This holds the task lock (and, for ptrace_attach, the tasklist_lock)
over the actual attach event, which closes a race between attacking to a
thread that is either doing a PTRACE_TRACEME or getting de-threaded.
Thanks to Oleg Nesterov for reminding me about this, and Chris Wright
for noticing a lost return value in my first version.
John Heffner [Sat, 6 May 2006 00:41:44 +0000 (17:41 -0700)]
[TCP]: Fix snd_cwnd adjustments in tcp_highspeed.c
Xiaoliang (David) Wei wrote:
> Hi gurus,
>
> I am reading the code of tcp_highspeed.c in the kernel and have a
> question on the hstcp_cong_avoid function, specifically the following
> AI part (line 136~143 in net/ipv4/tcp_highspeed.c ):
>
> /* Do additive increase */
> if (tp->snd_cwnd < tp->snd_cwnd_clamp) {
> tp->snd_cwnd_cnt += ca->ai;
> if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
> tp->snd_cwnd++;
> tp->snd_cwnd_cnt -= tp->snd_cwnd;
> }
> }
>
> In this part, when (tp->snd_cwnd_cnt == tp->snd_cwnd),
> snd_cwnd_cnt will be -1... snd_cwnd_cnt is defined as u16, will this
> small chance of getting -1 becomes a problem?
> Shall we change it by reversing the order of the cwnd++ and cwnd_cnt -=
> cwnd?
Absolutely correct. Thanks.
Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Sat, 6 May 2006 00:19:26 +0000 (17:19 -0700)]
[NETROM/ROSE]: Kill module init version kernel log messages.
There are out of date and don't tell the user anything useful.
The similar messages which IPV4 and the core networking used
to output were killed a long time ago.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sat, 6 May 2006 00:09:13 +0000 (17:09 -0700)]
[DCCP]: Fix sock_orphan dead lock
Calling sock_orphan inside bh_lock_sock in dccp_close can lead to dead
locks. For example, the inet_diag code holds sk_callback_lock without
disabling BH. If an inbound packet arrives during that admittedly tiny
window, it will cause a dead lock on bh_lock_sock. Another possible
path would be through sock_wfree if the network device driver frees the
tx skb in process context with BH enabled.
We can fix this by moving sock_orphan out of bh_lock_sock.
The tricky bit is to work out when we need to destroy the socket
ourselves and when it has already been destroyed by someone else.
By moving sock_orphan before the release_sock we can solve this
problem. This is because as long as we own the socket lock its
state cannot change.
So we simply record the socket state before the release_sock
and then check the state again after we regain the socket lock.
If the socket state has transitioned to DCCP_CLOSED in the time being,
we know that the socket has been destroyed. Otherwise the socket is
still ours to keep.
This problem was discoverd by Ingo Molnar using his lock validator.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[SCTP]: Prevent possible infinite recursion with multiple bundled DATA.
There is a rare situation that causes lksctp to go into infinite recursion
and crash the system. The trigger is a packet that contains at least the
first two DATA fragments of a message bundled together. The recursion is
triggered when the user data buffer is smaller that the full data message.
The problem is that we clone the skb for every fragment in the message.
When reassembling the full message, we try to link skbs from the "first
fragment" clone using the frag_list. However, since the frag_list is shared
between two clones in this rare situation, we end up setting the frag_list
pointer of the second fragment to point to itself. This causes
sctp_skb_pull() to potentially recurse indefinitely.
Proposed solution is to make a copy of the skb when attempting to link
things using frag_list.
Signed-off-by: Vladislav Yasevich <vladsilav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Sat, 6 May 2006 00:02:09 +0000 (17:02 -0700)]
[SCTP]: Allow spillover of receive buffer to avoid deadlock.
This patch fixes a deadlock situation in the receive path by allowing
temporary spillover of the receive buffer.
- If the chunk we receive has a tsn that immediately follows the ctsn,
accept it even if we run out of receive buffer space and renege data with
higher TSNs.
- Once we accept one chunk in a packet, accept all the remaining chunks
even if we run out of receive buffer space.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Mark Butler <butlerm@middle.net> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Pitre [Fri, 5 May 2006 21:32:24 +0000 (22:32 +0100)]
[ARM] 3500/1: fix PXA27x DMA allocation priority
Patch from Nicolas Pitre
Intel PXA27x developers manual section 5.4.1.1 lists a priority
distribution for the DMA channels differently than what the code
currently assumes. This patch fixes that.
Noticed by Simon Vogl <vogl@soft.uni-linz.ac.at>
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
George G. Davis [Fri, 5 May 2006 21:32:23 +0000 (22:32 +0100)]
[ARM] 3499/1: Fix VFP FPSCR corruption for double exception case
Patch from George G. Davis
The ARM VFP FPSCR register is corrupted when a condition flags modifying
VFP instruction is followed by a non-condition flags modifying VFP
instruction and both instructions raise exceptions. The fix is to
read the current FPSCR in between emulation of these two instructions
and use the current FPSCR value when handling the second exception.
Signed-off-by: George G. Davis <gdavis@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Fri, 5 May 2006 16:57:52 +0000 (17:57 +0100)]
[BLOCK] Fix oops on removal of SD/MMC card
The block layer keeps a reference (driverfs_dev) to the struct
device associated with the block device, and uses it internally
for generating uevents in block_uevent.
Block device uevents include umounting the partition, which can
occur after the backing device has been removed.
Unfortunately, this reference is not counted. This means that
if the struct device is removed from the device tree, the block
layers reference will become stale.
Guard against this by holding a reference to the struct device
in add_disk(), and only drop the reference when we're releasing
the gendisk kobject - in other words when we can be sure that no
further uevents will be generated for this block device.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Jens Axboe <axboe@suse.de>
Linus Torvalds [Thu, 4 May 2006 22:09:52 +0000 (15:09 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[PATCH] powerpc: Use the ibm,pa-features property if available
powerpc: Fix incorrect might_sleep in __get_user/__put_user on kernel addresses
[PATCH] ppc32 CPM_UART: fixes and improvements
[PATCH] ppc32 CPM_UART: Fixed break send on SCC
[PATCH] powerpc/kprobes: fix singlestep out-of-line
[PATCH] powerpc/pseries: avoid crash in PCI code if mem system not up
Linus Torvalds [Thu, 4 May 2006 21:52:43 +0000 (14:52 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3490/1: i.MX: move uart resources to board files
[ARM] 3488/1: make icedcc_putc do the right thing
[ARM] 3487/1: IXP4xx: Support non-PCI systems
[ARM] 3486/1: Mark memory as clobbered by the ARM _syscallX() macros
Russell King [Thu, 4 May 2006 17:22:51 +0000 (18:22 +0100)]
[MMC] Move set_ios debugging into mmc.c
Rather than having every driver duplicate the set_ios debugging,
provide a single version in mmc.c which can be expanded as we
add additional functionality.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sascha Hauer [Thu, 4 May 2006 13:07:42 +0000 (14:07 +0100)]
[ARM] 3490/1: i.MX: move uart resources to board files
Patch from Sascha Hauer
This patch moves the i.MX uart resources and the gpio pin setup to the
board files. This allows the boards to decide how many internal uarts
are connected to the outside world and whether they use rts/cts or
not.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Jens Axboe [Thu, 4 May 2006 07:13:49 +0000 (09:13 +0200)]
[PATCH] compat_sys_vmsplice: one-off in UIO_MAXIOV check
nr_segs may not be > UIO_MAXIOV, however it may be equal to. This makes
the behaviour identical to the real sys_vmsplice(). The other foov
syscalls also agree that this is the way to go.
This patch fixes hello messages sent when a node is a level 1
router. Slightly contrary to the spec (maybe) VMS ignores hello
messages that do not name level2 routers that it also knows about.
So, here we simply name all the routers that the node knows about
rather just other level1 routers. (I hope the patch is clearer than
the description. sorry).
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 4 May 2006 06:31:35 +0000 (23:31 -0700)]
[TCP]: Fix sock_orphan dead lock
Calling sock_orphan inside bh_lock_sock in tcp_close can lead to dead
locks. For example, the inet_diag code holds sk_callback_lock without
disabling BH. If an inbound packet arrives during that admittedly tiny
window, it will cause a dead lock on bh_lock_sock. Another possible
path would be through sock_wfree if the network device driver frees the
tx skb in process context with BH enabled.
We can fix this by moving sock_orphan out of bh_lock_sock.
The tricky bit is to work out when we need to destroy the socket
ourselves and when it has already been destroyed by someone else.
By moving sock_orphan before the release_sock we can solve this
problem. This is because as long as we own the socket lock its
state cannot change.
So we simply record the socket state before the release_sock
and then check the state again after we regain the socket lock.
If the socket state has transitioned to TCP_CLOSE in the time being,
we know that the socket has been destroyed. Otherwise the socket is
still ours to keep.
Note that I've also moved the increment on the orphan count forward.
This may look like a problem as we're increasing it even if the socket
is just about to be destroyed where it'll be decreased again. However,
this simply enlarges a window that already exists. This also changes
the orphan count test by one.
Considering what the orphan count is meant to do this is no big deal.
This problem was discoverd by Ingo Molnar using his lock validator.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Thu, 4 May 2006 06:26:20 +0000 (23:26 -0700)]
[ROSE]: Fix routing table locking in rose_remove_neigh.
The locking rule for rose_remove_neigh() are that the caller needs to
hold rose_neigh_list_lock, so we better don't take it yet again in
rose_neigh_list_lock.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 4 May 2006 06:17:11 +0000 (23:17 -0700)]
[NETFILTER]: H.323 helper: fix use of uninitialized data
When a Choice element contains an unsupported choice no error is returned
and parsing continues normally, but the choice value is not set and
contains data from the last parsed message. This may in turn lead to
parsing of more stale data and following crashes.
Fixes a crash triggered by testcase 0003243 from the PROTOS c07-h2250v4
testsuite following random other testcases:
Jens Axboe [Wed, 3 May 2006 08:58:22 +0000 (10:58 +0200)]
[PATCH] splice: redo page lookup if add_to_page_cache() returns -EEXIST
This can happen quite easily, if several processes are trying to splice
the same file at the same time. It's not a failure, it just means someone
raced with us in allocating this file page. So just dump the allocated
page and relookup the original.
Jens Axboe [Wed, 3 May 2006 08:35:26 +0000 (10:35 +0200)]
[PATCH] splice: LRU fixups
Nick says that the current construct isn't safe. This goes back to the
original, but sets PIPE_BUF_FLAG_LRU on user pages as well as they all
seem to be on the LRU in the first place.
Mingming Cao [Thu, 4 May 2006 02:55:12 +0000 (19:55 -0700)]
[PATCH] ext3: multile block allocate little endian fixes
Some places in ext3 multiple block allocation code (in 2.6.17-rc3) don't
handle the little endian well. This was resulting in *wrong* block numbers
being assigned to in-memory block variables and then stored on disk
eventually. The following patch has been verified to fix an ext3
filesystem failure when run ltp test on a 64 bit machine.
Signed-off-by; Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Brent Casavant [Thu, 4 May 2006 02:55:10 +0000 (19:55 -0700)]
[PATCH] Altix: correct ioc4 port order
Currently loading the ioc3 as a module will cause the ports to be numbered
in reverse order. This mod maintains the proper order of cards for port
numbering.
Signed-off-by: Brent Casavant <bcasavan@sgi.com> Cc: Pat Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mark gross [Thu, 4 May 2006 02:55:07 +0000 (19:55 -0700)]
[PATCH] EDAC Coexistence with BIOS
Address the issue of EDAC/BIOS coexistence for the e752x chip-sets.
We have found a problem where the BIOS will start the system with the error
registers (dev0:fun1) hidden and assuming it has exclusive access to them.
The edac driver violates this assumption.
The workaround this patch offers is to honor the hidden-ness as an
indication that it is not safe to use those registers.
Signed-off-by: Mark Gross <mark.gross@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Thu, 4 May 2006 02:55:03 +0000 (19:55 -0700)]
[PATCH] uml: change timer initialization
inet_init, which schedules, is called before the UML timer_init, which sets
up the timer. The result is the interval timers being manipulated before
the appropriate signal handlers are established, causing unhandled timers.
This is fixed by making timer_init be called earlier.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Smalley [Wed, 3 May 2006 14:52:36 +0000 (10:52 -0400)]
[PATCH] selinux: Clear selinux_enabled flag upon runtime disable.
Clear selinux_enabled flag upon runtime disable of SELinux by userspace,
and make sure it is defined even if selinux= boot parameter support is
not enabled in configuration.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Tested-by: Jon Smirl <jonsmirl@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Wed, 3 May 2006 13:04:37 +0000 (23:04 +1000)]
[PATCH] powerpc: Use the ibm,pa-features property if available
Forthcoming IBM machines will have a "ibm,pa-features" property on CPU
nodes, that contains bits indicating which optional architecture
features are implemented by the CPU. This adds code to use the
property, if present, to update our CPU feature bitmaps. Note that
this means we can both set and clear feature bits based on what
the firmware tells us.
This is based on a patch by Will Schmidt <willschm@us.ibm.com>.
Paul Mackerras [Wed, 3 May 2006 13:02:04 +0000 (23:02 +1000)]
powerpc: Fix incorrect might_sleep in __get_user/__put_user on kernel addresses
We have a case where __get_user and __put_user can validly be used
on kernel addresses in interrupt context - namely, the alignment
exception handler, as our get/put_unaligned just do a single access
and rely on the alignment exception handler to fix things up in the
rare cases where the cpu can't handle it in hardware. Thus we can
get alignment exceptions in the network stack at interrupt level.
The alignment exception handler does a __get_user to read the
instruction and blows up in might_sleep().
Since a __get_user on a kernel address won't actually ever sleep,
this makes the might_sleep conditional on the address being less
than PAGE_OFFSET.
A number of small issues are fixed, and added the header file, missed from the
original series. With this, driver should be pretty stable as tested among
both platform-device-driven and "old way" boards. Also added missing GPL
statement , and updated year field on existing ones to reflect
code update.
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
SCC uart sends a break sequence each time it is stopped with the
CPM_CR_STOP_TX command. That means that each time an application closes the
serial device, a break is transmitted. To fix this, graceful tx stop is
issued for SCC.
Signed-off-by: David Jander <david.jander@protonic.nl> Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
We currently single-step inline if the instruction on which a kprobe is
inserted is a trap variant.
- variants (such as tdnei, used by BUG()) typically evaluate a condition
and cause a trap only if the condition is satisfied.
- kprobes uses the unconditional "trap" (0x7fe00008) and single-stepping
again on this instruction, resulting in another trap without
evaluating the condition is obviously incorrect.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
[PATCH] powerpc/pseries: avoid crash in PCI code if mem system not up
The powerpc code is currently performing PCI setup before memory
initialization. PCI setup touches PCI config space registers. If the PCI
card is bad, this will evoke an error, which currrently can't be handled,
as the PCI error recovery code expects kmalloc() to be functional. This
patch will cause the system to punt instead of crashing with
Patrick McHardy [Tue, 2 May 2006 21:23:07 +0000 (23:23 +0200)]
[NETFILTER] SCTP conntrack: fix infinite loop
fix infinite loop in the SCTP-netfilter code: check SCTP chunk size to
guarantee progress of for_each_sctp_chunk(). (all other uses of
for_each_sctp_chunk() are preceded by do_basic_checks(), so this fix
should be complete.)
Based on patch from Ingo Molnar <mingo@elte.hu>
CVE-2006-1527
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] via-rhine: zero pad short packets on Rhine I ethernet cards
Fixes Rhine I cards disclosing fragments of previously transmitted frames
in new transmissions.
Before transmission, any socket buffer (skb) shorter than the ethernet
minimum length of 60 bytes was zero-padded. On Rhine I cards the data can
later be copied into an aligned transmission buffer without copying this
padding. This resulted in the transmission of the frame with the extra
bytes beyond the provided content leaking the previous contents of this
buffer on to the network.
Now zero-padding is repeated in the local aligned buffer if one is used.
Following a suggestion from the via-rhine maintainer, no attempt is made
here to avoid the duplicated effort of padding the skb if it is known that
an aligned buffer will definitely be used. This is to make the change
"obviously correct" and allow it to be applied to a stable kernel if
necessary. There is no change to the flow of control and the changes are
only to the Rhine I code path.
The patch has run on an in-service Rhine-I host without incident. Frames
shorter than 60 bytes are now correctly zero-padded when captured on a
separate host. I see no unusual stats reported by ifconfig, and no unusual
log messages.
Signed-off-by: Craig Brind <craigbrind@gmail.com> Signed-off-by: Roger Luethi <rl@hellgate.ch> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Olaf Hering [Fri, 28 Apr 2006 01:23:49 +0000 (18:23 -0700)]
[PATCH] mv643xx_eth: provide sysfs class device symlink
On Sat, Mar 11, Olaf Hering wrote:
> Why is the /sys/class/net/eth0/device symlink not created for the
> mv643xx_eth driver? Does this work for other platform device drivers?
> Seems to work for the ps2 keyboard at least.
The SET_NETDEV_DEV has to be done before a call to register_netdev. With
the new patch below, the device symlink for the platform device was
created. Unfortunately, after the 4 ls commands, the network connection
died. No idea if the box crashed or if something else broke, lost remote
access.
Provide sysfs 'device' in /class/net/ethN Also, set module owner field,
like pcnet32 driver does.
Signed-off-by: Olaf Hering <olh@suse.de> Acked-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Russell King [Tue, 2 May 2006 19:18:53 +0000 (20:18 +0100)]
[MMC] PXA: reduce the number of lines PXAMCI debug uses
There's no reason for the PXAMCI debug code to print so many lines - it
causes the kernel buffer to overflow when trying to debug this driver.
Remove some debug messages which are duplicated by core code, and
combine other messages, resulting in fewer characters written to the
kernel log.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Tue, 2 May 2006 19:02:39 +0000 (20:02 +0100)]
[MMC] PXA and i.MX: don't avoid sending stop command on error
Always send a stop command at the end of a data transfer. If we avoid
sending the stop command, some cards remain in data transfer mode, and
refuse to accept further read/write commands.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Tue, 2 May 2006 16:24:59 +0000 (17:24 +0100)]
[MMC] extend data timeout for writes
The CSD contains a "read2write factor" which determines the multiplier to
be applied to the read timeout to obtain the write timeout. We were
ignoring this parameter, resulting in the possibility for writes being
timed out too early.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Jens Axboe [Tue, 2 May 2006 13:03:27 +0000 (15:03 +0200)]
[PATCH] splice: fix page LRU accounting
Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly
to know whether we need to fiddle with page LRU state after stealing it,
however for some origins we just don't know if the page is on the LRU
list or not.
So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file()
instead.
Jens Axboe [Tue, 2 May 2006 10:57:18 +0000 (12:57 +0200)]
[PATCH] vmsplice: fix badly placed end paranthesis
We need to use the minium of {len, PAGE_SIZE-off}, not {len, PAGE_SIZE}-off.
The latter doesn't make any sense, and could cause us to attempt negative
length transfers...
Linus Torvalds [Tue, 2 May 2006 04:43:05 +0000 (21:43 -0700)]
Merge branch 'audit.b10' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b10' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] Audit Filter Performance
[PATCH] Rework of IPC auditing
[PATCH] More user space subject labels
[PATCH] Reworked patch for labels on user space messages
[PATCH] change lspp ipc auditing
[PATCH] audit inode patch
[PATCH] support for context based audit filtering, part 2
[PATCH] support for context based audit filtering
[PATCH] no need to wank with task_lock() and pinning task down in audit_syscall_exit()
[PATCH] drop task argument of audit_syscall_{entry,exit}
[PATCH] drop gfp_mask in audit_log_exit()
[PATCH] move call of audit_free() into do_exit()
[PATCH] sockaddr patch
[PATCH] deal with deadlocks in audit_free()
struct xt_standard_target
{
struct xt_entry_target target;
int verdict;
};
xt_entry_target contains a pointer, so when compiled for 64 bit the
structure gets an extra 4 byte of padding at the end. On 32 bit
architectures where iptables aligns to 8 byte it will also have 4
byte padding at the end because it is only 36 bytes large.
The compat_ipt_standard_fn in the kernel adjusts the offsets by
which will always result in 4, even if the structure from userspace
was already padded to a multiple of 8. On x86 this works out by
accident because userspace only aligns to 4, on all other
architectures this is broken and causes incorrect adjustments to
the size and following offsets.
Thanks to Linus for lots of debugging help and testing.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Tue, 2 May 2006 01:33:40 +0000 (18:33 -0700)]
Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vmsplice: allow user to pass in gift pages
[PATCH] pipe: enable atomic copying of pipe data to/from user space
[PATCH] splice: call handle_ra_miss() on failure to lookup page
[PATCH] Add ->splice_read/splice_write to def_blk_fops
[PATCH] pipe: introduce ->pin() buffer operation
[PATCH] splice: fix bugs in pipe_to_file()
[PATCH] splice: fix bugs with stealing regular pipe pages
Linus Torvalds [Tue, 2 May 2006 01:26:31 +0000 (18:26 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/ipath: tidy up white space in a few files
IB/ipath: fix label name in interrupt handler
IB/ipath: improve sparse annotation
IB/ipath: simplify IB timer usage
IB/ipath: simplify RC send posting
IB/ipath: prevent hardware from being accessed during reset
IB/ipath: fix verbs registration
IB/ipath: change handling of PIO buffers
IB/ipath: iterate over correct number of ports during reset
IB/ipath: set up 32-bit DMA mask if 64-bit setup fails
IB/ipath: fix race with exposing reset file
IB/mthca: Fix offset in query_gid method
Shaohua Li [Mon, 1 May 2006 19:16:19 +0000 (12:16 -0700)]
[PATCH] timer TSC check suspend notifier change
At suspend time, the TSC CPUFREQ_SUSPENDCHANGE notifier change might
wrongly enable interrupt. cpufreq driver suspend/resume is in interrupt
disabled environment.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The PC Speaker driver's ->probe() routine doesn't even get called in the
64-bit kernels. The reason for that is that the arch code apparently has
to explictly add a "pcspkr" platform device in order for the driver core to
call the ->probe() routine. arch/i386/kernel/setup.c unconditionally adds
a "pcspkr" device, but the x86_64 kernel has no code at all related to the
PC Speaker.
The patch below copies the relevant code from i386 to x86_64, which makes
the PC Speaker work for me on x86_64.
Atsushi Nemoto [Mon, 1 May 2006 19:16:17 +0000 (12:16 -0700)]
[PATCH] genrtc: fix read on 64-bit platforms
Fix genrtc's read() routine for 64-bit platforms. Current gen_rtc_read()
stores 64bit integer and returns 8 even if an user tried to read a 32bit
integer.
Atsushi Nemoto [Mon, 1 May 2006 19:16:16 +0000 (12:16 -0700)]
[PATCH] RTC: rtc-dev tweak for 64-bit kernel
Make rtc-dev work well on 64-bit platforms with 32-bit userland. On those
platforms, users might try to read 32-bit integer value. This patch make
rtc-dev's read() work well for both "int" and "long" size. This tweak is came
from genrtc driver.
Heiko Carstens [Mon, 1 May 2006 19:16:14 +0000 (12:16 -0700)]
[PATCH] s390: fix ipd handling
As pointed out by Paulo Marques <pmarques@grupopie.com> MAX_IPD_TIME is by
a factor of ten too small. Since this means that we allow ten times more
IPDs in the intended time frame this could result in a cpu check stop of a
physical cpu.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeremy Kerr [Mon, 1 May 2006 19:16:12 +0000 (12:16 -0700)]
[PATCH] powerpc: Allow devices to register with numa topology
Change of_node_to_nid() to traverse the device tree, looking for a numa id.
Cell uses this to assign ids to SPUs, which are children of the CPU node.
Existing users of of_node_to_nid() are altered to use of_node_to_nid_single(),
which doesn't do the traversal.
Export an attach_sysdev_to_node() function, allowing system devices (eg.
SPUs) to link themselves into the numa topology in sysfs.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Joel H Schopp [Mon, 1 May 2006 19:16:11 +0000 (12:16 -0700)]
[PATCH] spufs: fix for CONFIG_NUMA
Based on an older patch from Mike Kravetz <kravetz@us.ibm.com>
We need to have a mem_map for high addresses in order to make fops->no_page
work on spufs mem and register files. So far, we have used the
memory_present() function during early bootup, but that did not work when
CONFIG_NUMA was enabled.
We now use the __add_pages() function to add the mem_map when loading the
spufs module, which is a lot nicer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pat Gefre [Mon, 1 May 2006 19:16:08 +0000 (12:16 -0700)]
[PATCH] Altix: correct ioc3 port order
Currently loading the ioc3 as a module will cause the ports to be numbered
in reverse order. This mod maintains the proper order of cards for port
numbering.
Signed-off-by: Patrick Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] page migration: Fix fallback behavior for dirty pages
Currently we check PageDirty() in order to make the decision to swap out
the page. However, the dirty information may be only be contained in the
ptes pointing to the page. We need to first unmap the ptes before checking
for PageDirty(). If unmap is successful then the page count of the page
will also be decreased so that pageout() works properly.
This is a fix necessary for 2.6.17. Without this fix we may migrate dirty
pages for filesystems without migration functions. Filesystems may keep
pointers to dirty pages. Migration of dirty pages can result in the
filesystem keeping pointers to freed pages.
Unmapping is currently not be separated out from removing all the
references to a page and moving the mapping. Therefore try_to_unmap will
be called again in migrate_page() if the writeout is successful. However,
it wont do anything since the ptes are already removed.
The coming updates to the page migration code will restructure the code
so that this is no longer necessary.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Blaisorblade's uml-makefile-nicer makes a V=0 build say SYMLINK where
what's happening is really a LINK.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Acked-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GCC hardened introduces additional symbol refererences (for the canary and
friends), also in modules - add weak export_symbols for them. We already
tested that the weak declaration creates no problem on both GCC's providing
the function definition and on GCC's which don't provide it.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] uml: cleanup unprofile expression and build infrastructure
*) Rather than duplicate in various buggy ways the application of
CFLAGS_NO_HARDENING and UNPROFILE (which apply to the same files),
centralize it in Makefile.rules. UNPROFILE_OBJS mustn't be listed in
USER_OBJS but are compiled as such.
I've also verified that unprofile didn't work in the current form, because we
set _c_flags directly (using CFLAGS and not USER_CFLAGS, which is wrong),
which is normally used by c_flags, but we also override c_flags for all
USER_OBJS, and there we don't call unprofile.
Instead it only worked for unmap.o, the only one which wasn't a USER_OBJ.
We need to set c_flags (which is not a public Kbuild API) to clear a lot of
compilation flags like -nostdinc which Kbuild forces on everything.
*) Rather than $(CFLAGS_$(notdir $@)), which expands to CFLAGS_anObj.s when
building "anObj.s", use $(CFLAGS_$(*F).o) which always accesses
CFLAGS_anObj.o, like done by Kbuild.
*) Make c_flags apply to all targets having the same basename, rather than
listing .s, .i, .lst and .o, with the use (which I tested) of
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] uml: fix compilation and execution with hardened GCC
To make some half-assembly stubs compile, disable various "hardened" GCC
features:
*) we can't make it build PIC code as we need %ebx to do syscalls and GCC
wants it free for PIC
*) we can't leave stack protection as the stub is moved (not relocated!) in
memory so the RIP-relative access to the canary tries reading from an
unmapped address and causes a segfault, since we move the stub of various
megabytes (the exact amount will be decided at runtime) away from the
link-time address.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] uml: use Kbuild tracking for all files and fix compilation output
Move the build of user-offsets to arch/um/sys-$(SUBARCH), where it's located.
So we can also build it via Kbuild with its dependency tracking rather than by
hand. While hacking here, fix also a lot of little cosmetic things.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Mon, 1 May 2006 19:16:00 +0000 (12:16 -0700)]
[PATCH] uml: error handling fixes
Blairsorblade noticed some confusion between our use of a system
call's return value and errno. This patch fixes a number of related
bugs -
using errno instead of a return value
using a return value instead of errno
forgetting to negate a error return to get a positive error code
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Mon, 1 May 2006 19:15:59 +0000 (12:15 -0700)]
[PATCH] uml: update defconfig
Bring defconfig up to date.
Also disable CONFIG_BLK_DEV_UBD_SYNC by default. By performing synchronous
I/O to the host, it slows things down, only protects against host crashes, and
can make a UML appear to hang while it waits for the host's disk.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Mon, 1 May 2006 19:15:58 +0000 (12:15 -0700)]
[PATCH] uml: clean up after MADVISE_REMOVE
The MADVISE_REMOVE-checking code didn't clean up after itself.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A quick hack to allow skas0 mode to run on 2G/2G hosts.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>