Chuck Lever [Thu, 18 Aug 2005 18:24:12 +0000 (11:24 -0700)]
[PATCH] NFS: Introduce the use of inode->i_lock to protect fields in nfsi
Down the road we want to eliminate the use of the global kernel lock entirely
from the NFS client. To do this, we need to protect the fields in the
nfs_inode structure adequately. Start by serializing updates to the
"cache_validity" field.
Note this change addresses an SMP hang found by njw@osdl.org, where processes
deadlock because nfs_end_data_update and nfs_revalidate_mapping update the
"cache_validity" field without proper serialization.
Test plan:
Millions of fsx ops on SMP clients. Run Nick Wilson's breaknfs program on
large SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chuck Lever [Thu, 18 Aug 2005 18:24:11 +0000 (11:24 -0700)]
[PATCH] NFS: use atomic bitops to manipulate flags in nfsi->flags
Introduce atomic bitops to manipulate the bits in the nfs_inode structure's
"flags" field.
Using bitops means we can use a generic wait_on_bit call instead of an ad hoc
locking scheme in fs/nfs/inode.c, so we can remove the "nfs_i_wait" field from
nfs_inode at the same time.
The other new flags field will continue to use bitmask and logic AND and OR.
This permits several flags to be set at the same time efficiently. The
following patch adds a spin lock to protect these flags, and this spin lock
will later cover other fields in the nfs_inode structure, amortizing the cost
of using this type of serialization.
Test plan:
Millions of fsx ops on SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg KH [Thu, 18 Aug 2005 00:33:11 +0000 (17:33 -0700)]
[PATCH] Fix manual binding infinite loop
Fix for manual binding of drivers to devices. Problem is if you pass in
a valid device id, but the driver refuses to bind. Infinite loop as
write() tries to resubmit the data it just sent.
Thanks to Michal Ostrowski <mostrows@watson.ibm.com> for pointing the
problem out.
Brian King [Wed, 17 Aug 2005 21:32:18 +0000 (07:32 +1000)]
[PATCH] ppc64: iommu vmerge fix
This fixes a bug in the PPC64 iommu vmerge code which results in the
potential for iommu_unmap_sg to go off unmapping more than it should.
This was found on a test system which resulted in PCI bus errors due to
PCI memory being unmapped while DMAs were still in progress.
Signed-off-by: Brian King <brking@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Wed, 17 Aug 2005 20:07:28 +0000 (13:07 -0700)]
Revert unnecessary zlib_inflate/inftrees.c fix
It turns out that empty distance code tables are not an error, and that
a compressed block with only literals can validly have an empty table
and should not be flagged as a data error.
Some old versions of gzip had problems with this case, but it does not
affect the zlib code in the kernel.
Analysis and explanations thanks to Sergey Vlasov <vsu@altlinux.ru>
Steven Rostedt [Wed, 17 Aug 2005 18:25:23 +0000 (14:25 -0400)]
[PATCH] nfsd to unlock kernel before exiting
The nfsd holds the big kernel lock upon exit, when it really shouldn't.
Not to mention that this breaks Ingo's RT patch. This is a trivial fix
to release the lock.
Ingo, this patch also works with your kernel, and stops the problem with
nfsd.
Note, there's a "goto out;" where "out:" is right above svc_exit_thread.
The point of the goto also holds the kernel_lock, so I don't see any
problem here in releasing it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bhavesh P. Davda [Wed, 17 Aug 2005 18:26:33 +0000 (12:26 -0600)]
[PATCH] NPTL signal delivery deadlock fix
This bug is quite subtle and only happens in a very interesting
situation where a real-time threaded process is in the middle of a
coredump when someone whacks it with a SIGKILL. However, this deadlock
leaves the system pretty hosed and you have to reboot to recover.
Not good for real-time priority-preemption applications like our
telephony application, with 90+ real-time (SCHED_FIFO and SCHED_RR)
processes, many of them multi-threaded, interacting with each other for
high volume call processing.
Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Herbert Xu [Wed, 17 Aug 2005 19:03:59 +0000 (12:03 -0700)]
[TCP]: Fix bug #5070: kernel BUG at net/ipv4/tcp_output.c:864
1) We send out a normal sized packet with TSO on to start off.
2) ICMP is received indicating a smaller MTU.
3) We send the current sk_send_head which needs to be fragmented
since it was created before the ICMP event. The first fragment
is then sent out.
At this point the remaining fragment is allocated by tcp_fragment.
However, its size is padded to fit the L1 cache-line size therefore
creating tail-room up to 124 bytes long.
This fragment will also be sitting at sk_send_head.
4) tcp_sendmsg is called again and it stores data in the tail-room of
of the fragment.
5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment
since the packet as a whole exceeds the MTU.
At this point we have a packet that has data in the head area being
fed to tso_fragment which bombs out.
My take on this is that we shouldn't ever call tcp_fragment on a TSO
socket for a packet that is yet to be transmitted since this creates
a packet on sk_send_head that cannot be extended.
So here is a patch to change it so that tso_fragment is always used
in this case.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Aug 2005 19:03:32 +0000 (12:03 -0700)]
[IPV6]: Fix raw socket hardware checksum failures
When packets hit raw sockets the csum update isn't done yet, do it manually.
Packets can also reach rawv6_rcv on the output path through
ip6_call_ra_chain, in this case skb->ip_summed is CHECKSUM_NONE and this
codepath isn't executed.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Dimitry Andric [Wed, 17 Aug 2005 12:01:19 +0000 (13:01 +0100)]
[ARM] 2850/1: Remove duplicate UART I/O mapping from s3c2410_iodesc
Patch from Dimitry Andric
This patch removes the initial UART I/O mapping from s3c2410_iodesc,
since the same mapping is already done in the function s3c24xx_init_io
in the file arch/arm/mach-s3c2410/cpu.c, through the s3c_iodesc array.
I'm not sure if duplicate mappings do any harm, but it's simply
redundant. Also, in s3c2440.c the UART I/O mapping is NOT done.
Additionally, I put a comma behind the last mapping, to ease
copy/pasting stuff around, and make the style consistent with
s3c2440.c and other files.
Signed-off-by: Dimitry Andric <dimitry@andric.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sean Lee [Wed, 17 Aug 2005 08:28:26 +0000 (09:28 +0100)]
[ARM] 2852/1: Correct the mistake in arch/arm/mm/Kconfig file
Patch from Sean Lee
In the arch/arm/mm/Kconfig file, the CPU_DCACHE_WRITETHROUGH
option is depend on the CPU_DISABLE_DCACHE, but the "Disable
D-Cache" option is configured as CPU_DCACHE_DISABLE.
The CPU_DISABLE_DCACHE should be CPU_DCACHE_DISABLE
Signed-off-by: Sean Lee <beginner2arm@eyou.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Stephen Rothwell [Wed, 17 Aug 2005 03:01:50 +0000 (13:01 +1000)]
[PATCH] iSeries build with newer assemblers and compilers
Paulus suggested that we put xLparMap in its own .c file so that we can
generate a .s file to be included into head.S. This doesn't get around
the problem of having it at a fixed address, but it makes it more
palatable.
It would be good if this could be included in 2.6.13 as it solves our
build problems with various versions of binutils and gcc. In
particular, it allows us to build an iSeries kernel on Debian unstable
using their biarch compiler.
This has been built and booted on iSeries and built for pSeries and g5.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Maneesh Soni [Tue, 16 Aug 2005 22:15:48 +0000 (15:15 -0700)]
[PATCH] Driver core: potentially fix use after free in class_device_attr_show
This moves the code to free devt_attr from class_device_del() to
class_dev_release() which is called after the last reference to the
corresponding kobject() is gone.
This allows us to keep the devt_attr alive while the corresponding
sysfs file is open.
Herbert Xu [Wed, 17 Aug 2005 03:43:40 +0000 (20:43 -0700)]
[TCP]: Fix bug #5070: kernel BUG at net/ipv4/tcp_output.c:864
1) We send out a normal sized packet with TSO on to start off.
2) ICMP is received indicating a smaller MTU.
3) We send the current sk_send_head which needs to be fragmented
since it was created before the ICMP event. The first fragment
is then sent out.
At this point the remaining fragment is allocated by tcp_fragment.
However, its size is padded to fit the L1 cache-line size therefore
creating tail-room up to 124 bytes long.
This fragment will also be sitting at sk_send_head.
4) tcp_sendmsg is called again and it stores data in the tail-room of
of the fragment.
5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment
since the packet as a whole exceeds the MTU.
At this point we have a packet that has data in the head area being
fed to tso_fragment which bombs out.
My take on this is that we shouldn't ever call tcp_fragment on a TSO
socket for a packet that is yet to be transmitted since this creates
a packet on sk_send_head that cannot be extended.
So here is a patch to change it so that tso_fragment is always used
in this case.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Aug 2005 03:39:38 +0000 (20:39 -0700)]
[IPV6]: Fix raw socket hardware checksum failures
When packets hit raw sockets the csum update isn't done yet, do it manually.
Packets can also reach rawv6_rcv on the output path through
ip6_call_ra_chain, in this case skb->ip_summed is CHECKSUM_NONE and this
codepath isn't executed.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Chubb [Wed, 17 Aug 2005 00:27:00 +0000 (17:27 -0700)]
[IA64] Updated zx1 defconfig
Just `make oldconfig' doesn't help for the zx1 defconfig ---
because we need the MPT Fusion drivers, which are picked up as not
selected.
Tested on HP ZX2000 and ZX2600.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> Signed-off-by: Tony Luck <tony.luck@intel.com>
Zachary Amsden [Tue, 16 Aug 2005 19:05:09 +0000 (12:05 -0700)]
[PATCH] i386 / desc_empty macro is incorrect
Chuck Ebbert noticed that the desc_empty macro is incorrect. Fix it.
Thankfully, this is not used as a security check, but it can falsely
overwrite TLS segments with carefully chosen base / limits. I do not
believe this is an issue in practice, but it is a kernel bug.
Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Chris Wright <chrisw@osdl.org>
[ x86-64 had the same problem, and the same fix. Linus ]
NTFS: Complete the previous fix for the unset device when mapping buffers
for mft record writing. I had missed the writepage based mft record
write code path.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Trond Myklebust [Tue, 16 Aug 2005 15:49:44 +0000 (11:49 -0400)]
[PATCH] NFS: Ensure we always update inode->i_mode when doing O_EXCL creates
When the client performs an exclusive create and opens the file for writing,
a Netapp filer will first create the file using the mode 01777. It does this
since an NFSv3/v4 exclusive create cannot immediately set the mode bits.
The 01777 mode then gets put into the inode->i_mode. After the file creation
is successful, we then do a setattr to change the mode to the correct value
(as per the NFS spec).
The problem is that nfs_refresh_inode() no longer updates inode->i_mode, so
the latter retains the 01777 mode. A bit later, the VFS notices this, and calls
remove_suid(). This of course now resets the file mode to inode->i_mode & 0777.
Hey presto, the file mode on the server is now magically changed to 0777. Duh...
NTFS: Fix bug in mft record writing where we forgot to set the device in
the buffers when mapping them after the VM had discarded them.
Thanks to Martin MOKREJŠ for the bug report.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
The exception handling code fails to compile if the extended
precision mode is enabled. This patch fixes those compile errors and
also stops _quiet functions from incorrectly raising exceptions. Reported-by: Ralph Siemsen <ralphs@netwinder.org> Signed-off-by: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] intelfb/fbdev: Save info->flags in a local variable
Reported by: Pavel Kysilka (Bugzilla Bug 5059)
The intelfb driver does not keep resolution set with fbset after
switching to anot console and back.
Steps to reproduce:
initial options: tty1,tty2 - 1024x768-60
1) tty1 - fbset after booting (1024x768-60)
2) tty1 - fbset 800x600-100
tty1: 800x600-100
3) swith to tty2, swith to tty1
tty1: 1024x768-60 (the same resolution as default from kernel booting)
This bug is caused by intelfb unintentionally destroying info->flags in
set_par(). Therefore the flag, FBINFO_MISC_USEREVENT used to notify
fbcon of a mode change was cleared causing the above problem. This bug
though is not intelfb specific, as other drivers may also be affected.
The fix is to save info->flags in a local variable before calling any
of the driver hooks. A more definitive fix (for post 2.6.13) is to
separate info->flags into one that is set by the driver and another that
is set by core fbdev/fbcon.
Sylvain Meyer [Mon, 15 Aug 2005 13:27:13 +0000 (21:27 +0800)]
[PATCH] intelfb: Do not ioremap entire graphics aperture
Reported by: Pavel Kysilka (Bugzilla Bug 4738)
modprobe of intelfb results in the following error message:
intelfb: Framebuffer driver for Intel(R) 830M/845G/852GM/855GM/865G/915G chi
intelfb: Version 0.9.2
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
allocation failed: out of vmalloc space - use vmalloc=<size> to increase siz
intelfb: Cannot remap FB region.
This will fail if the graphics aperture size is greater than 128 MB.
Fix is to ioremap only from the beginning of graphics aperture to the
end of the used framebuffer memory.
John McCutchan [Mon, 15 Aug 2005 16:13:28 +0000 (12:13 -0400)]
[PATCH] inotify: add MOVE_SELF event
This adds a MOVE_SELF event to inotify. It is sent whenever the inode
you are watching is moved. We need this event so that we can catch
something like this:
Paul Mundt [Sat, 13 Aug 2005 17:28:06 +0000 (20:28 +0300)]
[PATCH] sh: Make _syscall6() do the right thing.
There was a rather silly and embarrassing typo in the sh _syscall6().
For the syscall ABI we have the trapa value specified as 0x10 + number
of arguments, this was being set incorrectly in the _syscall6() case
which ended up causing some problems for users.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Sat, 13 Aug 2005 21:22:59 +0000 (14:22 -0700)]
Fix up mmap of /dev/kmem
This leaves the issue of whether we should deprecate the whole thing (or
if we should check the whole mmap range, for that matter) open. Just do
the minimal fix for now.
Oops. I knew I didn't have the physical versus logical cpu identifiers right
when I generated that patch. It's not nearly as bad as I feared at the time
though.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Fri, 12 Aug 2005 02:26:42 +0000 (19:26 -0700)]
[NETPOLL]: pre-fill skb pool
we could do one thing (see the patch below): i think it would be useful
to fill up the netlogging skb queue straight at initialization time.
Especially if netpoll is used for dumping alone, the system might not be
in a situation to fill up the queue at the point of crash, so better be
a bit more prepared and keep the pipeline filled.
[ I've modified this to be called earlier - mpm ]
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Mackall [Fri, 12 Aug 2005 02:25:54 +0000 (19:25 -0700)]
[NETPOLL]: add retry timeout
Add limited retry logic to netpoll_send_skb
Each time we attempt to send, decrement our per-device retry counter.
On every successful send, we reset the counter.
We delay 50us between attempts with up to 20000 retries for a total of
1 second. After we've exhausted our retries, subsequent failed
attempts will try only once until reset by success.
Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Moyer [Fri, 12 Aug 2005 02:23:50 +0000 (19:23 -0700)]
[NETPOLL]: deadlock bugfix
This fixes an obvious deadlock in the netpoll code. netpoll_rx takes the
npinfo->rx_lock. netpoll_rx is also the only caller of arp_reply (through
__netpoll_rx). As such, it is not necessary to take this lock.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Moyer [Fri, 12 Aug 2005 02:23:04 +0000 (19:23 -0700)]
[NETPOLL]: rx_flags bugfix
Initialize npinfo->rx_flags. The way it stands now, this will have random
garbage, and so will incur a locking penalty even when an rx_hook isn't
registered and we are not active in the netpoll polling code.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 11 Aug 2005 01:32:36 +0000 (18:32 -0700)]
[TCP]: Adjust {p,f}ackets_out correctly in tcp_retransmit_skb()
Well I've only found one potential cause for the assertion
failure in tcp_mark_head_lost. First of all, this can only
occur if cnt > 1 since tp->packets_out is never zero here.
If it did hit zero we'd have much bigger problems.
So cnt is equal to fackets_out - reordering. Normally
fackets_out is less than packets_out. The only reason
I've found that might cause fackets_out to exceed packets_out
is if tcp_fragment is called from tcp_retransmit_skb with a
TSO skb and the current MSS is greater than the MSS stored
in the TSO skb. This might occur as the result of an expiring
dst entry.
In that case, packets_out may decrease (line 1380-1381 in
tcp_output.c). However, fackets_out is unchanged which means
that it may in fact exceed packets_out.
Previously tcp_retrans_try_collapse was the only place where
packets_out can go down and it takes care of this by decrementing
fackets_out.
So we should make sure that fackets_out is reduced by an appropriate
amount here as well.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
The PFM_LOAD_CONTEXT may fail silently and cause a session
to remain reserved even though it should not. This can happen
when the commands succeeds in reserving the session but fails
when it actually tries to attach to the load_pid. In that case,
the command has failed but will return 0. More importantly,
the session will remain reserved. This patch fixes the problem.
Signed-off-by: <stephane.eranian@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
James Bottomley [Wed, 10 Aug 2005 18:29:15 +0000 (11:29 -0700)]
[PATCH] remove name length check in a workqueue
We have a chek in there to make sure that the name won't overflow
task_struct.comm[], but it's triggering for scsi with lots of HBAs, only
scsi is using single-threaded workqueues which don't append the "/%d"
anyway.
All too hard. Just kill the BUG_ON.
Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org>
[ kthread_create() uses vsnprintf() and limits the thing, so no
actual overflow can actually happen regardless ]
[PATCH] ppc64: Fix Fan control for new PowerMac G5 2.7GHz machines
The workaround for broken device-tree that prevents fan control from
working on recent G5 models need to be "enabled" for machines with
revision 0x37 of the bridge in addition to machines with revision 0x35.
Alexander Nyberg [Wed, 10 Aug 2005 17:11:36 +0000 (10:11 -0700)]
[PATCH] ns558 list handling fix
Need to use list_for_entry_safe(), as we're removing items during the
traversal. list_for_each_entry() uses the first ptr also as an iterator, if
you kfree() it slab takes it, might poison it and then you try to use it to
iterate to the next object in list.
Fix the p-persistence CSMA algorithm which in simplex mode was starting
with a slottime delay before doing anything else as if there was carrier
collision resulting in bad performance on simplex links.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tejun Heo [Sun, 7 Aug 2005 05:53:40 +0000 (14:53 +0900)]
[PATCH] sata: fix sata_sx4 dma_prep to not use sg->length
sata_sx4 directly references sg->length to calculate total_len in
pdc20621_dma_prep(). This is incorrect as dma_map_sg() could have
merged multiple sg's into one and, in such case, sg->length doesn't
reflect true size of the entry. This patch makes it use
sg_dma_len(sg).
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Dave Kleikamp [Wed, 10 Aug 2005 16:14:39 +0000 (11:14 -0500)]
JFS: Fix race in txLock
TxAnchor.anon_list is protected by jfsTxnLock (TXN_LOCK), but there was
a place in txLock() that was removing an entry from the list without holding
the spinlock.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Ben Dooks [Wed, 10 Aug 2005 15:45:14 +0000 (16:45 +0100)]
[PATCH] ARM: 2849/1: S3C24XX - USB host update (2848/1)
Patch from Ben Dooks
Rename the s3c2410_report_oc() to s3c2410_usb_report_oc()
as this is an usb specific function.
Change port power on the usb-simtec implementation to only
power up the output if both are set, as per the usb 1.1
specification
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>