IB/mthca: Fix posting lists of 256 receive requests to SRQ for Tavor
If we post a list of length exactly a multiple of 256, nreq in
doorbell gets set to 256 which is wrong: it should be encoded by 0.
This is because we only zero it out on the next WR, which may not be
there. The solution is to ring the doorbell after posting a WQE, not
before posting the next one.
This is the same bug that we just fixed for QPs with non-shared RQ.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Linus Torvalds [Wed, 24 May 2006 15:55:12 +0000 (08:55 -0700)]
Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/ipath: deref correct pointer when using kernel SMA
IB/ipath: fix null deref during rdma ops
IB/ipath: register as IB device owner
IB/ipath: enable PE800 receive interrupts on user ports
IB/ipath: enable GPIO interrupt on HT-460
IB/ipath: fix NULL dereference during cleanup
IB/ipath: replace uses of LIST_POISON
IB/ipath: fix reporting of driver version to userspace
IB/ipath: don't modify QP if changes fail
IB/ipath: fix spinlock recursion bug
Linus Torvalds [Wed, 24 May 2006 15:36:31 +0000 (08:36 -0700)]
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] libata: add pio flush for via atapi (was: Re: TR: ASUS A8V Deluxe, x86_64)
Dave Kleikamp [Wed, 24 May 2006 12:43:38 +0000 (07:43 -0500)]
JFS: Fix multiple errors in metapage_releasepage
It looks like metapage_releasepage was making in invalid assumption that
the releasepage method would not be called on a dirty page. Instead of
issuing a warning and releasing the metapage, it should return 0, indicating
that the private data for the page cannot be released.
I also realized that metapage_releasepage had the return code all wrong. If
it is successful in releasing the private data, it should return 1, otherwise
it needs to return 0.
Lastly, there is no need to call wait_on_page_writeback, since
try_to_release_page will not call us with a page in writback state.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
The truncate threshold calculation to prevent receiver from getting stuck
was incorrect, and it didn't take into account the upper limit on bits
in the register so the jumbo packet support was broken.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Bridge will OOPS on removal if other application has the SAP open.
The bridge SAP might be shared with other usages, so need
to do reference counting on module removal rather than explicit
close/delete.
Since packet might arrive after or during removal, need to clear
the receive function handle, so LLC only hands it to user (if any).
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Wright [Tue, 23 May 2006 22:08:13 +0000 (15:08 -0700)]
[NETFILTER]: SNMP NAT: fix memleak in snmp_object_decode
If kmalloc fails, error path leaks data allocated from asn1_oid_decode().
Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
The condition "> H323_ERROR_STOP" can never be true since H323_ERROR_STOP
is positive and is the highest possible return code, while real errors are
negative, fix the checks. Also only abort on real errors in some spots
that were just interpreting any return value != 0 as error.
Fixes crashes caused by use of stale data after a parsing error occured:
Bryan O'Sullivan [Tue, 23 May 2006 18:32:37 +0000 (11:32 -0700)]
IB/ipath: fix null deref during rdma ops
The problem was that node A's sending thread, which handles sending RDMA
read response data, would write the trigger word, the last packet would
be sent, node B would send a new RDMA read request, node A's interrupt
handler would initialize s_rdma_sge, then node A's sending thread would
update s_rdma_sge. This didn't happen very often naturally but was more
frequent with 1 byte RDMA reads. Rather than adding more locking or
increasing the QP structure size and copying sge data, I modified the
copy routine to update the pointers before writing the trigger word to
avoid the update race.
Signed-off-by: Ralph Campbell <ralphc@pathscale.com> Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Bryan O'Sullivan [Tue, 23 May 2006 18:32:29 +0000 (11:32 -0700)]
IB/ipath: fix spinlock recursion bug
The local loopback path for RC can lock the rkey table lock without
blocking interrupts. The receive interrupt path can then call
ipath_rkey_ok() and deadlock. Remove the redundant lock.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Jean Delvare [Tue, 23 May 2006 18:56:50 +0000 (15:56 -0300)]
V4L/DVB (4040a): Fix the following section warnings:
reference to .init.text: from .text between 'dvb_bt8xx_probe'
(at offset 0x122c) and 'dvb_bt8xx_remove'
reference to .init.text: from .text between 'dvb_bt8xx_probe'
(at offset 0x1267) and 'dvb_bt8xx_remove'
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
If CONFIG_VIDEO_DEV=m and CONFIG_VIDEO_V4L1_COMPAT=y, v4l1-compat should
be built as a module (currently, it isn't built at all leading to
problems with modules using it).
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Jens Axboe [Tue, 23 May 2006 09:23:49 +0000 (11:23 +0200)]
[PATCH] blk: fix gendisk->in_flight accounting during barrier sequence
While executing barrrier sequence, the bar_rq which carries actual
write was accounted as normal IO on completion, while it wasn't on
queueing. This caused gendisk->in_flight to be decremented by 1 after
each barrier thus messed up statistics.
This patch makes bar_rq not accounted as normal IO. As the containing
barrier request as a whole is accounted, part of it shouldn't be.
NeilBrown [Tue, 23 May 2006 05:35:26 +0000 (22:35 -0700)]
[PATCH] md: fix possible oops when starting a raid0 array
This loop that sets up the hash_table has problems.
Careful examination will show that the last time through, everything but
the first line is pointless. This is because all it does is change 'cur'
and 'size' and neither of these are used after the loop. This should ring
warning bells... That last time through the loop,
size += conf->strip_zone[cur].size
can index off the end of the strip_zone array. Depending on what it finds
there, it might exit the loop cleanly, or it might spin going further and
further beyond the array until it hits an unmapped address.
This patch rearranges the code so that the last, pointless, iteration of
the loop never happens. i.e. the one statement of the last loop that is
needed is moved the the end of the previous loop - or to before the loop
starts - and the loop counter starts from 1 instead of 0.
Cc: "Don Dupuis" <dondster@gmail.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Tue, 23 May 2006 05:35:25 +0000 (22:35 -0700)]
[PATCH] knfsd: Fix two problems that can cause rmmod nfsd to die
Both cause the 'entries' count in the export cache to be non-zero at module
removal time, so unregistering that cache fails and results in an oops.
1/ exp_pseudoroot (used for NFSv4 only) leaks a reference to an export
entry.
2/ sunrpc_cache_update doesn't increment the entries count when it adds
an entry.
Thanks to "david m. richter" <richterd@citi.umich.edu> for triggering the
problem and finding one of the bugs.
Cc: "david m. richter" <richterd@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David S. Miller [Tue, 23 May 2006 09:07:22 +0000 (02:07 -0700)]
[SPARC64]: Respect gfp_t argument to dma_alloc_coherent().
Using asm-generic/dma-mapping.h does not work because pushing
the call down to pci_alloc_coherent() causes the gfp_t argument
of dma_alloc_coherent() to be ignored.
Fix this by implementing things directly, and adding a gfp_t
argument we can use in the internal call down to the PCI DMA
implementation of pci_alloc_coherent().
This fixes massive memory corruption when using the sound driver
layer, which passes things like __GFP_COMP down into these
routines and (correctly) expects that to work.
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 22 May 2006 23:55:14 +0000 (16:55 -0700)]
[NETFILTER]: SNMP NAT: fix memory corruption
Fix memory corruption caused by snmp_trap_decode:
- When snmp_trap_decode fails before the id and address are allocated,
the pointers contain random memory, but are freed by the caller
(snmp_parse_mangle).
- When snmp_trap_decode fails after allocating just the ID, it tries
to free both address and ID, but the address pointer still contains
random memory. The caller frees both ID and random memory again.
- When snmp_trap_decode fails after allocating both, it frees both,
and the callers frees both again.
The corruption can be triggered remotely when the ip_nat_snmp_basic
module is loaded and traffic on port 161 or 162 is NATed.
Found by multiple testcases of the trap-app and trap-enc groups of the
PROTOS c06-snmpv1 testsuite.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Sat, 20 May 2006 22:00:36 +0000 (15:00 -0700)]
[PATCH] ad1848 section fix
WARNING: sound/oss/ad1848.o - Section mismatch: reference to .init.data:ad1848_isapnp_list from .text between 'ad1848_init_generic' (at offset 0x46f0) and 'kmalloc'
WARNING: sound/oss/ad1848.o - Section mismatch: reference to .init.data:ad1848_isapnp_list from .text between 'ad1848_init_generic' (at offset 0x46f8) and 'kmalloc'
WARNING: sound/oss/ad1848.o - Section mismatch: reference to .init.data:ad1848_isapnp_list from .text between 'ad1848_init_generic' (at offset 0x4818) and 'kmalloc'
Also,
sound/oss/ad1848.c: In function `ad1848_init':
sound/oss/ad1848.c:2029: warning: cast to pointer from integer of different size
sound/oss/ad1848.c: In function `ad1848_unload':
sound/oss/ad1848.c:2178: warning: cast to pointer from integer of different size
sound/oss/ad1848.c: In function `adintr':
sound/oss/ad1848.c:2207: warning: cast from pointer to integer of different size
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sat, 20 May 2006 22:00:35 +0000 (15:00 -0700)]
[PATCH] nm256_audio section fix
WARNING: sound/oss/nm256_audio.o - Section mismatch: reference to .init.text:nm256_peek_for_sig from .text between 'nm256_install' (at offset 0x3ba4) and 'nm256_probe' WARNING: sound/oss/nm256_audio.o - Section mismatch: reference to .init.text:nm256_peek_for_sig from .text between 'nm256_install' (at offset 0x3bac) and 'nm256_probe' WARNING: sound/oss/nm256_audio.o - Section mismatch: reference to .init.text: from .text between 'nm256_install' (at offset 0x3dcc) and 'nm256_probe' WARNING: sound/oss/nm256_audio.o - Section mismatch: reference to .init.text: from .text between 'nm256_install' (at offset 0x3dd0) and 'nm256_probe'
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sat, 20 May 2006 22:00:34 +0000 (15:00 -0700)]
[PATCH] mpu401 section fix
WARNING: sound/drivers/mpu401/snd-mpu401.o - Section mismatch: reference to .init.text: from .text between 'snd_mpu401_pnp_probe' (at offset 0x1f7) and 'snd_mpu401_pnp_remove'
Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sat, 20 May 2006 22:00:33 +0000 (15:00 -0700)]
[PATCH] i810 section fix
WARNING: drivers/video/i810/i810fb.o - Section mismatch: reference to .init.data: from .text between 'i810_fix_offsets' (at offset 0x1b88) and 'i810_alloc_agp_mem'
WARNING: drivers/video/i810/i810fb.o - Section mismatch: reference to .init.data: from .text between 'i810_fix_offsets' (at offset 0x1b8f) and 'i810_alloc_agp_mem'
WARNING: drivers/video/i810/i810fb.o - Section mismatch: reference to .init.data: from .text between 'i810_fix_offsets' (at offset 0x1ba3) and 'i810_alloc_agp_mem'
WARNING: drivers/video/i810/i810fb.o - Section mismatch: reference to .init.data: from .text between 'i810_fix_offsets' (at offset 0x1bb5) and 'i810_alloc_agp_mem'
WARNING: drivers/video/i810/i810fb.o - Section mismatch: reference to .init.data: from .text between 'i810_fix_offsets' (at offset 0x1bc6) and 'i810_alloc_agp_mem'
WARNING: drivers/video/i810/i810fb.o - Section mismatch: reference to .init.data: from .text between 'i810_init_defaults' (at offset 0x1dd8) and 'i810_init_device'
WARNING: drivers/video/i810/i810fb.o - Section mismatch: reference to .init.data: from .text between 'i810_init_defaults' (at offset 0x1dfb) and 'i810_init_device'
Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sat, 20 May 2006 22:00:32 +0000 (15:00 -0700)]
[PATCH] pd6729 section fix
WARNING: drivers/pcmcia/pd6729.o - Section mismatch: reference to .init.text: from .text between 'pd6729_pci_probe' (at offset 0x9a8) and 'pd6729_pci_remove'
Bob Picco [Sat, 20 May 2006 22:00:31 +0000 (15:00 -0700)]
[PATCH] Align the node_mem_map endpoints to a MAX_ORDER boundary
Andy added code to buddy allocator which does not require the zone's
endpoints to be aligned to MAX_ORDER. An issue is that the buddy allocator
requires the node_mem_map's endpoints to be MAX_ORDER aligned. Otherwise
__page_find_buddy could compute a buddy not in node_mem_map for partial
MAX_ORDER regions at zone's endpoints. page_is_buddy will detect that
these pages at endpoints are not PG_buddy (they were zeroed out by bootmem
allocator and not part of zone). Of course the negative here is we could
waste a little memory but the positive is eliminating all the old checks
for zone boundary conditions.
SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the logic
either because the holes and endpoints are handled differently. This
leaves checking alloc_remap and other arches which privately allocate for
node_mem_map.
Signed-off-by: Bob Picco <bob.picco@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Atsushi Nemoto [Sat, 20 May 2006 22:00:26 +0000 (15:00 -0700)]
[PATCH] kbuild: check SHT_REL sections
I found that modpost can not detect section mismatch on mips and i386. On
mips64, the modpost (with r_info layout fix) can detect it. The current
modpst only checks SHT_RELA section but I suppose SHT_REL section should be
checked also. This patch does not contain r_info layout fix. I'll post an
updated r_info layout fix on next mail.
Check SHT_REL sections as like as SHT_RELA sections to detect section
mismatch.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] s390: next_timer_interrupt overflow in stop_hz_timer
The 32 bit unsigned substraction (next - jiffies) in stop_hz_timer can
overflow if jiffies gets advanced between next_timer_interrupt and the read
under the xtime lock. The cast to a u64 then results in a large value
which causes the cpu to wait too long. Fix this by casting next and
jiffies independently to u64 before subtracting them.
(Spotted by Zachary Amsden <zach@vmware.com>)
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zachary Amsden [Sat, 20 May 2006 22:00:24 +0000 (15:00 -0700)]
[PATCH] Fix a NO_IDLE_HZ timer bug
Under certain timing conditions, a race during boot occurs where timer
ticks are being processed on remote CPUs. The remote timer ticks can
increment jiffies, and if this happens during a window when a timeout is
very close to expiring but a local tick has not yet been delivered, you can
end up with
1) No softirq pending
2) A local timer wheel which is not synced to jiffies
3) No high resolution timer active
4) A local timer which is supposed to fire before the current jiffies value.
In this circumstance, the comparison in next_timer_interrupt overflows,
because the base of the comparison for high resolution timers is jiffies,
but for the softirq timer wheel, it is relative the the current base of the
wheel (jiffies_base).
Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Satoshi Oshima [Sat, 20 May 2006 22:00:21 +0000 (15:00 -0700)]
[PATCH] kprobes: bad manipulation of 2 byte opcode on x86_64
Problem:
If we put a probe onto a callq instruction and the probe is executed,
kernel panic of Bad RIP value occurs.
Root cause:
If resume_execution() found 0xff at first byte of p->ainsn.insn, it must
check the _second_ byte. But current resume_execution check _first_ byte
again.
I changed it checks second byte of p->ainsn.insn.
Kprobes on i386 don't have this problem, because the implementation is a
little bit different from x86_64.
Vivek Goyal [Sat, 20 May 2006 22:00:21 +0000 (15:00 -0700)]
[PATCH] i386 kdump boot cpu physical apicid fix
o Kdump second kernel boot fails after a system crash if second kernel
is UP and acpi=off and if crash occurred on a non-boot cpu.
o Issue here is that MP tables report boot cpu lapic id as 0 but second
kernel is booting on a different processor and MP table data is stale
in this context. Hence apic_id_registered() check fails in setup_local_APIC()
when called from APIC_init_uniprocessor().
o Problem is not seen if ACPI is enabled as in that case
boot_cpu_physical_apicid is read from the LAPIC.
o Problem is not seen with SMP kernels as well because in this case also
boot_cpu_physical_apicid is read from LAPIC. (smp_boot_cpus()).
o The problem is fixed by reading boot_cpu_physical_apicid from LAPIC
if it is a UP kernel and CRASH_DUMP is enabled.
Stephen Street [Sat, 20 May 2006 22:00:19 +0000 (15:00 -0700)]
[PATCH] pxa2xx-spi update
Fix some outstanding issues with the pxa2xx_spi driver when running on a
PXA270:
- Wrong timeout calculation in the setup function due to different
peripheral clock rates in the PXAxxx family.
- Bad handling of SSSR_TFS interrupts in interrupt_transfer function.
- Added locking to interface between the pump_messages workqueue and the
pump_transfers tasklet.
Much thanks to Juergen Beisert for the extensive testing on the PXA270.
Signed-off-by: Stephen Street <stephen@streetfiresound.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This driver supports the SPI controller on the MPC83xx SoC devices from
Freescale. Note, this driver supports only the simple shift register SPI
controller and not the descriptor based CPM or QUICCEngine SPI controller.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dmitry pervushin [Sat, 20 May 2006 22:00:14 +0000 (15:00 -0700)]
[PATCH] minor SPI doc fix
Because several developers asked me about referenced but missing
spi_add_master(), I think that this patch should be applied ... it
corrects comments so they refer to spi_register_master() instead.
Signed-off-by: dmitry pervushin <dpervushin@ru.mvista.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Sesterhenn [Sat, 20 May 2006 22:00:12 +0000 (15:00 -0700)]
[PATCH] Overrun in isdn_tty.c
This fixes coverity bug id #1237. After the while loop, it is possible for
i == ISDN_LMSNLEN. If this happens the terminating '\0' is written after
the end of the array.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Cc: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Sat, 20 May 2006 22:00:11 +0000 (15:00 -0700)]
[PATCH] cpuset: might_sleep_if check in cpuset_zones_allowed
It's too easy to incorrectly call cpuset_zone_allowed() in an atomic
context without __GFP_HARDWALL set, and when done, it is not noticed until
a tight memory situation forces allocations to be tried outside the current
cpuset.
Add a 'might_sleep_if()' check, to catch this earlier on, instead of
waiting for a similar check in the mutex_lock() code, which is only rarely
invoked.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update the kernel/cpuset.c:cpuset_zone_allowed() comment.
The rule for when mm/page_alloc.c should call cpuset_zone_allowed()
was intended to be:
Don't call cpuset_zone_allowed() if you can't sleep, unless you
pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
the code that might scan up ancestor cpusets and sleep.
The explanation of this rule in the comment above cpuset_zone_allowed() was
stale, as a result of a restructuring of some __alloc_pages() code in
November 2005.
Rewrite that comment ...
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Sat, 20 May 2006 22:00:09 +0000 (15:00 -0700)]
[PATCH] Cpuset: might sleep checking zones allowed fix
Fix a couple of infrequently encountered 'sleeping function called from
invalid context' in the cpuset hooks in __alloc_pages. Could sleep while
interrupts disabled.
The routine cpuset_zone_allowed() is called by code in mm/page_alloc.c
__alloc_pages() to determine if a zone is allowed in the current tasks
cpuset. This routine can sleep, for certain GFP_KERNEL allocations, if the
zone is on a memory node not allowed in the current cpuset, but might be
allowed in a parent cpuset.
But we can't sleep in __alloc_pages() if in interrupt, nor if called for a
GFP_ATOMIC request (__GFP_WAIT not set in gfp_flags).
The rule was intended to be:
Don't call cpuset_zone_allowed() if you can't sleep, unless you
pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
the code that might scan up ancestor cpusets and sleep.
This rule was being violated in a couple of places, due to a bogus change
made (by myself, pj) to __alloc_pages() as part of the November 2005 effort
to cleanup its logic, and also due to a later fix to constrain which swap
daemons were awoken.
The bogus change can be seen at:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
[PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags
This was first noticed on a tight memory system, in code that was disabling
interrupts and doing allocation requests with __GFP_WAIT not set, which
resulted in __might_sleep() writing complaints to the log "Debug: sleeping
function called ...", when the code in cpuset_zone_allowed() tried to take
the callback_sem cpuset semaphore.
We haven't seen a system hang on this 'might_sleep' yet, but we are at
decent risk of seeing it fairly soon, especially since the additional
cpuset_zone_allowed() check was added, conditioning wakeup_kswapd(), in
March 2006.
Special thanks to Dave Chinner, for figuring this out, and a tip of the hat
to Nick Piggin who warned me of this back in Nov 2005, before I was ready
to listen.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kristen Accardi [Sat, 20 May 2006 22:00:08 +0000 (15:00 -0700)]
[PATCH] pci: correctly allocate return buffers for osc calls
The OSC set and query functions do not allocate enough space for return
values, and set the output buffer length to a false, too large value. This
causes the acpi-ca code to assume that the output buffer is larger than it
actually is, and overwrite memory when copying acpi return buffers into
this caller provided buffer. In some cases this can cause kernel oops if
the memory that is overwritten is a pointer. This patch will change these
calls to use a dynamically allocated output buffer, thus allowing the
acpi-ca code to decide how much space is needed.
Amy Griffis [Sat, 20 May 2006 22:00:07 +0000 (15:00 -0700)]
[PATCH] fix NULL dereference in inotify_ignore
Don't reassign to watch. If idr_find() returns NULL, then
put_inotify_watch() will choke.
Signed-off-by: Amy Griffis <amy.griffis@hp.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Amy Griffis [Sat, 20 May 2006 22:00:06 +0000 (15:00 -0700)]
[PATCH] fix race in inotify_release
While doing some inotify stress testing, I hit the following race. In
inotify_release(), it's possible for a watch to be removed from the lists
in between dropping dev->mutex and taking inode->inotify_mutex. The
reference we hold prevents the watch from being freed, but not from being
removed.
Checking the dev's idr mapping will prevent a double list_del of the
same watch.
Signed-off-by: Amy Griffis <amy.griffis@hp.com> Acked-by: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mike Kravetz [Sat, 20 May 2006 22:00:05 +0000 (15:00 -0700)]
[PATCH] SPARSEMEM incorrectly calculates section number
A bad calculation/loop in __section_nr() could result in incorrect section
information being put into sysfs memory entries. This primarily impacts
memory add operations as the sysfs information is used while onlining new
memory.
Fix suggested by Dave Hansen.
Note that the bug may not be obvious from the patch. It actually occurs in
the function's return statement:
In the existing code, root_nr has already been multiplied by
SECTIONS_PER_ROOT.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sat, 20 May 2006 22:00:01 +0000 (15:00 -0700)]
[PATCH] binfmt_flat: don't check for EMFILE
Bernd Schmidt points out that binfmt_flat is now leaving the exec file open
while the application runs. This offsets all the application's fd numbers.
We should have closed the file within exec(), not at exit()-time.
But there doesn't seem to be a lot of point in doing all this just to avoid
going over RLIMIT_NOFILE by one fd for a few microseconds. So take the EMFILE
checking out again. This will cause binfmt_flat to again fail LTP's
exec-should-return-EMFILE-when-fdtable-is-full test. That test appears to be
wrong anyway - Open Group specs say nothing about exec() returning EMFILE.
Peter Staubach [Sat, 20 May 2006 21:59:56 +0000 (14:59 -0700)]
[PATCH] NFS server subtree_check returns dubious value
Address a problem found when a Linux NFS server uses the "subtree_check"
export option.
The "subtree_check" NFS export option was designed to prohibit a client
from using a file handle for which it should not have permission. The
algorithm used is to ensure that the entire path to the file being
referenced is accessible to the user attempting to use the file handle. If
some part of the path is not accessible, then the operation is aborted and
the appropriate version of ESTALE is returned to the NFS client.
The error, ESTALE, is unfortunate in that it causes NFS clients to make
certain assumptions about the continued existence of the file. They assume
that the file no longer exists and refuse to attempt to access it again.
In this case, the file really does exist, but access was denied by the
server for a particular user.
A better error to return would be an EACCES sort of error. This would
inform the client that the particular operation that it was attempting was
not allowed, without the nasty side effects of the ESTALE error.
Signed-off-by: Peter Staubach <staubach@redhat.com> Acked-By: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It looks like the generic ide code now wants ide_init_hwif_ports() to set
the parent struct device into the ide_hw structure (new field ?). Without
this, the mac ide code can cause the ide probing code to explode in flames
in sysfs registration due to what looks like a stale pointer in there
(happens when removing/re-inserting one of the hotswap media bays on some
laptops).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chuck Ebbert [Sat, 20 May 2006 21:59:52 +0000 (14:59 -0700)]
[PATCH] i386: remove junk from stack dump
i386 stack dump has a "<0>" in the middle of the line and an extra space
between columns in multicolumn mode. Remove those and also remove an extra
blank line of source code.
Paul A. Clarke [Sat, 20 May 2006 21:59:51 +0000 (14:59 -0700)]
[PATCH] matroxfb: fix DVI setup to be more compatible
There has been a longstanding problem with the Matrox G450 and perhaps
other similar cards, with modes "above" 1280x1024-60 on ppc/ppc64 boxes
running Linux. Higher resolutions and/or higher refresh rates resulted in
a very noticably "jittery" display, and sometimes no display, depending on
the physical monitor. This patch fixes that problem on the systems I have
easy access to...
I've tested with SLES9SP3 (2.6.5+ kernel) and 2.6.16-rc6 custom kernels on
an IBM eServer p5 520 w/G450 (a.k.a GXT135P on IBM's ppc64 systems), and a
colleague of mine (Ian Romanick) tested it successfully on an Apple ppc32
box (w/GXT135P). I also tested it on IA32 box I have with a GXT135P to
verify that it didn't obviously break anything. In my testing, I covered
single-card, single and dual-head setups using both HD15 and DVI-D signals,
on both the IA32 and ppc64 boxes. While everything appeared fine on both
boxes, I did encounter one problem: I can't get any signal on the DVI-D
output on the ppc64 box. However, this is also the case without my patch.
I just noticed that screen-blanking only occurs on the primary display as
well.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Signed-off-by: Ian Romanick <idr@us.ibm.com> Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lin Feng Shen [Sat, 20 May 2006 21:59:49 +0000 (14:59 -0700)]
[PATCH] NFS: fix error handling on access_ok in compat_sys_nfsservctl
Functions compat_nfs_svc_trans, compat_nfs_clnt_trans,
compat_nfs_exp_trans, compat_nfs_getfd_trans and compat_nfs_getfs_trans,
which are called by compat_sys_nfsservctl(fs/compat.c), don't handle the
return value of access_ok properly. access_ok return 1 when the addr is
valid, and 0 when it's not, but these functions have the reversed
understanding. When the address is valid, they always return -EFAULT to
compat_sys_nfsservctl.
An example is to run /usr/sbin/rpc.nfsd(32bit program on Power5). It
doesn't function as expected. strace showes that nfsservctl returns
-EFAULT.
The patch fixes this by correcting the error handling on the return value
of access_ok in the five functions.
Signed-off-by: Lin Feng Shen <shenlinf@cn.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>