Zhu Yi [Wed, 1 Mar 2006 21:55:51 +0000 (05:55 +0800)]
[PATCH] ipw2200: Filter unsupported channels out in ad-hoc mode
Currently iwlist ethX freq[uency]/channel lists all the channels the card
supported for the current region, which includes some channels can only
be used in infrastructure mode. This patch filters these channels out if
the card is currently in ad-hoc mode.
Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Zhu Yi [Thu, 16 Feb 2006 08:21:09 +0000 (16:21 +0800)]
[PATCH] ipw2200: Fix rf_kill is activated after mode change with 'disable=1'
When loading the ipw2200 module with disabled=1, rf_kill is activated after
every mode change. This is caused by ipw_sw_reset() is called when a mode
is changed. The patch fixed the problem by distinguishing the purposes with
the 'option' paramenter.
Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Zhu Yi [Thu, 16 Feb 2006 23:46:16 +0000 (07:46 +0800)]
[PATCH] ipw2200: remove the WPA card associates to non-WPA AP checking
wpa_supplicant needs to set wpa_enabled unconditionally, with this check
it hasn't been possible to connect to non-WPA networks using wpa_supplicant.
So remove below check.
if (priv->ieee->wpa_enabled &&
network->wpa_ie_len == 0 && network->rsn_ie_len == 0)
Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bill Moss [Wed, 15 Feb 2006 00:50:18 +0000 (08:50 +0800)]
[PATCH] ipw2200: Add signal level to iwlist scan output
This patch does two things. It uses the parameter IW_QUAL_DBM which is new
in WE-19 to cause signal level and noise to be reported in dBm by the
wireless tools. It also defines the signal level as an unsigned integer
so that the signal level will be reported by iwlist iface scan.
Signed-off-by: Bill Moss <bmoss@clemson.edu> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
James Ketrenos [Tue, 14 Feb 2006 01:10:51 +0000 (09:10 +0800)]
[PATCH] ipw2200: stop netdev queue if h/w doesn't have space for new packets
The patch roll back the change we made to support for the ability to
start/stop independent Tx queues within a single net device in order to
support 802.11e QoS. We need to be able to indicate to the upper layers
that packets of a given priority can not be sent any more without halting
transmission of all packets, and without rescheduling high priority packets
down to the next priority level.
So we return NETDEV_TX_BUSY in this case and rely on the stack would
take care of rescheduling... which it apparently does immediately and
consumes the CPU. This caused the ksoftirqd kernel thread consuming almost
all the CPU...
To put the code back to the way it was before we made these changes we
put the call netif_queue_stop back in ipw_tx_skb. This effectively
disables multiple priority based transmit queues for 802.11e, but given
that its broken anyway...
Signed-off-by: James Ketrenos <jketreno@linux.intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
[PATCH] ipw2200: print geography code upon module load
Given the amount of support requests for the meaning of the geography code
I've written a patch for printing this information on module load no matter
the debug level.
I've also added a section to the README.ipw2200 file listing the geography
codes and their meaning.
Signed-off-by: Henrik Brix Andersen <brix@gentoo.org> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Larry Finger [Tue, 28 Feb 2006 15:48:28 +0000 (09:48 -0600)]
[PATCH] Remove duplicated code from ipw2200.c
As stated in a comment, the ipw2200 driver uses several routines that
were borrowed from ieee80211_geo.c. As ipw2200 requires ieee80211,
these routines are duplicated. The attached patch, which is sent
as an attachment to preserve whitespace, converts ipw2200.c to use
the ieee80211 versions, thereby reducing bloat in both the source
and binary.
Signed-Off-By: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Hong Liu [Wed, 8 Mar 2006 02:50:20 +0000 (10:50 +0800)]
[PATCH] ieee80211: Fix QoS is not active problem
Fix QoS is not active even the network and the card is QOS enabled.
The problem is we pass the wrong ieee80211_network address to
ipw_handle_beacon/ipw_handle_probe_response, thus the
ieee80211_network->qos_data.active will not be set, causing the driver
not sending QoS frames at all.
Signed-off-by: Hong Liu <hong.liu@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
[PATCH] Consistent capabilites associated with MPOL_MOVE_ALL
It seems that setting scheduling policy and priorities is also the kind of
thing that might be performed in apps that also use the NUMA API, so it
would seem consistent to use CAP_SYS_NICE for NUMA also.
So use CAP_SYS_NICE for controlling migration permissions.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Rework text in vm/page-migration to be clearer and reflect the final
version of page migration in 2.6.16. Mention Andi Kleen's numactl
package that contains user space tools for page migration via
libnuma. Add reference to numa_maps and to the manpage in numactl.
- Add todo list for outstanding issues
Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] page migration: fail if page is in a vma flagged VM_LOCKED
page migration currently simply retries a couple of times if try_to_unmap()
fails without inspecting the return code.
However, SWAP_FAIL indicates that the page is in a vma that has the
VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code
and avoid retrying the migration.
migrate_page_remove_references() now needs to return a reason why the
failure occured. So switch migrate_page_remove_references to use -Exx
style error messages.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nathan Scott [Wed, 15 Mar 2006 04:14:45 +0000 (15:14 +1100)]
Fix a direct I/O locking issue revealed by the new mutex code.
Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
not possible to get i_mutex locking correct when using DIO_OWN
direct I/O locking in a filesystem due to indeterminism in the
possible return code/lock/unlock combinations. This can cause
a direct read to attempt a double i_mutex unlock inside XFS.
We're now ensuring __blockdev_direct_IO always exits with the
inode i_mutex (still) held for a direct reader.
Tested with the three different locking modes (via direct block
device access, ext3 and XFS) - both reading and writing; cannot
find any regressions resulting from this change, and it clearly
fixes the mutex_unlock warning originally reported here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2
Signed-off-by: Nathan Scott <nathans@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de>
Maneesh Soni [Tue, 14 Mar 2006 09:33:14 +0000 (15:03 +0530)]
[PATCH] Plug kdump shutdown race window
lapic_shutdown() re-enables interrupts which is un-desirable for panic
case, so use local_irq_save() and local_irq_restore() to keep the irqs
disabled for kexec on panic case, and close a possible race window while
kdump shutdown as shown in this stack trace
Dave Peterson [Tue, 14 Mar 2006 05:20:50 +0000 (21:20 -0800)]
[PATCH] EDAC: disable sysfs interface
- Disable the EDAC sysfs code. The sysfs interface that EDAC presents to
user space needs more thought, and is likely to change substantially.
Therefore disable it for now so users don't start depending on it in its
current form.
- Disable the default behavior of calling panic() when an uncorrectible
error is detected (since for now, there is no sysfs interface that allows
the user to configure this behavior).
Signed-off-by: David S. Peterson <dsp@llnl.gov> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trond Myklebust [Tue, 14 Mar 2006 05:20:48 +0000 (21:20 -0800)]
[PATCH] SUNRPC: Fix potential deadlock in RPC code
In rpc_wake_up() and rpc_wake_up_status(), it is possible for the call to
__rpc_wake_up_task() to fail if another thread happens to be calling
rpc_wake_up_task() on the same rpc_task.
Trond Myklebust [Tue, 14 Mar 2006 05:20:47 +0000 (21:20 -0800)]
[PATCH] NFSv4: fix mount segfault on errors returned that are < -1000
It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of
mapping them to kernel errors. Problem spotted by Neil Horman
<nhorman@tuxdriver.com>
Trond Myklebust [Tue, 14 Mar 2006 05:20:46 +0000 (21:20 -0800)]
[PATCH] NFS: Fix a potential panic in O_DIRECT
Based on an original patch by Mike O'Connor and Greg Banks of SGI.
Mike states:
A normal user can panic an NFS client and cause a local DoS with
'judicious'(?) use of O_DIRECT. Any O_DIRECT write to an NFS file where the
user buffer starts with a valid mapped page and contains an unmapped page,
will crash in this way. I haven't followed the code, but O_DIRECT reads with
similar user buffers will probably also crash albeit in different ways.
Details: when nfs_get_user_pages() calls get_user_pages(), it detects and
correctly handles get_user_pages() returning an error, which happens if the
first page covered by the user buffer's address range is unmapped. However,
if the first page is mapped but some subsequent page isn't, get_user_pages()
will return a positive number which is less than the number of pages requested
(this behaviour is sort of analagous to a short write() call and appears to be
intentional). nfs_get_user_pages() doesn't detect this and hands off the
array of pages (whose last few elements are random rubbish from the newly
allocated array memory) to it's caller, whence they go to
nfs_direct_write_seg(), which then totally ignores the nr_pages it's given,
and calculates its own idea of how many pages are in the array from the user
buffer length. Needless to say, when it comes to transmit those uninitialised
page* pointers, we see a crash in the network stack.
GOTO Masanori [Tue, 14 Mar 2006 05:20:44 +0000 (21:20 -0800)]
[PATCH] Fix sigaltstack corruption among cloned threads
This patch fixes alternate signal stack corruption among cloned threads
with CLONE_SIGHAND (and CLONE_VM) for linux-2.6.16-rc6.
The value of alternate signal stack is currently inherited after a call of
clone(... CLONE_SIGHAND | CLONE_VM). But if sigaltstack is set by a
parent thread, and then if multiple cloned child threads (+ parent threads)
call signal handler at the same time, some threads may be conflicted -
because they share to use the same alternative signal stack region.
Finally they get sigsegv. It's an undesirable race condition. Note that
child threads created from NPTL pthread_create() also hit this conflict
when the parent thread uses sigaltstack, without my patch.
To fix this problem, this patch clears the child threads' sigaltstack
information like exec(). This behavior follows the SUSv3 specification.
In SUSv3, pthread_create() says "The alternate stack shall not be inherited
(when new threads are initialized)". It means that sigaltstack should be
cleared when sigaltstack memory space is shared by cloned threads with
CLONE_SIGHAND.
Note that I chose "if (clone_flags & CLONE_SIGHAND)" line because:
- If clone_flags line is not existed, fork() does not inherit sigaltstack.
- CLONE_VM is another choice, but vfork() does not inherit sigaltstack.
- CLONE_SIGHAND implies CLONE_VM, and it looks suitable.
- CLONE_THREAD is another candidate, and includes CLONE_SIGHAND + CLONE_VM,
but this flag has a bit different semantics.
I decided to use CLONE_SIGHAND.
[ Changed to test for CLONE_VM && !CLONE_VFORK after discussion --Linus ]
[PATCH] macintosh: correct AC Power info in /proc/pmu/info
Report AC Power present in /proc/pmu/info if there is no battery.
Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>, Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Brownell [Tue, 14 Mar 2006 05:20:40 +0000 (21:20 -0800)]
[PATCH] mtd_dataflash, fix block vs page erase
Fix a bug in the block-erase optimization for Dataflash; it was using block
erase even for smaller segments that need page erase.
That wouldn't matter for JFFS2, which never erases less than one block
(sometimes several blocks), but for other callers it might.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Herbert Xu [Mon, 13 Mar 2006 22:26:12 +0000 (14:26 -0800)]
[TCP]: Fix zero port problem in IPv6
When we link a socket into the hash table, we need to make sure that we
set the num/port fields so that it shows us with a non-zero port value
in proc/netlink and on the wire. This code and comment is copied over
from the IPv4 stack as is.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Andi Kleen [Sun, 12 Mar 2006 22:52:59 +0000 (23:52 +0100)]
[PATCH] x86-64: Fix up handling of non canonical user RIPs
EM64T CPUs have somewhat weird error reporting for non canonical RIPs in
SYSRET.
We can't handle any exceptions there because the exception handler would
end up running on the user stack which is unsafe.
To avoid problems any code that might end up with a user touched pt_regs
should return using int_ret_from_syscall. int_ret_from_syscall ends up
using IRET, which allows safe exceptions.
Linus Torvalds [Sun, 12 Mar 2006 22:56:02 +0000 (14:56 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] iwmmxt thread state alignment
[ARM] 3350/1: Enable 1-wire on ARM
[ARM] 3356/1: Workaround for the ARM1136 I-cache invalidation problem
[ARM] 3355/1: NSLU2: remove propmt depends
[ARM] 3354/1: NAS100d: fix power led handling
[ARM] Fix muldi3.S
Russell King [Sun, 12 Mar 2006 22:36:06 +0000 (22:36 +0000)]
[ARM] iwmmxt thread state alignment
This patch removes the reliance of iwmmxt on hand coded alignments.
Since thread_info is always 8K aligned, specifying that fpstate is
8-byte aligned achieves the same effect without needing to resort
to hand coded alignments.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Gregor Maier [Sun, 12 Mar 2006 02:51:25 +0000 (18:51 -0800)]
[NETFILTER]: Fix wrong option spelling in Makefile for CONFIG_BRIDGE_EBT_ULOG
Signed-off-by: Gregor Maier <gregor@net.in.tum.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian Haley [Sun, 12 Mar 2006 02:50:14 +0000 (18:50 -0800)]
[IPV6]: fix ipv6_saddr_score struct element
The scope element in the ipv6_saddr_score struct used in
ipv6_dev_get_saddr() is an unsigned integer, but __ipv6_addr_src_scope()
returns a signed integer (and can return -1).
Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Sun, 12 Mar 2006 01:51:39 +0000 (17:51 -0800)]
[PATCH] drivers/net/e1000/: proper prototypes
This patch moves prototypes of global variables and functions to a header
file.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: John Ronciak <john.ronciak@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ralf Baechle [Wed, 8 Mar 2006 11:49:31 +0000 (11:49 +0000)]
[PATCH] Sparse: Cleanup sgiseeq sparse warnings.
o Make sgiseeq_dump_rings static.
o Delete unused sgiseeq_my_reset.
o Move DEBUG define to beginning where it's easier to spot and will be
seen by <linux/kernel.h> as well.
o Use NULL for pointer initialization.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Adrian Bunk [Wed, 8 Mar 2006 08:06:30 +0000 (00:06 -0800)]
[PATCH] CONFIG_FORCEDETH updates
This patch contains the following possible updates:
- let FORCEDETH no longer depend on EXPERIMENTAL
- remove the "Reverse Engineered" from the option text:
for the user it's important which hardware the driver supports, not
how it was developed
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Sam Ravnborg [Wed, 8 Mar 2006 08:06:33 +0000 (00:06 -0800)]
[PATCH] de620: fix section mismatch warning
In latest -mm de620 gave following warning:
WARNING: drivers/net/de620.o - Section mismatch: reference to \
.init.text:de620_probe from .text between 'init_module' (at offset \
0x1682) and 'cleanup_module'
init_module() call de620_probe() which is declared __init.
Fix is to declare init_module() __init too.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jon Mason [Fri, 10 Mar 2006 21:12:10 +0000 (15:12 -0600)]
[PATCH] dl2k: DMA freeing error
This patch fixes an error in the dl2k driver's DMA mapping/unmapping.
The adapter uses the upper 16bits of the DMA address for the buffer
size. However, this is not masked off when referencing the DMA
address, and can lead to errors by trying to free a DMA address out of
range.
Thanks,
Jon
Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
David S. Miller [Sat, 11 Mar 2006 02:08:09 +0000 (18:08 -0800)]
[PATCH] Wrong return value corrupts free object in e1000 driver
For some reason, E1000's ->hard_start_xmit() routine returns -EFAULT
instead of one of the NETDEV_TX_* error codes. In fact, it frees up
the SKB before returning this. This makes the queueing layer think
the packet should be requeued and subsequently we corrupt a freed
object.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
The patch '[PATCH] RCU signal handling' [1] added an export for
__put_task_struct_cb, a put_task_struct helper newly introduced in that
patch. But the put_task_struct couldn't be used modular previously as
__put_task_struct wasn't exported. There are not callers of it in modular
code, and it shouldn't be exported because we don't want drivers to hold
references to task_structs.
This patch removes the export and folds __put_task_struct into
__put_task_struct_cb as there's no other caller.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Smalley [Sat, 11 Mar 2006 11:27:16 +0000 (03:27 -0800)]
[PATCH] selinux: tracer SID fix
Fix SELinux to not reset the tracer SID when the child is already being
traced, since selinux_ptrace is also called by proc for access checking
outside of the context of a ptrace attach.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arjan van de Ven [Sat, 11 Mar 2006 11:27:15 +0000 (03:27 -0800)]
[PATCH] edac: disable a few sysfs files to avoid them becoming an ABI
Disable (via ugly #if 0's) the 3 sysfs files that I think by now we all
agree are very much wrong. These files shouldn't become part of the ABI by
the 2.6.16 release, so I rather have this minimal patch merged to disable
them for now, the real fix can then come during the 2.6.17 devel window.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Badari Pulavarty [Sat, 11 Mar 2006 11:27:14 +0000 (03:27 -0800)]
[PATCH] ext3: fix nobh mode for chattr +j inodes
One can do "chattr +j" on a file to change its journalling mode. Fix
writeback mode with "nobh" handling for it.
Even though, we mount ext3 filesystem in writeback mode with "nobh" option,
some one can do "chattr +j" on a single file to force it to do journalled
mode. In order to do journaling, ext3_block_truncate_page() need to
fallback to default case of creating buffers and adding them to transaction
etc.
Kirill Korotaev [Sat, 11 Mar 2006 11:27:13 +0000 (03:27 -0800)]
[PATCH] ext3: ext3_symlink should use GFP_NOFS allocations inside
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
ext3_symlink(). Such allocation may re-enter ext3 code from
try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal
handle in task_struct and, hence, is not reentrable.
This bug led to "Assertion failure in journal_dirty_metadata()" messages.
Dmitry Torokhov [Sat, 11 Mar 2006 05:23:38 +0000 (00:23 -0500)]
[PATCH] Input: psmouse - disable autoresync
Automatic resynchronization in psmouse driver causes problems on some
hardware so disable it by default for now. People with KVM switches
that require resync can still enable it via module parameter or sysfs
attribute.
Jan Beulich [Wed, 22 Feb 2006 12:29:04 +0000 (13:29 +0100)]
[PATCH] kbuild: version.h should depend on .kernelrelease
Rebuilding a previously built tree while using make's -j option from
time to time results in the version.h check running at the same time as
the updating of .kernelrelease, resulting in UTS_RELEASE remaining an
empty string (and as a side effect causing the entire kernel to be
rebuilt).
Signed-Off-By: Jan Beulich <jbeulich@novell.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Catalin Marinas [Fri, 10 Mar 2006 22:26:47 +0000 (22:26 +0000)]
[ARM] 3356/1: Workaround for the ARM1136 I-cache invalidation problem
Patch from Catalin Marinas
ARM1136 erratum 371025 (category 2) specifies that, under rare
conditions, an invalidate I-cache by MVA (line or range) operation can
fail to invalidate a cache line. The recommended workaround is to
either invalidate the entire I-cache or invalidate the range by
set/way rather than MVA.
Note that for a 16K cache size, invalidating a 4K page by set/way is
equivalent to invalidating the entire I-cache.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.
The cache reaper currently tries to free all alien caches and all remote
per cpu pages in each pass of cache_reap. For a machines with large number
of nodes (such as Altix) this may lead to sporadic delays of around ~10ms.
Interrupts are disabled while reclaiming creating unacceptable delays.
This patch changes that behavior by adding a per cpu reap_node variable.
Instead of attempting to free all caches, we free only one alien cache and
the per cpu pages from one remote node. That reduces the time spend in
cache_reap. However, doing so will lengthen the time it takes to
completely drain all remote per cpu pagesets and all alien caches. The
time needed will grow with the number of nodes in the system. All caches
are drained when they overflow their respective capacity. So the drawback
here is only that a bit of memory may be wasted for awhile longer.
Details:
1. Rename drain_remote_pages to drain_node_pages to allow the specification
of the node to drain of pcp pages.
2. Add additional functions init_reap_node, next_reap_node for NUMA
that manage a per cpu reap_node counter.
3. Add a reap_alien function that reaps only from the current reap_node.
For us this seems to be a critical issue. Holdoffs of an average of ~7ms
cause some HPC benchmarks to slow down significantly. F.e. NAS parallel
slows down dramatically. NAS parallel has a 12-16 seconds runtime w/o rotor
compared to 5.8 secs with the rotor patches. It gets down to 5.05 secs with
the additional interrupt holdoff reductions.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently the code tries up to spin_retry times to grab a lock using the cs
instruction. The cs instruction has exclusive access to a memory region
and therefore invalidates the appropiate cache line of all other cpus. If
there is contention on a lock this leads to cache line trashing. This can
be avoided if we first check wether a cs instruction is likely to succeed
before the instruction gets actually executed.
Signed-off-by: Christian Ehrhardt <ehrhardt@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>