Paul Jackson [Fri, 24 Mar 2006 11:16:09 +0000 (03:16 -0800)]
[PATCH] cpuset memory spread slab cache hooks
Change the kmem_cache_create calls for certain slab caches to support cpuset
memory spreading.
See the previous patches, cpuset_mem_spread, for an explanation of cpuset
memory spreading, and cpuset_mem_spread_slab_cache for the slab cache support
for memory spreading.
The slab caches marked for now are: dentry_cache, inode_cache, some xfs slab
caches, and buffer_head. This list may change over time. In particular,
other file system types that are used extensively on large NUMA systems may
want to allow for spreading their directory and inode slab cache entries.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The hooks in the slab cache allocator code path for support of NUMA
mempolicies and cpuset memory spreading are in an important code path. Many
systems will use neither feature.
This patch optimizes those hooks down to a single check of some bits in the
current tasks task_struct flags. For non NUMA systems, this hook and related
code is already ifdef'd out.
The optimization is done by using another task flag, set if the task is using
a non-default NUMA mempolicy. Taking this flag bit along with the
PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
memory spreading' patch set, one can check for the combination of any of these
special case memory placement mechanisms with a single test of the current
tasks task_struct flags.
This patch also tightens up the code, to save a few bytes of kernel text
space, and moves some of it out of line. Due to the nested inlines called
from multiple places, we were ending up with three copies of this code, which
once we get off the main code path (for local node allocation) seems a bit
wasteful of instruction memory.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide the slab cache infrastructure to support cpuset memory spreading.
See the previous patches, cpuset_mem_spread, for an explanation of cpuset
memory spreading.
This patch provides a slab cache SLAB_MEM_SPREAD flag. If set in the
kmem_cache_create() call defining a slab cache, then any task marked with the
process state flag PF_MEMSPREAD will spread memory page allocations for that
cache over all the allowed nodes, instead of preferring the local (faulting)
node.
On systems not configured with CONFIG_NUMA, this results in no change to the
page allocation code path for slab caches.
On systems with cpusets configured in the kernel, but the "memory_spread"
cpuset option not enabled for the current tasks cpuset, this adds a call to a
cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB.
For tasks so marked, a second inline test is done for the slab cache flag
SLAB_MEM_SPREAD, and if that is set and if the allocation is not
in_interrupt(), this adds a call to to a cpuset routine that computes which of
the tasks mems_allowed nodes should be preferred for this allocation.
==> This patch adds another hook into the performance critical
code path to allocating objects from the slab cache, in the
____cache_alloc() chunk, below. The next patch optimizes this
hook, reducing the impact of the combined mempolicy plus memory
spreading hooks on this critical code path to a single check
against the tasks task_struct flags word.
This patch provides the generic slab flags and logic needed to apply memory
spreading to a particular slab.
A subsequent patch will mark a few specific slab caches for this placement
policy.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Fri, 24 Mar 2006 11:16:06 +0000 (03:16 -0800)]
[PATCH] cpuset memory spread: slab cache format
Rewrap the overly long source code lines resulting from the previous
patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch
contains only formatting changes, and no function change.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
memory spreading.
If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
from such a slab cache, the allocations are spread evenly over all the
memory nodes (task->mems_allowed) allowed to that task, instead of favoring
allocation on the node local to the current cpu.
The following inode and similar caches are marked SLAB_MEM_SPREAD:
The choice of which slab caches to so mark was quite simple. I marked
those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
inode_cache, and buffer_head, which were marked in a previous patch. Even
though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
potentially large file system i/o related slab caches as we need for memory
spreading.
Given that the rule now becomes "wherever you would have used a
SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
Future file system writers will just copy one of the existing file system
slab cache setups and tend to get it right without thinking.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Fri, 24 Mar 2006 11:16:04 +0000 (03:16 -0800)]
[PATCH] cpuset memory spread page cache implementation and hooks
Change the page cache allocation calls to support cpuset memory spreading.
See the previous patch, cpuset_mem_spread, for an explanation of cpuset memory
spreading.
On systems without cpusets configured in the kernel, this is no change.
On systems with cpusets configured in the kernel, but the "memory_spread"
cpuset option not enabled for the current tasks cpuset, this adds a call to a
cpuset routine and failed bit test of the processor state flag PF_SPREAD_PAGE.
On tasks in cpusets with "memory_spread" enabled, this adds a call to a cpuset
routine that computes which of the tasks mems_allowed nodes should be
preferred for this allocation.
If memory spreading applies to a particular allocation, then any other NUMA
mempolicy does not apply.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Fri, 24 Mar 2006 11:16:03 +0000 (03:16 -0800)]
[PATCH] cpuset memory spread basic implementation
This patch provides the implementation and cpuset interface for an alternative
memory allocation policy that can be applied to certain kinds of memory
allocations, such as the page cache (file system buffers) and some slab caches
(such as inode caches).
The policy is called "memory spreading." If enabled, it spreads out these
kinds of memory allocations over all the nodes allowed to a task, instead of
preferring to place them on the node where the task is executing.
All other kinds of allocations, including anonymous pages for a tasks stack
and data regions, are not affected by this policy choice, and continue to be
allocated preferring the node local to execution, as modified by the NUMA
mempolicy.
There are two boolean flag files per cpuset that control where the kernel
allocates pages for the file system buffers and related in kernel data
structures. They are called 'memory_spread_page' and 'memory_spread_slab'.
If the per-cpuset boolean flag file 'memory_spread_page' is set, then the
kernel will spread the file system buffers (page cache) evenly over all the
nodes that the faulting task is allowed to use, instead of preferring to put
those pages on the node where the task is running.
If the per-cpuset boolean flag file 'memory_spread_slab' is set, then the
kernel will spread some file system related slab caches, such as for inodes
and dentries evenly over all the nodes that the faulting task is allowed to
use, instead of preferring to put those pages on the node where the task is
running.
The implementation is simple. Setting the cpuset flags 'memory_spread_page'
or 'memory_spread_cache' turns on the per-process flags PF_SPREAD_PAGE or
PF_SPREAD_SLAB, respectively, for each task that is in the cpuset or
subsequently joins that cpuset. In subsequent patches, the page allocation
calls for the affected page cache and slab caches are modified to perform an
inline check for these flags, and if set, a call to a new routine
cpuset_mem_spread_node() returns the node to prefer for the allocation.
The cpuset_mem_spread_node() routine is also simple. It uses the value of a
per-task rotor cpuset_mem_spread_rotor to select the next node in the current
tasks mems_allowed to prefer for the allocation.
This policy can provide substantial improvements for jobs that need to place
thread local data on the corresponding node, but that need to access large
file system data sets that need to be spread across the several nodes in the
jobs cpuset in order to fit. Without this patch, especially for jobs that
might have one thread reading in the data set, the memory allocation across
the nodes in the jobs cpuset can become very uneven.
A couple of Copyright year ranges are updated as well. And a couple of email
addresses that can be found in the MAINTAINERS file are removed.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Fri, 24 Mar 2006 11:16:00 +0000 (03:16 -0800)]
[PATCH] cpuset cleanup not not operators
Since the test_bit() bit operator is boolean (return 0 or 1), the double not
"!!" operations needed to convert a scalar (zero or not zero) to a boolean are
not needed.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] cpusets: only wakeup kswapd for zones in the current cpuset
If we get under some memory pressure in a cpuset (we only scan zones that
are in the cpuset for memory) then kswapd is woken up for all zones. This
patch only wakes up kswapd in zones that are part of the current cpuset.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul E. McKenney [Fri, 24 Mar 2006 11:15:58 +0000 (03:15 -0800)]
[PATCH] rcutorture: tag success/failure line with module parameters
A long-running rcutorture test can overflow dmesg, so that the line
containing the module parameters is lost. Although it is usually possible
to retrieve this information from the log files, it is much better to just
tag it onto the final success/failure line so that it may be easily found.
This patch does just that.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.
This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
- parisc
- sparc64
- make the needlessly global function static:
- arm
- h8300
- m68k
- m68knommu
- s390
- v850
- x86_64
- add a prototype in asm/system.h:
- cris
- i386
- ia64
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Patrick Mochel <mochel@digitalimplant.org> Acked-by: Kyle McMartin <kyle@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Beulich [Fri, 24 Mar 2006 11:15:54 +0000 (03:15 -0800)]
[PATCH] tvec_bases too large for per-cpu data
With internal Xen-enabled kernels we see the kernel's static per-cpu data
area exceed the limit of 32k on x86-64, and even native x86-64 kernels get
fairly close to that limit. I generally question whether it is reasonable
to have data structures several kb in size allocated as per-cpu data when
the space there is rather limited.
The biggest arch-independent consumer is tvec_bases (over 4k on 32-bit
archs, over 8k on 64-bit ones), which now gets converted to use dynamically
allocated memory instead.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Fri, 24 Mar 2006 11:15:52 +0000 (03:15 -0800)]
[PATCH] fs/9p/: possible cleanups
- mux.c: v9fs_poll_mux() was inline but not static resuling in needless
object size bloat
- mux.c: remove all "inline"s: gcc should know best what to inline
- #if 0 the following unused global functions:
- 9p.c: v9fs_v9fs_t_flush()
- conv.c: v9fs_create_tauth()
- mux.c: v9fs_mux_rpcnb()
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Fri, 24 Mar 2006 11:15:50 +0000 (03:15 -0800)]
[PATCH] rcu_process_callbacks: don't cli() while testing ->nxtlist
__rcu_process_callbacks() disables interrupts to protect itself from
call_rcu() which adds new entries to ->nxtlist.
However we can check "->nxtlist != NULL" with interrupts enabled, we can't
get "false positives" because call_rcu() can only change this condition
from 0 to 1.
Bart Samwel [Fri, 24 Mar 2006 11:15:50 +0000 (03:15 -0800)]
[PATCH] Range checking in do_proc_dointvec_(userhz_)jiffies_conv
When (integer) sysctl values are in either seconds or centiseconds, but
represented internally as jiffies, the allowable value range is decreased.
This patch adds range checks to the conversion routines.
For values in seconds: maximum LONG_MAX / HZ.
For values in centiseconds: maximum (LONG_MAX / HZ) * USER_HZ.
(BTW, does anyone else feel that an interface in seconds should not be
accepting negative values?)
Bart Samwel [Fri, 24 Mar 2006 11:15:49 +0000 (03:15 -0800)]
[PATCH] Represent laptop_mode as jiffies internally
Make that the internal value for /proc/sys/vm/laptop_mode is stored as
jiffies instead of seconds. Let the sysctl interface do the conversions,
instead of doing on-the-fly conversions every time the value is used.
Add a description of the fact that laptop_mode doubles as a flag and a
timeout to the comment above the laptop_mode variable.
are stored as jiffies instead of centiseconds. Let the sysctl interface do
the conversions with full precision using clock_t_to_jiffies, instead of
doing overflow-sensitive on-the-fly conversions every time the values are
used.
Cons: apparent precision loss if HZ is not a multiple of 100, because of
conversion back and forth. This is a common problem for all sysctl values
that use proc_dointvec_userhz_jiffies. (There is only one other in-tree
use, in net/core/neighbour.c.)
Paul Jackson [Fri, 24 Mar 2006 11:15:46 +0000 (03:15 -0800)]
[PATCH] bitmap: region restructuring
Restructure the bitmap_*_region() operations, to avoid code duplication.
Also reduces binary text size by about 100 bytes (ia64 arch). The original
Bottomley bitmap_*_region patch added about 1000 bytes of compiled kernel text
(ia64). The Mundt multiword extension added another 600 bytes, and this
restructuring patch gets back about 100 bytes.
But the real motivation was the reduced amount of duplicated code.
Tested by Paul Mundt using <= BITS_PER_LONG as well as power of
2 aligned multiword spanning allocations.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mundt [Fri, 24 Mar 2006 11:15:45 +0000 (03:15 -0800)]
[PATCH] bitmap: region multiword spanning support
Add support to the lib/bitmap.c bitmap_*_region() routines
For bitmap regions larger than one word (nbits > BITS_PER_LONG). This removes
a BUG_ON() in lib bitmap.
I have an updated store queue API for SH that is currently using this with
relative success, and at first glance, it seems like this could be useful for
x86 (arch/i386/kernel/pci-dma.c) as well. Particularly for anything using
dma_declare_coherent_memory() on large areas and that attempts to allocate
large buffers from that space.
Paul Jackson also did some cleanup to this patch.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Fri, 24 Mar 2006 11:15:44 +0000 (03:15 -0800)]
[PATCH] bitmap: region cleanup
Paul Mundt <lethal@linux-sh.org> says:
This patch set implements a number of patches to clean up and restructure the
bitmap region code, in addition to extending the interface to support
multiword spanning allocations.
The current implementation (before this patch set) is limited by only being
able to allocate pages <= BITS_PER_LONG, as noted by the strategically
positioned BUG_ON() at lib/bitmap.c:752:
/* We don't do regions of pages > BITS_PER_LONG. The
* algorithm would be a simple look for multiple zeros in the
* array, but there's no driver today that needs this. If you
* trip this BUG(), you get to code it... */
BUG_ON(pages > BITS_PER_LONG);
As I seem to have been the first person to trigger this, the result ends up
being the following patch set with the help of Paul Jackson.
The final patch in the series eliminates quite a bit of code duplication, so
the bitmap code size ends up being smaller than the current implementation as
an added bonus.
After these are applied, it should already be possible to do multiword
allocations with dma_alloc_coherent() out of ranges established by
dma_declare_coherent_memory() on x86 without having to change any of the code,
and the SH store queue API will follow up on this as the other user that needs
support for this.
This patch:
Some code cleanup on the lib/bitmap.c bitmap_*_region() routines:
* spacing
* variable names
* comments
Has no change to code function.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tobias Klauser [Fri, 24 Mar 2006 11:15:34 +0000 (03:15 -0800)]
[PATCH] fs: Use ARRAY_SIZE macro
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a
duplicate of ARRAY_SIZE. Some trailing whitespaces are also deleted.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch> Cc: David Howells <dhowells@redhat.com> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Rossman [Fri, 24 Mar 2006 11:15:30 +0000 (03:15 -0800)]
[PATCH] s390: CEX2A crt message length
Undetected edge case for CRT messages to CEX2A caused length to be too short,
thus truncating the message. The solution was to check a different variable
which actually determines which key type is being used.
Increment version number in z90main.c to correct level of 1.3.3, fix copyright
year and add comment about bitlength limit of CEX2A.
Signed-off-by: Eric Rossman <edrossma@us.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stefan Bader [Fri, 24 Mar 2006 11:15:29 +0000 (03:15 -0800)]
[PATCH] s390: 3590 tape driver
Michael Holzheu <holzheu@de.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stefan Bader <shbader@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael Holzheu [Fri, 24 Mar 2006 11:15:28 +0000 (03:15 -0800)]
[PATCH] s390: fix endless retry loop in tape driver
If a tape device is assigned to another host, the interrupt for the assign
operation comes back with deferred condition code 1. Under some conditions
this can lead to an endless loop of retries. Check if the current request is
still in IO in deferred condition code handling and prevent retries when the
request has already been cancelled.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael Holzheu [Fri, 24 Mar 2006 11:15:27 +0000 (03:15 -0800)]
[PATCH] s390: tape operation abortion leads to panic
When a request is aborted because of a signal, we currently stop the request
via csh, but we do not wait for the interrupt of csh in any case. We free the
request structure and therefore when the interrupt for the csh operation is
presented, the request object is no longer valid and an invalid callback
pointer is used.
To fix this wait until the interrupt for csh arrives and until
wait_event_interruptible() does not return -ERESTARTSYS.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stefan Bader [Fri, 24 Mar 2006 11:15:26 +0000 (03:15 -0800)]
[PATCH] s390: tape retry flooding by deferred CC in interrupt
If a deferred CC happens there will be lots of messages, because the retry is
done immediatly in the interrupt handler which can be too fast. To avoid this
requeue the request and schedule the queue to be processed.
Signed-off-by: Stefan Bader <shbader@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stefan Weinhuber [Fri, 24 Mar 2006 11:15:25 +0000 (03:15 -0800)]
[PATCH] s390: dasd extended error reporting
The DASD extended error reporting is a facility that allows to get detailed
information about certain problems in the DASD I/O. This information can be
used to implement fail-over applications that can recover these problems.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Horst Hummel [Fri, 24 Mar 2006 11:15:24 +0000 (03:15 -0800)]
[PATCH] s390: random values in result of BIODASDINFO2
Use kzalloc to get a zeroed buffer for the structure returned to user space by
the BIODASDINFO2 ioctl. Not all fields are set up, e.g. the read_devno is
missing.
Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dasd_cmd just implements three ioctls which are wrappers around functionality
in the core kernel or other modules. When merging those into dasd_mod they
just add 22 lines of code which is far less than the amount of code removed in
the last two patches, and which doesn't spill into another 4k pages when build
modular, while removing a 128lines module.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] s390: use normal switch statement for ioctls in dasd_ioctlc
Add an ->ioctl method to the dasd_discipline structure. This allows to apply
the same kind of cleanups the last patch applied to dasd_ioctl.c to
dasd_eckd.c (the only dasd discipline with special ioctls) aswell.
Again lots of code removed. During auditing the ioctls I found two fishy
return value propagations from copy_{from,to}_user, maintainers please check
those, I've marked them with XXX comments.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] s390: use normal switch statement for ioctls in dasd_ioctlc
Handle ioctls implemented in dasd_ioctl through the normal switch statement
that most drivers use instead of the awkward dasd_ioctl_no_register routine.
This avoids searching a linear list on every call to dasd_ioctl(), and allows
to give the various ioctl implementation functions sane prototypes, aswell as
moving the check for bdev->bd_disk->private_data from the individual functions
to dasd_ioctl. (I think it can't actually every be NULL, but let's keep that
for later)
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael Ryan [Fri, 24 Mar 2006 11:15:17 +0000 (03:15 -0800)]
[PATCH] s390: cpu up retries
Retry starting of new cpu if sigp restart returns condition code 2 (busy).
Signed-off-by: Michael Ryan <ryan@funsoft.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cornelia Huck [Fri, 24 Mar 2006 11:15:13 +0000 (03:15 -0800)]
[PATCH] s390: cio documentation update
Update documentation of the common I/O layer:
- Add MSS-specific example.
- Add more information on ccwgroup devices.
- Add channel path type attribute.
- Fix typo.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cornelia Huck [Fri, 24 Mar 2006 11:15:12 +0000 (03:15 -0800)]
[PATCH] s390: wrong interrupt delivered for hsch() or csch()
When cio waits for the interrupt for a basic sense, interrupts for hsch() or
csch() issued in the meantime are wrongly counted as interrupts for the basic
sense and the accumulated irb is passed to the device driver. In
ccw_device_w4sense(), check for clear or halt function in the irb and pass the
irb for the csch() or hsch() to the device driver.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Akinobu Mita [Fri, 24 Mar 2006 11:15:11 +0000 (03:15 -0800)]
[PATCH] x86_64: {set,clear,test}_bit() related cleanup and pci_mmcfg_init() fix
While working on these patch set, I found several possible cleanup on x86-64
and ia64.
akpm: I stole this from Andi's queue.
Not only does it clean up bitops. It also unrelatedly changes the prototype
of pci_mmcfg_init() and removes its arch_initcall(). It seems that the wrong
two patches got joined together, but this is the one which has been tested.
This patch fixes the current x86_64 build error (the pci_mmcfg_init()
declaration in arch/i386/pci/pci.h disagrees with the definition in
arch/x86_64/pci/mmconfig.c)
This also means that x86_64's pci_mmcfg_init() gets called in the same (new)
manner as x86's: from arch/i386/pci/init.c:pci_access_init(), rather than via
initcall.
The bitops cleanups came along for free.
All this worked OK in -mm testing (since 2.6.16-rc4-mm1) because x86_64 was
tested with both patches applied.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Con Kolivas <kernel@kolivas.org> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Theodore Ts'o [Fri, 24 Mar 2006 11:15:10 +0000 (03:15 -0800)]
[PATCH] vfs: MS_VERBOSE should be MS_SILENT
The meaning of MS_VERBOSE is backwards; if the bit is set, it really means,
"don't be verbose". This is confusing and counter-intuitive.
In addition, there is also no way to set the MS_VERBOSE flag in the
mount(8) program in util-linux, but interesting, it does define options
which would do the right thing if MS_SILENT were defined, which
unfortunately we do not:
Patrick McHardy [Sat, 7 Jan 2006 23:44:15 +0000 (00:44 +0100)]
[PATCH] W1: Remove incorrect MODULE_ALIAS
The w1 netlink socket is created by a hardware specific driver calling
w1_add_master_device, so there is no point in including a module alias
for netlink autoloading in the core.
Adrian Bunk [Tue, 13 Dec 2005 22:04:33 +0000 (14:04 -0800)]
[PATCH] w1: misc cleanups
This patch contains the following cleanups:
- make needlessly global code static
- declarations for global code belong into header files
- w1.c: #if 0 the unused struct w1_slave_device
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs-2.6:
NTFS: 2.1.27 - Various bug fixes and cleanups.
NTFS: Semaphore to mutex conversion.
NTFS: Handle the recently introduced -ENAMETOOLONG return value from
NTFS: Add a missing call to flush_dcache_mft_record_page() in
NTFS: Fix a bug in fs/ntfs/inode.c::ntfs_read_locked_index_inode() where we
NTFS: Improve comments on file attribute flags in fs/ntfs/layout.h.
NTFS: Limit name length in fs/ntfs/unistr.c::ntfs_nlstoucs() to maximum
NTFS: Remove all the make_bad_inode() calls. This should only be called
NTFS: Add support for sparse files which have a compression unit of 0.
NTFS: Fix comparison of $MFT and $MFTMirr to not bail out when there are
NTFS: Use buffer_migrate_page() for the ->migratepage function of all ntfs
NTFS: Fix a buggette in an "should be impossible" case handling where we
NTFS: Fix an (innocent) off-by-one error in the runlist code.
NTFS: Fix two compiler warnings on Alpha. Thanks to Andrew Morton for
Linus Torvalds [Fri, 24 Mar 2006 00:24:24 +0000 (16:24 -0800)]
Merge branch 'blktrace' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'blktrace' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
[PATCH] relay: consolidate sendfile() and read() code
[PATCH] relay: add sendfile() support
[PATCH] relay: migrate from relayfs to a generic relay API
Linus Torvalds [Thu, 23 Mar 2006 23:28:51 +0000 (15:28 -0800)]
Merge git://oss.sgi.com:8090/oss/git/xfs-2.6
* git://oss.sgi.com:8090/oss/git/xfs-2.6: (71 commits)
[XFS] Sync up one/two other minor changes missed in previous merges.
[XFS] Reenable the noikeep (delete inode cluster space) option by default.
[XFS] Check that a page has dirty buffers before finding it acceptable for
[XFS] Fixup naming inconsistencies found by Pekka Enberg and one from Jan
[XFS] Explain the race closed by the addition of vn_iowait() to the start
[XFS] Fixing the error caused by the conflict between DIO Write's
[XFS] Fixing KDB's xrwtrc command, also added the current process id into
[XFS] Fix compiler warning from xfs_file_compat_invis_ioctl prototype.
[XFS] remove bogus INT_GET for u8 variables in xfs_dir_leaf.c
[XFS] endianess annotations for xfs_da_node_hdr_t
[XFS] endianess annotations for xfs_da_node_entry_t
[XFS] store xfs_attr_inactive_list_t in native endian
[XFS] store xfs_attr_sf_sort in native endian
[XFS] endianess annotations for xfs_attr_shortform_t
[XFS] endianess annotations for xfs_attr_leaf_name_remote_t
[XFS] endianess annotations for xfs_attr_leaf_name_local_t
[XFS] endianess annotations for xfs_attr_leaf_entry_t
[XFS] endianess annotations for xfs_attr_leaf_hdr_t
[XFS] remove bogus INT_GET on u8 variables in xfs_dir2_block.c
[XFS] endianess annotations for xfs_da_blkinfo_t
...
Kristen Accardi [Fri, 3 Mar 2006 18:16:05 +0000 (10:16 -0800)]
[PATCH] PCI Hotplug: add common acpi functions to core
shpchprm_acpi.c and pciehprm_acpi.c are nearly identical. In addition,
there are functions in both these files that are also in acpiphp_glue.c.
This patch will remove duplicate functions from shpchp, pciehp, and
acpiphp and move this functionality to pci_hotplug, as it is not
hardware specific. Get rid of shpchprm* and pciehprm* files since they
are no longer needed. shpchprm_nonacpi.c and pciehprm_nonacpi.c are
identical, as well as shpchprm_legacy.c and can be replaced with a
macro.
This patch also changes acpiphp to use the common hpp code.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kenji Kaneshige [Mon, 27 Feb 2006 13:15:49 +0000 (22:15 +0900)]
[PATCH] acpiphp: Scan slots under the nested P2P bridge
Current ACPIPHP driver scans only slots under the top level PCI-to-PCI
bridge. So hotplug PCI slots under the nested PCI-to-PCI bridge would
not be detected. For example, if the system has the ACPI namespace
like below, hotplug slots woule not be detected.
Matthew Wilcox [Mon, 6 Mar 2006 05:33:34 +0000 (22:33 -0700)]
[PATCH] PCI: Provide a boot parameter to disable MSI
Several drivers are starting to grow options to disable MSI. However,
it's often a host chipset issue, not something which individual drivers
should handle. So we add the pci=nomsi kernel parameter to allow the user
to disable MSI modes for systems we haven't added to the quirk list yet.
Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adrian Bunk [Sun, 26 Feb 2006 21:16:51 +0000 (22:16 +0100)]
[PATCH] PCI: cpqphp_ctrl.c: board_replaced(): remove dead code
The Coverity checker correctly noted, that in function board_replaced in
drivers/pci/hotplug/cpqphp_ctrl.c, the variable src always has the
value 8, and therefore much code after the
...
if (rc || src) {
...
if (rc)
return rc;
else
return 1;
}
...
can never be called.
This patch removes the unreachable code in this function fixing kernel
Bugzilla #6073.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
MUNEDA Takahiro [Fri, 24 Feb 2006 01:56:08 +0000 (17:56 -0800)]
[PATCH] acpiphp - slot management fix - V4
o This patch removes IDs (for slots management).
o This patch removes the slot register/unregister processes
from the init/exit phases. Instead, adds these processes
in the bridge add/cleanup phases.
o Currently, this change doesn't have any meanings. But
these changes are needed to support p2p bridge(with
hotplug slot)
Kristen Accardi [Fri, 24 Feb 2006 01:56:06 +0000 (17:56 -0800)]
[PATCH] acpi: remove dock event handling from ibm_acpi
Remove dock station support from ibm_acpi by default. This support has
been put into acpiphp instead. Allow ibm_acpi to continue to provide
docking station support via config option for laptops/docking stations
that are not supported by acpiphp.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kristen Accardi [Fri, 24 Feb 2006 01:56:03 +0000 (17:56 -0800)]
[PATCH] acpiphp: add dock event handling
These patches add generic dock event handling to acpiphp. If there are
pci devices that need to be inserted/removed after the dock event, the
event notification will be handed down to the normal pci hotplug event
handler in acpiphp so that new bridges/devices can be enumerated.
Because some dock stations do not have pci bridges or pci devices that
need to be inserted after a dock, acpiphp will remain loaded to handle
dock events even if no hotpluggable pci slots are discovered.
You probably need to have the pci=assign-busses kernel parameter enabled
to use these patches, and you may not allow ibm_acpi to handle docking
notifications and use this patch.
This patch incorporates feedback provided by many.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kristen Accardi [Fri, 24 Feb 2006 01:55:58 +0000 (17:55 -0800)]
[PATCH] acpiphp: add new bus to acpi
If we add a new bridge with subordinate busses, we should call make sure
that acpi is notified so that the PRT (if present) can be read and drivers
who have registered on this bus will be notified when it is started.
Also make sure to use the max reserved bus number for the starting the bus
scan.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Bernhard Kaindl [Sat, 18 Feb 2006 09:36:55 +0000 (01:36 -0800)]
[PATCH] PCI: PCI/Cardbus cards hidden, needs pci=assign-busses to fix
"In some cases, especially on modern laptops with a lot of PCI and cardbus
bridges, we're unable to assign correct secondary/subordinate bus numbers
to all cardbus bridges due to BIOS limitations unless we are using
"pci=assign-busses" boot option." -- Ivan Kokshaysky (from a patch comment)
Without it, Cardbus cards inserted are never seen by PCI because the parent
PCI-PCI Bridge of the Cardbus bridge will not pass and translate Type 1 PCI
configuration cycles correctly and the system will fail to find and
initialise the PCI devices in the system.
Reference: PCI-PCI Bridges: PCI Configuration Cycles and PCI Bus Numbering:
http://www.science.unitn.it/~fiorella/guidelinux/tlk/node72.html
The reason for this is that:
``All PCI busses located behind a PCI-PCI bridge must reside between the
secondary bus number and the subordinate bus number (inclusive).''
"pci=assign-busses" makes pcibios_assign_all_busses return 1 and this
turns on PCI renumbering during PCI probing.
Alan suggested to use DMI automatically set assign-busses on problem systems.
The only question for me was where to put it. I put it directly before
scanning PCI bus into pcibios_scan_root() because it's called from legacy,
acpi and numa and so it can be one place for all systems and configurations
which may need it.
AMD64 Laptops are also affected and fixed by assign-busses, and the code is
also incuded from arch/x86_64/pci/ that place will also work for x86_64
kernels, I only ifdef'-ed the x86-only Laptop in this example.
Affected and known or assumed to be fixed with it are (found by googling):
* ASUS Z71V and L3s
* Samsung X20
* Compaq R3140us and all Compaq R3000 series laptops with TI1620 Controller,
also Compaq R4000 series (from a kernel.org bugreport)
* HP zv5000z (AMD64 3700+, known that fixup_parent_subordinate_busnr fixes it)
* HP zv5200z
* IBM ThinkPad 240
* An IBM ThinkPad (1.8 GHz Pentium M) debugged by Pavel Machek
gives the correspondig message which detects the possible problem.
* MSI S260 / Medion SIM 2100 MD 95600
The patch also expands the "try pci=assign-busses" warning so testers will
help us to update the DMI table.
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Grant Grundler [Fri, 17 Feb 2006 07:58:29 +0000 (23:58 -0800)]
[PATCH] PCI: fix problems with MSI-X on ia64
Use "unsigned long" when dealing with PCI resources.
The BAR Indicator Register (BIR) can be a 64-bit value
or the resource could be a 64-bit host physical address.
Enables ib_mthca and cciss drivers to use MSI-X on ia64 HW.
Problem showed up now because of new system firmware on one platform.
Symptom will either be memory corruption or MCA.
Second part of this patch deals with "useless" code.
We walk through the steps to find the phys_addr and then
don't use the result. I suspect the intent was to zero
out the respective MSI-X entry but I'm not sure at the moment.
Delete the code inside the #if 0/#endif if it's really
not needed.
Signed-off-by: Grant Grundler <iod00d@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>