Andy Whitcroft [Fri, 8 Feb 2008 12:20:54 +0000 (04:20 -0800)]
update checkpatch.pl to version 0.13
This version brings a large number of fixes which have built up over
the Christmas period. Mostly these are fixes for false positives, both
through improvments to unary checks and possible type detection. It
also brings new checks for while location and CVS keywords. Of note:
- a number of fixes to unary detection
- detection of a number of new forms of types to improve type matching
- better inline handling
- recognision of '%' as an operator
Andy Whitcroft (28):
Version: 0.13
unary detection: maintain bracket state across lines
move to pre-sanitising the entire file
the text of a #error statement should be treated like it is in quotes
line sanitisation needs to target double backslash correctly
tighten comment guestimation for lines starting ' * '
debug: add a debug framework
prevent unclosed single quotes from spreading
add % as an operator
the text of a #warning statement should be treated like it is in quotes
possible matching applies in typedefs
single statement block checks must not trigger when two or more statements
possible types: local variables may also be const
treat inline as a type attribute to even when out of place
possible types: sparse annotations are valid indicators
possible types: beef up the possible type testing
check for hanging while statements on the wrong line
utf8 checks need to occur against the raw lines
function brace checks should use any whitespece matches
comments should take up space in the line when sanitised
remove debugging from if assignment checks
possible types -- ensure we detect all pointer casts
fix tests for function spacing in the presence of #define
clean up the UTF-8 error message to be clearer
test-lib: invert the status report, output success counts
detect and report CVS keywords
tests: break out tests
Add $Id$ to the CVS keyword checks
Benny Halevy (1):
checkpatch.pl: recognize the #elif preprocessor directive
Geert Uytterhoeven (1):
print the filenames of patches where available
Mauro Carvalho Chehab (1):
Fix missing \n in checkpatch.pl
Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 8 Feb 2008 12:20:51 +0000 (04:20 -0800)]
udf: change maintainer
I've tried to contact Ben Fennema a few times but without success. Since I'm
currently probably closest to being an UDF maintainer, I guess it's fine to
also change the entry in MAINTAINERS.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <bfennema@falcon.csc.calpoly.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 8 Feb 2008 12:20:51 +0000 (04:20 -0800)]
udf: fix adding entry to a directory
When adding directory entry to a directory, we have to properly increase
length of the last extent. Handle this similarly as extending regular files -
make extents always have size multiple of block size (it will be truncated
down to proper size in udf_clear_inode()).
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 8 Feb 2008 12:20:50 +0000 (04:20 -0800)]
udf: cleanup directory offset handling
Position in directory returned by readdir is offset of directory entry divided
by four (don't ask me why). Make this conversion only when reading f_pos from
userspace / writing it there and internally work in bytes. It makes things
more easily readable and also fixes a bug (we forgot to divide length of the
entry by 4 when advancing f_pos in udf_add_entry()).
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Galbraith [Fri, 8 Feb 2008 12:20:49 +0000 (04:20 -0800)]
udf: avoid unnecessary synchronous writes
Fix udf_clear_inode() to request asynchronous writeout in icache reclaim
path.
Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:49 +0000 (04:20 -0800)]
udf: fix signedness issue
sparse generated:
fs/udf/namei.c:896:15: originally declared here
fs/udf/namei.c:1147:41: warning: incorrect type in argument 3 (different signedness)
fs/udf/namei.c:1147:41: expected int *offset
fs/udf/namei.c:1147:41: got unsigned int *<noident>
fs/udf/namei.c:1152:78: warning: incorrect type in argument 3 (different signedness)
fs/udf/namei.c:1152:78: expected int *offset
fs/udf/namei.c:1152:78: got unsigned int *<noident>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sparse generated:
fs/udf/inode.c:324:41: warning: incorrect type in argument 4 (different signedness)
fs/udf/inode.c:324:41: expected long *<noident>
fs/udf/inode.c:324:41: got unsigned long *<noident>
inode_getblk always set 4th argument to uint32_t value
3rd parameter of map_bh is sector_t (which is unsigned long or u64)
so convert phys value to sector_t
fs/udf/inode.c:1818:47: warning: incorrect type in argument 3 (different signedness)
fs/udf/inode.c:1818:47: expected int *<noident>
fs/udf/inode.c:1818:47: got unsigned int *<noident>
fs/udf/inode.c:1826:46: warning: incorrect type in argument 3 (different signedness)
fs/udf/inode.c:1826:46: expected int *<noident>
fs/udf/inode.c:1826:46: got unsigned int *<noident>
udf_get_filelongad and udf_get_shortad are called always for uint32_t
values (struct extent_position->offset), so it's safe to convert offset
parameter to uint32_t
gcc warned:
fs/udf/inode.c: In function 'udf_get_block':
fs/udf/inode.c:299: warning: 'phys' may be used uninitialized in this function
initialize it to 0 (if someday someone will break inode_getblk we will catch it immediately)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Ben Fennema <bfennema@falcon.csc.calpoly.edu> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:47 +0000 (04:20 -0800)]
udf: remove wrong prototype of udf_readdir
sparse generated:
fs/udf/dir.c:78:5: warning: symbol 'udf_readdir' was not declared. Should it be static?
there are 2 different prototypes of udf_readdir - remove them and move
code around to make it still compile
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 8 Feb 2008 12:20:47 +0000 (04:20 -0800)]
kill UDFFS_{DATE,VERSION}
Printing date and version of a driver makes sense if there's a maintainer
who's maintaining and using these, but printing ancient version information
only confuses users.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:46 +0000 (04:20 -0800)]
udf: improve readability of udf_load_partition
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:46 +0000 (04:20 -0800)]
udf: fix udf_debug macro
udf_debug should be enclosed with do { } while (0)
to be safely used in code like below:
if (something)
udf_debug();
else
anything;
(Otherwise compiler will not compile it with:
"error: expected expression before 'else'")
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:44 +0000 (04:20 -0800)]
udf: cache struct udf_inode_info
cache UDF_I(struct inode *) return values when there are
at least 2 uses in one function
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:42 +0000 (04:20 -0800)]
udf: remove UDF_I_* macros and open code them
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:41 +0000 (04:20 -0800)]
udf: convert byte order of constant instead of variable
convert byte order of constant instead of variable,
which can be done at compile time (vs run time)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:41 +0000 (04:20 -0800)]
udf: replace loops coded with goto to real loops
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:40 +0000 (04:20 -0800)]
udf: create common function for changing free space counter
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:39 +0000 (04:20 -0800)]
udf: create common function for tag checksumming
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:36 +0000 (04:20 -0800)]
udf: fix coding style
fix coding style errors found by checkpatch:
- assignments in if conditions
- braces {} around single statement blocks
- no spaces after commas
- printks without KERN_*
- lines longer than 80 characters
- spaces between "type *" and variable name
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:35 +0000 (04:20 -0800)]
udf: fix sparse warnings (shadowing & mismatch between declaration and definition)
fix sparse warnings:
fs/udf/super.c:1431:24: warning: symbol 'bh' shadows an earlier one
fs/udf/super.c:1347:21: originally declared here
fs/udf/super.c:472:6: warning: symbol 'udf_write_super' was not declared. Should it be static?
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Ben Fennema <bfennema@falcon.csc.calpoly.edu> Cc: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:34 +0000 (04:20 -0800)]
udf: move calculating of nr_groups into helper function
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Ben Fennema <bfennema@falcon.csc.calpoly.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:33 +0000 (04:20 -0800)]
udf: convert macros related to bitmaps to functions
convert UDF_SB_ALLOC_BITMAP macro to udf_sb_alloc_bitmap function
convert UDF_SB_FREE_BITMAP macro to udf_sb_free_bitmap function
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Ben Fennema <bfennema@falcon.csc.calpoly.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:33 +0000 (04:20 -0800)]
udf: check if udf_load_logicalvol failed
udf_load_logicalvol may fail eg in out of memory conditions - check it
and propagate error further
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Ben Fennema <bfennema@falcon.csc.calpoly.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:32 +0000 (04:20 -0800)]
udf: convert UDF_SB_ALLOC_PARTMAPS macro to udf_sb_alloc_partition_maps function
- convert UDF_SB_ALLOC_PARTMAPS macro to udf_sb_alloc_partition_maps function
- convert kmalloc + memset to kcalloc
- check if kcalloc failed (partially)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Ben Fennema <bfennema@falcon.csc.calpoly.edu> Cc: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
convert UDF_SB_LVIDIU macro to udf_sb_lvidiu function
rename some struct udf_sb_info fields:
- s_volident to s_volume_ident
- s_lastblock to s_last_block
- s_lvidbh to s_lvid_bh
- s_recordtime to s_record_time
- s_serialnum to s_serial_number;
- s_vat to s_vat_inode;
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Ben Fennema <bfennema@falcon.csc.calpoly.edu> Cc: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:28 +0000 (04:20 -0800)]
udf: fix coding style of super.c
fix coding style errors found by checkpatch:
- assignments in if conditions
- braces {} around single statement blocks
- no spaces after commas
- printks without KERN_*
- lines longer than 80 characters
before: total: 50 errors, 207 warnings, 1835 lines checked
after: total: 0 errors, 164 warnings, 1872 lines checked
all 164 warnings left are lines longer than 80 characters;
this file has too much indentation with really long expressions
to break all those lines now; will fix later
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Ben Fennema <bfennema@falcon.csc.calpoly.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes simple attributes might need to return an error, e.g. for
acquiring a mutex interruptibly. In fact we have that situation in
spufs already which is the original user of the simple attributes. This
patch merged the temporarily forked attributes in spufs back into the
main ones and allows to return errors.
[akpm@linux-foundation.org: build fix] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <stefano.brivio@polimi.it> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg KH <greg@kroah.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We shouldn't use WB_SYNC_ALL if the caller is asking for asynchronous
treatment.
Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Fri, 8 Feb 2008 12:20:22 +0000 (04:20 -0800)]
asm-*/posix_types.h: scrub __GLIBC__
Some arches (like alpha and ia64) already have a clean posix_types.h header.
This brings all the others in line by removing all references to __GLIBC__
(and some undocumented __USE_ALL).
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we create symlink on UFS2 filesystem under Linux, it looks wrong under
other OSes, because of max symlink length field was not initialized
properly, and data blocks were not used to save short symlink names.
[akpm@linux-foundation.org: add missing fs32_to_cpu()] Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Steven <stevenaaus@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Fri, 8 Feb 2008 12:20:14 +0000 (04:20 -0800)]
aio: partial write should not return error code
When an AIO write gets an error after writing some data (eg. ENOSPC), it
should return the amount written already, not the error. Just like write()
is supposed to.
This was found by the libaio test suite.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-By: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:13 +0000 (04:20 -0800)]
ext3: replace all adds to little endians variables with le*_add_cpu
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
expression_in_cpu_byteorder);
with:
leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
sparse didn't generate any new warning with this patch
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Fri, 8 Feb 2008 12:20:12 +0000 (04:20 -0800)]
byteorder: move le32_add_cpu & friends from OCFS2 to core
This patchset moves le*_add_cpu and be*_add_cpu functions from OCFS2 to core
header (1st), converts ext3 filesystem to this API (2nd) and replaces XFS
different named functions with new ones (3rd).
There are many places where these functions will be useful. Just look at:
grep -r 'cpu_to_[ble12346]*([ble12346]*_to_cpu.*[-+]' linux-src/ Patch for
ext3 is an example how conversions will probably look like.
Jan Kara [Fri, 8 Feb 2008 12:20:11 +0000 (04:20 -0800)]
Use pgoff_t instead of unsigned long
Convert variables containing page indexes to pgoff_t.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed L. Cashin [Fri, 8 Feb 2008 12:20:09 +0000 (04:20 -0800)]
aoe: make error messages more specific
Andrew Morton pointed out that the "too many targets" message in patch 2 could
be printed for failing GFP_ATOMIC allocations. This patch makes the messages
more specific.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed L. Cashin [Fri, 8 Feb 2008 12:20:07 +0000 (04:20 -0800)]
aoe: add module parameter for users who need more outstanding I/O
An AoE target provides an estimate of the number of outstanding commands that
the AoE initiator can send before getting a response. The aoe_maxout
parameter provides a way to set an even lower limit. It will not allow a user
to use more outstanding commands than the target permits. If a user discovers
a problem with a large setting, this parameter provides a way for us to work
with them to debug the problem. We expect to improve the dynamic window
sizing algorithm and drop this parameter. For the time being, it is a
debugging aid.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed L. Cashin [Fri, 8 Feb 2008 12:20:06 +0000 (04:20 -0800)]
aoe: only install new AoE device once
An aoe driver user who had about 70 AoE targets found that he was hitting a
BUG in sysfs_create_file because the aoe driver was trying to tell the kernel
about an AoE device more than once. Each AoE device was reachable by several
local network interfaces, and multiple ATA device indentify responses were
returning from that single device.
This patch eliminates a race condition so that aoe always informs the block
layer of a new AoE device once in the presence of multiple incoming ATA device
identify responses.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed L. Cashin [Fri, 8 Feb 2008 12:20:05 +0000 (04:20 -0800)]
aoe: dynamically allocate a capped number of skbs when necessary
What this Patch Does
Even before this recent series of 12 patches to 2.6.22-rc4, the aoe
driver was reusing a small set of skbs that were allocated once and
were only used for outbound AoE commands.
The network layer cannot be allowed to put_page on the data that is
still associated with a bio we haven't returned to the block layer,
so the aoe driver (even before the patch under discussion) is still
the owner of skbs that have been handed to the network layer for
transmission. We need to keep track of these skbs so that we can
free them, but by tracking them, we can also easily re-use them.
The new patch was a response to the behavior of certain network
drivers. We cannot reuse an skb that the network driver still has
in its transmit ring. Network drivers can defer transmit ring
cleanup and then use the state in the skb to determine how many data
segments to clean up in its transmit ring. The tg3 driver is one
driver that behaves in this way.
When the network driver defers cleanup of its transmit ring, the aoe
driver can find itself in a situation where it would like to send an
AoE command, and the AoE target is ready for more work, but the
network driver still has all of the pre-allocated skbs. In that
case, the new patch just calls alloc_skb, as you'd expect.
We don't want to get carried away, though. We try not to do
excessive allocation in the write path, so we cap the number of skbs
we dynamically allocate.
Probably calling it a "dynamic pool" is misleading. We were already
trying to use a small fixed-size set of pre-allocated skbs before
this patch, and this patch just provides a little headroom (with a
ceiling, though) to accomodate network drivers that hang onto skbs,
by allocating when needed. The d->skbpool_hd list of allocated skbs
is necessary so that we can free them later.
We didn't notice the need for this headroom until AoE targets got
fast enough.
Alternatives
If the network layer never did a put_page on the pages in the bio's
we get from the block layer, then it would be possible for us to
hand skbs to the network layer and forget about them, allowing the
network layer to free skbs itself (and thereby calling our own
skb->destructor callback function if we needed that). In that case
we could get rid of the pre-allocated skbs and also the
d->skbpool_hd, instead just calling alloc_skb every time we wanted
to transmit a packet. The slab allocator would effectively maintain
the list of skbs.
Besides a loss of CPU cache locality, the main concern with that
approach the danger that it would increase the likelihood of
deadlock when VM is trying to free pages by writing dirty data from
the page cache through the aoe driver out to persistent storage on
an AoE device. Right now we have a situation where we have
pre-allocation that corresponds to how much we use, which seems
ideal.
Of course, there's still the separate issue of receiving the packets
that tell us that a write has successfully completed on the AoE
target. When memory is low and VM is using AoE to flush dirty data
to free up pages, it would be perfect if there were a way for us to
register a fast callback that could recognize write command
completion responses. But I don't think the current problems with
the receive side of the situation are a justification for
exacerbating the problem on the transmit side.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed L. Cashin [Fri, 8 Feb 2008 12:20:03 +0000 (04:20 -0800)]
aoe: user can ask driver to forget previously detected devices
When an AoE device is detected, the kernel is informed, and a new block device
is created. If the device is unused, the block device corresponding to remote
device that is no longer available may be removed from the system by telling
the aoe driver to "flush" its list of devices.
Without this patch, software like GPFS and LVM may attempt to read from AoE
devices that were discovered earlier but are no longer present, blocking until
the I/O attempt times out.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed L. Cashin [Fri, 8 Feb 2008 12:20:00 +0000 (04:20 -0800)]
aoe: handle multiple network paths to AoE device
A remote AoE device is something can process ATA commands and is identified by
an AoE shelf number and an AoE slot number. Such a device might have more
than one network interface, and it might be reachable by more than one local
network interface. This patch tracks the available network paths available to
each AoE device, allowing them to be used more efficiently.
Andrew Morton asked about the call to msleep_interruptible in the revalidate
function. Yes, if a signal is pending, then msleep_interruptible will not
return 0. That means we will not loop but will call aoenet_xmit with a NULL
skb, which is a noop. If the system is too low on memory or the aoe driver is
too low on frames, then the user can hit control-C to interrupt the attempt to
do a revalidate. I have added a comment to the code summarizing that.
Andrew Morton asked whether the allocation performed inside addtgt could use a
more relaxed allocation like GFP_KERNEL, but addtgt is called when the aoedev
lock has been locked with spin_lock_irqsave. It would be nice to allocate the
memory under fewer restrictions, but targets are only added when the device is
being discovered, and if the target can't be added right now, we can try again
in a minute when then next AoE config query broadcast goes out.
Andrew Morton pointed out that the "too many targets" message could be printed
for failing GFP_ATOMIC allocations. The last patch in this series makes the
messages more specific.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Fri, 8 Feb 2008 12:19:57 +0000 (04:19 -0800)]
constify tables in kernel/sysctl_check.c
Remains the question whether it is intended that many, perhaps even large,
tables are compiled in without ever having a chance to get used, i.e.
whether there shouldn't #ifdef CONFIG_xxx get added.
[akpm@linux-foundation.org: fix cut-n-paste error] Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Fri, 8 Feb 2008 12:19:49 +0000 (04:19 -0800)]
rewrite rd
This is a rewrite of the ramdisk block device driver.
The old one is really difficult because it effectively implements a block
device which serves data out of its own buffer cache. It relies on the dirty
bit being set, to pin its backing store in cache, however there are non
trivial paths which can clear the dirty bit (eg. try_to_free_buffers()),
which had recently lead to data corruption. And in general it is completely
wrong for a block device driver to do this.
The new one is more like a regular block device driver. It has no idea about
vm/vfs stuff. It's backing store is similar to the buffer cache (a simple
radix-tree of pages), but it doesn't know anything about page cache (the pages
in the radix tree are not pagecache pages).
There is one slight downside -- direct block device access and filesystem
metadata access goes through an extra copy and gets stored in RAM twice.
However, this downside is only slight, because the real buffercache of the
device is now reclaimable (because we're not playing crazy games with it), so
under memory intensive situations, footprint should effectively be the same --
maybe even a slight advantage to the new driver because it can also reclaim
buffer heads.
The fact that it now goes through all the regular vm/fs paths makes it
much more useful for testing, too.
text data bss dec hex filename
2837 849 384 4070 fe6 drivers/block/rd.o
3528 371 12 3911 f47 drivers/block/brd.o
Text is larger, but data and bss are smaller, making total size smaller.
A few other nice things about it:
- Similar structure and layout to the new loop device handlinag.
- Dynamic ramdisk creation.
- Runtime flexible buffer head size (because it is no longer part of the
ramdisk code).
- Boot / load time flexible ramdisk size, which could easily be extended
to a per-ramdisk runtime changeable size (eg. with an ioctl).
- Can use highmem for the backing store.
[akpm@linux-foundation.org: fix build]
[byron.bbradley@gmail.com: make rd_size non-static] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Byron Bradley <byron.bbradley@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 8 Feb 2008 12:19:31 +0000 (04:19 -0800)]
mn10300: add the MN10300/AM33 architecture to the kernel
Add architecture support for the MN10300/AM33 CPUs produced by MEI to the
kernel.
This patch also adds board support for the ASB2303 with the ASB2308 daughter
board, and the ASB2305. The only processor supported is the MN103E010, which
is an AM33v2 core plus on-chip devices.
[akpm@linux-foundation.org: nuke cvs control strings] Signed-off-by: Masakazu Urade <urade.masakazu@jp.panasonic.com> Signed-off-by: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 8 Feb 2008 12:19:28 +0000 (04:19 -0800)]
aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.
Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
be permitted to go looking for A.OUT libraries to load in such a case. Not
only that, but under such conditions A.OUT core dumps are not produced either.
To make this work, this patch also does the following:
(1) Makes the existence of the contents of linux/a.out.h contingent on
CONFIG_ARCH_SUPPORTS_AOUT.
(2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
core dumping code.
(3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This
is then included only where needed. This means that this bit of arch
code will be stored in the appropriate A.OUT binfmt module rather than
the core kernel.
(4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
needed) and FRV.
This patch depends on the previous patch to move STACK_TOP[_MAX] out of
asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
format is available.
[jdike@addtoit.com: uml: re-remove accidentally restored code] Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 8 Feb 2008 12:19:27 +0000 (04:19 -0800)]
aout: mark arches that support A.OUT format
Mark arches that support A.OUT format by including the following in their
master Kconfig files:
config ARCH_SUPPORTS_AOUT
def_bool y
This should also be set if the arch provides compatibility A.OUT support for
an older arch, for instance x86_64 for i386 or sparc64 for sparc.
I've guessed at which arches don't, based on comments in the code, however I'm
sure that some of the ones I've marked as 'yes' actually should be 'no'.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Fri, 8 Feb 2008 12:19:24 +0000 (04:19 -0800)]
timekeeping: rename timekeeping_is_continuous to timekeeping_valid_for_hres
Function timekeeping_is_continuous() no longer checks flag
CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now. So rename
the function accordingly.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 8 Feb 2008 12:19:22 +0000 (04:19 -0800)]
Clean up the kill_something_info
This is the first step (of two) in removing the kill_pgrp_info.
All the users of this function are in kernel/signal.c, but all they need is to
call __kill_pgrp_info() with the tasklist_lock read-locked.
Fortunately, one of its users is the kill_something_info(), which already
needs this lock in one of its branches, so clean these branches up and call
the __kill_pgrp_info() directly.
Based on Oleg's view of how this function should look.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 8 Feb 2008 12:19:21 +0000 (04:19 -0800)]
Pidns: fix badly converted mqueues pid handling
When sending the pid namespaces patches I wrongly converted the tsk->tgid into
task_pid_vnr(tsk) in mqueue-s (the git id of this patch is b488893a390edfe027bae7a46e9af8083e740668).
The proper behavior is to get the task_tgid_vnr(tsk).
This seem to be the only mistake of that kind.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 8 Feb 2008 12:19:20 +0000 (04:19 -0800)]
Pidns: make full use of xxx_vnr() calls
Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
_all_ converted to operate on the current pid namespace. After this each call
like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
one.
Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
appropriate.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:19 +0000 (04:19 -0800)]
ITIMER_REAL: convert to use struct pid
signal_struct->tsk points to the ->group_leader and thus we have the nasty
code in de_thread() which has to change it and restart ->real_timer if the
leader is changed.
Use "struct pid *leader_pid" instead. This also allows us to kill now
unneeded send_group_sig_info().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:19 +0000 (04:19 -0800)]
uglify while_each_pid_task() to make sure we don't count the execing pricess twice
There is a window when de_thread() switches the leader and drops
tasklist_lock. In that window do_each_pid_task(PIDTYPE_PID) finds both new
and old leaders.
The problem is pretty much theoretical and probably can be ignored. Currently
the only users of do_each_pid_task(PIDTYPE_PID) are send_sigio/send_sigurg, so
they can send the signal to the same process twice.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:18 +0000 (04:19 -0800)]
uglify kill_pid_info() to fix kill() vs exec() race
kill_pid_info()->pid_task() could be the old leader of the execing process.
In that case it is possible that the leader will be released before we take
siglock. This means that kill_pid_info() (and thus sys_kill()) can return a
false -ESRCH.
Change the code to retry when lock_task_sighand() fails. The endless loop is
not possible, __exit_signal() both clears ->sighand and does detach_pid().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pid_vnr returns the user space pid with respect to the pid namespace the
struct pid was allocated in. What we want before we return a pid to user
space is the user space pid with respect to the pid namespace of current.
pid_vnr is a very nice optimization but because it isn't quite what we want
it is easy to use pid_vnr at times when we aren't certain the struct pid
was allocated in our pid namespace.
Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the
parent process reported in the core dumps and the parent process in
get_signal_to_deliver.
So unless the performance impact is huge having an interface that does what
we want instead of always what we want should be much more reliable and
much less error prone.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This modifies do_wait and eligible child to take a pair of enum pid_type
and struct pid *pid to precisely specify what set of processes are eligible
to be waited for, instead of the raw pid_t value from sys_wait4.
This fixes a bug in sys_waitid where you could not wait for children in
just process group 1.
This fixes a pid namespace crossing case in eligible_child. Allowing us to
wait for a processes in our current process group even if our current
process group == 0.
This allows the no child with this pid case to be optimized. This allows
us to optimize the pid membership test in eligible child to be optimized.
This even closes a theoretical pid wraparound race where in a threaded
parent if two threads are waiting for the same child and one thread picks
up the child and the pid numbers wrap around and generate another child
with that same pid before the other thread is scheduled (teribly insanely
unlikely) we could end up waiting on the second child with the same pid#
and not discover that the specific child we were waiting for has exited.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:13 +0000 (04:19 -0800)]
move the related code from exit_notify() to exit_signals()
The previous bugfix was not optimal, we shouldn't care about group stop
when we are the only thread or the group stop is in progress. In that case
nothing special is needed, just set PF_EXITING and return.
Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify().
So, from the performance POV the only difference is that we don't trust
!signal_pending() until we take ->siglock. But this in fact fixes another
___pure___ theoretical minor race. __group_complete_signal() finds the
task without PF_EXITING and chooses it as the target for signal_wake_up().
But nothing prevents this task from exiting in between without noticing the
pending signal and thus unpredictably delaying the actual delivery.
Oleg Nesterov [Fri, 8 Feb 2008 12:19:12 +0000 (04:19 -0800)]
fix group stop with exit race
do_signal_stop() counts all sub-thread and sets ->group_stop_count
accordingly. Every thread should decrement ->group_stop_count and stop,
the last one should notify the parent.
However a sub-thread can exit before it notices the signal_pending(), or it
may be somewhere in do_exit() already. In that case the group stop never
finishes properly.
Note: this is a minimal fix, we can add some optimizations later. Say we
can return quickly if thread_group_empty(). Also, we can move some signal
related code from exit_notify() to exit_signals().
Oleg Nesterov [Fri, 8 Feb 2008 12:19:11 +0000 (04:19 -0800)]
start the global /sbin/init with 0,0 special pids
As Eric pointed out, there is no problem with init starting with sid == pgid
== 0, and this was historical linux behavior changed in 2.6.18.
Remove kernel_init()->__set_special_pids(), this is unneeded and complicates
the rules for sys_setsid().
This change and the previous change in daemonize() mean that /sbin/init does
not need the special "session != 1" hack in sys_setsid() any longer. We can't
remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so
update the comment only.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:10 +0000 (04:19 -0800)]
move daemonized kernel threads into the swapper's session
Daemonized kernel threads run in the init's session. This doesn't match the
behaviour of kthread_create()'ed threads, and this is one of the 2 reasons
why we need a special hack in sys_setsid().
Now that set_special_pids() was changed to use struct pid, not pid_t, we can
use init_struct_pid and set 0,0 special pids.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:09 +0000 (04:19 -0800)]
teach set_special_pids() to use struct pid
Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:09 +0000 (04:19 -0800)]
fix setsid() for sub-namespace /sbin/init
sys_setsid() still deals with pid_t's from the global namespace. This means
that the "session > 1" check can't help for sub-namespace init, setsid() can't
succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links.
Remove the usage of task_struct->pid and convert the code to use "struct pid".
This also simplifies and speedups the code, saves one find_pid().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:08 +0000 (04:19 -0800)]
sys_setpgid(): simplify pid/ns interaction
sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice
versa. Use "struct pid" more consistently. Saves one find_vpid() and
eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the
code.
Also use the same_thread_group() helper.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:07 +0000 (04:19 -0800)]
wait_task_zombie: remove ->exit_state/exit_signal checks for WNOWAIT
The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense.
The exit_state was EXIT_ZOMBIE when the function was called, and another
thread can change it to EXIT_DEAD right after the check.
The second condition is not possible, detached non-traced threads were already
filtered out by eligible_child(), we didn't drop tasklist since then.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Feb 2008 12:19:07 +0000 (04:19 -0800)]
wait_task_continued/zombie: don't use task_pid_nr_ns() lockless
Surprise, the other two wait_task_*() functions also abuse the
task_pid_nr_ns() function, and may cause read-after-free or report nr == 0
in wait_task_continued(). wait_task_zombie() doesn't have this problem,
but it is still better to cache pid_t rather than call task_pid_nr_ns()
three times on the saved pid_namespace.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>