Harvey Harrison [Tue, 29 Apr 2008 08:03:30 +0000 (01:03 -0700)]
drivers/block: use get_unaligned_* helpers
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Ed L. Cashin <ecashin@coraid.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Tue, 29 Apr 2008 08:03:30 +0000 (01:03 -0700)]
kernel: Move arches to use common unaligned access
Unaligned access is ok for the following arches:
cris, m68k, mn10300, powerpc, s390, x86
Arches that use the memmove implementation for native endian, and
the byteshifting for the opposite endianness.
h8300, m32r, xtensa
Packed struct for native endian, byteshifting for other endian:
alpha, blackfin, ia64, parisc, sparc, sparc64, mips, sh
m86knommu is generic_be for Coldfire, otherwise unaligned access is ok.
frv, arm chooses endianness based on compiler settings, uses the byteshifting
versions. Remove the unaligned trap handler from frv as it is now unused.
v850 is le, uses the byteshifting versions for both be and le.
Remove the now unused asm-generic implementation.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Tue, 29 Apr 2008 08:03:27 +0000 (01:03 -0700)]
kernel: add common infrastructure for unaligned access
Create a linux/unaligned directory similar in spirit to the linux/byteorder
folder to hold generic implementations collected from various arches.
Currently there are five implementations:
1) packed_struct.h: C-struct based, from asm-generic/unaligned.h
2) le_byteshift.h: Open coded byte-swapping, heavily based on asm-arm
3) be_byteshift.h: Open coded byte-swapping, heavily based on asm-arm
4) memmove.h: taken from multiple implementations in tree
5) access_ok.h: taken from x86 and others, unaligned access is ok.
All of the new implementations checks for sizes not equal to 1,2,4,8
and will fail to link.
API additions:
get_unaligned_{le16|le32|le64|be16|be32|be64}(p) which is meant to replace
code of the form:
le16_to_cpu(get_unaligned((__le16 *)p));
put_unaligned_{le16|le32|le64|be16|be32|be64}(val, pointer) which is meant to
replace code of the form:
put_unaligned(cpu_to_le16(val), (__le16 *)p);
The headers that arches should include from their asm/unaligned.h:
access_ok.h : Wrappers of the byteswapping functions in asm/byteorder
Choose a particular implementation for little-endian access:
le_byteshift.h
le_memmove.h (arch must be LE)
le_struct.h (arch must be LE)
Choose a particular implementation for big-endian access:
be_byteshift.h
be_memmove.h (arch must be BE)
be_struct.h (arch must be BE)
After including as needed from the above, include unaligned/generic.h and
define your arch's get/put_unaligned as (for LE):
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Tue, 29 Apr 2008 08:03:26 +0000 (01:03 -0700)]
sysv fs: remove superfluous check for __GNUC__ compiler
Since <linux/sysv_fs.h> isn't exported to userspace, there is little
point checking that this is a GNU-compatible compiler.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tpm: change Kconfig dependencies from PNPACPI to PNP
There is no "PNPACPI" driver interface as such. PNPACPI is an internal
backend of PNP, and drivers just use the generic PNP interface.
The drivers should depend on CONFIG_PNP, not CONFIG_PNPACPI.
tpm_nsc.c doesn't use PNP at all, so we can just remove the dependency
completely. It probably *should* use PNP to discover the device, but until it
does, there's no point in depending on PNP.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Kylene Jo Hall <kjhall@us.ibm.com> Cc: Marcel Selhorst <tpm@selhorst.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sam Ravnborg [Tue, 29 Apr 2008 08:03:23 +0000 (01:03 -0700)]
tpm: fix section mismatch warning
Fix following warning:
WARNING: vmlinux.o(.init.text+0x32804): Section mismatch in reference from the function init_nsc() to the function .devexit.text:tpm_nsc_remove()
The function tpm_nsc_remove() are used outside __exit, so remove the __exit
annotation to make sure the function is always avilable.
Note: Trying to compare this module with other users of platform_device gve me
the impression that this driver needs some work to match other users.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Kylene Hall <kjhall@us.ibm.com> Cc: Marcel Selhorst <tpm@selhorst.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
parport_pc: wrap PNP probe code in #ifdef CONFIG_PNP
Wrap PNP probe code in #ifdef CONFIG_PNP. We already do the same for
CONFIG_PCI.
Without this change, we'll have unresolved references to pnp_get_resource()
function when CONFIG_PNP=n. (This is a new interface that's not in mainline
yet.)
Robert P. J. Day [Tue, 29 Apr 2008 08:03:20 +0000 (01:03 -0700)]
afs: use the shorter LIST_HEAD for brevity
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
edac: fix module initialization on several modules 2nd time
I implemented opstate_init() as a inline function in linux/edac.h.
added calling opstate_init() to:
i82443bxgx_edac.c
i82860_edac.c
i82875p_edac.c
i82975x_edac.c
I wrote a fixed patch of
edac-fix-module-initialization-on-several-modules.patch,
and tested building 2.6.25-rc7 with applying this. It was succeed.
I think the patch is now correct.
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Tue, 29 Apr 2008 08:03:18 +0000 (01:03 -0700)]
edac: remove unneeded functions and add static accessor
Collection of patches, merged into one, from Adrian that do the following:
1) This patch makes the following needlessly global functions static:
- edac_pci_get_log_pe()
- edac_pci_get_log_npe()
- edac_pci_get_panic_on_pe()
- edac_pci_unregister_sysfs_instance_kobj()
- edac_pci_main_kobj_setup()
2) Remove unneeded function edac_device_find()
3) Added #if 0 around function edac_pci_find()
4) make the needlessly global edac_pci_generic_check() static
5) Removed function edac_check_mc_devices()
Doug Thompson modified Adrian's patches, to bettern represent
the direction of EDAC, and make them one patch.
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Tue, 29 Apr 2008 08:03:17 +0000 (01:03 -0700)]
edac: use the shorter LIST_HEAD for brevity
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Tyser [Tue, 29 Apr 2008 08:03:15 +0000 (01:03 -0700)]
edac: add e752x parameter for sysbus_parity selection
Add a module parameter "sysbus_parity" to allow forcing system bus parity
error checking on or off. Also add support to automatically disable system
bus parity errors for processors which do not support it.
If the sysbus_parity parameter is specified, sysbus parity detection will be
forced on or off. If it is not specified, the driver will attempt to look at
the CPU identifier string and determine if the CPU supports system bus parity.
A blacklist was used instead of a whitelist so that system bus parity would
be enabled by default and to minimize the chances of breaking things for those
people already using the driver which for some reason have a processor that
does not have a valid CPU identifier string.
[akpm@linux-foundation.org: coding-style fixes] Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Peter Tyser <ptyser@xes-inc.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 29 Apr 2008 08:03:13 +0000 (01:03 -0700)]
idr: create idr_layer_cache at boot time
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.
This change also enables us to eliminate the check every time idr_init() is
called.
[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some drivers have duplicated unlikely() macros. IS_ERR() already has
unlikely() in itself.
This patch cleans up such pointless code.
Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jeff Garzik <jeff@garzik.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
printf("Drained the pool\n");
read(fd, &n, sizeof(n));
printf("Found more randomness\n");
return(0);
}
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Mackall [Tue, 29 Apr 2008 08:03:05 +0000 (01:03 -0700)]
random: make mixing interface byte-oriented
Switch add_entropy_words to a byte-oriented interface, eliminating numerous
casts and byte/word size rounding issues. This also reduces the overall
bit/byte/word confusion in this code.
We now mix a byte at a time into the word-based pool. This takes four times
as many iterations, but should be negligible compared to hashing overhead.
This also increases our pool churn, which adds some depth against some
theoretical failure modes.
The function name is changed to emphasize pool mixing and deemphasize entropy
(the samples mixed in may not contain any). extract is added to the core
function to make it clear that it extracts from the pool.
Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Mackall [Tue, 29 Apr 2008 08:03:03 +0000 (01:03 -0700)]
random: remove some prefetch logic
The urandom output pool (ie the fast path) fits in one cacheline, so
this is pretty unnecessary. Further, the output path has already
fetched the entire pool to hash it before calling in here.
(This was the only user of prefetch_range in the kernel, and it passed
in words rather than bytes!)
Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Mackall [Tue, 29 Apr 2008 08:03:00 +0000 (01:03 -0700)]
random: make backtracking attacks harder
At each extraction, we change (poolbits / 16) + 32 bits in the pool,
or 96 bits in the case of the secondary pools. Thus, a brute-force
backtracking attack on the pool state is less difficult than breaking
the hash. In certain cases, this difficulty may be is reduced to 2^64
iterations.
Instead, hash the entire pool in one go, then feedback the whole hash
(160 bits) in one go. This will make backtracking at least as hard as
inverting the hash.
Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Tue, 29 Apr 2008 08:02:54 +0000 (01:02 -0700)]
generalize asm-generic/ioctl.h to allow overriding values
In the spirit of a number of other asm-generic header files,
generalize asm-generic/ioctl.h to allow arch-specific ioctl.h headers
to simply override _IOC_SIZEBITS and/or _IOC_DIRBITS before including
this header file, allowing a number of ioctl.h header files to be
shortened considerably.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Tue, 29 Apr 2008 08:02:52 +0000 (01:02 -0700)]
nbd: delete superfluous test for __GNUC__
Since <linux/compiler.h> already tests for __GNUC__, there's no point in nbd.h
repeating that test.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Permit the use of partitions with network block devices (NBD).
A new parameter is introduced to define how many partition we want to be able
to manage per network block device. This parameter is "max_part".
For instance, to manage 63 partitions / loop device, we will do:
[on the server side]
# nbd-server 1234 /dev/sdb
[on the client side]
# modprobe nbd max_part=63
# ls -l /dev/nbd*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:14 /dev/nbd0
brw-rw---- 1 root disk 43, 64 2008-03-25 11:11 /dev/nbd1
brw-rw---- 1 root disk 43, 640 2008-03-25 11:11 /dev/nbd10
brw-rw---- 1 root disk 43, 704 2008-03-25 11:11 /dev/nbd11
brw-rw---- 1 root disk 43, 768 2008-03-25 11:11 /dev/nbd12
brw-rw---- 1 root disk 43, 832 2008-03-25 11:11 /dev/nbd13
brw-rw---- 1 root disk 43, 896 2008-03-25 11:11 /dev/nbd14
brw-rw---- 1 root disk 43, 960 2008-03-25 11:11 /dev/nbd15
brw-rw---- 1 root disk 43, 128 2008-03-25 11:11 /dev/nbd2
brw-rw---- 1 root disk 43, 192 2008-03-25 11:11 /dev/nbd3
brw-rw---- 1 root disk 43, 256 2008-03-25 11:11 /dev/nbd4
brw-rw---- 1 root disk 43, 320 2008-03-25 11:11 /dev/nbd5
brw-rw---- 1 root disk 43, 384 2008-03-25 11:11 /dev/nbd6
brw-rw---- 1 root disk 43, 448 2008-03-25 11:11 /dev/nbd7
brw-rw---- 1 root disk 43, 512 2008-03-25 11:11 /dev/nbd8
brw-rw---- 1 root disk 43, 576 2008-03-25 11:11 /dev/nbd9
# nbd-client localhost 1234 /dev/nbd0
Negotiation: ..size = 80418240KB
bs=1024, sz=80418240
-------NOTE, RFC: partition table is not automatically read.
The driver sets bdev->bd_invalidated to 1 to force the read of the partition
table of the device, but this is done only on an open of the device.
So we have to do a "touch /dev/nbdX" or something like that.
It can't be done from the nbd-client or nbd driver because at this
level we can't ask to read the partition table and to serve the request
at the same time (-> deadlock)
If someone has a better idea, I'm open to any suggestion.
-------NOTE, RFC
# fdisk -l /dev/nbd0
Disk /dev/nbd0: 82.3 GB, 82348277760 bytes
255 heads, 63 sectors/track, 10011 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/nbd0p1 * 1 9965 80043831 83 Linux
/dev/nbd0p2 9966 10011 369495 5 Extended
/dev/nbd0p5 9966 10011 369463+ 82 Linux swap / Solaris
# ls -l /dev/nbd0*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:16 /dev/nbd0
brw-rw---- 1 root disk 43, 1 2008-03-25 11:16 /dev/nbd0p1
brw-rw---- 1 root disk 43, 2 2008-03-25 11:16 /dev/nbd0p2
brw-rw---- 1 root disk 43, 5 2008-03-25 11:16 /dev/nbd0p5
# mount /dev/nbd0p1 /mnt
# ls /mnt
bin dev initrd lost+found opt sbin sys var
boot etc initrd.img media proc selinux tmp vmlinuz
cdrom home lib mnt root srv usr
# umount /mnt
# nbd-client -d /dev/nbd0
# ls -l /dev/nbd0*
brw-rw---- 1 root disk 43, 0 2008-03-25 11:16 /dev/nbd0
-------NOTE
On "nbd-client -d", we can do an iocl(BLKRRPART) to update partition table:
as the size of the device is 0, we don't have to serve the partition manager
request (-> no deadlock).
-------NOTE
Signed-off-by: Paul Clements <paul.clements@steeleye.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch allows Network Block Device to be mounted locally (nbd-client to
nbd-server over 127.0.0.1).
It creates a kthread to avoid the deadlock described in NBD tools
documentation. So, if nbd-client hangs waiting for pages, the kblockd thread
can continue its work and free pages.
I have tested the patch to verify that it avoids the hang that always occurs
when writing to a localhost nbd connection. I have also tested to verify that
no performance degradation results from the additional thread and queue.
Patch originally from Laurent Vivier.
Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Gardner [Tue, 29 Apr 2008 08:02:45 +0000 (01:02 -0700)]
edd: add default mode CONFIG_EDD_OFF=n, override with edd={on,off}
Add a kernel parameter option to 'edd' to enable/disable BIOS Enhanced Disk
Drive Services. CONFIG_EDD_OFF disables EDD while still compiling EDD into
the kernel. Default behavior can be forced using 'edd=on' or 'edd=off' as
a kernel parameter.
[akpm@linux-foundation.org: fix kernel-parameters.txt] Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Tue, 29 Apr 2008 08:02:44 +0000 (01:02 -0700)]
sysctl: add the ->permissions callback on the ctl_table_root
When reading from/writing to some table, a root, which this table came from,
may affect this table's permissions, depending on who is working with the
table.
The core hunk is at the bottom of this patch. All the rest is just pushing
the ctl_table_root argument up to the sysctl_perm() function.
This will be mostly (only?) used in the net sysctls.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Tue, 29 Apr 2008 08:02:41 +0000 (01:02 -0700)]
sysctl: clean from unneeded extern and forward declarations
The do_sysctl_strategy isn't used outside kernel/sysctl.c, so this can be
static and without a prototype in header.
Besides, move this one and parse_table() above their callers and drop the
forward declarations of the latter call.
One more "besides" - fix two checkpatch warnings: space before a ( and an
extra space at the end of a line.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Tue, 29 Apr 2008 08:02:40 +0000 (01:02 -0700)]
sysctl: merge equal proc_sys_read and proc_sys_write
Many (most of) sysctls do not have a per-container sense. E.g.
kernel.print_fatal_signals, vm.panic_on_oom, net.core.netdev_budget and so on
and so forth. Besides, tuning then from inside a container is not even
secure. On the other hand, hiding them completely from the container's tasks
sometimes causes user-space to stop working.
When developing net sysctl, the common practice was to duplicate a table and
drop the write bits in table->mode, but this approach was not very elegant,
lead to excessive memory consumption and was not suitable in general.
Here's the alternative solution. To facilitate the per-container sysctls
ctl_table_root-s were introduced. Each root contains a list of
ctl_table_header-s that are visible to different namespaces. The idea of this
set is to add the permissions() callback on the ctl_table_root to allow ctl
root limit permissions to the same ctl_table-s.
The main user of this functionality is the net-namespaces code, but later this
will (should) be used by more and more namespaces, containers and control
groups.
Actually, this idea's core is in a single hunk in the third patch. First two
patches are cleanups for sysctl code, while the third one mostly extends the
arguments set of some sysctl functions.
This patch:
These ->read and ->write callbacks act in a very similar way, so merge these
paths to reduce the number of places to patch later and shrink the .text size
(a bit).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: introduce proc_create_data to setup de->data
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code. The original OOPS is described in the
commit 2d3a4e3666325a9709cc8ea2e88151394e8f20fc:
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.
create_proc_entries is replaced in the entire kernel code as new method
is also simply better.
This patch:
The problem is the same as for de->proc_fops. Right now PDE becomes visible
without data set. So, the entry could be looked up without data. This, in
most cases, will simply OOPS.
proc_create_data call is created to address this issue. proc_create now
becomes a wrapper around it.
Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Chris Mason <chris.mason@oracle.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Karsten Keil <kkeil@suse.de> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Len Brown <lenb@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Neil Brown <neilb@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Osterlund <petero2@telia.com> Cc: Pierre Peiffer <peifferp@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: convert /proc/tty/ldiscs to seq_file interface
Note: THIS_MODULE and header addition aren't technically needed because
this code is not modular, but let's keep it anyway because people
can copy this code into modular code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that last dozen or so users of ->get_info were removed, ditch it too.
Everyone sane shouldd have switched to seq_file interface long ago.
P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
is long-standing crap, BTW, thus
a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
b) new such users should be rejected,
c) everyone is encouraged to convert his favourite ->read_proc user or
I'll do it, lazy bastards.
proc: switch /proc/scsi/device_info to seq_file interface
Note 1: 0644 should be used, but root bypasses permissions, so writing
to /proc/scsi/device_info still works.
Note 2: looks like scsi_dev_info_list is unprotected
Note 3: probably make proc whine about "unwriteable but with ->write hook"
entries. Probably.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: switch /proc/irda/irnet to seq_file interface
Probably interface misuse, because of the way iterating over hashbin is done.
However! Printing of socket number ("IrNET socket %d - ", i++") made conversion
to proper ->start/->next difficult enough to do blindly without hardware.
Said that, please apply.
Remove useless comment while I am it.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: switch /proc/bus/ecard/devices to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Yani Ioannou <yani.ioannou@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
PDE should consist of only one path component, otherwise creation or or
removal will fail. However, if NULL is passed as parent then create/remove
accept full path as a argument. This is arbitrary restriction -- all
infrastructure is in place.
proc_subdir_lock protects only modifying and walking through PDE lists, so
after we've found PDE to remove and actually removed it from lists, there is
no need to hold proc_subdir_lock for the rest of operation.
Roland McGrath [Tue, 29 Apr 2008 08:01:38 +0000 (01:01 -0700)]
procfs: mem permission cleanup
This cleans up the permission checks done for /proc/PID/mem i/o calls. It
puts all the logic in a new function, check_mem_permission().
The old code repeated the (!MAY_PTRACE(task) || !ptrace_may_attach(task))
magical expression multiple times. The new function does all that work in one
place, with clear comments.
The old code called security_ptrace() twice on successful checks, once in
MAY_PTRACE() and once in __ptrace_may_attach(). Now it's only called once,
and only if all other checks have succeeded.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Helsley [Tue, 29 Apr 2008 08:01:36 +0000 (01:01 -0700)]
procfs task exe symlink
The kernel implements readlink of /proc/pid/exe by getting the file from
the first executable VMA. Then the path to the file is reconstructed and
reported as the result.
Because of the VMA walk the code is slightly different on nommu systems.
This patch avoids separate /proc/pid/exe code on nommu systems. Instead of
walking the VMAs to find the first executable file-backed VMA we store a
reference to the exec'd file in the mm_struct.
That reference would prevent the filesystem holding the executable file
from being unmounted even after unmapping the VMAs. So we track the number
of VM_EXECUTABLE VMAs and drop the new reference when the last one is
unmapped. This avoids pinning the mounted filesystem.
[akpm@linux-foundation.org: improve comments]
[yamamoto@valinux.co.jp: fix dup_mmap] Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>
Cc:"Eric W. Biederman" <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc: print more information when removing non-empty directories
This usually saves one recompile to insert similar printk like below. :)
Sample nastygram:
remove_proc_entry: removing non-empty directory '/proc/foo', leaking at least 'bar'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:776 remove_proc_entry+0x18a/0x200()
Modules linked in: foo(-) container fan battery dock sbs ac sbshc backlight ipv6 loop af_packet amd_rng sr_mod i2c_amd8111 i2c_amd756 cdrom i2c_core button thermal processor
Pid: 3034, comm: rmmod Tainted: G M 2.6.25-rc1 #5
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 29 Apr 2008 08:01:34 +0000 (01:01 -0700)]
keys: make key_serial() a function if CONFIG_KEYS=y
Make key_serial() an inline function rather than a macro if CONFIG_KEYS=y.
This prevents double evaluation of the key pointer and also provides better
type checking.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Tue, 29 Apr 2008 08:01:32 +0000 (01:01 -0700)]
keys: explicitly include required slab.h header file.
Since these two source files invoke kmalloc(), they should explicitly
include <linux/slab.h>.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maximum number of keys that each non-root user may have and the maximum
total number of bytes of data that each of those users may have stored in
their keys.
Also increase the quotas as a number of people have been complaining that it's
not big enough. I'm not sure that it's big enough now either, but on the
other hand, it can now be set in /etc/sysctl.conf.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: <kwc@citi.umich.edu> Cc: <arunsr@cse.iitk.ac.in> Cc: <dwalsh@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 29 Apr 2008 08:01:31 +0000 (01:01 -0700)]
keys: don't generate user and user session keyrings unless they're accessed
Don't generate the per-UID user and user session keyrings unless they're
explicitly accessed. This solves a problem during a login process whereby
set*uid() is called before the SELinux PAM module, resulting in the per-UID
keyrings having the wrong security labels.
This also cures the problem of multiple per-UID keyrings sometimes appearing
due to PAM modules (including pam_keyinit) setuiding and causing user_structs
to come into and go out of existence whilst the session keyring pins the user
keyring. This is achieved by first searching for extant per-UID keyrings
before inventing new ones.
The serial bound argument is also dropped from find_keyring_by_name() as it's
not currently made use of (setting it to 0 disables the feature).
Signed-off-by: David Howells <dhowells@redhat.com> Cc: <kwc@citi.umich.edu> Cc: <arunsr@cse.iitk.ac.in> Cc: <dwalsh@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
keys: allow clients to set key perms in key_create_or_update()
The key_create_or_update() function provided by the keyring code has a default
set of permissions that are always applied to the key when created. This
might not be desirable to all clients.
Here's a patch that adds a "perm" parameter to the function to address this,
which can be set to KEY_PERM_UNDEF to revert to the current behaviour.
Signed-off-by: Arun Raghavan <arunsr@cse.iitk.ac.in> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 29 Apr 2008 08:01:26 +0000 (01:01 -0700)]
keys: add keyctl function to get a security label
Add a keyctl() function to get the security label of a key.
The following is added to Documentation/keys.txt:
(*) Get the LSM security context attached to a key.
long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
size_t buflen)
This function returns a string that represents the LSM security context
attached to a key in the buffer provided.
Unless there's an error, it always returns the amount of data it could
produce, even if that's too big for the buffer, but it won't copy more
than requested to userspace. If the buffer pointer is NULL then no copy
will take place.
A NUL character is included at the end of the string if the buffer is
sufficiently big. This is included in the returned count. If no LSM is
in force then an empty string will be returned.
A process must have view permission on the key for this function to be
successful.
[akpm@linux-foundation.org: declare keyctl_get_security()] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: Paul Moore <paul.moore@hp.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>