* ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
hrtimer: hook compat_sys_nanosleep up to high res timer code
hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
Linus Torvalds [Thu, 18 Oct 2007 22:08:35 +0000 (15:08 -0700)]
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] kill ata_sg_is_last()
Update libata driver for bf548 atapi controller against the 2.6.24 tree.
libata-sff: Correct use of check_status()
drivers/ata: add support to Freescale 3.0Gbps SATA Controller
pata_acpi: fix build breakage if !CONFIG_PM
Linus Torvalds [Thu, 18 Oct 2007 21:51:02 +0000 (14:51 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] time: Move R4000 clockevent device code to separate configurable file
[MIPS] time: Delete dead cycles_per_jiffy, mips_timer_ack and null_timer_ack
[MIPS] IP32: Retire use of plat_timer_setup.
[MIPS] Jazz: Retire use of plat_timer_setup.
[MIPS] IP27: Convert to clock_event_device.
[MIPS] JMR3927: Convert to clock_event_device.
[MIPS] Always do the ARC64_TWIDDLE_PC thing.
Linus Torvalds [Thu, 18 Oct 2007 21:40:30 +0000 (14:40 -0700)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (51 commits)
[IPV6]: Fix again the fl6_sock_lookup() fixed locking
[NETFILTER]: nf_conntrack_tcp: fix connection reopening fix
[IPV6]: Fix race in ipv6_flowlabel_opt() when inserting two labels
[IPV6]: Lost locking in fl6_sock_lookup
[IPV6]: Lost locking when inserting a flowlabel in ipv6_fl_list
[NETFILTER]: xt_sctp: fix mistake to pass a pointer where array is required
[NET]: Fix OOPS due to missing check in dev_parse_header().
[TCP]: Remove lost_retrans zero seqno special cases
[NET]: fix carrier-on bug?
[NET]: Fix uninitialised variable in ip_frag_reasm()
[IPSEC]: Rename mode to outer_mode and add inner_mode
[IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP
[IPSEC]: Use the top IPv4 route's peer instead of the bottom
[IPSEC]: Store afinfo pointer in xfrm_mode
[IPSEC]: Add missing BEET checks
[IPSEC]: Move type and mode map into xfrm_state.c
[IPSEC]: Fix length check in xfrm_parse_spi
[IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi
[IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi
[IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input
...
Linus Torvalds [Thu, 18 Oct 2007 21:39:44 +0000 (14:39 -0700)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC/64]: Consolidate of_register_driver
[SPARC] Videopix Frame Grabber: Convert device_lock_sem to mutex
[SPARC]: Support for new termios.
[SPARC64]: Check of_get_property() return in pci_determine_mem_io_space().
[SPARC64]: Fix boot failures due to bootmem.
[SPARC64]: Implement atomic backoff.
Shannon Nelson [Thu, 18 Oct 2007 10:07:15 +0000 (03:07 -0700)]
I/OAT: Add completion callback for async_tx interface use
The async_tx interface includes a completion callback. This adds support
for using that callback, including using interrupts on completion.
[akpm@linux-foundation.org: various fixes] Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shannon Nelson [Thu, 18 Oct 2007 10:07:14 +0000 (03:07 -0700)]
I/OAT: Tighten descriptor setup performance
The change to the async_tx interface cost this driver some performance by
spreading the descriptor setup across several functions, including multiple
passes over the new descriptor chain. Here we bring the work back into one
primary function and only do one pass.
[akpm@linux-foundation.org: cleanups, uninline] Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shannon Nelson [Thu, 18 Oct 2007 10:07:13 +0000 (03:07 -0700)]
I/OAT: clean up error handling and some print messages
Make better use of dev_err(), and catch an error where the transaction
creation might fail.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shannon Nelson [Thu, 18 Oct 2007 10:07:13 +0000 (03:07 -0700)]
I/OAT: clean up of dca provider start and stop
Don't start ioat_dca if ioat_dma didn't start, and then stop ioat_dca
before stopping ioat_dma. Since the ioat_dma side does the pci device
work, This takes care of ioat_dca trying to use a bad device reference.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shannon Nelson [Thu, 18 Oct 2007 10:07:12 +0000 (03:07 -0700)]
I/OAT: cleanup pci issues
Reorder the pci release actions
Letting go of the resources in the right order helps get rid of
occasional kernel complaints.
Fix the pci_driver object name [Randy Dunlap]
Rename the struct pci_driver data so that false section mismatch
warnings won't be produced.
Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Corey Minyard [Thu, 18 Oct 2007 10:07:11 +0000 (03:07 -0700)]
IPMI: fix hotmod remove lock
The removal of proc entries was done holding a lock, which is no longer
allowed. There is no need for the lock, only a mutex is required, so switch
over to a mutex.
Corey Minyard [Thu, 18 Oct 2007 10:07:10 +0000 (03:07 -0700)]
IPMI: new NMI handling
Convert over to the new NMI handling for getting IPMI watchdog timeouts via an
NMI. This add config options to know if there is the ability to receive NMIs
and if it has an NMI post processing call. Then it modifies the IPMI watchdog
to take advantage of this so that it can know if an NMI comes in.
It also adds testing that the IPMI NMI watchdog works.
Corey Minyard [Thu, 18 Oct 2007 10:07:08 +0000 (03:07 -0700)]
IPMI: remove bogus semaphore from watchdog
Lockdep was giving an error when loading the IPMI watchdog module. It turns
out that if you try to claim a lock in a parameter handling routine, lockdep
won't see that lock as "static" yet because the module is not yet on the
module list, so it will complain.
However, the semaphore in question is completely unnecessary. So just remove
it.
Corey Minyard [Thu, 18 Oct 2007 10:07:08 +0000 (03:07 -0700)]
IPMI: don't init irq until ready
Patrick found a race at startup. Interrupts were being enabled for the IPMI
interface before the driver was really ready to handle them. This could
result in an oops if something was pending on the interface at startup and
interrupt were already enabled (technically shouldn't happen, but need to
cover for this in real life). So move the IRQ setup to the code that starts
the actual IPMI processing.
Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Patrick Schoeller <Patrick.Schoeller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Thu, 18 Oct 2007 10:07:07 +0000 (03:07 -0700)]
Replace __attribute_pure__ with __pure
To be consistent with the use of attributes in the rest of the kernel
replace all use of __attribute_pure__ with __pure and delete the definition
of __attribute_pure__.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Thu, 18 Oct 2007 10:07:05 +0000 (03:07 -0700)]
fuse: add blksize field to fuse_attr
There are cases when the filesystem will be passed the buffer from a single
read or write call, namely:
1) in 'direct-io' mode (not O_DIRECT), read/write requests don't go
through the page cache, but go directly to the userspace fs
2) currently buffered writes are done with single page requests, but
if Nick's ->perform_write() patch goes it, it will be possible to
do larger write requests. But only if the original write() was
also bigger than a page.
In these cases the filesystem might want to give a hint to the app
about the optimal I/O size.
Allow the userspace filesystem to supply a blksize value to be returned by
stat() and friends. If the field is zero, it defaults to the old
PAGE_CACHE_SIZE value.
Miklos Szeredi [Thu, 18 Oct 2007 10:07:03 +0000 (03:07 -0700)]
fuse: add list of writable files to fuse_inode
Each WRITE request must carry a valid file descriptor. When a page is written
back from a memory mapping, the file through which the page was dirtied is not
available, so a new mechananism is needed to find a suitable file in
->writepage(s).
A list of fuse_files is added to fuse_inode. The file is removed from the
list in fuse_release().
This patch is in preparation for writable mmap support.
Miklos Szeredi [Thu, 18 Oct 2007 10:07:02 +0000 (03:07 -0700)]
fuse: support BSD locking semantics
It is trivial to add support for flock(2) semantics to the existing protocol,
by setting the lock owner field to the file pointer, and passing a new
FUSE_LK_FLOCK flag with the locking request.
Miklos Szeredi [Thu, 18 Oct 2007 10:07:00 +0000 (03:07 -0700)]
fuse: clean up open file passing in setattr
Clean up supplying open file to the setattr operation. In addition to being a
cleanup it prepares for the changes in the way the open file is passed to the
setattr method.
Miklos Szeredi [Thu, 18 Oct 2007 10:06:58 +0000 (03:06 -0700)]
fuse: fix race between getattr and write
Getattr and lookup operations can be running in parallel to attribute changing
operations, such as write and setattr.
This means, that if for example getattr was slower than a write, the cached
size attribute could be set to a stale value.
To prevent this race, introduce a per-filesystem attribute version counter.
This counter is incremented whenever cached attributes are modified, and the
incremented value stored in the inode.
Before storing new attributes in the cache, getattr and lookup check, using
the version number, whether the attributes have been modified during the
request's lifetime. If so, the returned attributes are not cached, because
they might be stale.
Thanks to Jakub Bogusz for the bug report and test program.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Jakub Bogusz <jakub.bogusz@gemius.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Sandeen [Thu, 18 Oct 2007 10:06:57 +0000 (03:06 -0700)]
ext3: fix setup_new_group_blocks locking
setup_new_group_blocks() manipulates the group descriptor block bh under
the block_bitmap bh's lock. It shouldn't matter since nobody but resize
should be touching these blocks, but it's worth fixing up.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
C: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takashi Sato [Thu, 18 Oct 2007 10:06:56 +0000 (03:06 -0700)]
ext3: support large blocksize up to PAGESIZE
This patch set supports large block size(>4k, <=64k) in ext3 just enlarging
the block size limit. But it is NOT possible to have 64kB blocksize on
ext3 without some changes to the directory handling code. The reason is
that an empty 64kB directory block would have a rec_len == (__u16)2^16 ==
0, and this would cause an error to be hit in the filesystem. The proposed
solution is treat 64k rec_len with a an impossible value like rec_len =
0xffff to handle this.
The Patch-set consists of the following 2 patches.
[1/2] ext3: enlarge blocksize
- Allow blocksize up to pagesize
[2/2] ext3: fix rec_len overflow
- prevent rec_len from overflow with 64KB blocksize
Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext3, and able to handle empty directory block.
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Thu, 18 Oct 2007 10:06:53 +0000 (03:06 -0700)]
powerpc: lock bitops
Add non-trivial lock bitops implementation for powerpc.
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Thu, 18 Oct 2007 10:06:52 +0000 (03:06 -0700)]
mips: fix bitops
Documentation/atomic_ops.txt defines these primitives must contain a memory
barrier both before and after their memory operation. This is consistent with
the atomic ops implementation on mips.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Thu, 18 Oct 2007 10:06:51 +0000 (03:06 -0700)]
alpha: lock bitops
Alpha can avoid one mb when acquiring a lock with test_and_set_bit_lock.
[bunk@kernel.org: alpha bitops.h must #include <asm/barrier.h>] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Thu, 18 Oct 2007 10:06:50 +0000 (03:06 -0700)]
alpha: fix bitops
Documentation/atomic_ops.txt defines these primitives must contain a memory
barrier both before and after their memory operation. This is consistent with
the atomic ops implementation on alpha.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Satyam Sharma [Thu, 18 Oct 2007 10:06:38 +0000 (03:06 -0700)]
x86 msr driver: Misc cpuinit annotations
msr_class_cpu_callback() can be marked __cpuinit, being the notifier callback
for a __cpuinitdata notifier_block. So can be marked msr_device_create() too,
called only from the newly-__cpuinit msr_class_cpu_callback() or from
__init-marked msr_init().
Signed-off-by: Satyam Sharma <satyam@infradead.org> Cc: Andi Kleen <ak@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The return of the present "do {} while" based stub definition of
register_hotcpu_notifier() cannot be checked. This makes the stub
asymmetric w.r.t. the real HOTPLUG_CPU=y implementation that is
int-returning. So let us redefine this to be consistent with the full
version. Also do the same for unregister_hotcpu_notifier().
We cannot define these as static inline functions due to an existing GCC
bug (#33172). So define as macros that return appropriately instead (int
'0' for the register_hotcpu_notifier case and void for
unregister_hotcpu_notifier).
Michael Neuling [Thu, 18 Oct 2007 10:06:37 +0000 (03:06 -0700)]
powerpc: add scaled time accounting
This adds POWERPC specific hooks for scaled time accounting.
POWER6 includes a SPURR register. The SPURR is based off the PURR register
but is scaled based on CPU frequency and issue rates. This gives a more
accurate account of the instructions used per task. The PURR and timebase
will be constant relative to the wall clock, irrespective of the CPU
frequency.
This implementation reads the SPURR register in account_system_vtime which
is only call called on context witch and hard and soft irq entry and exit.
The percentage of user and system time is then estimated using the ratio of
these accounted by the PURR. If the SPURR is not present, the PURR read.
An earlier implementation of this patch read the SPURR whenever the PURR
was read, which included the system call entry and exit path.
Unfortunately this showed a performance regression on lmbench runs, so was
re-implemented.
I've included the lmbench results here when run bare metal on POWER6. 1st
column is the unpatch results. 2nd column is the results using the below
patch and the 3rd is the % diff of these results from the base. 4th and
5th columns are the results and % differnce from the base using the older
patch (SPURR read in syscall entry/exit path).
This moves the new items to the end of the taskstats struct as
requested by Balbir and yourself.
Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Neuling [Thu, 18 Oct 2007 10:06:34 +0000 (03:06 -0700)]
Add scaled time to taskstats based process accounting
This adds items to the taststats struct to account for user and system
time based on scaling the CPU frequency and instruction issue rates.
Adds account_(user|system)_time_scaled callbacks which architectures
can use to account for time using this mechanism.
Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Thu, 18 Oct 2007 10:06:30 +0000 (03:06 -0700)]
Add missing newlines to some uses of dev_<level> messages
Found these while looking at printk uses.
Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk
Signed-off-by: Joe Perches <joe@perches.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Roland Dreier <rolandd@cisco.com> Cc: Tilman Schmidt <tilman@imap.cc> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Greg KH <greg@kroah.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Cc: James Smart <James.Smart@Emulex.Com> Cc: Andrew Vasquez <andrew.vasquez@qlogic.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Thu, 18 Oct 2007 10:06:28 +0000 (03:06 -0700)]
Char: rocket, remove potential leak in module_init
if (controller && !request_region) then we leaked a tty driver struct, fix it
by adding function deinit tail with goto-ing into it (and from other fail
paths too)
Jiri Slaby [Thu, 18 Oct 2007 10:06:26 +0000 (03:06 -0700)]
Char: rocket, fix dynamic_dev tty
- register_device unconditionally (non-pci dependent) to have also isa
devices in /dev
- unregister devices on module removal
- don't set TTY_DRIVER_DYNAMIC_DEV twice (removed the one dependent on some
macro)
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ferenc Wagner <wferi@niif.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Thu, 18 Oct 2007 10:06:24 +0000 (03:06 -0700)]
Char: moxa, cleanup prints
- use dev_* where pdev is available (probe function)
- add some printks on fail paths
- add KERN_ macros otherwise
- remove useless verbose variable
- wrap lines to 80 cols at most
Jiri Slaby [Thu, 18 Oct 2007 10:06:20 +0000 (03:06 -0700)]
Char: cyclades, remove bottom half processing
The work done in bottom half doesn't cost much cpu time (e.g. tty_hangup
itself schedules its own bottom half), it's possible to do the work in isr
directly and save hence some .text.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Walker [Thu, 18 Oct 2007 10:06:04 +0000 (03:06 -0700)]
whitespace fixes: cpuset
Signed-off-by: Daniel Walker <dwalker@mvista.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morgan [Thu, 18 Oct 2007 10:05:59 +0000 (03:05 -0700)]
V3 file capabilities: alter behavior of cap_setpcap
The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1,
can change the capabilities of another process, p2. This is not the
meaning that was intended for this capability at all, and this
implementation came about purely because, without filesystem capabilities,
there was no way to use capabilities without one process bestowing them on
another.
Since we now have a filesystem support for capabilities we can fix the
implementation of CAP_SETPCAP.
The most significant thing about this change is that, with it in effect, no
process can set the capabilities of another process.
The capabilities of a program are set via the capability convolution
rules:
at exec() time. As such, the only influence the pre-exec() program can
have on the post-exec() program's capabilities are through the pI
capability set.
The correct implementation for CAP_SETPCAP (and that enabled by this patch)
is that it can be used to add extra pI capabilities to the current process
- to be picked up by subsequent exec()s when the above convolution rules
are applied.
Here is how it works:
Let's say we have a process, p. It has capability sets, pE, pP and pI.
Generally, p, can change the value of its own pI to pI' where
(pI' & ~pI) & ~pP = 0.
That is, the only new things in pI' that were not present in pI need to
be present in pP.
The role of CAP_SETPCAP is basically to permit changes to pI beyond
the above:
if (pE & CAP_SETPCAP) {
pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */
}
This capability is useful for things like login, which (say, via
pam_cap) might want to raise certain inheritable capabilities for use
by the children of the logged-in user's shell, but those capabilities
are not useful to or needed by the login program itself.
One such use might be to limit who can run ping. You set the
capabilities of the 'ping' program to be "= cap_net_raw+i", and then
only shells that have (pI & CAP_NET_RAW) will be able to run
it. Without CAP_SETPCAP implemented as described above, login(pam_cap)
would have to also have (pP & CAP_NET_RAW) in order to raise this
capability and pass it on through the inheritable set.
Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysctl: deprecate sys_sysctl in a user space visible fashion.
After adding checking to register_sysctl_table and finding a whole new set
of bugs. Missed by countless code reviews and testers I have finally lost
patience with the binary sysctl interface.
The binary sysctl interface has been sort of deprecated for years and
finding a user space program that uses the syscall is more difficult then
finding a needle in a haystack. Problems continue to crop up, with the in
kernel implementation. So since supporting something that no one uses is
silly, deprecate sys_sysctl with a sufficient grace period and notice that
the handful of user space applications that care can be fixed or replaced.
The /proc/sys sysctl interface that people use will continue to be
supported indefinitely.
This patch moves the tested warning about sysctls from the path where
sys_sysctl to a separate path called from both implementations of
sys_sysctl, and it adds a proper entry into
Documentation/feature-removal-schedule.
Allowing us to revisit this in a couple years time and actually kill
sys_sysctl.
[lethal@linux-sh.org: sysctl: Fix syscall disabled build] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysctl: for irda update sysctl_checks list of binary paths
It turns out that the net/irda code didn't register any of it's binary paths
in the global sysctl.h header file so I missed them completely when making an
authoritative list of binary sysctl paths in the kernel. So add them to the
list of valid binary sysctl paths.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Well it turns out after I dug into the problems a little more I was returning
a few false positives so this patch updates my logic to remove them.
- Don't complain about 0 ctl_names in sysctl_check_binary_path
It is valid for someone to remove the sysctl binary interface
and still keep the same sysctl proc interface.
- Count ctl_names and procnames as matching if they both don't
exist.
- Only warn about missing min&max when the generic functions care.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After going through the kernels sysctl tables several times it has become
clear that code review and testing is just not effective in prevent
problematic sysctl tables from being used in the stable kernel. I certainly
can't seem to fix the problems as fast as they are introduced.
Therefore this patch adds sysctl_check_table which is called when a sysctl
table is registered and checks to see if we have a problematic sysctl table.
The biggest part of the code is the table of valid binary sysctl entries, but
since we have frozen our set of binary sysctls this table should not need to
change, and it makes it much easier to detect when someone unintentionally
adds a new binary sysctl value.
As best as I can determine all of the several hundred errors spewed on boot up
now are legitimate.
[bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysctl: properly register the irda binary sysctl numbers
Grumble. These numbers should have been in sysctl.h from the beginning if we
ever expected anyone to use them. Oh well put them there now so we can find
them and make maintenance easier.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It looks like we inadvertently killed the cad_pid binary sysctl support when
cap_pid was changed to be a struct pid. Since no one has complained just
remove the binary path.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No one has bothered to set strategy routine for the the netfilter sysctls that
return jiffies to be sysctl_jiffies.
So it appears the sys_sysctl path is unused and untested, so this patch
removes the binary sysctl numbers.
Which fixes the netfilter oops in 2.6.23-rc2-mm2 for me.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of having a bunch of ifdefs in sysctl.c move all of the pty sysctl
logic into drivers/char/pty.c
As well as cleaning up the logic this prevents sysctl_check_table from
complaining that the root table has a NULL data pointer on something with
generic methods.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The sysctl binary paths don't look as if they even code work, .data is not
filled in, and all of the proc_handlers look at extra1 and there is not
strategy routine.
So just kill the binary paths.
In addition this patch removes the setting of extra1 on directories. It
doesn't look like the parport code ever examines it, and it's bad sysctl form.
[bunk@kernel.org: remove parport_device_num()] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysctl: remove the binary interface for aio-nr, aio-max-nr, acpi_video_flags
aio-nr, aio-max-nr, acpi_video_flags are unsigned long values which sysctl
does not handle properly with a 64bit kernel and a 32bit user space.
Since no one is likely to be using the binary sysctl values and the ascii
interface still works, this patch just removes support for the binary sysctl
interface from the kernel.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>