Linus Torvalds [Wed, 5 Dec 2007 17:26:52 +0000 (09:26 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
VM/Security: add security hook to do_brk
Security: round mmap hint address above mmap_min_addr
security: protect from stack expantion into low vm addresses
Security: allow capable check to permit mmap or low vm space
SELinux: detect dead booleans
SELinux: do not clear f_op when removing entries
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[LRO]: fix lro_gen_skb() alignment
[TCP]: NAGLE_PUSH seems to be a wrong way around
[TCP]: Move prior_in_flight collect to more robust place
[TCP] FRTO: Use of existing funcs make code more obvious & robust
[IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS
[ROSE]: Trivial compilation CONFIG_INET=n case
[IPVS]: Fix sched registration race when checking for name collision.
[IPVS]: Don't leak sysctl tables if the scheduler registration fails.
Al Viro [Wed, 5 Dec 2007 08:46:47 +0000 (08:46 +0000)]
remove nonsense force-casts from ocfs2
endianness annotations in networking code had been in place for quite a
while; in particular, sin_port and s_addr are annotated as big-endian.
Code in ocfs2 had __force casts added apparently to shut the sparse
warnings up; of course, these days they only serve to *produce* warnings
for no reason whatsoever...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Wed, 5 Dec 2007 07:45:31 +0000 (23:45 -0800)]
VM/Security: add security hook to do_brk
Given a specifically crafted binary do_brk() can be used to get low pages
available in userspace virtual memory and can thus be used to circumvent
the mmap_min_addr low memory protection. Add security checks in do_brk().
Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Alan Cox <alan@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Wed, 5 Dec 2007 07:45:28 +0000 (23:45 -0800)]
proc: fix proc_dir_entry refcounting
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.
This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():
1) PDE leak
remove_proc_entry de_put
----------------- ------
[refcnt = 1]
if (atomic_read(&de->count) == 0)
if (atomic_dec_and_test(&de->count))
if (de->deleted)
/* also not taken! */
free_proc_entry(de);
else
de->deleted = 1;
[refcount=0, deleted=1]
Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.
Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.
Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 5 Dec 2007 07:45:27 +0000 (23:45 -0800)]
jbd: Fix assertion failure in fs/jbd/checkpoint.c
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.
If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).
We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state. The locking there is subtle though (as
everywhere in JBD ;(). We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.
Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either. Better be safe if something
changes in future...
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Gardner [Wed, 5 Dec 2007 07:45:24 +0000 (23:45 -0800)]
gpio_cs5535: disable AUX on output
The AMD CS5535/CS5536 GPIO has two alternate output modes: AUX-1 and AUX-2.
When either AUX is enabled, the cs5535_gpio driver cannot control the
output.
Some BIOS code for the Geode processor enables AUX-1 for GPIO-1, which
configures it as the PC BEEP output.
This patch will disable AUX-1 and AUX-2 when the user enables output.
Signed-of-by: Ben Gardner <gardner.ben@gmail.com> Cc: Richard Knutsson <ricknu-0@student.ltu.se> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Wed, 5 Dec 2007 07:45:24 +0000 (23:45 -0800)]
Avoid potential NULL dereference in unregister_sysctl_table
register_sysctl_table() can return NULL sometimes, e.g. when kmalloc()
returns NULL or when sysctl check fails.
I've also noticed, that many (most?) code in the kernel doesn't check for
the return value from register_sysctl_table() and later simply calls the
unregister_sysctl_table() with potentially NULL argument.
This is unlikely on a common kernel configuration, but in case we're
dealing with modules and/or fault-injection support, there's a slight
possibility of an OOPS.
Changing all the users to check for return code from the registering does
not look like a good solution - there are too many code doing this and
failure in sysctl tables registration is not a good reason to abort module
loading (in most of the cases).
So I think, that we can just have this check in unregister_sysctl_table
just to avoid accidental OOPS-es (actually, the unregister_sysctl_table()
did exactly this, before the start_unregistering() appeared).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:23 +0000 (23:45 -0800)]
Blackfin SPI driver: reconfigure speed_hz and bits_per_word in each spi transfer
- reconfigure SPI baud from speed_hz of each spi transfer
- according to spi_transfer.bits_per_word to reprogram register and setup
correct SPI operation handlers
Signed-off-by: Bryan Wu <bryan.wu@analog.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sonic Zhang [Wed, 5 Dec 2007 07:45:17 +0000 (23:45 -0800)]
spi: spi_bfin: change handling of communication parameters
Fix SPI driver to work with SPI flash ST M25P16 on bf548
Currently the SPI driver enables the SPI controller and sets the SPI baud
register for each SPI transfer. But they should never be changed within a SPI
message session, in which several SPI transfers are pumped.
This patch moves SPI setting to the begining of a message session, and
never disables SPI controller until an error occurs.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:15 +0000 (23:45 -0800)]
spi: spi_bfin uses platform device resources
Update spi driver to support multi-ports by using platform resources; tested
on STAMP537+SPI_MMC, other boards need more testing. Plus other minor
updates.
Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:13 +0000 (23:45 -0800)]
spi: spi_bfin cleanups, error handling
Cleanup and error handling
- add error handling in SPI bus driver with selecting clients
- use proper defines to access Blackfin MMRs
- remove useless SSYNCs
- cleaner use of portmux calls
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marc Pignat [Wed, 5 Dec 2007 07:45:11 +0000 (23:45 -0800)]
spi: use simplified spi_sync() calling convention
Given the patch which simplifies the spi_sync calling convention, this one
updates the callers of that routine which tried using it according to the
previous specification. (Most didn't.)
Signed-off-by: Marc Pignat <marc.pignat@hevs.ch> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marc Pignat [Wed, 5 Dec 2007 07:45:10 +0000 (23:45 -0800)]
spi: simplify spi_sync() calling convention
Simplify spi_sync calling convention, eliminating the need to check both
the return value AND the message->status. In consequence, this corrects
misbehaviours of spi_read and spi_write (which only checked the former) and
their callers.
Signed-off-by: Marc Pignat <marc.pignat@hevs.ch> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Wed, 5 Dec 2007 07:45:10 +0000 (23:45 -0800)]
spi: at25 driver is for EEPROM not FLASH
Add comment to at25 driver that it's for EEPROM chips, not FLASH
chips ... the AT25 series has both types of chip, and sometimes
they're even pin-compatible. The command sets are different, as
is the treatment of erasure. (FLASH needs explicit erasure, but
with EEPROM it's implicit.) Note that all vendors seem to have
this same confusion in their *25* series SPI memory parts.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes regression, introduced since 2.6.16. NextStep variant of
UFS as OpenStep uses directory block size equals to 1024. Without this
change, ufs_check_page fails in many cases.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Dave Bailey <dsbailey@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Kosina [Wed, 5 Dec 2007 07:45:05 +0000 (23:45 -0800)]
RTC: assure proper memory ordering with respect to RTC_DEV_BUSY flag
We must make sure that the RTC_DEV_BUSY flag has proper lock semantics,
i.e. that the RTC_DEV_BUSY stores clearing the flag don't get reordered
before the preceeding stores and loads and vice versa.
Spotted by Nick Piggin.
Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: David Brownell <david-b@pacbell.net> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we are complicating the code in copy_process, the clone ABI, and
if we fix the bugs sys_setsid itself, with an unnecessary open coded
version of sys_setsid.
So just simplify everything and don't special case the session and pgrp of
the initial process in a pid namespace.
Having this special case actually presents to user space the classic linux
startup conditions with session == pgrp == 0 for /sbin/init.
We already handle sending signals to processes in a child pid namespace.
We need to handle sending signals to processes in a parent pid namespace
for cases like SIGCHILD and SIGIO.
This makes nothing extra visible inside a pid namespace. So this extra
special case appears to have no redeeming merits.
Further removing this special case increases the flexibility of how we can
use pid namespaces, by not requiring the initial process in a pid namespace
to be a daemon.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer [Wed, 5 Dec 2007 07:45:02 +0000 (23:45 -0800)]
aio: only account I/O wait time in read_events if there are active requests
On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
running (but completely idle). The UML code sits in io_getevents waiting for
an event to be submitted and completed.
Fix this by checking ctx->reqs_active before scheduling to determine whether
or not we are waiting for I/O.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Jeff Dike <jdike@addtoit.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 5 Dec 2007 14:46:09 +0000 (15:46 +0100)]
lockdep: in_range() fix
Torsten Kaiser wrote:
| static inline int in_range(const void *start, const void *addr, const void *end)
| {
| return addr >= start && addr <= end;
| }
| This will return true, if addr is in the range of start (including)
| to end (including).
|
| But debug_check_no_locks_freed() seems does:
| const void *mem_to = mem_from + mem_len
| -> mem_to is the last byte of the freed range, that fits in_range
| lock_from = (void *)hlock->instance;
| -> first byte of the lock
| lock_to = (void *)(hlock->instance + 1);
| -> first byte of the next lock, not last byte of the lock that is being checked!
|
| The test is:
| if (!in_range(mem_from, lock_from, mem_to) &&
| !in_range(mem_from, lock_to, mem_to))
| continue;
| So it tests, if the first byte of the lock is in the range that is freed ->OK
| And if the first byte of the *next* lock is in the range that is freed
| -> Not OK.
We can also simplify in_range checks, we need only 2 comparisons, not 4.
If the lock is not in memory range, it should be either at the left of range
or at the right.
Steven Rostedt [Wed, 5 Dec 2007 14:46:09 +0000 (15:46 +0100)]
futex: fix for futex_wait signal stack corruption
David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too. The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.
Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.
Here's the relevant code from kernel/futex.c: (not in order in the file)
[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
u32 val3)
{
struct timespec ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
return -EINVAL;
t = timespec_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add(ktime_get(), t);
tp = &t;
}
[...]
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
[...]
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL;
if (!(op & FUTEX_PRIVATE_FLAG))
fshared = ¤t->mm->mmap_sem;
switch (cmd) {
case FUTEX_WAIT:
ret = futex_wait(uaddr, fshared, val, timeout);
So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.
This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.
I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.
The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters. This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.
Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for. If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Gallatin [Wed, 5 Dec 2007 10:31:42 +0000 (02:31 -0800)]
[LRO]: fix lro_gen_skb() alignment
Add a field to the lro_mgr struct so that drivers can specify how much
padding is required to align layer 3 headers when a packet is copied
into a freshly allocated skb by inet_lro.c:lro_gen_skb(). Without
padding, skbs generated by LRO will cause alignment warnings on
architectures which require strict alignment (seen on sparc64).
Myri10GE is updated to use this field.
Signed-off-by: Andrew Gallatin <gallatin@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Wed, 5 Dec 2007 10:21:35 +0000 (02:21 -0800)]
[TCP]: Move prior_in_flight collect to more robust place
The previous location is after sacktag processing, which affects
counters tcp_packets_in_flight depends on. This may manifest as
wrong behavior if new SACK blocks are present and all is clear
for call to tcp_cong_avoid, which in the case of
tcp_reno_cong_avoid bails out early because it thinks that
TCP is not limited by cwnd.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Wed, 5 Dec 2007 10:20:21 +0000 (02:20 -0800)]
[TCP] FRTO: Use of existing funcs make code more obvious & robust
Though there's little need for everything that tcp_may_send_now
does (actually, even the state had to be adjusted to pass some
checks FRTO does not want to occur), it's more robust to let it
make the decision if sending is allowed. State adjustments
needed:
- Make sure snd_cwnd limit is not hit in there
- Disable nagle (if necessary) through the frto_counter == 2
The result of check for frto_counter in argument to call for
tcp_enter_frto_loss can just be open coded, therefore there
isn't need to store the previous frto_counter past
tcp_may_send_now.
In addition, returns can then be combined.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Tue, 4 Dec 2007 08:45:06 +0000 (00:45 -0800)]
[IPVS]: Fix sched registration race when checking for name collision.
The register_ip_vs_scheduler() checks for the scheduler with the
same name under the read-locked __ip_vs_sched_lock, then drops,
takes it for writing and puts the scheduler in list.
This is racy, since we can have a race window between the lock
being re-locked for writing.
The fix is to search the scheduler with the given name right under
the write-locked __ip_vs_sched_lock.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Tue, 4 Dec 2007 08:43:24 +0000 (00:43 -0800)]
[IPVS]: Don't leak sysctl tables if the scheduler registration fails.
In case we load lblc or lblcr module we can leak some sysctl
tables if the call to register_ip_vs_scheduler() fails.
I've looked at the register_ip_vs_scheduler() code and saw, that
the only reason to fail is the name collision, so I think that
with some 3rd party schedulers this becomes a relevant issue. No?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Paris [Tue, 4 Dec 2007 16:06:55 +0000 (11:06 -0500)]
VM/Security: add security hook to do_brk
Given a specifically crafted binary do_brk() can be used to get low
pages available in userspace virtually memory and can thus be used to
circumvent the mmap_min_addr low memory protection. Add security checks
in do_brk().
Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Mon, 26 Nov 2007 23:47:40 +0000 (18:47 -0500)]
Security: round mmap hint address above mmap_min_addr
If mmap_min_addr is set and a process attempts to mmap (not fixed) with a
non-null hint address less than mmap_min_addr the mapping will fail the
security checks. Since this is just a hint address this patch will round
such a hint address above mmap_min_addr.
gcj was found to try to be very frugal with vm usage and give hint addresses
in the 8k-32k range. Without this patch all such programs failed and with
the patch they happily get a higher address.
This patch is wrappad in CONFIG_SECURITY since mmap_min_addr doesn't exist
without it and there would be no security check possible no matter what. So
we should not bother compiling in this rounding if it is just a waste of
time.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Mon, 26 Nov 2007 23:47:46 +0000 (18:47 -0500)]
Security: allow capable check to permit mmap or low vm space
On a kernel with CONFIG_SECURITY but without an LSM which implements
security_file_mmap it is impossible for an application to mmap addresses
lower than mmap_min_addr. Based on a suggestion from a developer in the
openwall community this patch adds a check for CAP_SYS_RAWIO. It is
assumed that any process with this capability can harm the system a lot
more easily than writing some stuff on the zero page and then trying to
get the kernel to trip over itself. It also means that programs like X
on i686 which use vm86 emulation can work even with mmap_min_addr set.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
Stephen Smalley [Mon, 26 Nov 2007 16:12:53 +0000 (11:12 -0500)]
SELinux: detect dead booleans
Instead of using f_op to detect dead booleans, check the inode index
against the number of booleans and check the dentry name against the
boolean name for that index on reads and writes. This prevents
incorrect use of a boolean file opened prior to a policy reload while
allowing valid use of it as long as it still corresponds to the same
boolean in the policy.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
Grant Likely [Sun, 2 Dec 2007 05:10:03 +0000 (22:10 -0700)]
gianfar: fix compile warning
Eliminate an uninitialized variable warning. The code is correct, but
a pointer to the automatic variable 'addr' is passed to dma_alloc_coherent.
Since addr has never been initialized, and the compiler doesn't know
what dma_alloc_coherent will do with it, it complains.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Olof Johansson [Tue, 4 Dec 2007 03:34:14 +0000 (21:34 -0600)]
pasemi_mac: Fix reuse of free'd skb
Turns out we're freeing the skb when we detect CRC error, but we're
not clearing out info->skb. We could either clear it and have the stack
reallocate it, or just leave it and the rx ring refill code will reuse
the one that was allocated.
Reusing a freed skb obviously caused some nasty crashes of various kind,
as reported by Brent Baude and David Woodhouse.
Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Prevent deadlock in sky2 recovery logic. sky2_down calls napi_synchronize
which gets stuck if napi was already disabled.
Fix by rearranging slightly and not calling napi_disable until after
both ports are stopped. The napi_disable probably is being overly
paranoid, but it is safe now.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jon Smirl [Mon, 3 Dec 2007 22:38:10 +0000 (22:38 +0000)]
Fix memory corruption in fec_mpc52xx
The mpc5200 fec driver is corrupting memory. This patch fixes two bugs
where the wrong skb was being referenced.
Signed-off-by: Jon Smirl <jonsmirl@gmail.com> Acked-by: Domen Puncer <domen.puncer@telargo.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
pata_amd/pata_via: de-couple programming of PIO/MWDMA and UDMA timings
* Don't program UDMA timings when programming PIO or MWDMA modes.
This has also a nice side-effect of fixing regression added by commit 681c80b5d96076f447e8101ac4325c82d8dce508 ("libata: correct handling of
SRST reset sequences") (->set_piomode method for PIO0 is called before
->cable_detect method which checks UDMA timings to get the cable type).
* Bump driver version.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Tested-by: "Thomas Lindroth" <thomas.lindroth@gmail.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Mark Lord [Tue, 4 Dec 2007 19:07:52 +0000 (14:07 -0500)]
sata_mv: Warn about HPT RocketRAID BIOS treatment of "Legacy" drives
The Highpoint RocketRAID boards using Marvell 7042 chips
overwrite the 9th sector of attached drives at boot time,
when those drives are configured as "Legacy" (the default)
in the HighPoint BIOS.
This kills GRUB, and probably other stuff.
But it all happens *before* Linux is even loaded.
So, for now we'll log a WARNING when such boards are detected,
and advise users to configure BIOS "JBOD" volumes instead,
which don't appear to suffer from this problem.
Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Robert Hancock [Sun, 25 Nov 2007 22:59:36 +0000 (16:59 -0600)]
sata_nv: don't use legacy DMA in ADMA mode (v3)
We need to run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
corruption could potentially result if this happened. Also, fail any attempt to
try and issue NCQ commands with result taskfile requested, since the hardware
doesn't allow this.
Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Linus Torvalds [Tue, 4 Dec 2007 17:37:39 +0000 (09:37 -0800)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] Make sure the restore psw masks are initialized.
[S390] Fix compile error on 31bit without preemption
[S390] dcssblk: prevent early access without own make_request function
[S390] cio: add missing reprobe loop end statement
[S390] cio: Issue SenseID per path.
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
sched: default to more agressive yield for SCHED_BATCH tasks
sched: fix crash in sys_sched_rr_get_interval()
Adrian Bunk [Tue, 4 Dec 2007 13:35:00 +0000 (14:35 +0100)]
MAINTAINERS: remove the MTRR entry
I haven't seen Richard doing MTRR related work for quite some time, and
the "X86 ARCHITECTURE" entry in MAINTAINERS already covers the people
currently responsible for this code.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don Zickus [Tue, 4 Dec 2007 16:19:07 +0000 (17:19 +0100)]
x86: add the word 'WARNING' in check_nmi_watchdog() output
Our automated test suite looks for keywords like error, fail, warning in
the boot log. In the case when the nmi watchdog is determined to be
stuck in check_nmi_watchdog(), none of those keywords are displayed.
This patch adds a keyword, "WARNING:", so it makes it easier to notice
when the nmi watchdog isn't working correctly. Also add a proper
KERN_WARNING mark to this printout.
Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Tue, 4 Dec 2007 16:04:39 +0000 (17:04 +0100)]
sched: default to more agressive yield for SCHED_BATCH tasks
do more agressive yield for SCHED_BATCH tuned tasks: they are all
about throughput anyway. This allows a gentler migration path for
any apps that relied on stronger yield.
arch/s390/kernel/built-in.o: In function `cleanup_io_leave_insn':
/space/kvm/arch/s390/kernel/entry.S:(.text+0xbfce): undefined reference to `preempt_schedule_irq'
This patch hides preempt_schedule_irq if CONFIG_PREEMPT is not set.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[S390] dcssblk: prevent early access without own make_request function
When loading a dcss segment with the dcssblk driver, sometimes the
following kind of message appears:
bio too big device dcssblk0 (8 > 0)
Buffer I/O error on device dcssblk0, logical block 172016
..
The fix is to move the disk registration after setting the
make_request function, to avoid calls into generic_make_request
for dcssblock without having the make_request function set up
properly.
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cornelia Huck [Tue, 4 Dec 2007 15:09:01 +0000 (16:09 +0100)]
[S390] cio: Issue SenseID per path.
We may receive a unit check for every path when we issue a SenseID.
Unfortunately, the channel subsystem will try on a different path
every time if we use a lpm of 0xff, which will exhaust our retry
counter.
Therefore, revert SenseID to its previous per-path behaviour and
just leave out the suspend multipath reconnect.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>