[PATCH] select_bad_process(): kill a bogus PF_DEAD/TASK_DEAD check
The only one usage of TASK_DEAD outside of last schedule path,
select_bad_process:
for_each_task(p) {
if (!p->mm)
continue;
...
if (p->state == TASK_DEAD)
continue;
...
TASK_DEAD state is set at the end of do_exit(), this means that p->mm
was already set == NULL by exit_mm(), so this task was already rejected
by 'if (!p->mm)' above.
Note also that the caller holds tasklist_lock, this means that p can't
pass exit_notify() and then set TASK_DEAD when p->mm != NULL.
Also, remove open-coded is_init().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I am not sure about this patch, I am asking Ingo to take a decision.
task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion
it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should
live only in ->exit_state as documented in sched.h.
Note that this state is not visible to user-space, get_task_state() masks off
unsuitable states.
[PATCH] set EXIT_DEAD state in do_exit(), not in schedule()
schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD
to ensure that the exiting task will be deactivated. Note that this EXIT_DEAD
is in fact a "random" value, we can use any bit except normal TASK_XXX values.
It is better to set this state in do_exit() along with PF_DEAD flag and remove
that check in schedule().
We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it
can not change task's ->state: the 'state' argument of try_to_wake_up() can't
have EXIT_DEAD bit. And in case when try_to_wake_up() sees a stale value of
->state == TASK_RUNNING it will do nothing.
Introduce the disable_irq_nosync_lockdep_irqsave() and
enable_irq_lockdep_irqrestore() APIs. These are needed for NE2000; basically
NE2000 calls disable_irq and enable_irq as locking against the IRQ handler,
but both in cases where interrupts are on and off. This means that lockdep
needs to track the old state of the virtual irq flags on disable_irq, and
restore these at enable_irq time.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] cramfs: make cramfs_uncompress_exit() return void
It always returns 0, so relying on it is useless. The only caller isn't
checking return value. In general, un-, de-, -free functions should return
void.
[PATCH] Pass a lock expression to __cond_lock, like __acquire and __release
Currently, __acquire and __release take a lock expression, but __cond_lock
takes only a condition, not the lock acquired if the expression evaluates
to true. Change __cond_lock to accept a lock expression, and change all
the callers to pass in a lock expression.
At the beginning of the routine, "copied" is set to 0, but it is no good
because in lines 805 and 812 it is set to other values. Finally, the
routine returns as if it copied 12 (=ENOMEM) bytes less than it actually
did.
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Acked-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jason Baron [Fri, 29 Sep 2006 09:01:01 +0000 (02:01 -0700)]
[PATCH] block_dev.c mutex_lock_nested() fix
In the case below we are locking the whole disk not a partition. This
change simply brings the code in line with the piece above where when we
are the 'first' opener, and we are a partition.
Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Replace _spin_trylock with spin_trylock in the IRQ variants to use __cond_lock
spin_trylock_irq and spin_trylock_irqsave use _spin_trylock, which does not
use the __cond_lock wrapper annotation and thus does not affect the lock
context; change them to use spin_trylock instead, which does use
__cond_lock.
[PATCH] Make spinlock/rwlock annotations more accurate by using parameters, not types
The lock annotations used on spinlocks and rwlocks currently use
__{acquires,releases}(spinlock_t) and __{acquires,releases}(rwlock_t),
respectively. This loses the information of which lock actually got
acquired or released, and assumes a different type for the parameter of
__acquires and __releases than the rest of the kernel. While the current
implementations of __acquires and __releases throw away their argument,
this will not always remain the case. Change this to use the lock
parameter instead, to preserve this information and increase consistency in
usage of __acquires and __releases.
Alan Cox [Fri, 29 Sep 2006 09:00:58 +0000 (02:00 -0700)]
[PATCH] tty: Fix bits and note more bits to fix
If your driver implements "break on" and "break off" this ensures you won't
get multiple overlapping requests or requests in parallel. If your driver
has its own break handling then its still your problem as the driver
author.
Break is also now serialized against writes from user space properly but no
new guarantees are made driver level about writes from the line discipline
itself (eg flow control or echo)
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Cox [Fri, 29 Sep 2006 09:00:57 +0000 (02:00 -0700)]
[PATCH] solaris emulation: incorrect tty locking
[akpm@osdl.org: build fix]
[akpm@osdl.org: warning fix] Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds a missing exit, if the file that should be parsed couldn't be opened.
Without it crashes with a segfault, cause the filedescriptor is accessed
even if the file could not be opened.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ian Kent [Fri, 29 Sep 2006 09:00:54 +0000 (02:00 -0700)]
[PATCH] autofs4: pending flag not cleared on mount fail
During testing I've found that the mount pending flag can be left set at
exit from autofs4_lookup after a failed mount request. This shouldn't be
allowed to happen and causes incorrect error returns.
Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The check for an empty directory in the autofs4_follow_link method fails
occassionally due to old dentrys. We had the same problem
autofs4_revalidate ages ago. I thought we wouldn't need this in
autofs4_follow_link, silly me.
Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Why? ->ioprio was already copied in dup_task_struct(). I guess this is
needed to ensure that the child can't escape
sys_ioprio_set(IOPRIO_WHO_{PGRP,USER}), yes?
In that case we don't need ->siglock held, and the comment should be
updated.
Cal Peake [Fri, 29 Sep 2006 09:00:47 +0000 (02:00 -0700)]
[PATCH] kill extraneous printk in kernel_restart()
Get rid of an extraneous printk in kernel_restart().
Signed-off-by: Cal Peake <cp@absolutedigital.net> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Björn Steinbrink [Fri, 29 Sep 2006 09:00:46 +0000 (02:00 -0700)]
[PATCH] Fix ____call_usermodehelper errors being silently ignored
If ____call_usermodehelper fails, we're not interested in the child
process' exit value, but the real error, so let's stop wait_for_helper from
overwriting it in that case.
Issue discovered by Benedikt Böhm while working on a Linux-VServer usermode
helper.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Return better error codes if drivers/char/raw.c module init fails
Currently this module just returns 1 if anything on module init fails. Store
the error code of the different function calls and return their error on
problems.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Fixed to not unregister twice on error ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland McGrath [Fri, 29 Sep 2006 09:00:45 +0000 (02:00 -0700)]
[PATCH] Use decimal for PTRACE_ATTACH and PTRACE_DETACH.
It is sure confusing that linux/ptrace.h has:
#define PTRACE_SINGLESTEP 9
#define PTRACE_ATTACH 0x10
#define PTRACE_DETACH 0x11
#define PTRACE_SYSCALL 24
All the low-numbered constants are in decimal, but the last two in hex.
It sure makes it likely that someone will look at this and think that
9, 10, 11 are used, and that 16 and 17 are not used.
How about we use the same notation for all the numbers [0,24] in the
same short list?
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arjan van de Ven [Fri, 29 Sep 2006 09:00:43 +0000 (02:00 -0700)]
[PATCH] tty: make termios_sem a mutex
[akpm@osdl.org: fix] Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Cox [Fri, 29 Sep 2006 09:00:40 +0000 (02:00 -0700)]
[PATCH] tty: lock ticogwinsz
Now we lock the set ioctl its trivial to lock the get one so the data
copied is consistent. At the moment we have the BKL here but this removes
the need for it and is a step in the right direction
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.
Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update. Passing ticks
get rid of this redundant calculation. Also there are another redundancy
pointed out by Martin Schwidefsky.
As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies. (This patch does not really
remove wall_jiffies. It would be another cleanup patch)
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] I/O Error attempting to read last partial block of a file in an ISO9660 file system
There was an I/O error that prevented reading the last partial block of
large files in an ISO9660 filesystem. The error was generated when a file
comprised more than one section and had a size that was not an exact
multiple of the filesystem block size. This patch removes the check (and
failure) for reading into the last partial block (and possibly beyond) for
multiple-section files.
It worked in my testing to prevent reading beyond the end of the section;
my first patch just incremented the sect_size block count for a partial
block and continued doing the check. But there is a commment in the source
code about reading beyond the end of the file to fill a page cache.
Failing to access beyond the section would prevent reading beyond the end
of the file.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] posix-timers: Fix the flags handling in posix_cpu_nsleep()
When a posix_cpu_nsleep() sleep is interrupted by a signal more than twice, it
incorrectly reports the sleep time remaining to the user. Because
posix_cpu_nsleep() doesn't report back to the user when it's called from
restart function due to the wrong flags handling.
This patch, which applies after previous one, moves the nanosleep() function
from posix_cpu_nsleep() to do_cpu_nanosleep() and cleans up the flags handling
appropriately.
Signed-off-by: Toyo Abe <toyoa@mvista.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode
The clock_nanosleep() function does not return the time remaining when the
sleep is interrupted by a signal.
This patch creates a new call out, compat_clock_nanosleep_restart(), which
handles returning the remaining time after a sleep is interrupted. This
patch revives clock_nanosleep_restart(). It is now accessed via the new
call out. The compat_clock_nanosleep_restart() is used for compatibility
access.
Since this is implemented in compatibility mode the normal path is
virtually unaffected - no real performance impact.
Signed-off-by: Toyo Abe <toyoa@mvista.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Kara [Fri, 29 Sep 2006 09:00:26 +0000 (02:00 -0700)]
[PATCH] dquot: add proper locking when using current->signal->tty
Dquot passes the tty to tty_write_message without locking
Signed-off-by: Jan Kara <jack@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Memory leaks can happen in the vc_resize() function in drivers/char/vt.c
because of the vc->vc_screenbuf variable overriding in vc_allocate(). The
kmemleak reported trace is as follows:
[PATCH] elf_fdpic_core_dump: don't take tasklist_lock
do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
in wait_for_completion(&mm->core_done) at this point, so we can use RCU
locks.
Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
in wait_for_completion(&mm->core_done) at this point, so we can use RCU
locks.
Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).
Akinobu Mita [Fri, 29 Sep 2006 09:00:22 +0000 (02:00 -0700)]
[PATCH] check return value of cpu_callback
Spawing ksoftirqd, migration, or watchdog, and calling init_timers_cpu()
may fail with small memory. If it happens in initcalls, kernel NULL
pointer dereference happens later. This patch makes crash happen
immediately in such cases. It seems a bit better than getting kernel NULL
pointer dereference later.
Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove some code which is unneeded if CONFIG_PM=n.
* Make suspend/resume registration look like the rest of drivers:
#ifdef CONFIG_PM in struct pci_driver, prototypes, actual hooks.
* Drop CS46XX_ACPI_SUPPORT. It logically duplicated CONFIG_PM. It was
hardcoded to 1 approx forever (ALSA merge just moved driver to
sound/oss/).
* After previous point, sound/oss/cs46xxpm-24.h removed as being useless.
* As side effect selling (unused) static inline functions as suspend/resume
hooks funkiness removed too.
[PATCH] fix wrong error code on interrupted close syscalls
The problem is that close() syscalls can call a file system's flush
handler, which in turn might sleep interruptibly and ultimately pass back
an -ERESTARTSYS return value. This happens for files backed by an
interruptible NFS mount under nfs_file_flush() when a large file has just
been written and nfs_wait_bit_interruptible() detects that there is a
signal pending.
I have a test case where the "strace" command is used to attach to a
process sleeping in such a close(). Since the SIGSTOP is forced onto the
victim process (removing it from the thread's "blocked" mask in
force_sig_info()), the RPC wait is interrupted and the close() is
terminated early.
But the file table entry has already been cleared before the flush handler
was called. Thus, when the syscall is restarted, the file descriptor
appears closed and an EBADF error is returned (which is wrong). What's
worse, there is the hypothetical case where another thread of a
multi-threaded application might have reused the file descriptor, in which
case that file would be mistakenly closed.
The bottom line is that close() syscalls are not restartable, and thus
-ERESTARTSYS return values should be mapped to -EINTR. This is consistent
with the close(2) manual page. The fix is below.
Signed-off-by: Ernie Petrides <petrides@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul E. McKenney [Fri, 29 Sep 2006 09:00:11 +0000 (02:00 -0700)]
[PATCH] memory ordering in __kfifo primitives
Both __kfifo_put() and __kfifo_get() have header comments stating that if
there is but one concurrent reader and one concurrent writer, locking is not
necessary. This is almost the case, but a couple of memory barriers are
needed. Another option would be to change the header comments to remove the
bit about locking not being needed, and to change the those callers who
currently don't use locking to add the required locking. The attachment
analyzes this approach, but the patch below seems simpler.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Cc: Stelian Pop <stelian@popies.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Jones [Fri, 29 Sep 2006 09:00:10 +0000 (02:00 -0700)]
[PATCH] lockdep: print kernel version
Lets do the same thing we do for oopses - print out the version in the
report. It's an extra line of output though. We could tack it on the end
of the INFO: lines, but that screws up Ingo's pretty output.
Signed-off-by: Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] atiixp: ATI SB600 IDE support for various modes
Support SB600 SATA legacy IDE (DMA enable).
Signed-off-by: Anatoli Antonovitch <antonovi@ati.com> Cc: Jeff Garzik <jeff@garzik.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The code in __register_chrdev_region checks that if the driver wishing to
register has the same major as an existing driver the new minor range is
strictly less than the existing minor range. However, it does not also
check that the new minor range is strictly greater than the existing minor
range. That is, if driver X has registered with major=x and minor=0-3,
__register_chrdev_region will allow driver Y to register with major=x and
minor=1-4.
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().
Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.
Eric's original description:
There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.
Introduce is_init to capture this case.
With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: <lxc-devel@lists.sourceforge.net> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Biederman [Fri, 29 Sep 2006 09:00:06 +0000 (02:00 -0700)]
[PATCH] Fix conflict with the is_init identifier on parisc
This appears to be the only usage of is_init in the kernel besides the
usage in sched.h. On ia64 the same function is called in_init. So to
remove the conflict and make the kernel more consistent rename is_init
is_core is_local and is_local_section to in_init in_core in_local and
in_local_section respectively.
Thanks to Adrian Bunk who spotted this, and to Matthew Wilcox
who suggested this fix.
Signed-off-by: Eric Biederman <ebiederm@xmission.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixed race on put_files_struct on exec with proc. Restoring files on
current on error path may lead to proc having a pointer to already kfree-d
files_struct.
->files changing at exit.c and khtread.c are safe as exit_files() makes all
things under lock.
Found during OpenVZ stress testing.
[akpm@osdl.org: add export] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Cox [Fri, 29 Sep 2006 09:00:03 +0000 (02:00 -0700)]
[PATCH] tty locking on resize
The current kernel serializes console resizes but does not serialize the
resize against the tty structure updates. This means that while two
parallel resizes cannot mess up the console you can get incorrect results
reported.
Secondly while doing this I added vc_lock_resize() to lock and resize the
console. This leaves all knowledge of the console_sem in the vt/console
driver and kicks it out of the tty layer, which is good
Thirdly while doing this I decided I couldn't stand "disallocate" any
longer so I switched it to "deallocate".
Signed-off-by: Alan Cox <alan@redhat.com> Cc: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chris Mason [Fri, 29 Sep 2006 09:00:03 +0000 (02:00 -0700)]
[PATCH] add -o flush for fat
Fat is commonly used on removable media. Mounting with -o flush tells the
FS to write things to disk as quickly as possible. It is like -o sync, but
much faster (and not as safe).
Signed-off-by: Chris Mason <mason@suse.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[assuming BSD security levels are deleted]
The only user of i_security, f_security, s_security fields is SELinux,
however, quite a few security modules are trying to get into kernel.
So, wrap them under CONFIG_SECURITY. Adding config option for each
security field is likely an overkill.
Following Stephen Smalley's suggestion, i_security initialization is
moved to security_inode_alloc() to not clutter core code with ifdefs
and make alloc_inode() codepath tiny little bit smaller and faster.
The user of (highly greppable) struct fown_struct::security field is
still to be found. I've checked every "fown_struct" and every "f_owner"
occurence. Additionally it's removal doesn't break i386 allmodconfig
build.
struct inode, struct file, struct super_block, struct fown_struct
become smaller.
P.S. Combined with two reiserfs inode shrinking patches sent to
linux-fsdevel, I can finally suck 12 reiserfs inodes into one page.
All suppliers of ->quota_read, ->quota_write (I've found ext2, ext3, UFS,
reiserfs) already have them properly ifdeffed. All callers of
->quota_read, ->quota_write are under CONFIG_QUOTA umbrella, so...
Chris Mason [Fri, 29 Sep 2006 08:59:56 +0000 (01:59 -0700)]
[PATCH] Fix reiserfs latencies caused by data=ordered
ReiserFS does periodic cleanup of old transactions in order to limit the
length of time a journal replay may take after a crash. Sometimes, writing
metadata from an old (already committed) transaction may require committing
a newer transaction, which also requires writing all data=ordered buffers.
This can cause very long stalls on journal_begin.
This patch makes sure new transactions will not need to be committed before
trying a periodic reclaim of an old transaction. It is low risk because if
a bad decision is made, it just means a slightly longer journal replay
after a crash.
Signed-off-by: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adam Tlalka [Fri, 29 Sep 2006 08:59:53 +0000 (01:59 -0700)]
[PATCH] console utf-8 mode fixes
Fix utf-8 mode so alternate charset modes always work according to control
sequences interpreted in do_con_trol function preserving backward US-ASCII
and VT100 semigraphics compatibility.
Malformed utf-8 sequences are represented as sequences of replacement
glyphs,original codes or '?' as a last resort.
unicode-xterm, gnome-terminal, kconsole and other terminal emulators in
utf-8 mode respect acsc, enacs, rmacs sequences. Also I found that some
important system programs (from Debian distro) uses acsc in utf-8 mode -
dselect, aptitude, w3m for example.
Signed-off-by: Adam Tlalka <atlka@pg.gda.pl> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] ucb1x00-ts: handle errors from input_register_device()
ucb1x00-ts: handle errors from input_register_device()
Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Jones [Fri, 29 Sep 2006 08:59:51 +0000 (01:59 -0700)]
[PATCH] single bit flip detector
In cases where we detect a single bit has been flipped, we spew the usual
slab corruption message, which users instantly think is a kernel bug. In a
lot of cases, single bit errors are down to bad memory, or other hardware
failure.
This patch adds an extra line to the slab debug messages in those cases, in
the hope that users will try memtest before they report a bug.
000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Possibly bad RAM. Run memtest86.
[akpm@osdl.org: cleanups] Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Just comment and next "while" look _very_ wrong. Place { correctly to hint
unsuspecting ones that it's the end of the loop actually.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Jones <davej@redhat.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This code has suffered from broken core design and lack of developer
attention. Broken security modules are too dangerous to leave around. It
is time to remove this one.
Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Davi Arnaut <davi.arnaut@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: James Morris <jmorris@namei.org> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Use valid_dma_direction() in include/asm-i386/dma-mapping.h
Now that the generic DMA code has a function to decide if a given DMA
mapping is valid use it. This will catch cases where direction is not any
of the defined enum values but some random number outside the valid range.
The current implementation will only catch the defined but invalid case
DMA_NONE.
Alan Cox [Fri, 29 Sep 2006 08:59:47 +0000 (01:59 -0700)]
[PATCH] There is no devfs, there has never been a devfs, we have always been at war with...
Jon Smirl noted a couple of tty driver functions now are quite misleadingly
named with the death of devfs. A quick grep found another case in the lp
driver.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I was looking for the a way around an OOM-problem, and found a couple of
undocumented new features for tuning the OOM-score of individual processes.
Here's a small documentation patch for /proc/<pid>/oom_adj and
/proc/<pid>/oom_score.
Steven Rostedt [Fri, 29 Sep 2006 08:59:44 +0000 (01:59 -0700)]
[PATCH] clean up and remove some extra spinlocks from rtmutex
Oleg brought up some interesting points about grabbing the pi_lock for some
protections. In this discussion, I realized that there are some places
that the pi_lock is being grabbed when it really wasn't necessary. Also
this patch does a little bit of clean up.
This patch basically does three things:
1) renames the "boost" variable to "chain_walk". Since it is used in
the debugging case when it isn't going to be boosted. It better
describes what the test is going to do if it succeeds.
2) moves get_task_struct to just before the unlocking of the wait_lock.
This removes duplicate code, and makes it a little easier to read. The
owner wont go away while either the pi_lock or the wait_lock are held.
3) removes the pi_locking and owner blocked checking completely from the
debugging case. This is because the grabbing the lock and doing the
check, then releasing the lock is just so full of races. It's just as
good to go ahead and call the pi_chain_walk function, since after
releasing the lock the owner can then block anyway, and we would have
missed that. For the debug case, we really do want to do the chain walk
to test for deadlocks anyway.
[oleg@tv-sign.ru: more of the same] Signed-of-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Esben Nielsen <nielsen.esben@googlemail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Beulich [Fri, 29 Sep 2006 08:59:42 +0000 (01:59 -0700)]
[PATCH] fix Intel RNG detection
Previously, since determination whether there was an Intel random number
generator was based on a single bit, on systems with a matching bridge
device but without a firmware hub, there was a 50% chance that the code
would incorrectly decide that the system had an RNG. This patch adds
detection of the firmware hub to better qualify the existence of an RNG.
There is one issue with the patch: I was unable to determine the LPC
equivalent for the PCI bridge 8086:2430 (since the old code didn't care
about which of the many devices provided by the ICH/ESB it was chose to use
the PCI bridge device, but the FWH settings live in the LPC device, so the
device list needed to be changed).
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Sandeen [Fri, 29 Sep 2006 08:59:41 +0000 (01:59 -0700)]
[PATCH] mount udf UDF_PART_FLAG_READ_ONLY partitions with MS_RDONLY
There's a bug where a UDF_PART_FLAG_READ_ONLY udf partition gets mounted
read-write, then subsequent problems happen; files seem to be able to be
removed, but file creation results in EIO or worse, oops.
EIO is coming from udf_new_block(), which returns EIO if the right flags
aren't set; only UDF_PART_FLAG_READ_ONLY is set in this case. We probably
s hould not have gotten this far...
Attached patch seems to fix it - and includes a printk to alert the user
that their "rw" mount request has been converted to "ro."
Here's the testcase I used:
[root@magnesium ~]# mkisofs -R -J -udf -o testiso /tmp/
...
Total translation table size: 0
Total rockridge attributes bytes: 342923
Total directory bytes: 382312
Path table size(bytes): 104
Max brk space used 103000
105059 extents written (205 MB)
[root@magnesium ~]# mount -o loop testiso /mnt/test/
[root@magnesium ~]# ls /mnt/test/fsfile
/mnt/test/fsfile
[root@magnesium ~]# rm /mnt/test/fsfile
[root@magnesium ~]# ls /mnt/test/fsfile
ls: /mnt/test/fsfile: No such file or directory
[root@magnesium ~]# touch /mnt/test/fsfile
touch: cannot touch `/mnt/test/fsfile': Input/output error
[root@magnesium tmp]# grep udf /proc/mounts
/dev/loop1 /mnt/test udf rw 0 0
Force readonly mounts of UDF partitions marked as read-only.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Olaf Hering [Fri, 29 Sep 2006 08:59:39 +0000 (01:59 -0700)]
[PATCH] ignore partition table on disks with AIX label
The on-disk data structures from AIX are not known, also the filesystem
layout is not known. There is a msdos partition signature at the end of
the first block, and the kernel recognizes 3 small (and overlapping)
partitions. But they are not usable. Maybe the firmware uses it to find
the bootloader for AIX, but AIX boots also if the first block is cleared.
This fixes also YaST which compares the output from parted (and formerly
fdisk) with /proc/partitions. fdisk recognizes the AIX label since a long
time, SuSE has a patch for parted to handle the disk label as unknown.
dmesg will look like this:
sda: [AIX] unknown partition table
Tested on an IBM B50 with AIX V4.3.3.
Signed-off-by: Olaf Hering <olh@suse.de> Cc: Albert Cahalan <acahalan@gmail.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] DMI: Decode and save OEM String information
This teaches dmi_decode() how to decode and save OEM Strings (type 11) DMI
information, which is currently discarded silently. Existing code using
DMI is not affected. Follows the "System Management BIOS (SMBIOS)
Specification" (http://www.dmtf.org/standards/smbios), and also the
userspace dmidecode.c code.
OEM Strings are the only safe way to identify some hardware, e.g., the
ThinkPad embedded controller used by the soon-to-be-submitted tp_smapi
driver. This will also let us eliminate the long whitelist in the mainline
hdaps driver (in a future patch).
[PATCH] timer: add lock annotation to lock_timer_base
lock_timer_base acquires a lock and returns with that lock held. Add a
lock annotation to this function so that sparse can check callers for lock
pairing, and so that sparse will not complain about this function since it
intentionally uses the lock in this manner.