]> pilppa.com Git - linux-2.6-omap-h63xx.git/log
linux-2.6-omap-h63xx.git
16 years agoext3: return -EIO not -ESTALE on directory traversal through deleted inode
Bryan Donlan [Thu, 2 Apr 2009 23:57:15 +0000 (16:57 -0700)]
ext3: return -EIO not -ESTALE on directory traversal through deleted inode

ext3_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly.  However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode.  This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.

The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.

This patch thus changes ext3_lookup to return -EIO if it receives -ESTALE
from ext3_iget(), as ext3 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.

Signed-off-by: Bryan Donlan <bdonlan@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoext3: use unsigned instead of int for type of blocksize in fs/ext3/namei.c
Wei Yongjun [Thu, 2 Apr 2009 23:57:14 +0000 (16:57 -0700)]
ext3: use unsigned instead of int for type of blocksize in fs/ext3/namei.c

Use unsigned instead of int for the parameter which carries a blocksize.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agojbd: fix oops in jbd_journal_init_inode() on corrupted fs
Jan Kara [Thu, 2 Apr 2009 23:57:13 +0000 (16:57 -0700)]
jbd: fix oops in jbd_journal_init_inode() on corrupted fs

On 32-bit system with CONFIG_LBD getblk can fail because provided block
number is too big. Make JBD gracefully handle that.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <dmaciejak@fortinet.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoext3: remove the BKL in ext3/ioctl.c
Cyrus Massoumi [Thu, 2 Apr 2009 23:57:12 +0000 (16:57 -0700)]
ext3: remove the BKL in ext3/ioctl.c

Reformat ext3/ioctl.c to make it look more like ext4/ioctl.c and remove
the BKL around ext3_ioctl().

Signed-off-by: Cyrus Massoumi <cyrusm@gmx.net>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopnpbios: propagate kthread_run() error
Erik Ekman [Thu, 2 Apr 2009 23:57:09 +0000 (16:57 -0700)]
pnpbios: propagate kthread_run() error

- Error code from kthread_run() is now returned in pnpbios_thread_init()

- Remove variable which always was 0.

Signed-off-by: Erik Ekman <erik@kryo.se>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopnpbios: fix warning if CONFIG_HOTPLUG=n
Erik Ekman [Thu, 2 Apr 2009 23:57:08 +0000 (16:57 -0700)]
pnpbios: fix warning if CONFIG_HOTPLUG=n

drivers/pnp/pnpbios/core.c: In function 'pnpbios_thread_init':
drivers/pnp/pnpbios/core.c:578: warning: unused variable 'task'

Signed-off-by: Erik Ekman <erik@kryo.se>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agospi-gpio: allow operation without CS signal
Michael Buesch [Thu, 2 Apr 2009 23:57:07 +0000 (16:57 -0700)]
spi-gpio: allow operation without CS signal

Change spi-gpio so that it is possible to drive SPI communications over
GPIO without the need for a chipselect signal.

This is useful in very small setups where there's only one slave device
on the bus.

This patch does not affect existing setups.

I use this for a tiny communication channel between an embedded device and
a microcontroller.  There are not enough GPIOs available for chipselect
and it's not needed anyway in this case.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agogpio: gpio_{request,free}() now required (feature removal)
David Brownell [Thu, 2 Apr 2009 23:57:06 +0000 (16:57 -0700)]
gpio: gpio_{request,free}() now required (feature removal)

We want to phase out the GPIO "autorequest" mechanism in gpiolib and
require all callers to use gpio_request().

 - Update feature-removal-schedule
 - Update the documentation now
 - Convert the relevant pr_warning() in gpiolib to a WARN()
   so folk using this mechanism get a noisy stack dump

Some drivers and board init code will probably need to change.
Implementations not using gpiolib will still be fine; they are already
required to implement gpio_{request,free}() stubs.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agogpiolib: allow GPIOs to be named
Daniel Silverstone [Thu, 2 Apr 2009 23:57:05 +0000 (16:57 -0700)]
gpiolib: allow GPIOs to be named

Allow GPIOs in GPIOLIB chips to be named.  This name is then used when the
GPIO is exported to sysfs, although it could be used elsewhere if deemed
useful.

Signed-off-by: Daniel Silverstone <dsilvers@simtec.co.uk>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agortc: add m41t62 support to rtc-m41t80 driver
Daniel Glockner [Thu, 2 Apr 2009 23:57:03 +0000 (16:57 -0700)]
rtc: add m41t62 support to rtc-m41t80 driver

Compared to the other supported chips, the m41t62 uses a different
register to set the square wave frequency.

Signed-off-by: Daniel Glockner <dg@emlix.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agortc-v3020: add ability to access v3020 chip with GPIOs
Mike Rapoport [Thu, 2 Apr 2009 23:57:01 +0000 (16:57 -0700)]
rtc-v3020: add ability to access v3020 chip with GPIOs

The v3020 RTC can be connected to GPIOs as well as to memory-like
interface.  Add ability to use GPIO bit-bang for v3020 read-write access.

[akpm@linux-foundation.org: fix off-by-one in error path]
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoinitramfs: prevent initramfs printk message being split by messages from other code.
Simon Kitching [Thu, 2 Apr 2009 23:57:00 +0000 (16:57 -0700)]
initramfs: prevent initramfs printk message being split by messages from other code.

initramfs uses printk without a linefeed, then does some work, then uses
printk to finish the message off.  However if some other code does a
printk in between, then the messages get mixed together.  Better for each
message to be an independent line...

Example of problem that this fixes:

    checking if image is initramfs...<7>Switched to high resolution mode on CPU 1
    Switched to high resolution mode on CPU 0
    it is

Signed-off-by: Simon Kitching <skitching@apache.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoSimplify copy_thread()
Alexey Dobriyan [Thu, 2 Apr 2009 23:56:59 +0000 (16:56 -0700)]
Simplify copy_thread()

First argument unused since 2.3.11.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomemory_accessor: implement the new memory_accessor interfaces for SPI EEPROMs
David Brownell [Thu, 2 Apr 2009 23:56:58 +0000 (16:56 -0700)]
memory_accessor: implement the new memory_accessor interfaces for SPI EEPROMs

- Define new setup() hook to export the accessor
 - Implement accessor methods

Moves some error checking out of the sysfs interface code into the layer
below it, which is now shared by both sysfs and memory access code.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomemory_accessor: implement the new memory_accessor interface for I2C EEPROM
Kevin Hilman [Thu, 2 Apr 2009 23:56:57 +0000 (16:56 -0700)]
memory_accessor: implement the new memory_accessor interface for I2C EEPROM

In the case of at24, the platform code registers a 'setup' callback with
the at24_platform_data.  When the at24 driver detects an EEPROM, it fills
out the read and write functions of the memory_accessor and calls the
setup callback passing the memory_accessor struct.  The platform code can
then use the read/write functions in the memory_accessor struct for
reading and writing the EEPROM.

Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomemory_accessor: new interface for reading/writing persistent memory
Kevin Hilman [Thu, 2 Apr 2009 23:56:56 +0000 (16:56 -0700)]
memory_accessor: new interface for reading/writing persistent memory

Add an interface by which other kernel code can read/write persistent
memory such as I2C or SPI EEPROMs, or devices which provide NVRAM.  Use
cases include storage of board-specific configuration data like Ethernet
addresses and sensor calibrations.

Original idea, review and improvement suggestions by David Brownell.

Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoworkqueue: add to_delayed_work() helper function
Jean Delvare [Thu, 2 Apr 2009 23:56:54 +0000 (16:56 -0700)]
workqueue: add to_delayed_work() helper function

It is a fairly common operation to have a pointer to a work and to need a
pointer to the delayed work it is contained in.  In particular, all
delayed works which want to rearm themselves will have to do that.  So it
would seem fair to offer a helper function for this operation.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agouml: fix warnings in kernel_execve
Miklos Szeredi [Thu, 2 Apr 2009 23:56:53 +0000 (16:56 -0700)]
uml: fix warnings in kernel_execve

Fix the following warnings:

arch/um/kernel/syscall.c: In function 'kernel_execve':
arch/um/kernel/syscall.c:130: warning: passing argument 1 of 'um_execve' discards qualifiers from pointer target type
arch/um/kernel/syscall.c:130: warning: passing argument 2 of 'um_execve' discards qualifiers from pointer target type
arch/um/kernel/syscall.c:130: warning: passing argument 3 of 'um_execve' discards qualifiers from pointer target type

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agouml: fix link error from prefixing of i386 syscalls with ptregs_
Miklos Szeredi [Thu, 2 Apr 2009 23:56:51 +0000 (16:56 -0700)]
uml: fix link error from prefixing of i386 syscalls with ptregs_

Fix the following link error:

arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x11c): undefined reference to `ptregs_fork'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x140): undefined reference to `ptregs_execve'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x2cc): undefined reference to `ptregs_iopl'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x2d8): undefined reference to `ptregs_vm86old'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x2f0): undefined reference to `ptregs_sigreturn'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x2f4): undefined reference to `ptregs_clone'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x3ac): undefined reference to `ptregs_vm86'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x3c8): undefined reference to `ptregs_rt_sigreturn'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x3fc): undefined reference to `ptregs_sigaltstack'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x40c): undefined reference to `ptregs_vfork'

This was introduced by commit 253f29a4, "x86: pass in pt_regs pointer
for syscalls that need it"

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agouml: fix compile error from net_device_ops conversion
Miklos Szeredi [Thu, 2 Apr 2009 23:56:49 +0000 (16:56 -0700)]
uml: fix compile error from net_device_ops conversion

Fix the following compile error:

arch/um/drivers/net_kern.c: In function 'uml_inetaddr_event':
arch/um/drivers/net_kern.c:760: error: 'struct net_device' has no member named 'open'

This was introduced by commit 8bb95b39, "uml: convert network device
to netdevice ops".

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofloppy: provide a PNP device table in the module.
Scott James Remnant [Thu, 2 Apr 2009 23:56:47 +0000 (16:56 -0700)]
floppy: provide a PNP device table in the module.

The missing device table means that the floppy module is not auto-loaded,
even when the appropriate PNP device (0700) is found.

We don't actually use the table in the module, since the device doesn't
have a struct pnp_driver, but it's sufficient to cause an alias in the
module that udev/modprobe will use.

Signed-off-by: Scott James Remnant <scott@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Philippe De Muyter <phdm@macqel.be>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agovfs: check bh->b_blocknr only if BH_Mapped is set
Nikanth Karthikesan [Thu, 2 Apr 2009 23:56:46 +0000 (16:56 -0700)]
vfs: check bh->b_blocknr only if BH_Mapped is set

Check bh->b_blocknr only if BH_Mapped is set.

akpm: I doubt if b_blocknr is ever uninitialised here, but it could
conceivably cause a problem if we're doing a lookup for block zero.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: define a UNIQUE value for AS_UNEVICTABLE flag
Lee Schermerhorn [Thu, 2 Apr 2009 23:56:45 +0000 (16:56 -0700)]
mm: define a UNIQUE value for AS_UNEVICTABLE flag

A new "address_space flag"--AS_MM_ALL_LOCKS--was defined to use the next
available AS flag while the Unevictable LRU was under development.  The
Unevictable LRU was using the same flag and "no one" noticed.  Current
mainline, since 2.6.28, has same value for two symbolic flag names.

So, define a unique flag value for AS_UNEVICTABLE--up close to the other
flags, [at the cost of an additional #ifdef] so we'll notice next time.
Note that #ifdef is not actually required, if we don't mind having the
unused flag value defined.

Replace #defines with an enum.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <stable@kernel.org> [2.6.28.x, 2.6.29.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoadd fiemap.h to header-y
Eric Sandeen [Thu, 2 Apr 2009 23:56:44 +0000 (16:56 -0700)]
add fiemap.h to header-y

Include fiemap.h in header-y; it defines the interface for the
FS_IOC_FIEMAP file mapping ioctl.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMAINTAINERS: add hvc_console
Michael Ellerman [Thu, 2 Apr 2009 23:56:43 +0000 (16:56 -0700)]
MAINTAINERS: add hvc_console

Add a MAINTAINERS entry for the hypervisor virtual console driver.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: do_xip_mapping_read: fix length calculation
Martin Schwidefsky [Thu, 2 Apr 2009 23:56:42 +0000 (16:56 -0700)]
mm: do_xip_mapping_read: fix length calculation

The calculation of the value nr in do_xip_mapping_read is incorrect.  If
the copy required more than one iteration in the do while loop the copies
variable will be non-zero.  The maximum length that may be passed to the
call to copy_to_user(buf+copied, xip_mem+offset, nr) is len-copied but the
check only compares against (nr > len).

This bug is the cause for the heap corruption Carsten has been chasing
for so long:

*** glibc detected *** /bin/bash: free(): invalid next size (normal): 0x00000000800e39f0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x200000b9b44]
/lib64/libc.so.6(cfree+0x8e)[0x200000bdade]
/bin/bash(free_buffered_stream+0x32)[0x80050e4e]
/bin/bash(close_buffered_stream+0x1c)[0x80050ea4]
/bin/bash(unset_bash_input+0x2a)[0x8001c366]
/bin/bash(make_child+0x1d4)[0x8004115c]
/bin/bash[0x8002fc3c]
/bin/bash(execute_command_internal+0x656)[0x8003048e]
/bin/bash(execute_command+0x5e)[0x80031e1e]
/bin/bash(execute_command_internal+0x79a)[0x800305d2]
/bin/bash(execute_command+0x5e)[0x80031e1e]
/bin/bash(reader_loop+0x270)[0x8001efe0]
/bin/bash(main+0x1328)[0x8001e960]
/lib64/libc.so.6(__libc_start_main+0x100)[0x200000592a8]
/bin/bash(clearerr+0x5e)[0x8001c092]

With this bug fix the commit 0e4a9b59282914fe057ab17027f55123964bc2e2
"ext2/xip: refuse to change xip flag during remount with busy inodes" can
be removed again.

Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agorandom: align rekey_work's timer
Anton Blanchard [Thu, 2 Apr 2009 23:56:39 +0000 (16:56 -0700)]
random: align rekey_work's timer

Align rekey_work. Even though it's infrequent, we may as well line it up.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: align vmstat_work's timer
Anton Blanchard [Thu, 2 Apr 2009 23:56:39 +0000 (16:56 -0700)]
mm: align vmstat_work's timer

Even though vmstat_work is marked deferrable, there are still benefits to
aligning it.  For certain applications we want to keep OS jitter as low as
possible and aligning timers and work so they occur together can reduce
their overall impact.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agowriteback: guard against jiffies wraparound on inode->dirtied_when checks (try #3)
Jeff Layton [Thu, 2 Apr 2009 23:56:37 +0000 (16:56 -0700)]
writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3)

The dirtied_when value on an inode is supposed to represent the first time
that an inode has one of its pages dirtied.  This value is in units of
jiffies.  It's used in several places in the writeback code to determine
when to write out an inode.

The problem is that these checks assume that dirtied_when is updated
periodically.  If an inode is continuously being used for I/O it can be
persistently marked as dirty and will continue to age.  Once the time
compared to is greater than or equal to half the maximum of the jiffies
type, the logic of the time_*() macros inverts and the opposite of what is
needed is returned.  On 32-bit architectures that's just under 25 days
(assuming HZ == 1000).

As the least-recently dirtied inode, it'll end up being the first one that
pdflush will try to write out.  sync_sb_inodes does this check:

/* Was this inode dirtied after sync_sb_inodes was called? */
  if (time_after(inode->dirtied_when, start))
  break;

...but now dirtied_when appears to be in the future.  sync_sb_inodes bails
out without attempting to write any dirty inodes.  When this occurs,
pdflush will stop writing out inodes for this superblock.  Nothing can
unwedge it until jiffies moves out of the problematic window.

This patch fixes this problem by changing the checks against dirtied_when
to also check whether it appears to be in the future.  If it does, then we
consider the value to be far in the past.

This should shrink the problematic window of time to such a small period
(30s) as not to matter.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years ago__tty_open(): use the correct type for saved_flags
Andrew Morton [Thu, 2 Apr 2009 23:56:36 +0000 (16:56 -0700)]
__tty_open(): use the correct type for saved_flags

filp->f_flags is unsigned, so use that type for the local copy.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agovfs: skip I_CLEAR state inodes
Wu Fengguang [Thu, 2 Apr 2009 23:56:34 +0000 (16:56 -0700)]
vfs: skip I_CLEAR state inodes

clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so
_outside_ of inode_lock.  So any I_FREEING testing is incomplete without a
coupled testing of I_CLEAR.

So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and
add_dquot_ref().

Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara
reminds fixing the other two cases.

Masayoshi MIZUMA has a nice panic flow:

=====================================================================
            [process A]               |        [process B]
 |                                    |
 |    prune_icache()                  | drop_pagecache()
 |      spin_lock(&inode_lock)        |   drop_pagecache_sb()
 |      inode->i_state |= I_FREEING;  |       |
 |      spin_unlock(&inode_lock)      |       V
 |          |                         |     spin_lock(&inode_lock)
 |          V                         |         |
 |      dispose_list()                |         |
 |        list_del()                  |         |
 |        clear_inode()               |         |
 |          inode->i_state = I_CLEAR  |         |
 |            |                       |         V
 |            |                       |      if (inode->i_state & (I_FREEING|I_WILL_FREE))
 |            |                       |              continue;           <==== NOT MATCH
 |            |                       |
 |            |                       | (DANGER from here on! Accessing disposing inode!)
 |            |                       |
 |            |                       |      __iget()
 |            |                       |        list_move() <===== PANIC on poisoned list !!
 V            V                       |
(time)
=====================================================================

Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agonommu: fix a number of issues with the per-MM VMA patch
David Howells [Thu, 2 Apr 2009 23:56:32 +0000 (16:56 -0700)]
nommu: fix a number of issues with the per-MM VMA patch

Fix a number of issues with the per-MM VMA patch:

 (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
     a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
     system.

 (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
     lest it overflow.

 (3) Move the allocation of the vm_area_struct slab back for fork.c.

 (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.

 (5) Use BUG_ON() rather than if () BUG().

 (6) Make the default validate_nommu_regions() a static inline rather than a
     #define.

 (7) Make free_page_series()'s objection to pages with a refcount != 1 more
     informative.

 (8) Adjust the __put_nommu_region() banner comment to indicate that the
     semaphore must be held for writing.

 (9) Limit the number of warnings about munmaps of non-mmapped regions.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofb: nvidiafb recognizes geforcego 7300 chip as mobile
Sergey Senozhatsky [Thu, 2 Apr 2009 23:56:30 +0000 (16:56 -0700)]
fb: nvidiafb recognizes geforcego 7300 chip as mobile

nvidiafb recognizes geforcego 7300 chip as mobile

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agogeneric debug pagealloc: build fix
Akinobu Mita [Thu, 2 Apr 2009 23:56:30 +0000 (16:56 -0700)]
generic debug pagealloc: build fix

This fixes a build failure with generic debug pagealloc:

  mm/debug-pagealloc.c: In function 'set_page_poison':
  mm/debug-pagealloc.c:8: error: 'struct page' has no member named 'debug_flags'
  mm/debug-pagealloc.c: In function 'clear_page_poison':
  mm/debug-pagealloc.c:13: error: 'struct page' has no member named 'debug_flags'
  mm/debug-pagealloc.c: In function 'page_poison':
  mm/debug-pagealloc.c:18: error: 'struct page' has no member named 'debug_flags'
  mm/debug-pagealloc.c: At top level:
  mm/debug-pagealloc.c:120: error: redefinition of 'kernel_map_pages'
  include/linux/mm.h:1278: error: previous definition of 'kernel_map_pages' was here
  mm/debug-pagealloc.c: In function 'kernel_map_pages':
  mm/debug-pagealloc.c:122: error: 'debug_pagealloc_enabled' undeclared (first use in this function)

by fixing

 - debug_flags should be in struct page
 - define DEBUG_PAGEALLOC config option for all architectures

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agodrm/radeon: load the right microcode on rs780
Alex Deucher [Mon, 30 Mar 2009 00:44:26 +0000 (20:44 -0400)]
drm/radeon: load the right microcode on rs780

Copy/paste error.  The RV670 microcode should work ok, so it's
not a show stopper.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
16 years agoMerge branch 'drm-intel-next' of ../anholt-2.6 into drm-linus
Dave Airlie [Fri, 3 Apr 2009 00:27:21 +0000 (10:27 +1000)]
Merge branch 'drm-intel-next' of ../anholt-2.6 into drm-linus

16 years agodrm: remove unused "can_grow" parameter from drm_crtc_helper_initial_config
Jesse Barnes [Fri, 27 Mar 2009 20:05:19 +0000 (13:05 -0700)]
drm: remove unused "can_grow" parameter from drm_crtc_helper_initial_config

Cleanup some leftovers from the X port.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
16 years agoMerge current mainline tree into linux-omap tree
Tony Lindgren [Fri, 3 Apr 2009 00:10:56 +0000 (17:10 -0700)]
Merge current mainline tree into linux-omap tree

Merge branches 'master' and 'linus'

Conflicts:
arch/arm/configs/omap_3430sdp_defconfig
arch/arm/configs/rx51_defconfig
arch/arm/mach-omap2/Kconfig
arch/arm/mach-omap2/Makefile
arch/arm/mach-omap2/board-2430sdp.c
arch/arm/mach-omap2/board-3430sdp.c
arch/arm/mach-omap2/board-apollon.c
arch/arm/mach-omap2/board-h4.c
arch/arm/mach-omap2/board-ldp.c
arch/arm/mach-omap2/board-omap3beagle.c
arch/arm/mach-omap2/board-omap3pandora.c
arch/arm/mach-omap2/board-overo.c
arch/arm/mach-omap2/board-rx51-peripherals.c
arch/arm/mach-omap2/board-rx51.c
arch/arm/mach-omap2/mmc-twl4030.c
arch/arm/mach-omap2/mmc-twl4030.h
arch/arm/mach-omap2/usb-musb.c
arch/arm/plat-omap/dma.c
arch/arm/plat-omap/include/mach/board-nokia.h
arch/arm/plat-omap/include/mach/irqs.h
arch/arm/plat-omap/include/mach/usb.h
drivers/Makefile
drivers/i2c/chips/Kconfig
drivers/video/omap/omapfb_main.c
net/ipv4/netfilter/Makefile

16 years agodma: Add SoF and EoF debugging to ipu_idmac.c, minor cleanup
Guennadi Liakhovetski [Thu, 2 Apr 2009 09:36:58 +0000 (11:36 +0200)]
dma: Add SoF and EoF debugging to ipu_idmac.c, minor cleanup

Add Start-of-Frame and End-of-Frame debugging to ipu_idmac.c, in the
future it might also be needed for the actual video processing in
mx3-camera, at which point, the ISRs will have to be transferred to
mx3_camera.c, for which ipu_irq_map() and ipu_irq_unmap() functions will
have to be exported.

Also simplify a couple of pointer-dereferences.

Signed-off-by: Guennadi Liakhovetski <lg@denx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
16 years agoglge: remove unused #include <version.h>
Huang Weiyi [Thu, 2 Apr 2009 05:33:55 +0000 (05:33 +0000)]
glge: remove unused #include <version.h>

Remove unused #include <version.h> in drivers/net/qlge/qlge_ethtool.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodnet: remove unused #include <version.h>
Huang Weiyi [Thu, 2 Apr 2009 05:33:50 +0000 (05:33 +0000)]
dnet: remove unused #include <version.h>

Remove unused #include <version.h> in drivers/net/dnet.c.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: miscounts due to tcp_fragment pcount reset
Ilpo Järvinen [Wed, 1 Apr 2009 23:18:20 +0000 (23:18 +0000)]
tcp: miscounts due to tcp_fragment pcount reset

It seems that trivial reset of pcount to one was not sufficient
in tcp_retransmit_skb. Multiple counters experience a positive
miscount when skb's pcount gets lowered without the necessary
adjustments (depending on skb's sacked bits which exactly), at
worst a packets_out miscount can crash at RTO if the write queue
is empty!

Triggering this requires mss change, so bidir tcp or mtu probe or
like.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Uwe Bugla <uwe.bugla@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: add helper for counter tweaking due mid-wq change
Ilpo Järvinen [Wed, 1 Apr 2009 23:15:17 +0000 (23:15 +0000)]
tcp: add helper for counter tweaking due mid-wq change

We need full-scale adjustment to fix a TCP miscount in the next
patch, so just move it into a helper and call for that from the
other places.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agohso: fix for the 'invalid frame length' messages
Jan Dumon [Wed, 1 Apr 2009 22:59:07 +0000 (22:59 +0000)]
hso: fix for the 'invalid frame length' messages

Some devices cannot send very short usb transfers. To get around this the
firmware adds a known pattern and flags the driver that it should check for
this pattern on short transfers. This flag was not taken into account by
the driver.

Signed-off-by: Jan Dumon <j.dumon@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agohso: fix for crash when unplugging the device
Jan Dumon [Wed, 1 Apr 2009 22:57:20 +0000 (22:57 +0000)]
hso: fix for crash when unplugging the device

Changed the order in which things are freed. This fixes an oops when
unplugging the device while network traffic is ongoing.

Signed-off-by: Jan Dumon <j.dumon@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodrm: fix EDID backward compat check
Jesse Barnes [Thu, 2 Apr 2009 21:56:24 +0000 (14:56 -0700)]
drm: fix EDID backward compat check

EDIDs should be backward compatible, so don't bail if we see a version
of 3 (which is out there now) and print a message if we see something
newer, but allow it to be parsed.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
16 years agodrm: sync the mode validation for INTERLACE/DBLSCAN
yakui_zhao [Thu, 2 Apr 2009 03:52:12 +0000 (11:52 +0800)]
drm: sync the mode validation for INTERLACE/DBLSCAN

Check whether the INTERLACE/DBLSCAN is supported by output device. If
not, the mode containing the flag of INTERLACE/DBLSCAN will be marked
as unsupported.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
16 years agodrm: fix typo in edid vendor parsing.
Dave Airlie [Thu, 2 Apr 2009 23:10:33 +0000 (09:10 +1000)]
drm: fix typo in edid vendor parsing.

Should be,

    edid_vendor[2] = (edid->mfg_id[1] & 0x1f) +  '@';

Since vendor ID has only two bytes only, I am somewhat surprised why gcc
doesn't complain this.

Reported-by: Guo, Chaohong <chaohong.guo@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
16 years agoDRM: drm_crtc_helper.h doesn't actually need i2c.h
Jean Delvare [Thu, 2 Apr 2009 09:52:24 +0000 (11:52 +0200)]
DRM: drm_crtc_helper.h doesn't actually need i2c.h

Remove an include that isn't actually needed to prevent needless
rebuilds.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
16 years agodrm: fix missing inline function on 32-bit powerpc.
Dave Airlie [Tue, 31 Mar 2009 04:14:39 +0000 (15:14 +1100)]
drm: fix missing inline function on 32-bit powerpc.

The readq/writeq really need to be static inline on the arches which
don't provide them.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
16 years agoACPI: acpi_enforce_resource=strict by default
Luca Tettamanti [Sun, 29 Mar 2009 22:01:27 +0000 (00:01 +0200)]
ACPI: acpi_enforce_resource=strict by default

Enforce strict resource checking - disallowing access by native
drivers to IO ports and memory regions claimed by ACPI firmware.

The patch is mainly aimed to block native hwmon drivers from touching
monitoring chips that ACPI thinks it own.

If this causes a regression, boot with "acpi_enforce_resources=lax"
which was the previous default.

http://bugzilla.kernel.org/show_bug.cgi?id=12376
http://bugzilla.kernel.org/show_bug.cgi?id=12541

Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>
16 years ago[ARM] fix build-breaking 7a192ec commit
Russell King [Thu, 2 Apr 2009 22:23:43 +0000 (23:23 +0100)]
[ARM] fix build-breaking 7a192ec commit

The commit:

    platform driver: fix incorrect use of 'platform_bus_type' with 'struct device_driver'

contains this:

-static int __exit pxa2xx_flash_remove(struct device *dev)
+static int __exit pxa2xx_flash_remove(struct platform_device *dev)
...
-       .remove         = __exit_p(pxa2xx_flash_remove),
+       .remove         = __devexit_p(pxa2xx_flash_remove),

which leads to the following build error:

`pxa2xx_flash_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.exit.text' of drivers/built-in.o

This is not the only instance of it in this patch - all __exit_p's
touched by this patch have been converted to __devexit_p's without
regard to the original function.

Let's revert this change and, if we are going to convert functions
to be __devexit/__devinit, lets have that as a _separate_ patch doing
just that change.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
16 years agoMerge branch 'smsc911x-armplatforms' of git://github.com/steveglen/linux-2.6
Russell King [Thu, 2 Apr 2009 22:22:11 +0000 (23:22 +0100)]
Merge branch 'smsc911x-armplatforms' of git://github.com/steveglen/linux-2.6

16 years agodrm: Use pgprot_writecombine in GEM GTT mapping to get the right bits for !PAT.
Jesse Barnes [Wed, 1 Apr 2009 01:22:31 +0000 (18:22 -0700)]
drm: Use pgprot_writecombine in GEM GTT mapping to get the right bits for !PAT.

Otherwise, the PAGE_CACHE_WC would end up getting us a UC-only mapping, and
the write performance of GTT maps dropped 10x.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[anholt: cleaned up unused var]
Signed-off-by: Eric Anholt <eric@anholt.net>
16 years agopowerpc/math-emu: Change types to work on ppc64
Kumar Gala [Thu, 2 Apr 2009 21:17:36 +0000 (16:17 -0500)]
powerpc/math-emu: Change types to work on ppc64

While normally we don't use the math emulation code on ppc64 it can be
useful for doing things like emulating the embedded FP instructions.

Since performance isn't critical in this scenario its easier to keep
the sizes of the various math-emu the same as on ppc32.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
16 years agoBtrfs: BUG to BUG_ON changes
Stoyan Gaydarov [Thu, 2 Apr 2009 21:05:11 +0000 (17:05 -0400)]
Btrfs: BUG to BUG_ON changes

Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agofsl_pq_mdio: Fix compile failure
Segher Boessenkool [Thu, 2 Apr 2009 20:57:30 +0000 (13:57 -0700)]
fsl_pq_mdio: Fix compile failure

Add EXPORT_SYMBOL_GPL(fsl_pq_mdio_bus_name) for module builds

Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoBtrfs: remove dead code
Dan Carpenter [Thu, 2 Apr 2009 20:46:06 +0000 (16:46 -0400)]
Btrfs: remove dead code

Remove an unneeded return statement and conditional

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: remove dead code
Dan Carpenter [Thu, 2 Apr 2009 20:46:06 +0000 (16:46 -0400)]
Btrfs: remove dead code

merge is always NULL at this point.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: fix typos in comments
Wu Fengguang [Thu, 2 Apr 2009 20:46:06 +0000 (16:46 -0400)]
Btrfs: fix typos in comments

Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: remove unused ftrace include
Jim Owens [Thu, 2 Apr 2009 21:02:55 +0000 (17:02 -0400)]
Btrfs: remove unused ftrace include

Signed-off-by: jim owens <jowens@hp.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: fix __ucmpdi2 compile bug on 32 bit builds
Heiko Carstens [Fri, 3 Apr 2009 14:33:45 +0000 (10:33 -0400)]
Btrfs: fix __ucmpdi2 compile bug on 32 bit builds

We get this on 32 builds:

fs/built-in.o: In function `extent_fiemap':
(.text+0x1019f2): undefined reference to `__ucmpdi2'

Happens because of a switch statement with a 64 bit argument.
Convert this to an if statement to fix this.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: free inode struct when btrfs_new_inode fails
Shen Feng [Thu, 2 Apr 2009 20:46:06 +0000 (16:46 -0400)]
Btrfs: free inode struct when btrfs_new_inode fails

btrfs_new_inode doesn't call iput to free the inode
when it fails.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: fix race in worker_loop
Amit Gud [Thu, 2 Apr 2009 21:01:27 +0000 (17:01 -0400)]
Btrfs: fix race in worker_loop

Need to check kthread_should_stop after schedule_timeout() before calling
schedule(). This causes threads to sleep with potentially no one to wake them
up causing mount(2) to hang in btrfs_stop_workers waiting for threads to stop.

Signed-off-by: Amit Gud <gud@ksu.edu>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: add flushoncommit mount option
Sage Weil [Thu, 2 Apr 2009 20:59:01 +0000 (16:59 -0400)]
Btrfs: add flushoncommit mount option

The 'flushoncommit' mount option forces any data dirtied by a write in a
prior transaction to commit as part of the current commit.  This makes
the committed state a fully consistent view of the file system from the
application's perspective (i.e., it includes all completed file system
operations).  This was previously the behavior only when a snapshot is
created.

This is used by Ceph to ensure that completed writes make it to the
platter along with the metadata operations they are bound to (by
BTRFS_IOC_TRANS_{START,END}).

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: notreelog mount option
Sage Weil [Thu, 2 Apr 2009 20:49:40 +0000 (16:49 -0400)]
Btrfs: notreelog mount option

Add a 'notreelog' mount option to disable the tree log (used by fsync,
O_SYNC writes).  This is much slower, but the tree logging produces
inconsistent views into the FS for ceph.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: introduce btrfs_show_options
Eric Paris [Thu, 2 Apr 2009 20:46:06 +0000 (16:46 -0400)]
Btrfs: introduce btrfs_show_options

btrfs options can change at times other than mount, yet /proc/mounts shows the
options string used when the fs was mounted (an example would be when btrfs
determines that barriers aren't useful and turns them off.)  This patch
instead outputs the actual options in use by btrfs.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: rework allocation clustering
Chris Mason [Fri, 3 Apr 2009 13:47:43 +0000 (09:47 -0400)]
Btrfs: rework allocation clustering

Because btrfs is copy-on-write, we end up picking new locations for
blocks very often.  This makes it fairly difficult to maintain perfect
read patterns over time, but we can at least do some optimizations
for writes.

This is done today by remembering the last place we allocated and
trying to find a free space hole big enough to hold more than just one
allocation.  The end result is that we tend to write sequentially to
the drive.

This happens all the time for metadata and it happens for data
when mounted -o ssd.  But, the way we record it is fairly racey
and it tends to fragment the free space over time because we are trying
to allocate fairly large areas at once.

This commit gets rid of the races by adding a free space cluster object
with dedicated locking to make sure that only one process at a time
is out replacing the cluster.

The free space fragmentation is somewhat solved by allowing a cluster
to be comprised of smaller free space extents.  This part definitely
adds some CPU time to the cluster allocations, but it allows the allocator
to consume the small holes left behind by cow.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: Optimize locking in btrfs_next_leaf()
Chris Mason [Fri, 3 Apr 2009 14:14:18 +0000 (10:14 -0400)]
Btrfs: Optimize locking in btrfs_next_leaf()

btrfs_next_leaf was using blocking locks when it could have been using
faster spinning ones instead.  This adds a few extra checks around
the pieces that block and switches over to spinning locks.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: break up btrfs_search_slot into smaller pieces
Chris Mason [Fri, 3 Apr 2009 14:14:18 +0000 (10:14 -0400)]
Btrfs: break up btrfs_search_slot into smaller pieces

btrfs_search_slot was doing too many things at once.  This breaks
it up into more reasonable units.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: kill the pinned_mutex
Josef Bacik [Fri, 3 Apr 2009 14:14:18 +0000 (10:14 -0400)]
Btrfs: kill the pinned_mutex

This patch removes the pinned_mutex.  The extent io map has an internal tree
lock that protects the tree itself, and since we only copy the extent io map
when we are committing the transaction we don't need it there.  We also don't
need it when caching the block group since searching through the tree is also
protected by the internal map spin lock.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
16 years agoBtrfs: kill the block group alloc mutex
Josef Bacik [Fri, 3 Apr 2009 14:14:18 +0000 (10:14 -0400)]
Btrfs: kill the block group alloc mutex

This patch removes the block group alloc mutex used to protect the free space
tree for allocations and replaces it with a spin lock which is used only to
protect the free space rb tree.  This means we only take the lock when we are
directly manipulating the tree, which makes us a touch faster with
multi-threaded workloads.

This patch also gets rid of btrfs_find_free_space and replaces it with
btrfs_find_space_for_alloc, which takes the number of bytes you want to
allocate, and empty_size, which is used to indicate how much free space should
be at the end of the allocation.

It will return an offset for the allocator to use.  If we don't end up using it
we _must_ call btrfs_add_free_space to put it back.  This is the tradeoff to
kill the alloc_mutex, since we need to make sure nobody else comes along and
takes our space.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
16 years agoBtrfs: clean up find_free_extent
Josef Bacik [Fri, 3 Apr 2009 14:14:19 +0000 (10:14 -0400)]
Btrfs: clean up find_free_extent

I've replaced the strange looping constructs with a list_for_each_entry on
space_info->block_groups.  If we have a hint we just jump into the loop with
the block group and start looking for space.  If we don't find anything we
start at the beginning and start looking.  We never come out of the loop with a
ref on the block_group _unless_ we found space to use, then we drop it after we
set the trans block_group.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
16 years agoBtrfs: free space cache cleanups
Josef Bacik [Fri, 3 Apr 2009 14:14:19 +0000 (10:14 -0400)]
Btrfs: free space cache cleanups

This patch cleans up the free space cache code a bit.  It better documents the
idiosyncrasies of tree_search_offset and makes the code make a bit more sense.
I took out the info allocation at the start of __btrfs_add_free_space and put it
where it makes more sense.  This was left over cruft from when alloc_mutex
existed.  Also all of the re-searches we do to make sure we inserted properly.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
16 years agoBtrfs: unplug in the async bio submission threads
Chris Mason [Fri, 3 Apr 2009 14:32:58 +0000 (10:32 -0400)]
Btrfs: unplug in the async bio submission threads

Btrfs pages being written get set to writeback, and then may go through
a number of steps before they hit the block layer.  This includes compression,
checksumming and async bio submission.

The end result is that someone who writes a page and then does
wait_on_page_writeback is likely to unplug the queue before the bio they
cared about got there.

We could fix this by marking bios sync, or by doing more frequent unplugs,
but this commit just changes the async bio submission code to unplug
after it has processed all the bios for a device.  The async bio submission
does a fair job of collection bios, so this shouldn't be a huge problem
for reducing merging at the elevator.

For streaming O_DIRECT writes on a 5 drive array, it boosts performance
from 386MB/s to 460MB/s.

Thanks to Hisashi Hifumi for helping with this work.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoBtrfs: keep processing bios for a given bdev if our proc is batching
Chris Mason [Fri, 3 Apr 2009 14:27:10 +0000 (10:27 -0400)]
Btrfs: keep processing bios for a given bdev if our proc is batching

Btrfs uses async helper threads to submit write bios so the checksumming
helper threads don't block on the disk.

The submit bio threads may process bios for more than one block device,
so when they find one device congested they try to move on to other
devices instead of blocking in get_request_wait for one device.

This does a pretty good job of keeping multiple devices busy, but the
congested flag has a number of problems.  A congested device may still
give you a request, and other procs that aren't backing off the congested
device may starve you out.

This commit uses the io_context stored in current to decide if our process
has been made a batching process by the block layer.  If so, it keeps
sending IO down for at least one batch.  This helps make sure we do
a good amount of work each time we visit a bdev, and avoids large IO
stalls in multi-device workloads.

It's also very ugly.  A better solution is in the works with Jens Axboe.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
16 years agoACPI: simplify module_param namespace
Rusty Russell [Wed, 11 Mar 2009 22:37:19 +0000 (09:07 +1030)]
ACPI: simplify module_param namespace

Impact: cleanup

Rather than overriding MODULE_PARAM_PREFIX, build via acpi.o so
KBUILD_MODNAME is set to "acpi".

This is the logical way to do it, even though acpi cannot be a module
due to these config options being bool.  Those parts of ACPI which can
be modular are not built into the acpi "module".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Len Brown <len.brown@intel.com>
16 years agopowerpc/85xx: Re-add the device_type soc to socrates.dts
Wolfgang Grandegger [Thu, 2 Apr 2009 18:50:56 +0000 (20:50 +0200)]
powerpc/85xx: Re-add the device_type soc to socrates.dts

The device_type "soc" is still required for MPC85xx boards.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
16 years agodm: set queue ordered mode
Mikulas Patocka [Thu, 2 Apr 2009 18:55:39 +0000 (19:55 +0100)]
dm: set queue ordered mode

Set queue ordered mode.  It doesn't really matter what we set here
because we don't ever put any requests on the queue.  But we need to set
something other than QUEUE_ORDERED_NONE so that __generic_make_request
passes barrier requests to us.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: move wait queue declaration
Mikulas Patocka [Thu, 2 Apr 2009 18:55:39 +0000 (19:55 +0100)]
dm: move wait queue declaration

Move wait queue declaration and unplug to dm_wait_for_completion.

The purpose is to minimize duplicate code in the further patches.

The patch reorders functions a little bit. It doesn't change any
functionality. For proper non-deadlock operation, add_wait_queue must
happen before set_current_state(interruptible) and before the test for
!atomic_read(&md->pending).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: merge pushback and deferred bio lists
Mikulas Patocka [Thu, 2 Apr 2009 18:55:39 +0000 (19:55 +0100)]
dm: merge pushback and deferred bio lists

Merge pushback and deferred lists into one list - use deferred list
for both deferred and pushed-back bios.

This will be needed for proper support of barrier bios: it is impossible to
support ordering correctly with two lists because the requests on both lists
will be mixed up.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: allow uninterruptible wait for pending io
Mikulas Patocka [Thu, 2 Apr 2009 18:55:38 +0000 (19:55 +0100)]
dm: allow uninterruptible wait for pending io

Allow uninterruptible wait for pending IOs.

Add argument "interruptible" to dm_wait_for_completion that specifies
either interruptible or uninterruptible waiting.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: merge __flush_deferred_io into caller
Mikulas Patocka [Thu, 2 Apr 2009 18:55:38 +0000 (19:55 +0100)]
dm: merge __flush_deferred_io into caller

Merge __flush_deferred_io() into the only caller, dm_wq_work().

There's no need to have a function that has only one caller.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: move bio_io_error into __split_and_process_bio
Mikulas Patocka [Thu, 2 Apr 2009 18:55:38 +0000 (19:55 +0100)]
dm: move bio_io_error into __split_and_process_bio

Move the bio_io_error() calls directly into __split_and_process_bio().

This avoids some code duplication in later patches.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: rename __split_bio
Mikulas Patocka [Thu, 2 Apr 2009 18:55:37 +0000 (19:55 +0100)]
dm: rename __split_bio

Rename __split_bio() to __split_and_process_bio() because it not only splits
the bio to serveral parts, but also submits them to target drivers.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: remove unnecessary struct dm_wq_req
Mikulas Patocka [Thu, 2 Apr 2009 18:55:37 +0000 (19:55 +0100)]
dm: remove unnecessary struct dm_wq_req

Remove struct dm_wq_req and move "work" directly into struct mapped_device.

In the revised implementation, the thread will do just one type of work
(processing the queue).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: remove unnecessary work queue context field
Mikulas Patocka [Thu, 2 Apr 2009 18:55:36 +0000 (19:55 +0100)]
dm: remove unnecessary work queue context field

Remove the context field from struct dm_wq_req because we will no longer
need it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: remove unnecessary work queue type field
Mikulas Patocka [Thu, 2 Apr 2009 18:55:36 +0000 (19:55 +0100)]
dm: remove unnecessary work queue type field

Remove "type" field from struct dm_wq_req because we no longer need it
to have more than one value.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: bio list add bio_list_add_head
Mikulas Patocka [Thu, 2 Apr 2009 18:55:36 +0000 (19:55 +0100)]
dm: bio list add bio_list_add_head

Introduce a function that adds a bio to the head of the list for
use by the patch that will support barriers.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: persistent fix dtr cleanup
Jonathan Brassow [Thu, 2 Apr 2009 18:55:35 +0000 (19:55 +0100)]
dm snapshot: persistent fix dtr cleanup

The persistent exception store destructor does not properly
account for all conditions in which it can be called.  If it
is called after 'ctr' but before 'read_metadata' (e.g. if
something else in 'snapshot_ctr' fails) then it will attempt
to free areas of memory that haven't been allocated yet.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: move status to exception store
Jonathan Brassow [Thu, 2 Apr 2009 18:55:35 +0000 (19:55 +0100)]
dm snapshot: move status to exception store

Let the exception store types print out their status through
the new API, rather than having the snapshot code do it.

Adjust the buffer position to allow for the preceding DMEMIT in the
arguments to type->status().

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: move ctr parsing to exception store
Jonathan Brassow [Thu, 2 Apr 2009 18:55:34 +0000 (19:55 +0100)]
dm snapshot: move ctr parsing to exception store

First step of having the exception stores parse their own arguments -
generalizing the interface.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: use DMEMIT macro for status
Jonathan Brassow [Thu, 2 Apr 2009 18:55:34 +0000 (19:55 +0100)]
dm snapshot: use DMEMIT macro for status

Use DMEMIT in place of snprintf.  This makes it easier later when
other modules are helping to populate our status output.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: remove dm_snap header
Jonathan Brassow [Thu, 2 Apr 2009 18:55:34 +0000 (19:55 +0100)]
dm snapshot: remove dm_snap header

Move some of the last bits from dm-snap.h into dm-snap.c where they
belong and remove dm-snap.h.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: remove dm_snap header use
Jonathan Brassow [Thu, 2 Apr 2009 18:55:33 +0000 (19:55 +0100)]
dm snapshot: remove dm_snap header use

Move useful functions out of dm-snap.h and stop using dm-snap.h.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm exception store: move cow pointer
Jonathan Brassow [Thu, 2 Apr 2009 18:55:33 +0000 (19:55 +0100)]
dm exception store: move cow pointer

Move COW device from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm exception store: move chunk_fields
Jonathan Brassow [Thu, 2 Apr 2009 18:55:32 +0000 (19:55 +0100)]
dm exception store: move chunk_fields

Move chunk fields from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm exception store: move dm_target pointer
Jonathan Brassow [Thu, 2 Apr 2009 18:55:32 +0000 (19:55 +0100)]
dm exception store: move dm_target pointer

Move target pointer from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm exception store: introduce registry
Jonathan Brassow [Thu, 2 Apr 2009 18:55:31 +0000 (19:55 +0100)]
dm exception store: introduce registry

Move exception stores into a registry.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm raid1: add is_remote_recovering hook for clusters
Jonathan Brassow [Thu, 2 Apr 2009 18:55:30 +0000 (19:55 +0100)]
dm raid1: add is_remote_recovering hook for clusters

The logging API needs an extra function to make cluster mirroring
possible.  This new function allows us to check whether a mirror
region is being recovered on another machine in the cluster.  This
helps us prevent simultaneous recovery I/O and process I/O to the
same locations on disk.

Cluster-aware log modules will implement this function.  Single
machine log modules will not.  So, there is no performance
penalty for single machine mirrors.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>