Linus Torvalds [Tue, 14 Oct 2008 23:34:11 +0000 (16:34 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (56 commits)
ocfs2: Make cached block reads the common case.
ocfs2: Kill the last naked wait_on_buffer() for cached reads.
ocfs2: Move ocfs2_bread() into dir.c
ocfs2: Simplify ocfs2_read_block()
ocfs2: Require an inode for ocfs2_read_block(s)().
ocfs2: Separate out sync reads from ocfs2_read_blocks()
ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().
ocfs2: Calculate EA hash only by its suffix.
ocfs2: Move trusted and user attribute support into xattr.c
ocfs2: Uninline ocfs2_xattr_name_hash()
ocfs2: Don't check for NULL before brelse()
ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cache
ocfs2: Documentation update for user_xattr / nouser_xattr mount options
ocfs2: make la_debug_mutex static
ocfs2: Remove pointless !!
ocfs2: Add empty bucket support in xattr.
ocfs2/xattr.c: Fix a bug when inserting xattr.
ocfs2: Add xattr mount option in ocfs2_show_options()
ocfs2: Switch over to JBD2.
ocfs2: Add the 'inode64' mount option.
...
Bjorn Helgaas [Tue, 14 Oct 2008 23:01:59 +0000 (17:01 -0600)]
rtc-cmos: look for PNP RTC first, then for platform RTC
We shouldn't rely on "pnp_platform_devices" to tell us whether there
is a PNP RTC device.
I introduced "pnp_platform_devices", but I think it was a mistake.
All it tells us is whether we found any PNPBIOS or PNPACPI devices.
Many machines have some PNP devices, but do not describe the RTC
via PNP. On those machines, we need to do the platform driver probe
to find the RTC.
We should just register the PNP driver and see whether it claims anything.
If we don't find a PNP RTC, fall back to the platform driver probe.
This (in conjunction with the arch/x86/kernel/rtc.c patch to add
a platform RTC device when PNP doesn't have one) should resolve
these issues:
Linus Torvalds [Tue, 14 Oct 2008 19:31:14 +0000 (12:31 -0700)]
Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux: (59 commits)
svcrdma: Fix IRD/ORD polarity
svcrdma: Update svc_rdma_send_error to use DMA LKEY
svcrdma: Modify the RPC reply path to use FRMR when available
svcrdma: Modify the RPC recv path to use FRMR when available
svcrdma: Add support to svc_rdma_send to handle chained WR
svcrdma: Modify post recv path to use local dma key
svcrdma: Add a service to register a Fast Reg MR with the device
svcrdma: Query device for Fast Reg support during connection setup
svcrdma: Add FRMR get/put services
NLM: Remove unused argument from svc_addsock() function
NLM: Remove "proto" argument from lockd_up()
NLM: Always start both UDP and TCP listeners
lockd: Remove unused fields in the nlm_reboot structure
lockd: Add helper to sanity check incoming NOTIFY requests
lockd: change nlmclnt_grant() to take a "struct sockaddr *"
lockd: Adjust nlmsvc_lookup_host() to accomodate AF_INET6 addresses
lockd: Adjust nlmclnt_lookup_host() signature to accomodate non-AF_INET
lockd: Support non-AF_INET addresses in nlm_lookup_host()
NLM: Convert nlm_lookup_host() to use a single argument
svcrdma: Add Fast Reg MR Data Types
...
Linus Torvalds [Tue, 14 Oct 2008 19:28:02 +0000 (12:28 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot:
raid, fastboot: hide RAID autodetect option if MD is compiled as a module
raid: make RAID autodetect default a KConfig option
warning: fix init do_mounts_md c
fastboot: make the RAID autostart code print a message just before waiting
fastboot: make the raid autodetect code wait for all devices to init
fastboot: Fix bootgraph.pl initcall name regexp
fastboot: fix issues and improve output of bootgraph.pl
Add a script to visualize the kernel boot process / time
- afa9b649 "fbcon: prevent cursor disappearance after switching to 512
character font"
- d850a2fa "vt/fbcon: fix background color on line feed"
- 7fe3915a "vt/fbcon: update scrl_erase_char after 256/512-glyph font
switch"
by request of Alan Cox. Quoth Alan:
"Unfortunately it's wrong and its been causing breakages because
various apps like ncurses expect our previous (and correct)
behaviour."
Alexander sent out a similar patch.
Requested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Tested-by: Jan Engelhardt <jengelh@medozas.de> Cc: Alexander V. Lukyanov <lav@netis.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joel Becker [Fri, 10 Oct 2008 00:20:34 +0000 (17:20 -0700)]
ocfs2: Make cached block reads the common case.
ocfs2_read_blocks() currently requires the CACHED flag for cached I/O.
However, that's the common case. Let's flip it around and provide an
IGNORE_CACHE flag for the special users. This has the added benefit of
cleaning up the code some (ignore_cache takes on its special meaning
earlier in the loop).
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 10 Oct 2008 00:20:33 +0000 (17:20 -0700)]
ocfs2: Kill the last naked wait_on_buffer() for cached reads.
ocfs2's cached buffer I/O goes through ocfs2_read_block(s)(). dir.c had
a naked wait_on_buffer() to wait for some readahead, but it should
use ocfs2_read_block() instead.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 10 Oct 2008 00:20:31 +0000 (17:20 -0700)]
ocfs2: Simplify ocfs2_read_block()
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
Only six pass a different flag set. Rather than have every caller care,
let's make ocfs2_read_block() take no flags and always do a cached read.
The remaining six places can call ocfs2_read_blocks() directly.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 10 Oct 2008 00:20:30 +0000 (17:20 -0700)]
ocfs2: Require an inode for ocfs2_read_block(s)().
Now that synchronous readers are using ocfs2_read_blocks_sync(), all
callers of ocfs2_read_blocks() are passing an inode. Use it
unconditionally. Since it's there, we don't need to pass the
ocfs2_super either.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 10 Oct 2008 00:20:29 +0000 (17:20 -0700)]
ocfs2: Separate out sync reads from ocfs2_read_blocks()
The ocfs2_read_blocks() function currently handles sync reads, cached,
reads, and sometimes cached reads. We're going to add some
functionality to it, so first we should simplify it. The uncached,
synchronous reads are much easer to handle as a separate function, so we
instroduce ocfs2_read_blocks_sync().
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Julia Lawall [Mon, 13 Oct 2008 19:59:04 +0000 (21:59 +0200)]
arch/m68k/mm/kmap.c: introduce missing kfree
Error handling code following a kmalloc should free the allocated data.
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,l;
position p1,p2;
expression *ptr != NULL;
@@
(
if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
|
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
..
if (x == NULL) S
)
<... when != x
when != if (...) { <+...x...+> }
x->f = E
..>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
| arch/m68k/kernel/ints.c:433: error: redefinition of 'init_irq_proc'
| include/linux/interrupt.h:438: error: previous definition of 'init_irq_proc' was here
This was introduced by commit 6168a702ab0be181e5e57a0b2d0e7376f7a47f0b
("Declare init_irq_proc before we use it."), which replaced the #ifdef
protection of the init_irq_proc() call by a static inline dummy if
CONFIG_PROC_FS is not set.
Make init_irq_proc() depend on CONFIG_PROC_FS to fix this.
HP input: kill warnings due to suseconds_t differences
Kill compiler warnings related to printf() formats in the input drivers for
various HP9000 machines, which are shared between PA-RISC (suseconds_t is int)
and m68k (suseconds_t is long). As both are 32-bit, it's safe to cast to int.
| include/linux/ssb/ssb.h: In function 'ssb_dma_mapping_error':
| include/linux/ssb/ssb.h:430: error: implicit declaration of function 'pci_dma_mapping_error'
| include/linux/ssb/ssb.h: In function 'ssb_dma_map_single':
| include/linux/ssb/ssb.h:444: error: implicit declaration of function 'pci_map_single'
| include/linux/ssb/ssb.h: In function 'ssb_dma_unmap_single':
| include/linux/ssb/ssb.h:458: error: implicit declaration of function 'pci_unmap_single'
| include/linux/ssb/ssb.h: In function 'ssb_dma_sync_single_for_cpu':
| include/linux/ssb/ssb.h:475: error: implicit declaration of function 'pci_dma_sync_single_for_cpu'
| include/linux/ssb/ssb.h: In function 'ssb_dma_sync_single_for_device':
| include/linux/ssb/ssb.h:493: error: implicit declaration of function 'pci_dma_sync_single_for_device'
or legacy drivers:
| drivers/net/hp100.c: In function 'pdl_map_data':
| drivers/net/hp100.c:291: error: implicit declaration of function 'pci_map_single'
| drivers/net/hp100.c: In function 'hp100_probe1':
| drivers/net/hp100.c:707: error: implicit declaration of function 'pci_alloc_consistent'
| drivers/net/hp100.c:782: error: implicit declaration of function 'pci_free_consistent'
| drivers/net/hp100.c: In function 'hp100_clean_txring':
| drivers/net/hp100.c:1614: error: implicit declaration of function 'pci_unmap_single'
and
| drivers/scsi/aic7xxx_old.c: In function 'aic7xxx_allocate_scb':
| drivers/scsi/aic7xxx_old.c:2573: error: implicit declaration of function 'pci_alloc_consistent'
| drivers/scsi/aic7xxx_old.c: In function 'aic7xxx_done':
| drivers/scsi/aic7xxx_old.c:2697: error: implicit declaration of function 'pci_unmap_single'
| drivers/scsi/aic7xxx_old.c: In function 'aic7xxx_handle_seqint':
| drivers/scsi/aic7xxx_old.c:4275: error: implicit declaration of function 'pci_map_single'
| drivers/scsi/aic7xxx_old.c: In function 'aic7xxx_free':
| drivers/scsi/aic7xxx_old.c:8460: error: implicit declaration of function 'pci_free_consistent'
rely on PCI DMA operations to be always available.
Add #include <asm-generic/pci-dma-compat.h> to <asm/pci.h> to make them happy.
| include/linux/ssb/ssb.h: In function 'ssb_dma_sync_single_range_for_cpu':
| include/linux/ssb/ssb.h:517: error: implicit declaration of function 'dma_sync_single_range_for_cpu'
| include/linux/ssb/ssb.h: In function 'ssb_dma_sync_single_range_for_device':
| include/linux/ssb/ssb.h:538: error: implicit declaration of function 'dma_sync_single_range_for_device'
Add the missing dma_sync_single_range_for_{cpu,device}(), and remove the
`inline' for the non-static function dma_sync_single_for_device().
The nvram and rtc-cmos drivers use the spinlock rtc_lock to protect against
concurrent accesses to the CMOS memory. As m68k doesn't support SMP or preempt
yet, the spinlock calls tend to get optimized away, but not for all
configurations, causing in some rare cases:
Currently Sun 3 support is the first platform option, as the Sun 3 MMU is
incompatible with standard Motorola MMUs. However, this means that
`allmodconfig' enables support for Sun 3, and thus disables support for all
other platforms.
Reverse the logic and move Sun 3 last, so `allmodconfig' enables all
platforms except for Sun 3, increasing compile-coverage.
Alan Cox [Tue, 14 Oct 2008 10:29:06 +0000 (11:29 +0100)]
8250: Fix lock warning (and possible crash)
Splitting the 8250 code back up to avoid a clash with the NR_IRQS removal
patch introduced a last minute bug. Put back the additional needed lines
for the old lock init
Signed-off-by: Alan Cox <alan@redhat.com>
[ Ingo also reports that this can cause a spontaneous reboot crash with
certain configs, and sends in an identical patch ] Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ron Mercer [Tue, 14 Oct 2008 05:55:59 +0000 (22:55 -0700)]
qlge: Fix page size ifdef test.
This ASIC does support all page sizes. For 4k and 8k page size the TX
control block needs an external scatter gather list. For page sizes
larger than 8k the max frags is satisfied by the original TX control
block.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alan Cox [Tue, 14 Oct 2008 02:01:08 +0000 (19:01 -0700)]
net: Rationalise email address: Network Specific Parts
Clean up the various different email addresses of mine listed in the code
to a single current and valid address. As Dave says his network merges
for 2.6.28 are now done this seems a good point to send them in where
they won't risk disrupting real changes.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/built-in.o: In function `phy_stop_interrupts':
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:631: undefined reference to `free_irq'
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:646: undefined reference to `enable_irq'
drivers/built-in.o: In function `phy_start_interrupts':
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:601: undefined reference to `request_irq'
drivers/built-in.o: In function `phy_interrupt':
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:528: undefined reference to `disable_irq_nosync'
drivers/built-in.o: In function `phy_change':
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:674: undefined reference to `enable_irq'
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:692: undefined reference to `disable_irq'
PHYLIB has alread a depend on !S390, however select PHYLIB at DSA overrides
that unfortunately. So add a depend on !S390 to DSA as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Tue, 14 Oct 2008 01:54:07 +0000 (18:54 -0700)]
netns: mib6 section fixlet
LD net/ipv6/ipv6.o
WARNING: net/ipv6/ipv6.o(.text+0xd8): Section mismatch in reference from the function inet6_net_init() to the function .init.text:ipv6_init_mibs()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Langer [Tue, 14 Oct 2008 01:49:38 +0000 (18:49 -0700)]
de2104x: wrong MAC address fix
The de2104x returns sometimes a wrong MAC address. The wrong one is
like the original one, but it comes with an one byte shift. I found
this bug on an older alpha ev5 cpu. More details are available in Gentoo
bugreport #240718.
It seems the hardware is sometimes a little bit too slow for an
immediate access. This patch solves the problem by introducing a small
udelay.
Signed-off-by: Martin Langer <martin-langer@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 14 Oct 2008 01:43:59 +0000 (18:43 -0700)]
pktgen: fix skb leak in case of failure
Seems that skb goes into void unless something magic happened
in pskb_expand_head in case of failure.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Tue, 14 Oct 2008 01:42:55 +0000 (18:42 -0700)]
mISDN/dsp_cmx.c: fix size checks
The checks for ensuring that the array indices are inside the range
were flipped.
Reported-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
CC drivers/net/enic/enic_main.o
drivers/net/enic/enic_main.c: In function â\80\98enic_queue_wq_skb_tsoâ\80\99:
drivers/net/enic/enic_main.c:576: error: implicit declaration of function â\80\98csum_ipv6_magicâ\80\99
make[3]: *** [drivers/net/enic/enic_main.o] Error 1
drivers/net/qlge/qlge_main.c: In function â\80\98ql_tsoâ\80\99:
drivers/net/qlge/qlge_main.c:1862: error: implicit declaration of function â\80\98csum_ipv6_magicâ\80\99
make[3]: *** [drivers/net/qlge/qlge_main.o] Error 1
drivers/net/jme.c: In function â\80\98jme_tx_tsoâ\80\99:
drivers/net/jme.c:1784: error: implicit declaration of function â\80\98csum_ipv6_magicâ\80\99
make[2]: *** [drivers/net/jme.o] Error 1
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tao Ma [Thu, 9 Oct 2008 15:06:14 +0000 (23:06 +0800)]
ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().
According to Christoph Hellwig's advice, we really don't need
a ->list to handle one xattr's list. Just a map from index to
xattr prefix is enough. And I also refactor the old list method
with the reference from fs/xfs/linux-2.6/xfs_xattr.c and the
xattr list method in btrfs.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Fri, 19 Sep 2008 14:17:41 +0000 (22:17 +0800)]
ocfs2: Add empty bucket support in xattr.
As Mark mentioned, it may be time-consuming when we remove the
empty xattr bucket, so this patch try to let empty bucket exist
in xattr operation. The modification includes:
1. Remove the functin of bucket and extent record deletion during
xattr delete.
2. In xattr set:
1) Don't clean the last entry so that if the bucket is empty,
the hash value of the bucket is the hash value of the entry
which is deleted last.
2) During insert, if we meet with an empty bucket, just use the
1st entry.
3. In binary search of xattr bucket, use the bucket hash value(which
stored in the 1st xattr entry) to find the right place.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Fri, 19 Sep 2008 14:16:34 +0000 (22:16 +0800)]
ocfs2/xattr.c: Fix a bug when inserting xattr.
During the process of xatt insertion, we use binary search
to find the right place and "low" is set to it. But when
there is one xattr which has the same name hash as the inserted
one, low is the wrong value. So set it to the right position.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Thu, 4 Sep 2008 03:03:41 +0000 (20:03 -0700)]
ocfs2: Switch over to JBD2.
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.
It's a pretty trivial change. Most functions are just renamed. The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.
Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem. It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.
We provide a compatibility option so that paranoid people can still use
JBD for the time being. This will go away shortly.
[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
ocfs2_truncate_for_delete(). --Mark ]
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Thu, 4 Sep 2008 03:03:40 +0000 (20:03 -0700)]
ocfs2: Add the 'inode64' mount option.
Now that ocfs2 limits inode numbers to 32bits, add a mount option to
disable the limit. This parallels XFS. 64bit systems can handle the
larger inode numbers.
[ Added description of inode64 mount option in ocfs2.txt. --Mark ]
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Thu, 4 Sep 2008 03:03:39 +0000 (20:03 -0700)]
ocfs2: Limit inode allocation to 32bits.
ocfs2 inode numbers are block numbers. For any filesystem with less
than 2^32 blocks, this is not a problem. However, when ocfs2 starts
using JDB2, it will be able to support filesystems with more than 2^32
blocks. This would result in inode numbers higher than 2^32.
The problem is that stat(2) can't handle those numbers on 32bit
machines. The simple solution is to have ocfs2 allocate all inodes
below that boundary.
The suballoc code is changed to honor an optional block limit. Only the
inode suballocator sets that limit - all other allocations stay unlimited.
The biggest trick is to grow the inode suballocator beneath that limit.
There's no point in allocating block groups that are above the limit,
then rejecting their elements later on. We want to prevent the inode
allocator from ever having block groups above the limit. This involves
a little gyration with the local alloc code. If the local alloc window
is above the limit, it signals the caller to try the global bitmap but
does not disable the local alloc file (which can be used for other
allocations).
[ Minor cleanup - removed an ML_NOTICE comment. --Mark ]
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Fri, 29 Aug 2008 01:00:19 +0000 (09:00 +0800)]
ocfs2: Resolve deadlock in ocfs2_xattr_free_block.
In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we
have a transaction open. This will deadlock the downconvert thread, so fix
it.
We can clean up how xattr blocks are removed while here - this patch also
moves the mechanism of releasing xattr block (including both value, xattr
tree and xattr block) into this function.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 1 Sep 2008 00:45:18 +0000 (08:45 +0800)]
ocfs2: bug-fix for journal extend in xattr.
In ocfs2_extend_trans, when we can't extend the current
transaction, it will commit current transaction and restart
a new one. So if the previous credits we have allocated aren't
used(the block isn't dirtied before our extend), we will not
have enough credits for any future operation(it will cause jbd
complain and bug out). So check this and re-extend it.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 22 Aug 2008 19:46:09 +0000 (12:46 -0700)]
ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree()
The original get/put_extent_tree() functions held a reference on
et_root_bh. However, every single caller already has a safe reference,
making the get/put cycle irrelevant.
We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It
no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is
removed. Callers now have a simpler init+use pattern.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
struct ocfs2_extent_tree_operations provides methods for the different
on-disk btrees in ocfs2. Describing what those methods do is probably a
good idea.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Thu, 21 Aug 2008 02:36:33 +0000 (19:36 -0700)]
ocfs2: Make ocfs2_extent_tree the first-class representation of a tree.
We now have three different kinds of extent trees in ocfs2: inode data
(dinode), extended attributes (xattr_tree), and extended attribute
values (xattr_value). There is a nice abstraction for them,
ocfs2_extent_tree, but it is hidden in alloc.c. All the calling
functions have to pick amongst a varied API and pass in type bits and
often extraneous pointers.
A better way is to make ocfs2_extent_tree a first-class object.
Everyone converts their object to an ocfs2_extent_tree() via the
ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all
tree calls to alloc.c.
This simplifies a lot of callers, making for readability. It also
provides an easy way to add additional extent tree types, as they only
need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree()
function.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Thu, 21 Aug 2008 01:32:45 +0000 (18:32 -0700)]
ocfs2: Add an insertion check to ocfs2_extent_tree_operations.
A couple places check an extent_tree for a valid inode. We move that
out to add an eo_insert_check() operation. It can be called from
ocfs2_insert_extent() and elsewhere.
We also have the wrapper calls ocfs2_et_insert_check() and
ocfs2_et_sanity_check() ignore NULL ops. That way we don't have to
provide useless operations for xattr types.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Thu, 21 Aug 2008 00:44:24 +0000 (17:44 -0700)]
ocfs2: Create specific get_extent_tree functions.
A caller knows what kind of extent tree they have. There's no reason
they have to call ocfs2_get_extent_tree() with a NULL when they could
just as easily call a specific function to their type of extent tree.
Introduce ocfs2_dinode_get_extent_tree(),
ocfs2_xattr_tree_get_extent_tree(), and
ocfs2_xattr_value_get_extent_tree(). They only take the necessary
arguments, calling into the underlying __ocfs2_get_extent_tree() to do
the real work.
__ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but
without needing any switch-by-type logic.
ocfs2_get_extent_tree() is now a wrapper around the specific calls. It
exists because a couple alloc.c functions can take et_type. This will
go later.
Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a
struct ocfs2_xattr_value_root* instead of void*. This gives us
typechecking where we didn't have it before.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Thu, 21 Aug 2008 00:09:42 +0000 (17:09 -0700)]
ocfs2: Use struct ocfs2_extent_tree in ocfs2_num_free_extents().
ocfs2_num_free_extents() re-implements the logic of
ocfs2_get_extent_tree(). Now that ocfs2_get_extent_tree() does not
allocate, let's use it in ocfs2_num_free_extents() to simplify the code.
The inode validation code in ocfs2_num_free_extents() is not needed.
All callers are passing in pre-validated inodes.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 20 Aug 2008 23:57:27 +0000 (16:57 -0700)]
ocfs2: Make 'private' into 'object' on ocfs2_extent_tree.
The 'private' pointer was a way to store off xattr values, which don't
live at a set place in the bh. But the concept of "the object
containing the extent tree" is much more generic. For an inode it's the
struct ocfs2_dinode, for an xattr value its the value. Let's save off
the 'object' at all times. If NULL is passed to
ocfs2_get_extent_tree(), 'object' is set to bh->b_data;
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 20 Aug 2008 23:48:35 +0000 (16:48 -0700)]
ocfs2: Make ocfs2_extent_tree get/put instead of alloc.
Rather than allocating a struct ocfs2_extent_tree, just put it on the
stack. Fill it with ocfs2_get_extent_tree() and drop it with
ocfs2_put_extent_tree(). Now the callers don't have to ENOMEM, yet
still safely ref the root_bh.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 20 Aug 2008 23:25:06 +0000 (16:25 -0700)]
ocfs2: Prefix the extent tree operations structure.
The ocfs2_extent_tree_operations structure gains a field prefix on its
members. The ->eo_sanity_check() operation gains a wrapper function for
completeness. All of the extent tree operation wrappers gain a
consistent name (ocfs2_et_*()).
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Mark Fasheh [Tue, 19 Aug 2008 17:54:29 +0000 (10:54 -0700)]
ocfs2: fix printk format warnings
This patch fixes the following build warnings:
fs/ocfs2/xattr.c: In function 'ocfs2_half_xattr_bucket':
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
fs/ocfs2/xattr.c: In function 'ocfs2_xattr_set_entry_in_bucket':
fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
Tiger Yang [Mon, 18 Aug 2008 09:11:46 +0000 (17:11 +0800)]
ocfs2: Add incompatible flag for extended attribute
This patch adds the s_incompat flag for extended attribute support. This
helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
to mount a volume with xattr support.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:53 +0000 (17:38 +0800)]
ocfs2: Enable xattr set in index btree
Where the previous patches added the ability of list/get xattr in buckets
for ocfs2, this patch enables ocfs2 to store large numbers of EAs.
The original design doc is written by Mark Fasheh, and it can be found in
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/IndexedEATrees. I only had to
make small modifications to it.
First, because the bucket size is 4K, a new field named xh_free_start is added
in ocfs2_xattr_header to indicate the next valid name/value offset in a bucket.
It is used when we store new EA name/value. With this field, we can find the
place more quickly and what's more, we don't need to sort the name/value every
time to let the last entry indicate the next unused space. This makes the
insert operation more efficient for blocksizes smaller than 4k.
Because of the new xh_free_start, another field named as xh_name_value_len is
also added in ocfs2_xattr_header. It records the total length of all the
name/values in the bucket. We need this so that we can check it and defragment
the bucket if there is not enough contiguous free space.
An xattr insertion looks like this:
1. xattr_index_block_find: find the right bucket by the name_hash, say bucketA.
2. check whether there is enough space in bucketA. If yes, insert it directly
and modify xh_free_start and xh_name_value_len accordingly. If not, check
xh_name_value_len to see whether we can store this by defragment the bucket.
If yes, defragment it and go on insertion.
3. If defragement doesn't work, check whether there is new empty bucket in
the clusters within this extent record. If yes, init the new bucket and move
all the buckets after bucketA one by one to the next bucket. Move half of the
entries in bucketA to the next bucket and go on insertion.
4. If there is no new bucket, grow the extent tree.
As for xattr deletion, we will delete an xattr bucket when all it's xattrs
are removed and move all the buckets after it to the previous one. When all
the xattr buckets in an extend record are freed, free this extend records
from ocfs2_xattr_tree.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:52 +0000 (17:38 +0800)]
ocfs2: Optionally limit extent size in ocfs2_insert_extent()
In xattr bucket, we want to limit the maximum size of a btree leaf,
otherwise we'll lose the benefits of hashing because we'll have to search
large leaves.
So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster
size we want so that we can prevent ocfs2_insert_extent() from merging the leaf
record even if it is contiguous with an adjacent record.
Other btree types are not affected by this change.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:51 +0000 (17:38 +0800)]
ocfs2: Add xattr lookup code xattr btrees
Add code to lookup a given extended attribute in the xattr btree. Lookup
follows this general scheme:
1. Use ocfs2_xattr_get_rec to find the xattr extent record
2. Find the xattr bucket within the extent which may contain this xattr
3. Iterate the bucket to find the xattr. In ocfs2_xattr_block_get(), we need
to recalcuate the block offset and name offset for the right position of
name/value.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:50 +0000 (17:38 +0800)]
ocfs2: Add xattr bucket iteration for large numbers of EAs
Ocfs2 breaks up xattr index tree leaves into 4k regions, called buckets.
Attributes are stored within a given bucket, depending on hash value.
After a discussion with Mark, we decided that the per-bucket index
(xe_entry[]) would only exist in the 1st block of a bucket. Likewise,
name/value pairs will not straddle more than one block. This allows the
majority of operations to work directly on the buffer heads in a leaf block.
This patch adds code to iterate the buckets in an EA. A new abstration of
ocfs2_xattr_bucket is added. It records the bhs in this bucket and
ocfs2_xattr_header. This keeps the code neat, improving readibility.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:49 +0000 (17:38 +0800)]
ocfs2: Add xattr index tree operations
When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to
store large numbers of EAs. This patch adds a new type in
ocfs2_extent_tree_type and adds the implementation so that we can re-use the
b-tree code to handle the storage of many EAs.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tiger Yang [Mon, 18 Aug 2008 09:11:00 +0000 (17:11 +0800)]
ocfs2: Add extended attribute support
This patch implements storing extended attributes both in inode or a single
external block. We only store EA's in-inode when blocksize > 512 or that
inode block has free space for it. When an EA's value is larger than 80
bytes, we will store the value via b-tree outside inode or block.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tiger Yang [Mon, 18 Aug 2008 09:08:55 +0000 (17:08 +0800)]
ocfs2: reserve inline space for extended attribute
Add the structures and helper functions we want for handling inline extended
attributes. We also update the inline-data handlers so that they properly
function in the event that we have both inline data and inline attributes
sharing an inode block.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:48 +0000 (17:38 +0800)]
ocfs2: Add extent tree operation for xattr value btrees
Add some thin wrappers around ocfs2_insert_extent() for each of the 3
different btree types, ocfs2_inode_insert_extent(),
ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The
last is for the xattr index btree, which will be used in a followup patch.
All the old callers in file.c etc will call ocfs2_dinode_insert_extent(),
while the other two handle the xattr issue. And the init of extent tree are
handled by these functions.
When storing xattr value which is too large, we will allocate some clusters
for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In
order to re-use the b-tree operation code, a new parameter named "private"
is added into ocfs2_extent_tree and it is used to indicate the root of
ocfs2_exent_list. The reason is that we can't deduce the root from the
buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse,
in any place in an ocfs2_xattr_bucket.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (66 commits)
ata: Add documentation for hard disk shock protection interface (v3)
ide: Implement disk shock protection support (v4)
ide-cd: fix printk format warning
piix: add Hercules EC-900 mini-notebook to ich_laptop short cable list
ide-atapi: assign taskfile flags per device type
ide-cd: move cdrom_info.dma to ide_drive_t.dma
ide: add ide_drive_t.dma flag
ide-cd: add a debug_mask module parameter
ide-cd: convert driver to new ide debugging macro (v3)
ide: move SFF DMA code to ide-dma-sff.c
ide: cleanup ide-dma.c
ide: cleanup ide_build_dmatable()
ide: remove needless includes from ide-dma.c
ide: switch to DMA-mapping API part #2
ide: make ide_dma_timeout() available also for CONFIG_BLK_DEV_IDEDMA_SFF=n
ide: make ide_dma_lost_irq() available also for CONFIG_BLK_DEV_IDEDMA_SFF=n
ide: __ide_dma_end() -> ide_dma_end()
pmac: remove needless pmac_ide_destroy_dmatable() wrapper
pmac: remove superfluous pmif == NULL checks
ide: Two fixes regarding memory allocation
...
Linus Torvalds [Mon, 13 Oct 2008 21:03:59 +0000 (14:03 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (313 commits)
V4L/DVB (9186): Added support for Prof 7300 DVB-S/S2 cards
V4L/DVB (9185): S2API: Ensure we have a reasonable ROLLOFF default
V4L/DVB (9184): cx24116: Change the default SNR units back to percentage by default.
V4L/DVB (9183): S2API: Return error of the caller provides 0 commands.
V4L/DVB (9182): S2API: Added support for DTV_HIERARCHY
V4L/DVB (9181): S2API: Add support fot DTV_GUARD_INTERVAL and DTV_TRANSMISSION_MODE
V4L/DVB (9180): S2API: Added support for DTV_CODE_RATE_HP/LP
V4L/DVB (9179): S2API: frontend.h cleanup
V4L/DVB (9178): cx24116: Add module parameter to return SNR as ESNO.
V4L/DVB (9177): S2API: Change _8PSK / _16APSK to PSK_8 and APSK_16
V4L/DVB (9176): Add support for DvbWorld USB cards with STV0288 demodulator.
V4L/DVB (9175): Remove NULL pointer in stb6000 driver.
V4L/DVB (9174): Allow custom inittab for ST STV0288 demodulator.
V4L/DVB (9173): S2API: Remove the hardcoded command limit during validation
V4L/DVB (9172): S2API: Bugfix related to DVB-S / DVB-S2 tuning for the legacy API.
V4L/DVB (9171): S2API: Stop an OOPS if illegal commands are dumped in S2API.
V4L/DVB (9170): cx24116: Sanity checking to data input via S2API to the cx24116 demod.
V4L/DVB (9169): uvcvideo: Support two new Bison Electronics webcams.
V4L/DVB (9168): Add support for MSI TV@nywhere Plus remote
V4L/DVB: v4l2-dev: remove duplicated #include
...
Tao Ma [Mon, 18 Aug 2008 09:38:47 +0000 (17:38 +0800)]
ocfs2: Add helper function in uptodate.c for removing xattr clusters
The old uptodate only handles the issue of removing one buffer_head from
ocfs2 inode's buffer cache. With xattr clusters, we may need to remove
multiple buffer_head's at a time.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:46 +0000 (17:38 +0800)]
ocfs2: Add the basic xattr disk layout in ocfs2_fs.h
Ocfs2 uses a very flexible structure for storing extended attributes on
disk. Small amount of attributes are stored directly in the inode block - up
to 256 bytes worth. If that fills up, attributes are also stored in an
external block, linked to from the inode block. That block can in turn
expand to a btree, capable of storing large numbers of attributes.
Individual attribute values are stored inline if they're small enough
(currently about 80 bytes, this can be changed though), and otherwise are
expanded to a btree. The theoretical limit to the size of an individual
attribute is about the same as an inode, though the kernel's upper bound on
the size of an attributes data is far smaller.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:45 +0000 (17:38 +0800)]
ocfs2: Make high level btree extend code generic
Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic
function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls
ocfs2_do_cluster_allocation() now, but the latter can be used for other
btree types as well.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:44 +0000 (17:38 +0800)]
ocfs2: Abstract ocfs2_extent_tree in b-tree operations.
In the old extent tree operation, we take the hypothesis that we
are using the ocfs2_extent_list in ocfs2_dinode as the tree root.
As xattr will also use ocfs2_extent_list to store large value
for a xattr entry, we refactor the tree operation so that xattr
can use it directly.
The refactoring includes 4 steps:
1. Abstract set/get of last_eb_blk and update_clusters since they may
be stored in different location for dinode and xattr.
2. Add a new structure named ocfs2_extent_tree to indicate the
extent tree the operation will work on.
3. Remove all the use of fe_bh and di, use root_bh and root_el in
extent tree instead. So now all the fe_bh is replaced with
et->root_bh, el with root_el accordingly.
4. Make ocfs2_lock_allocators generic. Now it is limited to be only used
in file extend allocation. But the whole function is useful when we want
to store large EAs.
Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used
for anything other than truncate inode data btrees.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:43 +0000 (17:38 +0800)]
ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode.
ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
they are all limited to an inode btree because they use a struct
ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
(the part of an ocfs2_dinode they actually use) so that the xattr btree code
can use these functions.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 18 Aug 2008 09:38:42 +0000 (17:38 +0800)]
ocfs2: Modify ocfs2_num_free_extents for future xattr usage.
ocfs2_num_free_extents() is used to find the number of free extent records
in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to
use this for extended attribute trees in the future, so genericize the
interface the take a buffer head. A future patch will allow that buffer_head
to contain any structure rooting an ocfs2 btree.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Mark Fasheh [Wed, 30 Jul 2008 01:29:18 +0000 (18:29 -0700)]
ocfs2: track local alloc state via debugfs
A per-mount debugfs file, "local_alloc" is created which when read will
expose live state of the nodes local alloc file. Performance impact is
minimal, only a bit of memory overhead per mount point. Still, the code is
hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug
local alloc performance problems on a live system.
Mark Fasheh [Tue, 29 Jul 2008 01:02:53 +0000 (18:02 -0700)]
ocfs2: throttle back local alloc when low on disk space
Ocfs2's local allocator disables itself for the duration of a mount point
when it has trouble allocating a large enough area from the primary bitmap.
That can cause performance problems, especially for disks which were only
temporarily full or fragmented. This patch allows for the allocator to
shrink it's window first, before being disabled. Later, it can also be
re-enabled so that any performance drop is minimized.
To do this, we allow the value of osb->local_alloc_bits to be shrunk when
needed. The default value is recorded in a mostly read-only variable so that
we can re-initialize when required.
Locking had to be updated so that we could protect changes to
local_alloc_bits. Mostly this involves protecting various local alloc values
with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
is used when the local allocator is has shrunk, but is not disabled. If the
available space dips below 1 megabyte, the local alloc file is disabled. In
either case, local alloc is re-enabled 30 seconds after the event, or when
an appropriate amount of bits is seen in the primary bitmap.
Mark Fasheh [Mon, 28 Jul 2008 21:55:20 +0000 (14:55 -0700)]
ocfs2: Track local alloc bits internally
Do this instead of tracking absolute local alloc size. This avoids
needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
the value is now in a more natural unit for internal file system bitmap
work.