Herbert Xu [Thu, 8 Jan 2009 18:40:57 +0000 (10:40 -0800)]
ipv6: Add GRO support
This patch adds GRO support for IPv6. IPv6 GRO supports extension
headers in the same way as GSO (by using the same infrastructure).
It's also simpler compared to IPv4 since we no longer have to worry
about fragmentation attributes or header checksums.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: David Vrabel <david.vrabel@csr.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to net_device_ops and internal net_device_stats
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to net_device_ops and use internal net_device_stats in bnep
device.
Note: no need for bnep_net_ioctl since if ioctl is not set, then
dev_ifsioc handles it by returning -EOPNOTSUPP
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Use the network_device_stats field in network_device.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Li Yang [Tue, 6 Jan 2009 22:08:10 +0000 (14:08 -0800)]
gianfar: ensure ECNTRL[R100] is cleared on link state change
When changing the link between 100Mbps and 1Gbps in SGMII mode it was
found out that the link would stop working. The issue is that ECNTRL[R100]
needs to be cleared when in 1Gbps mode. Older reference manuals didn't
require the explicitly clearing but has since been found it that it is
needed.
Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Hartkopp [Tue, 6 Jan 2009 19:07:54 +0000 (11:07 -0800)]
can: omit unneeded skb_clone() calls
The AF_CAN core delivered always cloned sk_buffs to the AF_CAN
protocols, although this was _only_ needed by the can-raw protocol.
With this (additionally documented) change, the AF_CAN core calls the
callback functions of the registered AF_CAN protocols with the original
(uncloned) sk_buff pointer and let's the can-raw protocol do the
skb_clone() itself which omits all unneeded skb_clone() calls for other
AF_CAN protocols.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Wu Fengguang [Tue, 6 Jan 2009 18:56:07 +0000 (10:56 -0800)]
dm9601: bring datasheet URL up to date
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Wu Fengguang [Tue, 6 Jan 2009 18:55:10 +0000 (10:55 -0800)]
dm9601: handle corrupt mac address
Some cheap devices ship with dangling EEPROM pins!
They always return invalid address ff:ff:ff:ff:ff:ff.
Inherit the auto-generated address in this case,
so that these products can work with zero configuration.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 6 Jan 2009 18:50:09 +0000 (10:50 -0800)]
vlan: Add GRO interfaces
This patch adds GRO interfaces for hardware-assisted VLAN reception.
With this in place we're now at parity with LRO as far as the
interface is concerned. That is, you can now take any LRO driver
and convert it over to GRO.
As the CB memory clashes with GRO's use of CB, I've removed it
entirely by storing dev in skb->dev. This is OK because VLAN
gets called first thing in netif_receive_skb and skb->dev is
not used in between us calling netif_rx and netif_receive_skb
getting called.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 6 Jan 2009 18:49:34 +0000 (10:49 -0800)]
gro: Add internal interfaces for VLAN
Previously GRO's only entry point from the outside is through
napi_gro_receive and napi_gro_frags. These interfaces are for
device drivers.
This patch rearranges things to provide a new set of interfaces
for VLANs. These interfaces are for internal use only. The
VLAN code itself can then provide a set of entry points for
device drivers.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
These variables are only used with an interface that just dumps their
values into registers to be passed to the hypervisor. The arguments
to that interface are declared to be "unsigned long", so make these
variables match. The macros are only used with these variables, so make
them match as well.
This code is currently only built for 64bit powerpc, so the transformation
is really a noop. If the interface was ever ported to 32 bit, it would
almost certainly still use registers to pass the parameters and so
"unsigned long" would still be appropriate.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
ehea_plpar_hcall9() takes an "unsigned long" array to return its results,
so change the arrays we pass to it to match. This is currently only
64 bit code, so the transformation is actually a noop, but because
ehea_plpar_hcall9() copies the values of registers into the array,
if this was ported to a 32 bit hypervisor interface "unsigned long"
would probably still be the correct type.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Convert IRDA drivers to use already existing net_device_stats structure
in network device. This is a pre-cursor to conversion to net_device
ops. Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Supply dm_add_exception as a callback to the read_metadata function.
Add a status function ready for a later patch and name the functions
consistently.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
dm snapshot: split out exception store implementations
Move the existing snapshot exception store implementations out into
separate files. Later patches will place these behind a new
interface in preparation for alternative implementations.
dm snapshot: separate out exception store interface
Pull structures that bridge the gap between snapshot and
exception store out of dm-snap.h and put them in a new
.h file - dm-exception-store.h. This file will define the
API for new exception stores.
Ultimately, dm-snap.h is unnecessary, since only dm-snap.c
should be using it.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The same workqueue is used both for sending uevents and processing queued I/O.
Deadlock has been reported in RHEL5 when sending a uevent was blocked waiting
for the queued I/O to be processed. Use scheduled_work() for the asynchronous
uevents instead.
Milan Broz [Tue, 6 Jan 2009 03:05:12 +0000 (03:05 +0000)]
dm: add name and uuid to sysfs
Implement simple read-only sysfs entry for device-mapper block device.
This patch adds a simple sysfs directory named "dm" under block device
properties and implements
- name attribute (string containing mapped device name)
- uuid attribute (string containing UUID, or empty string if not set)
The kobject is embedded in mapped_device struct, so no additional
memory allocation is needed for initializing sysfs entry.
During the processing of sysfs attribute we need to lock mapped device
which is done by a new function dm_get_from_kobj, which returns the md
associated with kobject and increases the usage count.
Each 'show attribute' function is responsible for its own locking.
Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 6 Jan 2009 03:05:10 +0000 (03:05 +0000)]
dm table: rework reference counting
Rework table reference counting.
The existing code uses a reference counter. When the last reference is
dropped and the counter reaches zero, the table destructor is called.
Table reference counters are acquired/released from upcalls from other
kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
If the reference counter reaches zero in one of the upcalls, the table
destructor is called from almost random kernel code.
This leads to various problems:
* dm_any_congested being called under a spinlock, which calls the
destructor, which calls some sleeping function.
* the destructor attempting to take a lock that is already taken by the
same process.
* stale reference from some other kernel code keeps the table
constructed, which keeps some devices open, even after successful
return from "dmsetup remove". This can confuse lvm and prevent closing
of underlying devices or reusing device minor numbers.
The patch changes reference counting so that the table destructor can be
called only at predetermined places.
The table has always exactly one reference from either mapped_device->map
or hash_cell->new_map. After this patch, this reference is not counted
in table->holders. A pair of dm_create_table/dm_destroy_table functions
is used for table creation/destruction.
Temporary references from the other code increase table->holders. A pair
of dm_table_get/dm_table_put functions is used to manipulate it.
When the table is about to be destroyed, we wait for table->holders to
reach 0. Then, we call the table destructor. We use active waiting with
msleep(1), because the situation happens rarely (to one user in 5 years)
and removing the device isn't performance-critical task: the user doesn't
care if it takes one tick more or not.
This way, the destructor is called only at specific points
(dm_table_destroy function) and the above problems associated with lazy
destruction can't happen.
Finally remove the temporary protection added to dm_any_congested().
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Andi Kleen [Tue, 6 Jan 2009 03:05:09 +0000 (03:05 +0000)]
dm: support barriers on simple devices
Implement barrier support for single device DM devices
This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.
NB. Any DM device might cease to support barriers if it gets
reconfigured so code must continue to allow for a possible
-EOPNOTSUPP on every barrier bio submitted. - agk
Kiyoshi Ueda [Tue, 6 Jan 2009 03:05:07 +0000 (03:05 +0000)]
dm request: extend target interface
This patch adds the following target interfaces for request-based dm.
map_rq : for mapping a request
rq_end_io : for finishing a request
busy : for avoiding performance regression from bio-based dm.
Target can tell dm core not to map requests now, and
that may help requests in the block layer queue to be
bigger by I/O merging.
In bio-based dm, this behavior is done by device
drivers managing the block layer queue.
But in request-based dm, dm core has to do that
since dm core manages the block layer queue.
Takahiro Yasui [Tue, 6 Jan 2009 03:04:59 +0000 (03:04 +0000)]
dm log: avoid reinitialising io_req on every operation
rw_header function updates three members of io_req data every time
when I/O is processed. bi_rw and notify.fn are never modified once
they get initialized, and so they can be set in advance.
header_to_disk() can also be pulled out of write_header() since only one
caller needs it and write_header() can be replaced by rw_header()
directly.
Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Change dm_unregister_target to return void and use BUG() for error
reporting.
dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.
This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.
This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Always increase the error count when I/O on a leg of a mirror fails.
The error count is used to decide whether to select an alternative
mirror leg. If the target doesn't use the "handle_errors" feature, the
error count is not updated and the bio can get requeued forever by the
read callback.
Fix it by increasing error_count before the handle_errors feature
checking.
Cc: stable@kernel.org Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Takahiro Yasui [Tue, 6 Jan 2009 03:04:56 +0000 (03:04 +0000)]
dm log: fix dm_io_client leak on error paths
In create_log_context function, dm_io_client_destroy function needs
to be called, when memory allocation of disk_header, sync_bits and
recovering_bits failed, but dm_io_client_destroy is not called.
Cc: stable@kernel.org Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Acked-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 6 Jan 2009 03:04:54 +0000 (03:04 +0000)]
dm snapshot: change yield to msleep
Change yield() to msleep(1). If the thread had realtime priority,
yield() doesn't really yield, so the yielding process would loop
indefinitely and cause machine lockup.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 6 Jan 2009 03:04:53 +0000 (03:04 +0000)]
dm table: drop reference at unbind
Move one dm_table_put() so that the last reference in the thread
gets dropped in __unbind().
This is required for a following patch,
dm-table-rework-reference-counting.patch, which will change the logic in
such a way that table destructor is called only at specific points in
the code.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
module: convert to stop_machine_create/destroy.
stop_machine: introduce stop_machine_create/destroy.
parisc: fix module loading failure of large kernel modules
module: fix module loading failure of large kernel modules for parisc
module: fix warning of unused function when !CONFIG_PROC_FS
kernel/module.c: compare symbol values when marking symbols as exported in /proc/kallsyms.
remove CONFIG_KMOD
Linus Torvalds [Tue, 6 Jan 2009 03:03:11 +0000 (19:03 -0800)]
Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
swiotlb: Don't include linux/swiotlb.h twice in lib/swiotlb.c
intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
swiotlb: add missing __init annotations
Linus Torvalds [Tue, 6 Jan 2009 02:58:06 +0000 (18:58 -0800)]
Merge branch 'i2c-next' of git://aeryn.fluff.org.uk/bjdooks/linux
* 'i2c-next' of git://aeryn.fluff.org.uk/bjdooks/linux:
i2c-omap: fix type of irq handler function
i2c-s3c2410: Change IRQ to be plain integer.
i2c-s3c2410: Allow more than one i2c-s3c2410 adapter
i2c-s3c2410: Remove default platform data.
i2c-s3c2410: Use platform data for gpio configuration
i2c-s3c2410: Fixup style problems from checkpatch.pl
i2c-omap: Enable I2C wakeups for 34xx
i2c-omap: reprogram OCP_SYSCONFIG register after reset
i2c-omap: convert 'rev1' flag to generic 'rev' u8
i2c-omap: fix I2C timeouts due to recursive omap_i2c_{un,}idle()
i2c-omap: Clean-up i2c-omap
i2c-omap: Don't compile in OMAP15xx I2C ISR for non-OMAP15xx builds
i2c-omap: Mark init-only functions as __init
i2c-omap: Add support for omap34xx
i2c-omap: FIFO handling support and broken hw workaround for i2c-omap
i2c-omap: Add high-speed support to omap-i2c
i2c-omap: Close suspected race between omap_i2c_idle() and omap_i2c_isr()
i2c-omap: Do not use interruptible wait call in omap_i2c_xfer_msg
Fix up apparently-trivial conflict in drivers/i2c/busses/i2c-s3c2410.c
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits)
GFS2: Use DEFINE_SPINLOCK
GFS2: Fix use-after-free bug on umount (try #2)
Revert "GFS2: Fix use-after-free bug on umount"
GFS2: Streamline alloc calculations for writes
GFS2: Send useful information with uevent messages
GFS2: Fix use-after-free bug on umount
GFS2: Remove ancient, unused code
GFS2: Move four functions from super.c
GFS2: Fix bug in gfs2_lock_fs_check_clean()
GFS2: Send some sensible sysfs stuff
GFS2: Kill two daemons with one patch
GFS2: Move gfs2_recoverd into recovery.c
GFS2: Fix "truncate in progress" hang
GFS2: Clean up & move gfs2_quotad
GFS2: Add more detail to debugfs glock dumps
GFS2: Banish struct gfs2_rgrpd_host
GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd
GFS2: Move rg_igeneration into struct gfs2_rgrpd
GFS2: Banish struct gfs2_dinode_host
GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize
...
Linus Torvalds [Tue, 6 Jan 2009 02:47:12 +0000 (18:47 -0800)]
igb: fix anoying type mismatch warning on rx/tx queue sizing
When using "min()", the types of both sides should match. With the cpu
mask changes, the type of num_online_cpus() will now depend on config
options. Use "min_t()" with an explicit type instead.
And make the rx/tx case look the same too, just for sanity.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits)
qlge: Fix sparse warnings for tx ring indexes.
qlge: Fix sparse warning regarding rx buffer queues.
qlge: Fix sparse endian warning in ql_hw_csum_setup().
qlge: Fix sparse endian warning for inbound packet control block flags.
qlge: Fix sparse warnings for byte swapping in qlge_ethool.c
myri10ge: print MAC and serial number on probe failure
pkt_sched: cls_u32: Fix locking in u32_change()
iucv: fix cpu hotplug
af_iucv: Free iucv path/socket in path_pending callback
af_iucv: avoid left over IUCV connections from failing connects
af_iucv: New error return codes for connect()
net/ehea: bitops work on unsigned longs
Revert "net: Fix for initial link state in 2.6.28"
tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
tcp: don't mask EOF and socket errors on nonblocking splice receive
dccp: Integrate the TFRC library with DCCP
dccp: Clean up ccid.c after integration of CCID plugins
dccp: Lockless integration of CCID congestion-control plugins
qeth: get rid of extra argument after printk to dev_* conversion
qeth: No large send using EDDP for HiperSockets.
...
Linus Torvalds [Tue, 6 Jan 2009 02:34:12 +0000 (18:34 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: ice1724 - Fix a typo in IEC958 PCM name
ASoC: fix davinci-sffsdr buglet
ALSA: sound/usb: Use negated usb_endpoint_xfer_control, etc
ALSA: hda - cxt5051 report jack state
ALSA: hda - add basic jack reporting functions to patch_conexant.c
ALSA: Use usb_set/get_intfdata
ASoC: Clean up kerneldoc warnings
ASoC: Fix pxa2xx-pcm checks for invalid DMA channels
LSA: hda - Add HP Acacia detection
ALSA: hda - fix name for ALC1200
ALSA: sound/usb: use USB API functions rather than constants
ASoC: TWL4030: DAPM based capture implementation
ASoC: TWL4030: Make the enum filter generic for twl4030
Linus Torvalds [Tue, 6 Jan 2009 02:33:38 +0000 (18:33 -0800)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Fix on resume, now preserves user policy min/max.
[CPUFREQ] Add Celeron Core support to p4-clockmod.
[CPUFREQ] add to speedstep-lib additional fsb values for core processors
[CPUFREQ] Disable sysfs ui for p4-clockmod.
[CPUFREQ] p4-clockmod: reduce noise
[CPUFREQ] clean up speedstep-centrino and reduce cpumask_t usage
Linus Torvalds [Tue, 6 Jan 2009 02:32:43 +0000 (18:32 -0800)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (138 commits)
ocfs2: Access the right buffer_head in ocfs2_merge_rec_left.
ocfs2: use min_t in ocfs2_quota_read()
ocfs2: remove unneeded lvb casts
ocfs2: Add xattr support checking in init_security
ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle
ocfs2: calculate and reserve credits for xattr value in mknod
ocfs2/xattr: fix credits calculation during index create
ocfs2/xattr: Always updating ctime during xattr set.
ocfs2/xattr: Remove extend_trans call and add its credits from the beginning
ocfs2/dlm: Fix race during lockres mastery
ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list
ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating
ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler()
ocfs2/dlm: Fix a race between migrate request and exit domain
ocfs2: One more hamming code optimization.
ocfs2: Another hamming code optimization.
ocfs2: Don't hand-code xor in ocfs2_hamming_encode().
ocfs2: Enable metadata checksums.
ocfs2: Validate superblock with checksum and ecc.
ocfs2: Checksum and ECC for directory blocks.
...
Linus Torvalds [Tue, 6 Jan 2009 02:32:06 +0000 (18:32 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
inotify: fix type errors in interfaces
fix breakage in reiserfs_new_inode()
fix the treatment of jfs special inodes
vfs: remove duplicate code in get_fs_type()
add a vfs_fsync helper
sys_execve and sys_uselib do not call into fsnotify
zero i_uid/i_gid on inode allocation
inode->i_op is never NULL
ntfs: don't NULL i_op
isofs check for NULL ->i_op in root directory is dead code
affs: do not zero ->i_op
kill suid bit only for regular files
vfs: lseek(fd, 0, SEEK_CUR) race condition
Nick Piggin [Tue, 6 Jan 2009 02:05:50 +0000 (03:05 +0100)]
mm lockless pagecache barrier fix
An XFS workload showed up a bug in the lockless pagecache patch. Basically it
would go into an "infinite" loop, although it would sometimes be able to break
out of the loop! The reason is a missing compiler barrier in the "increment
reference count unless it was zero" case of the lockless pagecache protocol in
the gang lookup functions.
This would cause the compiler to use a cached value of struct page pointer to
retry the operation with, rather than reload it. So the page might have been
removed from pagecache and freed (refcount==0) but the lookup would not correctly
notice the page is no longer in pagecache, and keep attempting to increment the
refcount and failing, until the page gets reallocated for something else. This
isn't a data corruption because the condition will be detected if the page has
been reallocated. However it can result in a lockup.
Linus points out that ACCESS_ONCE is also required in that pointer load, even
if it's absence is not causing a bug on our particular build. The most general
way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.
Assembly of find_get_pages,
before:
.L220:
movq (%rbx), %rax #* ivtmp.1162, tmp82
movq (%rax), %rdi #, prephitmp.1149
.L218:
testb $1, %dil #, prephitmp.1149
jne .L217 #,
testq %rdi, %rdi # prephitmp.1149
je .L203 #,
cmpq $-1, %rdi #, prephitmp.1149
je .L217 #,
movl 8(%rdi), %esi # <variable>._count.counter, c
testl %esi, %esi # c
je .L218 #,
after:
.L212:
movq (%rbx), %rax #* ivtmp.1109, tmp81
movq (%rax), %rdi #, ret
testb $1, %dil #, ret
jne .L211 #,
testq %rdi, %rdi # ret
je .L197 #,
cmpq $-1, %rdi #, ret
je .L211 #,
movl 8(%rdi), %esi # <variable>._count.counter, c
testl %esi, %esi # c
je .L212 #,
(notice the obvious infinite loop in the first example, if page->count remains 0)
Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ron Mercer [Tue, 6 Jan 2009 02:19:59 +0000 (18:19 -0800)]
qlge: Fix sparse warnings for tx ring indexes.
Warnings:
drivers/net/qlge/qlge_main.c:1474:34: warning: restricted degrades to integer
drivers/net/qlge/qlge_main.c:1475:36: warning: restricted degrades to integer
drivers/net/qlge/qlge_main.c:1592:51: warning: restricted degrades to integer
drivers/net/qlge/qlge_main.c:1941:20: warning: incorrect type in assignment (different base types)
drivers/net/qlge/qlge_main.c:1941:20: expected restricted unsigned int [usertype] tid
drivers/net/qlge/qlge_main.c:1941:20: got int [signed] index
drivers/net/qlge/qlge_main.c:1945:24: warning: incorrect type in assignment (different base types)
drivers/net/qlge/qlge_main.c:1945:24: expected restricted unsigned int [usertype] txq_idx
drivers/net/qlge/qlge_main.c:1945:24: got unsigned int [unsigned] [usertype] tx_ring_idx
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>