pilppa.com Git - linux-2.6-omap-h63xx.git/log

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:

drivers/isdn/i4l/isdn_net.c
fs/cifs/connect.c

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: hold extra reference to bio in blk_rq_map_user_iov()
  relay: fix cpu offline problem
  Release old elevator on change elevator
  block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash
  block/md: fix md autodetection
  block: make add_partition() return pointer to hd_struct
  block: fix add_partition() error path

suspend: use WARN not WARN_ON to print the message

By using WARN(), kerneloops.org can collect which component is causing
the delay and make statistics about that. suspend_test_finish() is
currently the number 2 item but unless we can collect who's causing
it we're not going to be able to fix the hot topic ones..

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kernel/profile.c: fix section mismatch warning
  function tracing: fix wrong pos computing when read buffer has been fulfilled
  tracing: fix mmiotrace resizing crash
  ring-buffer: no preempt for sched_clock()
  ring-buffer: buffer record on/off switch

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpuset: fix regression when failed to generate sched domains
  sched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers
  sched: fix kernel warning on /proc/sched_debug access
  sched: correct sched-rt-group.txt pathname in init/Kconfig

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
swiotlb: use coherent_dma_mask in alloc_coherent
MAINTAINERS: remove me as RAID maintainer

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
  Blackfin arch: fix a broken define in dma-mapping
  Blackfin arch: fix bug - Turn on DEBUG_DOUBLEFAULT, booting SMP kernel crash
  Blackfin arch: fix bug - shared lib function in L2 failed be called
  Blackfin arch: fix incorrect limit check for bf54x check_gpio
  Blackfin arch: fix bug - Cpufreq assumes clocks in kHz and not Hz.
  Blackfin arch: dont warn when running a kernel on the oldest supported silicon
  Blackfin arch: fix bug - kernel build with write back policy fails to be booted up
  Blackfin arch: fix bug - dmacopy test case fail on all platform
  Blackfin arch: Fix typo when adding CONFIG_DEBUG_VERBOSE
  Blackfin arch: don't copy bss when copying L1
  Blackfin arch: fix bug - Fail to boot jffs2 kernel for BF561 with SMP patch
  Blackfin arch: handle case of d_path() returning error in decode_address()

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix resume of GPIO unsol event for STAC/IDT
  ALSA: hda - Add quirks for HP Pavilion DV models
  ALSA: hda - Fix GPIO initialization in patch_stac92hd71bxx()
  ALSA: hda - Check model type instead of SSID in patch_92hd71bxx()
  ALSA: sound/pci/pcxhr/pcxhr.c: introduce missing kfree and pci_disable_device
  ALSA: hda: STAC_VREF_EVENT value change
  ALSA: hda - Missing NULL check in hda_beep.c
  ALSA: hda - Add digital beep playback switch for STAC/IDT codecs

block: hold extra reference to bio in blk_rq_map_user_iov()

If the size passed in is OK but we end up mapping too many segments,
we call the unmap path directly like from IO completion. But from IO
completion we have an extra reference to the bio, so this error case
goes OOPS when it attempts to free and already free bio.

Fix it by getting an extra reference to the bio before calling the
unmap failure case.

Reported-by: Petr Vandrovec <vandrove@vc.cvut.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

relay: fix cpu offline problem

relay_open() will close allocated buffers when failed.
but if cpu offlined, some buffer will not be closed.
this patch fixed it.

and did cleanup for relay_reset() too.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

Release old elevator on change elevator

We should release old elevator when change to use a new one.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash

We run into system boot failure with kernel 2.6.28-rc. We found it on a
couple of machines, including T61 notebook, nehalem machine, and another
HPC NX6325 notebook.  All the machines use FedoraCore 8 or FedoraCore 9.
With kernel prior to 2.6.28-rc, system boot doesn't fail.

I debug it and locate the root cause. Pls. see
http://bugzilla.kernel.org/show_bug.cgi?id=11899
https://bugzilla.redhat.com/show_bug.cgi?id=471517

As a matter of fact, there are 2 bugs.

1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5 times
and fails once. nash has a bug. Some of its functions misuse return
value 0.  Sometimes, 0 means timeout and no uevent available. Sometimes,
0 means nash gets an uevent, but the uevent isn't block-related (for
exmaple, usb). If by coincidence, kernel tells nash that uevents are
available, but kernel also set timeout, nash might stops collecting
other uevents in queue if current uevent isn't block-related.  I work
out a patch for nash to fix it.
http://bugzilla.kernel.org/attachment.cgi?id=18858

2) root=LABEL=/, system always can't boot. initrd init reports
switchroot fails. Here is an executation branch of nash when booting:
    (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop)
    (2) nash query /proc/devices with the major number; It found line
"8 sd";
    (3) nash use 'sd' to search its own probe table to find device (DISK)
type for the device and add it to its own list;
    (4) Later on, it probes all devices in its list to get filesystem
labels; scsi register "8 sd" always.

When major is 259, nash fails to find the device(DISK) type. I enables
CONFIG_DEBUG_BLOCK_EXT_DEVT=y when compiling kernel, so 259 is picked up
for device /dev/sda1, which causes nash to fail to find device (DISK)
type.

To fixing issue 2), I create a patch for nash and another patch for
kernel.

http://bugzilla.kernel.org/attachment.cgi?id=18859
http://bugzilla.kernel.org/attachment.cgi?id=18837

Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new
block device in proc/devices.

With 2 patches on nash and 1 patch on kernel, I boot my machines for
dozens of times without failure.

Signed-off-by Zhang Yanmin <yanmin.zhang@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block/md: fix md autodetection

Block ext devt conversion missed md_autodetect_dev() call in
rescan_partitions() leaving md autodetect unable to see partitions.
Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: make add_partition() return pointer to hd_struct

Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure. This change will be used to fix md
autodetection bug.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: fix add_partition() error path

Partition stats structure was not freed on devt allocation failure
path. Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

Merge branches 'topic/fix/hda' and 'topic/fix/misc' into for-linus

ALSA: hda - Fix resume of GPIO unsol event for STAC/IDT

Use cached write for setting the GPIO unsolicited event mask to be
restored properly at resume.

Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Add quirks for HP Pavilion DV models

Added the quirk entries for HP Pavilion DV5 and DV7 with model=hp-m4.

Reference: Novell bnc#445321, bnc#445161
https://bugzilla.novell.com/show_bug.cgi?id=445321
https://bugzilla.novell.com/show_bug.cgi?id=445161

Signed-off-by: Takashi Iwai <tiwai@suse.de>

Blackfin arch: fix a broken define in dma-mapping

dma_mapping_error is an actual function, so fix broken define with a
real inline stub

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

Blackfin arch: fix bug - Turn on DEBUG_DOUBLEFAULT, booting SMP kernel crash

Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>

ALSA: hda - Fix GPIO initialization in patch_stac92hd71bxx()

Fixed the GPIO mask and co initialization in patch_stac92hd71bxx()
so that the gpio_maks for HP_M4 model is set properly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>

kernel/profile.c: fix section mismatch warning

Impact: fix section mismatch warning in kernel/profile.c

Here, profile_nop function has been called from a non-init function
create_hash_tables(void). Which generetes a section mismatch warning.
Previously, create_hash_tables(void) was a init function. So, removing
__init from create_hash_tables(void) requires profile_nop to be
non-init.

This patch makes profile_nop function inline and fixes the
following warning:

WARNING: vmlinux.o(.text+0x6ebb6): Section mismatch in reference from
the function create_hash_tables() to the function
.init.text:profile_nop()
The function create_hash_tables() references
the function __init profile_nop().
This is often because create_hash_tables lacks a __init
annotation or the annotation of profile_nop is wrong.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

cpuset: fix regression when failed to generate sched domains

Impact: properly rebuild sched-domains on kmalloc() failure

When cpuset failed to generate sched domains due to kmalloc()
failure, the scheduler should fallback to the single partition
'fallback_doms' and rebuild sched domains, but now it only
destroys but not rebuilds sched domains.

The regression was introduced by:

| commit dfb512ec4834116124da61d6c1ee10fd0aa32bd6
| Author: Max Krasnyansky <maxk@qualcomm.com>
| Date: Fri Aug 29 13:11:41 2008 -0700
|
| sched: arch_reinit_sched_domains() must destroy domains to force rebuild

After the above commit, partition_sched_domains(0, NULL, NULL) will
only destroy sched domains and partition_sched_domains(1, NULL, NULL)
will create the default sched domain.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  prevent cifs_writepages() from skipping unwritten pages
  Fixed parsing of mount options when doing DFS submount
  [CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch
  [CIFS] Fix build break
  cifs: reinstate sharing of tree connections
  [CIFS] minor cleanup to cifs_mount
  cifs: reinstate sharing of SMB sessions sans races
  cifs: disable sharing session and tcon and add new TCP sharing code
  [CIFS] clean up server protocol handling
  [CIFS] remove unused list, add new cifs sock list to prepare for mount/umount fix
  [CIFS] Fix cifs reconnection flags
  [CIFS] Can't rely on iov length and base when kernel_recvmsg returns error

prevent cifs_writepages() from skipping unwritten pages

Fixes a data corruption under heavy stress in which pages could be left
dirty after all open instances of a inode have been closed.

In order to write contiguous pages whenever possible, cifs_writepages()
asks pagevec_lookup_tag() for more pages than it may write at one time.
Normally, it then resets index just past the last page written before calling
pagevec_lookup_tag() again.

If cifs_writepages() can't write the first page returned, it wasn't resetting
index, and the next call to pagevec_lookup_tag() resulted in skipping all of
the pages it previously returned, even though cifs_writepages() did nothing
with them. This can result in data loss when the file descriptor is about
to be closed.

This patch ensures that index gets set back to the next returned page so
that none get skipped.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Shirish S Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>

Fixed parsing of mount options when doing DFS submount

Since these hit the same routines, and are relatively small, it is easier to review
them as one patch.

Fixed incorrect handling of the last option in some cases
Fixed prefixpath handling convert path_consumed into host depended string length (in bytes)
Use non default separator if it is provided in the original mount options

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>

Remove -mno-spe flags as they dont belong

For some unknown reason at Steven Rostedt added in disabling of the SPE
instruction generation for e500 based PPC cores in commit
6ec562328fda585be2d7f472cfac99d3b44d362a.

We are removing it because:

1. It generates e500 kernels that don't work
2. its not the correct set of flags to do this
3. we handle this in the arch/powerpc/Makefile already
4. its unknown in talking to Steven why he did this

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Tested-and-Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd

* 'for-linus' of git://git.o-hand.com/linux-mfd:
mfd: Correct WM8350 I2C return code usage
mfd: fix event masking for da9030

[CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch

set tcon->ses earlier

If the inital tree connect fails, we'll end up calling cifs_put_smb_ses
with a NULL pointer. Fix it by setting the tcon->ses earlier.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  rtc: rtc-sun4v fixes, revised
  sparc: Fix tty compile warnings.
  sparc: struct device - replace bus_id with dev_name(), dev_set_name()

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  rtnetlink: propagate error from dev_change_flags in do_setlink()
  isdn: remove extra byteswap in isdn_net_ciscohdlck_slarp_send_reply
  Phonet: refuse to send bigger than MTU packets
  e1000e: fix IPMI traffic
  e1000e: fix warn_on reload after phy_id error
  phy: fix phy address bug
  e100: fix dma error in direction for mapping
  igb: use dev_printk instead of printk
  qla3xxx: Cleanup: Fix link print statements.
  igb: Use device_set_wakeup_enable
  e1000: Use device_set_wakeup_enable
  e1000e: Use device_set_wakeup_enable
  via-velocity: enable perfect filtering for multicast packets
  phy: Add support for Marvell 88E1118 PHY
  mlx4_en: Pause parameters per port
  phylib: fix premature freeing of struct mii_bus
  atl1: Do not enumerate options unsupported by chip
  atl1e: fix broken multicast by removing unnecessary crc inversion
  gianfar: Fix DMA unmap invocations
  net/ucc_geth: Fix oops in uec_get_ethtool_stats()
  ...

sched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers

Impact: fix potential NULL dereference

Contrary to ad474caca3e2a0550b7ce0706527ad5ab389a4d4 changelog, other
acct_group_xxx() helpers can be called after exit_notify() by timer tick.
Thanks to Roland for pointing out this. Somehow I missed this simple fact
when I read the original patch, and I am afraid I confused Frank during
the discussion. Sorry.

Fortunately, these helpers work with current, we can check ->exit_state
to ensure that ->signal can't go away under us.

Also, add the comment and compiler barrier to account_group_exec_runtime(),
to make sure we load ->signal only once.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

net: sctp should update its inuse counter

This patch is a preparation to namespace conversion of /proc/net/protocols

In order to have relevant information for SCTP protocols, we should use
sock_prot_inuse_add() to update a (percpu and pernamespace) counter of
inuse sockets.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: af_unix should update its inuse counter

This patch is a preparation to namespace conversion of /proc/net/protocols

In order to have relevant information for UNIX protocol, we should use
sock_prot_inuse_add() to update a (percpu and pernamespace) counter of
inuse sockets.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

swiotlb: use coherent_dma_mask in alloc_coherent

Impact: fix DMA buffer allocation coherency bug in certain configs

This patch fixes swiotlb to use dev->coherent_dma_mask in
swiotlb_alloc_coherent().

coherent_dma_mask is a subset of dma_mask (equal to it most of
the time), enumerating the address range that a given device
is able to DMA to/from in a cache-coherent way.

But currently, swiotlb uses dev->dma_mask in alloc_coherent()
implicitly via address_needs_mapping(), but alloc_coherent is really
supposed to use coherent_dma_mask.

This bug could break drivers that uses smaller coherent_dma_mask than
dma_mask (though the current code works for the majority that use the
same mask for coherent_dma_mask and dma_mask).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

net: af_unix can make unix_nr_socks visbile in /proc

Currently, /proc/net/protocols displays socket counts only for TCP/TCPv6
protocols

We can provide unix_nr_socks for free here, this counter being
already maintained in af_unix

Before patch :

# grep UNIX /proc/net/protocols
UNIX 428 -1 -1 NI 0 yes kernel

After patch :

# grep UNIX /proc/net/protocols
UNIX 428 98 -1 NI 0 yes kernel

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: propagate error from dev_change_flags in do_setlink()

Unlike ifconfig, iproute doesn't report an error when setting
an interface up fails:

(example: put wireless network mac80211 interface into repeater mode
with iwconfig but do not set a peer MAC address, it should fail with
-ENOLINK)

without patch:
# ip link set wlan0 up ; echo $?
0
#

with patch:
# ip link set wlan0 up ; echo $?
RTNETLINK answers: Link has been severed
2
#

Propagate the return value from dev_change_flags() to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevice chelsio: Convert directly reference of netdev->priv

Several netdev share one adapter here.
We use netdev->ml_priv of the netdevs point to the first netdev's priv.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: remove extra byteswap in isdn_net_ciscohdlck_slarp_send_reply

commit a144ea4b7a13087081ab5402fa9ad0bcfd249e67 [IPV4]: annotate struct in_ifaddr

Missed this extra byteswap as the isdn inlines hide the htonl inside
put_u32 which causes an extra byteswap on little-endian arches.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ematch: simpler tcf_em_unregister()

Simply delete ops from list and let list debugging do the job.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Cleanup of af_unix

This is a pure cleanup of net/unix/af_unix.c to meet current code
style standards

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: Tidy up setsockopt calls

This splits the setsockopt calls into two groups, depending on whether an
integer argument (val) is required and whether routines being called do
their own locking.

Some options (such as setting the CCID) use u8 rather than int, so that for
these the test with regard to integer-sizeof can not be used.

The second switch-case statement now only has those statements which need
locking and which make use of `val'.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Eugene Teo <eugeneteo@kernel.sg>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: Deprecate Ack Ratio sysctl

This patch deprecates the Ack Ratio sysctl, since
* Ack Ratio is entirely ignored by CCID-3 and CCID-4,
* Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
* even if it would work in CCID-2, there is no point for a user to change it:
   - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
   - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
     (since waiting for Acks which will never arrive in this window),
   - cwnd is not a user-configurable value.

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
* Ack Ratio is resolved as a dependency of the selected CCID;
* if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
   the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
   Ratio 2 for both endpoints";
* what happens then is part of another patch set, since it concerns the
   dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: Feature negotiation for minimum-checksum-coverage

This provides feature negotiation for server minimum checksum coverage
which so far has been missing.

Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.

Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: Deprecate old setsockopt framework

The previous setsockopt interface, which passed socket options via struct
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.

This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation.
These are essentially wrappers around the internal __feat_register functions,
with checking added to avoid

* wrong usage (type);
* changing values while the connection is in progress.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: Mechanism to resolve CCID dependencies

This adds a hook to resolve features whose value depends on the choice of
CCID. It is done at the server since it can only be done after the CCID
values have been negotiated; i.e. the client will add its CCID preference
list on the Change options sent in the Request, which will be reconciled
with the local preference list of the server.

The concept is documented on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
implementation_notes.html#ccid_dependencies

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: VIRTIO_NET_F_MSG_RXBUF (imprive rcv buffer allocation)

If segmentation offload is enabled by the host, we currently allocate
maximum sized packet buffers and pass them to the host. This uses up
20 ring entries, allowing us to supply only 20 packet buffers to the
host with a 256 entry ring. This is a huge overhead when receiving
small packets, and is most keenly felt when receiving MTU sized
packets from off-host.

The VIRTIO_NET_F_MRG_RXBUF feature flag is set by hosts which support
using receive buffers which are smaller than the maximum packet size.
In order to transfer large packets to the guest, the host merges
together multiple receive buffers to form a larger logical buffer.
The number of merged buffers is returned to the guest via a field in
the virtio_net_hdr.

Make use of this support by supplying single page receive buffers to
the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of
the payload to the skb's linear data buffer and adjust the fragment
offset to point to the remaining data. This ensures proper alignment
and allows us to not use any paged data for small packets. If the
payload occupies multiple pages, we simply append those pages as
fragments and free the associated skbs.

This scheme allows us to be efficient in our use of ring entries
while still supporting large packets. Benchmarking using netperf from
an external machine to a guest over a 10Gb/s network shows a 100%
improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark
with GSO disabled on the host side, throughput was seen to increase
from 700Mb/s to 1.7Gb/s.

Based on a patch from Herbert Xu.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: hook up the set-tso ethtool op

Seems like an oversight that we have set-tx-csum and set-sg hooked
up, but not set-tso.

Also leads to the strange situation that if you e.g. disable tx-csum,
then tso doesn't get disabled.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: Recycle some more rx buffer pages

Each time we re-fill the recv queue with buffers, we allocate
one too many skbs and free it again when adding fails. We should
recycle the pages allocated in this case.

A previous version of this patch made trim_pages() trim trailing
unused pages from skbs with some paged data, but this actually
caused a barely measurable slowdown.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
Signed-off-by: David S. Miller <davem@davemloft.net>

[CIFS] Fix build break

Signed-off-by: Steve French <sfrench@us.ibm.com>

net: use %pF for /proc/net/ptype

Technically, patch changes format for modules, but I think nobody cares.

-86dd :ipv6:ipv6_rcv+0x0
+86dd ipv6_rcv+0x0/0x400 [ipv6]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: refuse to send bigger than MTU packets

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make sure struct dst_entry refcount is aligned on 64 bytes

As found in the past (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.

We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.

for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0

As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)

Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.

"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rcu: documents rculist_nulls

Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'
end-of-list can help in some RCU algos.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls

RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.

This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.

Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.

__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)

Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: Use hlist_nulls in UDP RCU code

This is a straightforward patch, using hlist_nulls infrastructure.

RCUification already done on UDP two weeks ago.

Using hlist_nulls permits us to avoid some memory barriers, both
at lookup time and delete time.

Patch is large because it adds new macros to include/net/sock.h.
These macros will be used by TCP & DCCP in next patch.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rcu: Introduce hlist_nulls variant of hlist

hlist uses NULL value to finish a chain.

hlist_nulls variant use the low order bit set to 1 to signal an end-of-list marker.

This allows to store many different end markers, so that some RCU lockless
algos (used in TCP/UDP stack for example) can save some memory barriers in
fast paths.

Two new files are added :

include/linux/list_nulls.h
  - mimics hlist part of include/linux/list.h, derived to hlist_nulls variant

include/linux/rculist_nulls.h
  - mimics hlist part of include/linux/rculist.h, derived to hlist_nulls variant

   Only four helpers are declared for the moment :

     hlist_nulls_del_init_rcu(), hlist_nulls_del_rcu(),
     hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry_rcu()

prefetches() were removed, since an end of list is not anymore NULL value.
prefetches() could trigger useless (and possibly dangerous) memory transactions.

Example of use (extracted from __udp4_lib_lookup())

struct sock *sk, *result;
        struct hlist_nulls_node *node;
        unsigned short hnum = ntohs(dport);
        unsigned int hash = udp_hashfn(net, hnum);
        struct udp_hslot *hslot = &udptable->hash[hash];
        int score, badness;

        rcu_read_lock();
begin:
        result = NULL;
        badness = -1;
        sk_nulls_for_each_rcu(sk, node, &hslot->head) {
                score = compute_score(sk, net, saddr, hnum, sport,
                                      daddr, dport, dif);
                if (score > badness) {
                        result = sk;
                        badness = score;
                }
        }
        /*
         * if the nulls value we got at the end of this lookup is
         * not the expected one, we must restart lookup.
         * We probably met an item that was moved to another chain.
         */
        if (get_nulls_value(node) != hash)
                goto begin;

        if (result) {
                if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
                        result = NULL;
                else if (unlikely(compute_score(result, net, saddr, hnum, sport,
                                  daddr, dport, dif) < badness)) {
                        sock_put(result);
                        goto begin;
                }
        }
        rcu_read_unlock();
        return result;

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>

TPROXY: implemented IP_RECVORIGDSTADDR socket option

In case UDP traffic is redirected to a local UDP socket,
the originally addressed destination address/port
cannot be recovered with the in-kernel tproxy.

This patch adds an IP_RECVORIGDSTADDR sockopt that enables
a IP_ORIGDSTADDR ancillary message in recvmsg(). This
ancillary message contains the original destination address/port
of the packet being received.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Fix ARP behavior with many mac-vlans

Ben Greear wrote:
> I have 500 mac-vlans on a system talking to 500 other
> mac-vlans.  My problem is that the arp-table gets extremely
> huge because every time an arp-request comes in on all mac-vlans,
> a stale arp entry is added for each mac-vlan.  I have filtering
> turned on, but that doesn't help because the neigh_event_ns call
> below will cause a stale neighbor entry to be created regardless
> of whether a replay will be sent or not.
> Maybe the neigh_event code should be below the checks for dont_send,
> and only create check neigh_event_ns if we are !dont_send?

The attached patch makes it work much better for me.  The patch
will cause the code to NOT create a stale neighbor entry if we
are not going to respond to the ARP request.  The old code
*would* create a stale entry even if we are not going to respond.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cifs: reinstate sharing of tree connections

Use a similar approach to the SMB session sharing. Add a list of tcons
attached to each SMB session. Move the refcount to non-atomic. Protect
all of the above with the cifs_tcp_ses_lock. Add functions to
properly find and put references to the tcons.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>

e1000e: enable ECC correction on 82571 silicon

This change enables ECC correction for the packet buffer on all 82571
silicon.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: fix IPMI traffic

Some users reported that they have machines with BMCs enabled that cannot
receive IPMI traffic after e1000e is loaded.
http://marc.info/?l=e1000-devel&m=121909039127414&w=2
http://marc.info/?l=e1000-devel&m=121365543823387&w=2

This fixes the issue if they load with the new parameter = 0 by disabling
crc stripping, but leaves the performance feature on for most users.
Based on work done by Hong Zhang.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: fix warn_on reload after phy_id error

If the driver fails to initialize the first time due to the failure in the
phy_id check the kernel triggers a warn_on on the second try to load the
driver because the driver did not free the msi/x resources in the first
load because of the previous failure in phy_id check.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylib: make mdio-gpio work without OF (v4)

make mdio-gpio work with non OpenFirmware gpio implementation.

Aditional changes to mdio-gpio:
- use gpio_request() and gpio_free()
- place irq[] array in struct mdio_gpio_info
- add module description, author and license
- add note about compiling this driver as module
- rename mdc and mdio function (were ugly names)
- change MII to MDIO in bus name
- add __init __exit to module (un)loading functions
- probe fails if no phys added to the bus
- kzalloc bitbang with sizeof(*bitbang)

Changes since v3:
- keep bus naming "%x" to be compatible with existing drivers.

Changes since v2:
- more #ifdefs reduction
- platform driver will be registered on OF platforms also
- unified platform and OF bus_id to phy%i

Changes since v1:
- removed NO_IRQ
- reduced #idefs

Laurent, please test this driver under OF.

Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylib: rename mdio-ofgpio to mdio-gpio

Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>

unitialized return value in mm/mlock.c: __mlock_vma_pages_range()

Fix an unitialized return value when compiling on parisc (with CONFIG_UNEVICTABLE_LRU=y):
mm/mlock.c: In function `__mlock_vma_pages_range':
mm/mlock.c:165: warning: `ret' might be used uninitialized in this function

Signed-off-by: Helge Deller <deller@gmx.de>
[ It isn't ever really used uninitialized, since no caller should ever
  call this function with an empty range.  But the compiler is correct
  that from a local analysis standpoint that is impossible to see, and
  fixing the warning is appropriate.  ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

stop_machine: fix race with return value (fixes Bug #11989)

Bug #11989: Suspend failure on NForce4-based boards due to chanes in
stop_machine

We should not access active.fnret outside the lock; in theory the next
stop_machine could overwrite it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Fix broken ownership of /proc/sys/ files

D'oh...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-and-tested-by: Peter Palfrader <peter@palfrader.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dm9000: Fix build error.

Reported by Stephen Rothwell:

drivers/net/dm9000.c:1450: error: expected ')' before ';' token
drivers/net/dm9000.c:1455: error: expected ';' before '}' token

Signed-off-by: David S. Miller <davem@davemloft.net>

mfd: Correct WM8350 I2C return code usage

The vendor BSP used for the WM8350 development provided an I2C driver
which incorrectly returned zero on succesful sends rather than the
number of transmitted bytes, an error which was then propagated into the
WM8350 I2C accessors.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

mfd: fix event masking for da9030

Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

acpi: fix oops in acpi_system_wakeup_device_seq_show

Commit 0794469da3f7b2093575cbdfc1108308dd3641ce: ("ACPI: struct device -
replace bus_id with dev_name(), dev_set_name()") introduced a bug by
testing 'dev_name(ldev)' instead of 'ldev->bus' for NULL when printing
out the bus information.

So if ldev->bus was NULL, we'd oops.

Reported-and-tested-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

phy: fix phy address bug

PHYID returns 0xffff and not 0xffffffff when not found and in some
case(at91sam9263) 0x0. Maybe this patch could be useful.

Signed-off-by: Giulio Benetti <giulio.benetti@micronovasrl.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e100: fix dma error in direction for mapping

The e100 driver triggers BUG_ON(buf->direction != dir)
by doing pci_map_single(..., PCI_DMA_BIDIRECTIONAL)
and pci_dma_sync_single_for_device(..., PCI_DMA_TODEVICE).

Changing the DMA direction, especially with dmabounce will result
in unexpected behaviour.

Reported-by: Anders Grafstrom <grfstrm@users.sourceforge.net>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: use dev_printk instead of printk

Use dev_printk() instead of printk() to give a little more context
and use consistent format.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qla3xxx: Cleanup: Fix link print statements.

Removed debug print statements and improved conditionals around informational statements.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Use device_set_wakeup_enable

Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
igb_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: Use device_set_wakeup_enable

Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
e1000_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: Use device_set_wakeup_enable

Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
e1000_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

via-velocity: enable perfect filtering for multicast packets

Signed-off-by: Joey Zhuo <joeyzhuo@via.com.tw>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pegasus: minor resource shrinkage

Make pegasus driver not allocate a workqueue until the driver
is bound to some device, which will need that workqueue if
the device is brought up. This conserves resources when the
driver is linked but there's no pegasus device connected.

Also shrink the runtime footprint a smidgeon by moving some
init-only code into its proper section, and move an obnoxious
(frequent and meaningless) message to be debug-only.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix usage of netif_*_all_queues() with netif_carrier_{off|on}()

netif_carrier_off() is sufficient to stop Tx into the driver. Stopping the Tx
queues is redundant and unnecessary. By the same token, netif_carrier_on()
will be sufficient to re-enable Tx, so waking the queues is unnecessary.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

function tracing: fix wrong pos computing when read buffer has been fulfilled

Impact: make output of available_filter_functions complete

phenomenon:

The first value of dyn_ftrace_total_info is not equal with
`cat available_filter_functions | wc -l`, but they should be equal.

root cause:

When printing functions with seq_printf in t_show, if the read buffer
is just overflowed by current function record, then this function
won't be printed to user space through read buffer, it will
just be dropped. So we can't see this function printing.

So, every time the last function to fill the read buffer, if overflowed,
will be dropped.

This also applies to set_ftrace_filter if set_ftrace_filter has
more bytes than read buffer.

fix:

Through checking return value of seq_printf, if less than 0, we know
this function doesn't be printed. Then we decrease position to force
this function to be printed next time, in next read buffer.

Another little fix is to show correct allocating pages count.

Signed-off-by: walimis <walimisdev@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

MAINTAINERS: remove me as RAID maintainer

Neil has been the maintainer of the RAID/MD code for a long time,
remove me as a co-maintainer.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: fix kernel warning on /proc/sched_debug access

Luis Henriques reported that with CONFIG_PREEMPT=y + CONFIG_PREEMPT_DEBUG=y +
CONFIG_SCHED_DEBUG=y + CONFIG_LATENCYTOP=y enabled, the following warning
triggers when using latencytop:

> [  775.663239] BUG: using smp_processor_id() in preemptible [00000000] code: latencytop/6585
> [  775.663303] caller is native_sched_clock+0x3a/0x80
> [  775.663314] Pid: 6585, comm: latencytop Tainted: G        W 2.6.28-rc4-00355-g9c7c354 #1
> [  775.663322] Call Trace:
> [  775.663343]  [<ffffffff803a94e4>] debug_smp_processor_id+0xe4/0xf0
> [  775.663356]  [<ffffffff80213f7a>] native_sched_clock+0x3a/0x80
> [  775.663368]  [<ffffffff80213e19>] sched_clock+0x9/0x10
> [  775.663381]  [<ffffffff8024550d>] proc_sched_show_task+0x8bd/0x10e0
> [  775.663395]  [<ffffffff8034466e>] sched_show+0x3e/0x80
> [  775.663408]  [<ffffffff8031039b>] seq_read+0xdb/0x350
> [  775.663421]  [<ffffffff80368776>] ? security_file_permission+0x16/0x20
> [  775.663435]  [<ffffffff802f4198>] vfs_read+0xc8/0x170
> [  775.663447]  [<ffffffff802f4335>] sys_read+0x55/0x90
> [  775.663460]  [<ffffffff8020c67a>] system_call_fastpath+0x16/0x1b
> ...

This breakage was caused by me via:

  7cbaef9: sched: optimize sched_clock() a bit

Change the calls to cpu_clock().

Reported-by: Luis Henriques <henrix@sapo.pt>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: don't grab devices with no input
  HID: fix radio-mr800 hidquirks
  HID: fix kworld fm700 radio hidquirks
  HID: fix start/stop cycle in usbhid driver
  HID: use single threaded work queue for hid_compat
  HID: map macbook keys for "Expose" and "Dashboard"
  HID: support for new unibody macbooks
  HID: fix locking in hidraw_open()

Merge git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
  pcmcia: ensure correct logging in do_io_probe
  pcmcia: add another pata/ide ID
  pcmcia: add braces in error path
  pcmcia: struct device - replace bus_id with dev_name(), dev_set_name()
  pcmcia: setup resource information for pseudo multifunction devices.
  pcmcia: fix indentation & braces disagreement - add braces

phy: Add support for Marvell 88E1118 PHY

This patch will add support for the Marvell 88E1118 PHY which supports gigabit ethernet among other things.

Signed-off-by: Ron Madrid <ron_madrid@sbcglobal.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Pause parameters per port

Before the change the driver reported the same pause parameters
for all the ports, even only one of them was modified.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 2.6.28-rc5

Fix inotify watch removal/umount races

Inotify watch removals suck violently.

To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex.  That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount.  We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.

Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done.  Cleanup is just deactivate_super().

However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords.  That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.

We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable.  OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e.  that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().

So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong.  We had
to drop ih->mutex before we could grab ->s_umount.  So the watch
could've been gone already.

That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer.  If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address.  That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that.  Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.

So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check.  If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.

And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

LIS3LV02Dx: remove unused #include <version.h>

The file(s) below do not use LINUX_VERSION_CODE nor KERNEL_VERSION.
drivers/hwmon/lis3lv02d.c

This patch removes the said #include <version.h>.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'sh/for-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6

* 'sh/for-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  serial: sh-sci: Reorder the SCxTDR write after the TDxE clear.
  sh: __copy_user function can corrupt the stack in case of exception
  sh: Fixed the TMU0 reload value on resume
  sh: Don't factor in PAGE_OFFSET for valid_phys_addr_range() check.
  sh: early printk port type fix
  i2c: fix i2c-sh_mobile rx underrun
  sh: Provide a sane valid_phys_addr_range() to prevent TLB reset with PMB.
  usb: r8a66597-hcd: fix wrong data access in SuperH on-chip USB
  fix sci type for SH7723
  serial: sh-sci: fix cannot work SH7723 SCIFA
  sh: Handle fixmap TLB eviction more coherently.

Merge branch 'doc-subdirs' of git://git.kernel.org/pub/scm/linux/kernel/git/rdunlap/linux-docs

* 'doc-subdirs' of git://git.kernel.org/pub/scm/linux/kernel/git/rdunlap/linux-docs:
Create/use more directory structure in the Documentation/ tree.

Add 'pr_fmt()' format modifier to pr_xyz macros.

A common reason for device drivers to implement their own printk macros
is the lack of a printk prefix with the standard pr_xyz macros.
Introduce a pr_fmt() macro that is applied for every pr_xyz macro to the
format string.

The most common use of the pr_fmt macro would be to add the name of the
device driver to all pr_xyz messages in a source file.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: restrict RDMA usage

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
  V4L/DVB (9624): CVE-2008-5033: fix OOPS on tvaudio when controlling bass/treble
  V4L/DVB (9623): tvaudio: Improve debug msg by printing something more human
  V4L/DVB (9622): tvaudio: Improve comments and remove a unneeded prototype
  V4L/DVB (9621): Avoid writing outside shadow.bytes[] array
  V4L/DVB (9620): tvaudio: use a direct reference for chip description
  V4L/DVB (9619): tvaudio: update initial comments
  V4L/DVB (9618): tvaudio: add additional logic to avoid OOPS
  V4L/DVB (9617): tvtime: remove generic_checkmode callback
  V4L/DVB (9616): tvaudio: cleanup - group all callbacks together
  V4L/DVB (9615): tvaudio: instead of using a magic number, use ARRAY_SIZE
  V4L/DVB (9613): tvaudio: fix a memory leak

Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] fix s390x_newuname
  [S390] dasd: log sense for fatal errors
  [S390] cpu topology: fix locking
  [S390] cio: Fix refcount after moving devices.
  [S390] ftrace: fix kernel stack backchain walking
  [S390] ftrace: disable tracing on idle psw
  [S390] lockdep: fix compile bug
  [S390] kvm_s390: Fix oops in virtio device detection with "mem="
  [S390] sclp: emit error message if assign storage fails
  [S390] Fix range for add_active_range() in setup_memory()

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] dpt_i2o: fix transferred data length for scsi_set_resid()
  [SCSI] scsi_error regression: Fix idempotent command handling
  [SCSI] zfcp: Fix hexdump data in s390dbf traces
  [SCSI] zfcp: fix erp timeout cleanup for port open requests
  [SCSI] zfcp: Wait for port scan to complete when setting adapter online
  [SCSI] zfcp: Fix cast warning
  [SCSI] zfcp: Fix request list handling in error path
  [SCSI] zfcp: fix mempool usage for status_read requests
  [SCSI] zfcp: fix req_list_locking.
  [SCSI] zfcp: Dont clear reference from SCSI device to unit
  [SCSI] qla2xxx: Update version number to 8.02.01-k9.
  [SCSI] qla2xxx: Return a FAILED status when abort mailbox-command fails.
  [SCSI] qla2xxx: Do not honour max_vports from firmware for 2G ISPs and below.
  [SCSI] qla2xxx: Use pci_disable_rom() to manipulate PCI config space.
  [SCSI] qla2xxx: Correct Atmel flash-part handling.
  [SCSI] megaraid: fix mega_internal_command oops

Revert "x86: blacklist DMAR on Intel G31/G33 chipsets"

This reverts commit e51af6630848406fc97adbd71443818cdcda297b, which was
wrongly hoovered up and submitted about a month after a better fix had
already been merged.

The better fix is commit cbda1ba898647aeb4ee770b803c922f595e97731
("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do
this blacklisting based on the DMI identification for the offending
motherboard, since sometimes this chipset (or at least a chipset with
the same PCI ID) apparently _does_ actually have an IOMMU.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>