pilppa.com Git - linux-2.6-omap-h63xx.git/log

[SCSI] iscsi class, iscsi_tcp/iser: add host arg to session creation

iscsi offload (bnx2i and qla4xx) allocate a scsi host per hba,
so the session creation path needs a shost/host_no argument.
Software iscsi/iser will follow the same behabior as before
where it allcoates a host per session, but in the future iser
will probably look more like bnx2i where the host's parent is
the hardware (rnic for iser and for bnx2i it is the nic), because
it does not use a socket layer like how iscsi_tcp does.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] mpt fusion : Adding FAULT Reset polling work

When the firmware is in Fault state it will be identifed only when the next time
the driver access the IOC state.
This patch includes a polling function in the driver which will be executed in
regular interval to check the status of the firmware and if it is in Fault
state, then the firmware will be reset by the driver.

Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] mpt fusion : Setting intial period to 0xFF instead of 0xA

The initial period is set to 0xFF

Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] mpt fusion : Updated copyright statment with 2008 included

Updating copyright statement to include the year 2008

Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] mpt fusion: Driver version upgrade to 3.04.07

Updating driver version to 3.04.07 from 3.04.06

Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Fix sparse warning by providing new entry in dbf

drivers/s390/scsi/zfcp_dbf.c:692:2: warning: context imbalance in
'zfcp_rec_dbf_event_thread' - different lock contexts for basic block

Replace the parameter indicating if the lock is held with a new entry
function that only acquires the lock. This makes the lock handling
more visible and removes the sparse warning.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: remove some __attribute__ ((packed))

There is no need to pack data structures which describe the
contents of records in the new recovery trace.

lcrash currently depends on the binary format for the other traces,
removing the packed attribute from all traces would break trace
debugging with lcrash.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Refine trace levels of some recovery related events.

This change better spreads trace levels of recovery related events.
There was an overlap of traces for some recovery triggers and the
processing of recovery actions.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Add information about interrupt to trace.

Store the index of the buffer in the inbound queue used to report
request completion in trace record for request coompletion.
This piece of information allows to better compare qdio and zfcp traces.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Rename sbal_curr to sbal_last.

sbal_last is more appropriate, because it matches sbal_first.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Rename sbal_last.

sbal_last is confusing, as it is not the last one actually used,
but just a limit. sbal_limit is a better name.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Remove field sbal_last from trace record.

This field is not needed, because it designates an index with a fix offset
from sbal_first. It's name is confusing anyway.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Remove some sparse warnings

Remove some sparse warnings by telling sparse that zfcp_req_create
acquires the lock for the request queue.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Fix fsf_status_read return code handling

If allocation of a status buffer failed the function incorrectly
returned 0 instead of -ENOMEM.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Fix mempool pointer for GID_PN request allocation

When allocating memory for GID_PN nameserver requests, the allocation
function stores the pointer to the mempool, but then overwrites the
pointer via memset. Later, the wrong function to free the memory will
be called, since this is based on the stored pointer.

Fix this by first initializing the struct and then storing the pointer.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall

Processing of an unsolicted status request can lead to a locking race
of the request_queue's queue_lock during the recreation of the
used up status read request while still in interrupt context
of the response handler.

Detaching the 'refill' of the long running status read requests from
the handler to a scheduled work is solving this issue.

In addition, each refill-run is trying to re-establish the full amount
of status read requests, which might have failed in earlier runs.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] mpt fusion: make struct mpt_proc_root_dir static

This patch makes the needlessly global struct mpt_proc_root_dir static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: "Prakash, Sathya" <Sathya.Prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] make use of the residue value

USB sometimes doesn't return an error but instead returns a residue
value indicating part (or all) of the command wasn't completed. So if
the driver _done() error processing indicates the command was fully
processed, subtract off the residue so that this USB error gets
propagated.

Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] SCSI: remove dev->power.power_state from mesh driver

power.power_state is scheduled for removal. This patch (as1055)
removes all uses of that field from the SCSI mesh driver.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Paul Mackerras <paulus@au.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] aacraid: linit.c make aac_show_serial_number static

drivers/scsi/aacraid/linit.c:865:9: warning: symbol 'aac_show_serial_number' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Mark Salyzyn <Mark_Salyzyn@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] aacraid: Add Power Management cards to documentation

Update the documented list of products supported by the aacraid driver.

Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: sysfs attributes for fabric and channel latencies

The latency information is provided on a SCSI device level (LUN)
which can be found at the following location

/sys/class/scsi_device/<H:C:T:L>/device/cmd_latency
/sys/class/scsi_device/<H:C:T:L>/device/read_latency
/sys/class/scsi_device/<H:C:T:L>/device/write_latency

Each sysfs attribute provides the available data: min, max and sum for
fabric and channel latency and the number of requests processed.

An overrun of the variables is neither detected nor treated. The file
has to be read twice to make a meaningful statement, because only the
differences of the values between the two reads can be used. A reset
of the values can be achieved by writing to the attribute.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Track fabric and channel latencies provided by FCP adapter

Add the infrastructure to retrieve the fabric and channel latencies
from FSF commands for each SCSI command that has been processed. For
each unit, the sum, min, max and number of requests is tracked.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: Remove hardware handler infrastructure from dm

This patch just removes infrastructure that provided support for hardware
handlers in the dm layer as it is not needed anymore.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: Remove hardware handlers from dm

This patch removes the 3 hardware handlers that currently exist
under dm as the functionality is moved to SCSI layer in the earlier
patches.

[jejb: removed more makefile hunks and rejection fixes]
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: Remove dm_pg_init_complete

This patch just removes the dm layer's path initialization completion
routine. This is separated from the other patch(scsi_dh: Use SCSI
device handler in dm-multipath) Just to make that patch more readable.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: Add a single threaded workqueue for initializing paths

Before this patch set (SCSI hardware handlers), initialization of a
path was done asynchronously. Doing that requires a workqueue in each
device/hardware handler module and leads to unneccessary complication
in the device handler code, making it difficult to read the code and
follow the state diagram.

Moving that workqueue to this level makes the device handler code simpler.
Hence, the workqueue is moved to dm level.

A new workqueue is added instead of adding it to the existing workqueue
(kmpathd) for the following reasons:
1. Device activation has to happen faster, stacking them along
   with the other workqueue might lead to unnecessary delay
   in the activation of the path.
2. The effect could be felt the other way too. i.e the current
   events that are handled by the existing workqueue might get
   a delayed response.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: Use SCSI device handler in dm-multipath

This patch converts dm-mpath to use scsi device handlers instead of
dm's hardware handlers.

This patch does not add any new functionality. Old behaviors remain and
userspace tools work as is except that arguments supplied with hardware
handler are ignored.

One behavioral exception is: Activation of a path is synchronous in this
patch, opposed to the older behavior of being asynchronous (changed in
patch 07: scsi_dh: Add a single threaded workqueue for initializing a path)

Note: There is no need to get a reference for the device handler module
(as it was done in the dm hardware handler case) here as the reference
is held when the device was first found. Instead we check and make sure
that support for the specified device is present at table load time.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: add EMC Clariion device handler

This adds support for EMC Clariions. This patch has the features that
currently exists in mainline and advanced features from Ed's patches.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: add hp sw device handler

This patch provides the device handler to support the older hp boxes
which cannot be upgraded.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: add lsi rdac device handler

This patch provides the device handler to support the LSI RDAC SCSI
based storage devices.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_dh: add infrastructure for SCSI Device Handlers

Some of the storage devices (that can be accessed through multiple paths),
do need some special handling for
1. Activating the passive path of the storage access.
2. Decode and handle the special sense codes returned by the devices.
3. Handle the I/Os being sent to the passive path, especially
during the device probe time.
when accessed through multiple paths.

As of today this special device handling is done at the dm-multipath
layer using dm-handlers. That works well for (1); for (2) to be handled
at dm layer, scsi sense information need to be exported from SCSI to dm-layer,
which is not very attractive; (3) cannot be done at all at the dm layer.

Device handler has been moved to SCSI mainly to handle (2) and (3) properly.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

Linux 2.6.26-rc5

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
  l2tp: Fix possible oops if transmitting or receiving when tunnel goes down
  tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
  tcp: Increment OUTRSTS in tcp_send_active_reset()
  raw: Raw socket leak.
  lt2p: Fix possible WARN_ON from socket code when UDP socket is closed
  USB ID for Philips CPWUA054/00 Wireless USB Adapter 11g
  ssb: Fix context assertion in ssb_pcicore_dev_irqvecs_enable
  libertas: fix command size for CMD_802_11_SUBSCRIBE_EVENT
  ipw2200: expire and use oldest BSS on adhoc create
  airo warning fix
  b43legacy: Fix controller restart crash
  sctp: Fix ECN markings for IPv6
  sctp: Flush the queue only once during fast retransmit.
  sctp: Start T3-RTX timer when fast retransmitting lowest TSN
  sctp: Correctly implement Fast Recovery cwnd manipulations.
  sctp: Move sctp_v4_dst_saddr out of loop
  sctp: retran_path update bug fix
  tcp: fix skb vs fack_count out-of-sync condition
  sunhme: Cleanup use of deprecated calls to save_and_cli and restore_flags.
  xfrm: xfrm_algo: correct usage of RIPEMD-160
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: switch /proc/led to seq_file
sparc64: IO accessors fix

l2tp: Fix possible oops if transmitting or receiving when tunnel goes down

Some problems have been experienced in the field which cause an oops
in the pppol2tp driver if L2TP tunnels fail while passing data.

The pppol2tp driver uses private data that is referenced via the
sk->sk_user_data of its UDP and PPPoL2TP sockets. This patch makes
sure that the driver uses sock_hold() when it holds a reference to the
sk pointer. This affects its sendmsg(), recvmsg(), getname(),
[gs]etsockopt() and ioctl() handlers.

Tested by ISP where problem was seen. System has been up 10 days with
no oops since running this patch. Without the patch, an oops would
occur every 1-2 days.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.

skb_splice_bits temporary drops the socket lock while iterating over
the socket queue in order to break a reverse locking condition which
happens with sendfile. This, however, opens a window of opportunity
for tcp_collapse() to aggregate skbs and thus potentially free the
current skb used in skb_splice_bits and tcp_read_sock.

This patch fixes the problem by (re-)getting the same "logical skb"
after the lock has been temporary dropped.

Based on idea and initial patch from Evgeniy Polyakov.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Increment OUTRSTS in tcp_send_active_reset()

TCP "resets sent" counter is not incremented when a TCP Reset is
sent via tcp_send_active_reset().

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

raw: Raw socket leak.

The program below just leaks the raw kernel socket

int main() {
        int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP);
        struct sockaddr_in addr;

        memset(&addr, 0, sizeof(addr));
        inet_aton("127.0.0.1", &addr.sin_addr);
        addr.sin_family = AF_INET;
        addr.sin_port = htons(2048);
        sendto(fd,  "a", 1, MSG_MORE, &addr, sizeof(addr));
        return 0;
}

Corked packet is allocated via sock_wmalloc which holds the owner socket,
so one should uncork it and flush all pending data on close. Do this in the
same way as in UDP.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

lt2p: Fix possible WARN_ON from socket code when UDP socket is closed

If an L2TP daemon closes a tunnel socket while packets are queued in
the tunnel's reorder queue, a kernel warning is logged because the
socket is closed while skbs are still referencing it. The fix is to
purge the queue in the socket's release handler.

WARNING: at include/net/sock.h:351 udp_lib_unhash+0x41/0x68()
Pid: 12998, comm: openl2tpd Not tainted 2.6.25 #8
[<c0423c58>] warn_on_slowpath+0x41/0x51
[<c05d33a7>] udp_lib_unhash+0x41/0x68
[<c059424d>] sk_common_release+0x23/0x90
[<c05d16be>] udp_lib_close+0x8/0xa
[<c05d8684>] inet_release+0x42/0x48
[<c0592599>] sock_release+0x14/0x60
[<c059299f>] sock_close+0x29/0x30
[<c046ef52>] __fput+0xad/0x15b
[<c046f1d9>] fput+0x17/0x19
[<c046c8c4>] filp_close+0x50/0x5a
[<c046da06>] sys_close+0x69/0x9f
[<c04048ce>] syscall_call+0x7/0xb

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6

USB ID for Philips CPWUA054/00 Wireless USB Adapter 11g

Enable the Philips CPWUA054/00 in p54usb.

Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ssb: Fix context assertion in ssb_pcicore_dev_irqvecs_enable

This fixes a context assertion in ssb that makes b44 print
out warnings on resume.

This fixes the following kernel oops:
http://www.kerneloops.org/oops.php?number=12732
http://www.kerneloops.org/oops.php?number=11410

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

libertas: fix command size for CMD_802_11_SUBSCRIBE_EVENT

The size was two small by two bytes.

Signed-off-by: Holger Schurig
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ipw2200: expire and use oldest BSS on adhoc create

If there are no networks on the free list, expire the oldest one when
creating a new adhoc network. Because ipw2200 and the ieee80211 stack
don't actually cull old networks and place them back on the free list
unless they are needed for new probe responses, over time the free list
would become empty and creating an adhoc network would fail due to the !
list_empty(...) check.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

airo warning fix

WARNING: space prohibited between function name and open parenthesis '('
#22: FILE: drivers/net/wireless/airo.c:2907:
+ while ((IN4500 (ai, COMMAND) & COMMAND_BUSY) && (delay < 10000)) {

total: 0 errors, 1 warnings, 8 lines checked

./patches/wireless-airo-waitbusy-wont-delay.patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Dan Williams <dcbw@redhat.com>
Cc: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

b43legacy: Fix controller restart crash

This fixes a kernel crash on rmmod, in the case where the controller
was restarted before doing the rmmod.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

sctp: Fix ECN markings for IPv6

Commit e9df2e8fd8fbc95c57dbd1d33dada66c4627b44c ("[IPV6]: Use
appropriate sock tclass setting for routing lookup.") also changed the
way that ECN capable transports mark this capability in IPv6. As a
result, SCTP was not marking ECN capablity because the traffic class
was never set. This patch brings back the markings for IPv6 traffic.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Flush the queue only once during fast retransmit.

When fast retransmit is triggered by a sack, we should flush the queue
only once so that only 1 retransmit happens. Also, since we could
potentially have non-fast-rtx chunks on the retransmit queue, we need
make sure any chunks eligable for fast retransmit are sent first
during fast retransmission.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Start T3-RTX timer when fast retransmitting lowest TSN

When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Correctly implement Fast Recovery cwnd manipulations.

Correctly keep track of Fast Recovery state and do not reduce
congestion window multiple times during sucht state.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Move sctp_v4_dst_saddr out of loop

There's no need to execute sctp_v4_dst_saddr() for each
iteration, just move it out of loop.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: retran_path update bug fix

If the current retran_path is the only active one, it should
update it to the the next inactive one.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-2.6-misc-20080605a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-fix

tcp: fix skb vs fack_count out-of-sync condition

This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
  1) DSACK in the middle of previous SACK block must be generated.
  2) In order to take that particular branch, part or all of the
     DSACKed segment must already be SACKed so that we have that
     in cache in the first place.
  3) The new info must be top enough so that fackets_out will be
     updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.

It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.

Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).

This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.

This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing &
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

sunhme: Cleanup use of deprecated calls to save_and_cli and restore_flags.

Make use of local_irq_save and local_irq_restore rather then the
deprecated save_and_cli and restore_flags calls.

Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: xfrm_algo: correct usage of RIPEMD-160

This patch fixes the usage of RIPEMD-160 in xfrm_algo which in turn
allows hmac(rmd160) to be used as authentication mechanism in IPsec
ESP and AH (see RFC 2857).

Signed-off-by: Adrian-Ken Rueegsegger <rueegsegger@swiss-it.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

[IPV6]: Do not change protocol for UDPv6 sockets with pending sent data.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6]: inet_sk(sk)->cork.opt leak

IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6]: Do not change protocol for raw IPv6 sockets.

It is not allowed to change underlying protocol for
int fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6] NETNS: Handle ancillary data in appropriate namespace.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6]: Check outgoing interface even if source address is unspecified.

The outgoing interface index (ipi6_ifindex) in IPV6_PKTINFO
ancillary data, is not checked if the source address (ipi6_addr)
is unspecified. If the ipi6_ifindex is the not-exist interface,
it should be fail.

Based on patch from Shan Wei <shanwei@cn.fujitsu.com> and
Brian Haley <brian.haley@hp.com>.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6]: Fix the data length of get destination options with short length

If get destination options with length which is not enough for that
option,getsockopt() will still return the real length of the option,
which is larger then the buffer space.
This is because ipv6_getsockopt_sticky() returns the real length of
the option.

This patch fix this problem.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6]: Fix the return value of get destination options with NULL data pointer

If we pass NULL data buffer to getsockopt(), it will return 0,
and the option length is set to -EFAULT:
getsockopt(sk, IPPROTO_IPV6, IPV6_DSTOPTS, NULL, &len);

This is because ipv6_getsockopt_sticky() will return -EFAULT or
-EINVAL if some error occur.

This patch fix this problem.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6] ADDRCONF: Allow longer lifetime on 64bit archs.

- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
  by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
  helper functions: addrconf_timeout_fixup() and
  addrconf_finite_timeout().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV4] TUNNEL4: Fix incoming packet length check for inter-protocol tunnel.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6] TUNNEL6: Fix incoming packet length check for inter-protocol tunnel.

I discover a strange behavior in [ipv4 in ipv6] tunnel. When IPv6 tunnel
payload is less than 40(0x28), packet can be sent to network, received in
physical interface, but not seen in IP tunnel interface. No counter increase
in tunnel interface.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6] ADDRCONF: Check range of prefix length

As of now, the prefix length is not vaildated when adding or deleting
addresses. The value is passed directly into the inet6_ifaddr structure
and later passed on to memcmp() as length indicator which relies on
the value never to exceed 128 (bits).

Due to the missing check, the currently code allows for any 8 bit
value to be passed on as prefix length while using the netlink
interface, and any 32 bit value while using the ioctl interface.

[Use unsigned int instead to generate better code - yoshfuji]

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6] UDP: Possible dst leak in udpv6_sendmsg.

ip6_sk_dst_lookup returns held dst entry. It should be released
on all paths beyond this point. Add missed release when up->pending
is set.

Bug report and initial patch by Denis V. Lunev <den@openvz.org>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Denis V. Lunev <den@openvz.org>

[SCTP]: Fix NULL dereference of asoc.

Commit 7cbca67c073263c179f605bdbbdc565ab29d801d ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Merge branch 'davem-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6

tcp: Fix inconsistency source (CA_Open only when !tcp_left_out(tp))

It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.

I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).

This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

Fix uart_set_ldisc() function type

Commit 64e9159f5d2c4edf5fa6425031e556f8fddaf7e6 ("serial_core:
uart_set_ldisc infrastructure") introduced the ability for low-level
serial drivers to be informed when the tty ldisc changes.

However, the actual tty-layer function that does this callback for
serial devices was declared with the wrong type, having a spurious and
unused 'ldisc' argument.

This fixed the resulting compiler warning by just removing it.

Acked-by: Blithering Idiot <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

netfilter: nf_conntrack_ipv6: fix inconsistent lock state in nf_ct_frag6_gather()

[   63.531438] =================================
[   63.531520] [ INFO: inconsistent lock state ]
[   63.531520] 2.6.26-rc4 #7
[   63.531520] ---------------------------------
[   63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[   63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes:
[   63.531520]  (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0
[   63.531520] {softirq-on-W} state was registered at:
[   63.531520]   [<c0143bba>] __lock_acquire+0x3aa/0x1080
[   63.531520]   [<c0144906>] lock_acquire+0x76/0xa0
[   63.531520]   [<c07a8f0b>] _spin_lock+0x2b/0x40
[   63.531520]   [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910
...

According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.

Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: xt_connlimit: fix accouning when receive RST packet in ESTABLISHED state

In xt_connlimit match module, the counter of an IP is decreased when
the TCP packet is go through the chain with ip_conntrack state TW.
Well, it's very natural that the server and client close the socket
with FIN packet. But when the client/server close the socket with RST
packet(using so_linger), the counter for this connection still exsit.
The following patch can fix it which is based on linux-2.6.25.4

Signed-off-by: Dong Wei <dwei.zh@gmail.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  [ALSA] hda - COMPAL IFL90/JFL-92 laptop quirk
  [ALSA] hda - Fix resume of auto-config mode with Realtek codecs
  [ALSA] hda - Fix model for LG LS75 laptop
  [ALSA] hda - Fix mic input on HP2133
  [ALSA] ac97 - Fix ASUS A9T laptop output

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stack
  suspend-vs-iommu: prevent suspend if we could not resume
  x86: section mismatch fix
  x86: fix Xorg crash with xf86MapVidMem error
  x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
  x86: fix bad pmd ffff810000207xxx(9090909090909090)
  x86: ioremap fix failing nesting check
  x86: fix broken math-emu with lazy allocation of fpu area
  x86: enable preemption in delay
  x86: disable preemption in native_smp_prepare_cpus
  x86: fix APIC warning on 32bit v2

Smack: fuse mount hang fix

The d_instantiate hook for Smack can hang on the root inode of a
filesystem if the file system code has not really done all the set-up.
Fuse is known to encounter this problem.

This change detects an attempt to instantiate a root inode and addresses
it early in the processing, before any attempt is made to do something
that might hang.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Tested-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata-sff: Fix oops reported in kerneloops.org for pnp devices with no ctl
  libata: kill unused constants
  sata_mv: PHY_MODE4 cleanups
  [libata] ata_piix: more acer short cable quirks
  [libata] ACPI: Properly handle bay devices in dock stations

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] qla2xxx: Update version number to 8.02.01-k4.
  [SCSI] qla2xxx: Correct handling of AENs postings for vports.
  [SCSI] qla2xxx: Revert "qla2xxx: Use proper HA during asynchronous event handling."
  [SCSI] ibmvscsi: Non SCSI error status fixup
  [SCSI] fusion mpt: fix target missing after resetting external raid
  [SCSI] fix intermittent oops in scsi_bus_uevent
  [SCSI] qla2xxx: Update version number to 8.02.01-k3.
  [SCSI] qla2xxx: Revert "qla2xxx: Validate mid-layer 'underflow' during check-condition handling."
  [SCSI] qla2xxx: Disable local-interrupts while polling for RISC status.
  [SCSI] qla2xxx: Extend the 'fw_dump' SYSFS node the ability to initiate a firmware dump.
  [SCSI] qla2xxx: Don't depend on mailbox return values while enabling FCE tracing.
  [SCSI] qla2xxx: Convert vport_sem to a mutex
  [SCSI] qla2xxx: firmware semaphore to mutex
  [SCSI] qla2xxx: Correct locking within MSI-X interrupt handlers.
  [SCSI] qla2xxx: Display driver version at module init-time.
  [SCSI] qla2xxx: Return correct port_type to FC-transport for Vports.

Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6.26

* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6.26:
  sh: Add defconfig for RSK7203.
  sh: Update SE7206 defconfig.
  sh: Disable 4KSTACKS on nommu.
  sh: fix miscompilation of ip_fast_csum with gcc >= 4.3
  sh: module.c use kernel unaligned helpers
  sh/kernel/cpu/irq/intc-sh5.c build fix

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdbts: Use HW breakpoints with CONFIG_DEBUG_RODATA
kgdb: use common ascii helpers and put_unaligned_be32 helper

bogus format in ip6mr

ptrdiff_t is %t..., not %Z...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

celleb_scc_pciex endianness misannotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mpc52xx_gpio iomem annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

s2io iomem annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cifs endianness fixes

__le16 fields used as host-endian.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Steve French <smfrench@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

isp1760-if iomem annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cdc-wdm endianness fixes

* wMaxPacketSize is le16; copying it to a field of local structure and then
using that field as host-endian (size of object to be allocated) is broken.
* bMaxPacketSize0 is 8-bit; feeding it to le16_to_cpu() is bogus and since the
result is used as host-endian, it's not even misspelled cpu_to_le16().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

usb/c67x00 endianness annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ibmaem endianness annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stack

Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT,
and bisected it to commit v2.6.19-1363-gacc2076, "i386: add sleazy FPU
optimization".

Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().

Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.

It's a side effect though. This is the failing scenario:

process 'A' in save_i387_ia32() just after clear_used_math()

Got an interrupt and pre-empted out.

At the next context switch to process 'A' again, kernel tries to restore
the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()

This results in init_fpu() during the __switch_to()'s math_state_restore()

And resulting in fpu corruption which will be saved/restored
(save_i387_fxsave and restore_i387_fxsave) during the remaining
part of the signal handling after the context switch.

Bisected-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org

suspend-vs-iommu: prevent suspend if we could not resume

iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume. Prevent system suspend in case we
would be unable to resume.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: section mismatch fix

Fix this:

WARNING: vmlinux.o(.text+0x114bb): Section mismatch in reference from
the function nopat() to the function .cpuinit.text:pat_disable()
The function nopat() references
the function __cpuinit pat_disable().
This is often because nopat lacks a __cpuinit
annotation or the annotation of pat_disable is wrong.

Reported-by: "Fabio Comolli" <fabio.comolli@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: fix Xorg crash with xf86MapVidMem error

Clarify the usage of mtrr_lookup() in PAT code, and to make PAT code
resilient to mtrr lookup problems.

Specifically, pat_x_mtrr_type() is restructured to highlight, under what
conditions we look for mtrr hint. pat_x_mtrr_type() uses a default type
when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup
for request type of '-1' and also defaults in a sane way on any mtrr
lookup failure.

pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API: (1) hasn't specified any type (/dev/mem
mapping) and we do not want to take performance hit by always mapping
UC_MINUS. This will be the case for /dev/mem mappings used to map BIOS
area or ACPI region which are WB'able. In this case, as long as MTRR is
not WB, PAT will request UC_MINUS for such mappings.

(2) user/API requests WB mapping while in reality MTRR may have UC or
WC. In this case, PAT can map as WB (without checking MTRR) and still
effective type will be UC or WC. But, a subsequent request to map same
region as UC or WC may fail, as the region will get trackked as WB in
PAT list. Looking at MTRR hint helps us to track based on effective type
rather than what user requested. Again, here mtrr_lookup is only used as
hint and we fallback to WB mapping (as requested by user) as default.

In both cases, after using the mtrr hint, we still go through the
memtype list to make sure there are no inconsistencies among multiple
users.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest

Changed the call to find_e820_area_size to pass u64 instead of unsigned long.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: fix bad pmd ffff810000207xxx(9090909090909090)

OGAWA Hirofumi and Fede have reported rare pmd_ERROR messages:
mm/memory.c:127: bad pmd ffff810000207xxx(9090909090909090).

Initialization's cleanup_highmap was leaving alignment filler
behind in the pmd for MODULES_VADDR: when vmalloc's guard page
would occupy a new page table, it's not allocated, and then
module unload's vfree hits the bad 9090 pmd entry left over.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: ioremap fix failing nesting check

Mika Kukkonen noticed that the nesting check in early_iounmap() is not
actually done.

Reported-by: Mika Kukkonen <mikukkon@srv1-m700-lanp.koti>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: torvalds@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: mikukkon@iki.fi
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: fix broken math-emu with lazy allocation of fpu area

Fix the math emulation that got broken with the recent lazy allocation of FPU
area. init_fpu() need to be added for the math-emulation path aswell
for the FPU area allocation.

math emulation enabled kernel booted fine with this, in the presence
of "no387 nofxsr" boot param.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: enable preemption in delay

The RT team has been searching for a nasty latency. This latency shows
up out of the blue and has been seen to be as big as 5ms!

Using ftrace I found the cause of the latency.

   pcscd-2995  3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
   pcscd-2995  3dN.2 52360301us : idle_cpu (irq_exit)
   pcscd-2995  3dN.2 52360301us : rcu_irq_exit (irq_exit)
   pcscd-2995  3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
)
   pcscd-2995  3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)

Here's an example of a 400 us latency. pcscd took a timer interrupt and
returned with "need resched" enabled, but did not reschedule until after
the next interrupt came in at 52360771us 400us later!

At first I thought we somehow missed a preemption check in entry.S. But
I also noticed that this always seemed to happen during a __delay call.

   pcscd-2995  3dN.2 52360836us : rcu_irq_exit (irq_exit)
   pcscd-2995  3.N.. 52361265us : preempt_schedule (__delay)

Looking at the x86 delay, I found my problem.

In git commit 35d5d08a085c56f153458c3f5d8ce24123617faf, Andrew Morton
placed preempt_disable around the entire delay due to TSC's not working
nicely on SMP.  Unfortunately for those that care about latencies this
is devastating! Especially when we have callers to mdelay(8).

Here I enable preemption during the loop and account for anytime the task
migrates to a new CPU. The delay asked for may be extended a bit by
the migration, but delay only guarantees that it will delay for that minimum
time. Delaying longer should not be an issue.

[
  Thanks to Thomas Gleixner for spotting that cpu wasn't updated,
    and to place the rep_nop between preempt_enabled/disable.
]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: akpm@osdl.org
Cc: Clark Williams <clark.williams@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi-suse@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>