Mike Christie [Wed, 21 May 2008 20:54:17 +0000 (15:54 -0500)]
[SCSI] libiscsi: fix cmds_max setting
Drivers expect that the cmds_max value they pass to the iscsi layer
is the max scsi commands + mgmt tasks. This patch implements that
and fixes some checks for nr cmd limits.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:16 +0000 (15:54 -0500)]
[SCSI] iscsi class: Add session initiatorname and ifacename sysfs attrs.
This adds two new attrs used for creating initiator ports and
binding sessions to hardware.
The session level initiatorname:
Since bnx2i does a scsi_host per host device, we need to add the
iface initiator port settings on the session, so we can create
multiple initiator ports (each with different inames) per device/scsi_host.
The current iname reflects that qla4xxx can have one iname per hba, and we are
allocating a host per session for software. The iname on the host will
remain so we can export and set the hba level qla4xxx setting.
The ifacename attr:
To bind a session to a some peice of hardware in userspace we maintain
some mappings, but during boot or iscsid restart (iscsid contains the user
space part of the driver) we need to be able to figure out which of those
host mappings abstractions maps to certain sessions. This patch adds
a ifacename attr, which userspace can set to id the host side of the
endpoint across pivot_roots and iscsid restarts.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:14 +0000 (15:54 -0500)]
[SCSI] iser: Modify iser to take a iscsi_endpoint struct in ep callouts and session setup
This hooks iser into the iscsi endpoint code. Previously it handled the
lookup and allocation. This has been made generic so bnx2i and iser can
share it. It also allows us to pass iser the leading conn's ep, so we
know the ib_deivce being used and can set it as the scsi_host's parent.
And that allows scsi-ml to set the dma_mask based on those values.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:13 +0000 (15:54 -0500)]
[SCSI] iscsi class: add endpoint class
Add sysfs representation for the endpoint, so userspace can match the
host and session to the endpoint. This will allow us to set the host's
parent correctly at host creation time.
The next patches will convert tcp and iser, and fix iser's dma_mask
bug.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:12 +0000 (15:54 -0500)]
[SCSI] iscsi class: user device_for_each_child instead of duplicating session list
Currently we duplicate the list of sessions, because we were using the
test for if a session was on the host list to indicate if the session
was bound or unbound. We can instead use the target_id and fix up
the class so that drivers like bnx2i do not have to manage the target id
space.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:06 +0000 (15:54 -0500)]
[SCSI] libiscsi: merge iscsi_mgmt_task and iscsi_cmd_task
There is no need to have the mgmt and cmd tasks separate
structs. It used to save a lot of memory when we overprealocated
memory for tasks, but the next patches will set up the
driver so in the future they can use a mempool or some other
common scsi command allocator and common tagging.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:05 +0000 (15:54 -0500)]
[SCSI] libiscsi: modify libiscsi so it supports offloaded data paths
This patch modifies libiscsi, so drivers like bnx2i and iser can execute
a command from queuecommand/send_pdu instead of having to be queued to
be run in a workq.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Currently to get a ctask from the session cmd array, you have to
know to use the itt modifier. To make this easier on LLDs and
so in the future we can easilly kill the session array and use
the host shared map instead, this patch adds a nice wrapper
to strip the itt into a session->cmds index and return a ctask.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:03 +0000 (15:54 -0500)]
[SCSI] iser: fix handling of scsi cmnds during recovery.
After the stop_conn callback has returned the LLD should not
touch the scsi cmds. iscsi_tcp and libiscsi use the
conn->recv_lock and suspend_rx field to halt recv path
processing, but iser does not have any protection.
This patch modifies iser so that userspace can just
call the ep_disconnect callback, which will halt
all recv IO, before calling the stop_conn callback so
we do not have to worry about the conn->recv_lock and
suspend rx field. iser just needs to stop the send side
from accessing the ib conn.
Fixup to handle when the ep poll fails and ep disconnect
is called from Erez.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:02 +0000 (15:54 -0500)]
[SCSI] iscsi: modify iscsi printk so it can take driver data pointers
Some drivers want to be able to just pass in the driver data pointers
to the iscsi objects. To enable this we need the iscsi printk macro
to cast the object.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:54:01 +0000 (15:54 -0500)]
[SCSI] iscsi: remove session/conn_data_size from iscsi_transport
This removes the session and conn data_size fields from the iscsi_transport.
Just pass in the value like with host allocation. This patch also makes
it so the LLD iscsi_conn data is allocated with the iscsi_cls_conn.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Mike Christie [Wed, 21 May 2008 20:53:56 +0000 (15:53 -0500)]
[SCSI] iscsi class, iscsi_tcp/iser: add host arg to session creation
iscsi offload (bnx2i and qla4xx) allocate a scsi host per hba,
so the session creation path needs a shost/host_no argument.
Software iscsi/iser will follow the same behabior as before
where it allcoates a host per session, but in the future iser
will probably look more like bnx2i where the host's parent is
the hardware (rnic for iser and for bnx2i it is the nic), because
it does not use a socket layer like how iscsi_tcp does.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Prakash, Sathya [Tue, 20 May 2008 19:32:18 +0000 (01:02 +0530)]
[SCSI] mpt fusion : Adding FAULT Reset polling work
When the firmware is in Fault state it will be identifed only when the next time
the driver access the IOC state.
This patch includes a polling function in the driver which will be executed in
regular interval to check the status of the firmware and if it is in Fault
state, then the firmware will be reset by the driver.
Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Christof Schmitt [Mon, 19 May 2008 10:17:47 +0000 (12:17 +0200)]
[SCSI] zfcp: Fix sparse warning by providing new entry in dbf
drivers/s390/scsi/zfcp_dbf.c:692:2: warning: context imbalance in
'zfcp_rec_dbf_event_thread' - different lock contexts for basic block
Replace the parameter indicating if the lock is held with a new entry
function that only acquires the lock. This makes the lock handling
more visible and removes the sparse warning.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Martin Peschke [Mon, 19 May 2008 10:17:46 +0000 (12:17 +0200)]
[SCSI] zfcp: remove some __attribute__ ((packed))
There is no need to pack data structures which describe the
contents of records in the new recovery trace.
lcrash currently depends on the binary format for the other traces,
removing the packed attribute from all traces would break trace
debugging with lcrash.
Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Martin Peschke [Mon, 19 May 2008 10:17:45 +0000 (12:17 +0200)]
[SCSI] zfcp: Refine trace levels of some recovery related events.
This change better spreads trace levels of recovery related events.
There was an overlap of traces for some recovery triggers and the
processing of recovery actions.
Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Martin Peschke [Mon, 19 May 2008 10:17:44 +0000 (12:17 +0200)]
[SCSI] zfcp: Add information about interrupt to trace.
Store the index of the buffer in the inbound queue used to report
request completion in trace record for request coompletion.
This piece of information allows to better compare qdio and zfcp traces.
Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Christof Schmitt [Mon, 19 May 2008 10:17:38 +0000 (12:17 +0200)]
[SCSI] zfcp: Fix mempool pointer for GID_PN request allocation
When allocating memory for GID_PN nameserver requests, the allocation
function stores the pointer to the mempool, but then overwrites the
pointer via memset. Later, the wrong function to free the memory will
be called, since this is based on the stored pointer.
Fix this by first initializing the struct and then storing the pointer.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Swen Schillig [Mon, 19 May 2008 10:17:37 +0000 (12:17 +0200)]
[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall
Processing of an unsolicted status request can lead to a locking race
of the request_queue's queue_lock during the recreation of the
used up status read request while still in interrupt context
of the response handler.
Detaching the 'refill' of the long running status read requests from
the handler to a scheduled work is solving this issue.
In addition, each refill-run is trying to re-establish the full amount
of status read requests, which might have failed in earlier runs.
James Bottomley [Sun, 9 Mar 2008 00:24:17 +0000 (18:24 -0600)]
[SCSI] make use of the residue value
USB sometimes doesn't return an error but instead returns a residue
value indicating part (or all) of the command wasn't completed. So if
the driver _done() error processing indicates the command was fully
processed, subtract off the residue so that this USB error gets
propagated.
Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Alan Stern [Fri, 7 Mar 2008 16:20:25 +0000 (11:20 -0500)]
[SCSI] SCSI: remove dev->power.power_state from mesh driver
power.power_state is scheduled for removal. This patch (as1055)
removes all uses of that field from the SCSI mesh driver.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Paul Mackerras <paulus@au.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Harvey Harrison [Thu, 8 May 2008 19:08:53 +0000 (12:08 -0700)]
[SCSI] aacraid: linit.c make aac_show_serial_number static
drivers/scsi/aacraid/linit.c:865:9: warning: symbol 'aac_show_serial_number' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: Mark Salyzyn <Mark_Salyzyn@adaptec.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Each sysfs attribute provides the available data: min, max and sum for
fabric and channel latency and the number of requests processed.
An overrun of the variables is neither detected nor treated. The file
has to be read twice to make a meaningful statement, because only the
differences of the values between the two reads can be used. A reset
of the values can be achieved by writing to the attribute.
[SCSI] zfcp: Track fabric and channel latencies provided by FCP adapter
Add the infrastructure to retrieve the fabric and channel latencies
from FSF commands for each SCSI command that has been processed. For
each unit, the sum, min, max and number of requests is tracked.
This patch just removes the dm layer's path initialization completion
routine. This is separated from the other patch(scsi_dh: Use SCSI
device handler in dm-multipath) Just to make that patch more readable.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[SCSI] scsi_dh: Add a single threaded workqueue for initializing paths
Before this patch set (SCSI hardware handlers), initialization of a
path was done asynchronously. Doing that requires a workqueue in each
device/hardware handler module and leads to unneccessary complication
in the device handler code, making it difficult to read the code and
follow the state diagram.
Moving that workqueue to this level makes the device handler code simpler.
Hence, the workqueue is moved to dm level.
A new workqueue is added instead of adding it to the existing workqueue
(kmpathd) for the following reasons:
1. Device activation has to happen faster, stacking them along
with the other workqueue might lead to unnecessary delay
in the activation of the path.
2. The effect could be felt the other way too. i.e the current
events that are handled by the existing workqueue might get
a delayed response.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[SCSI] scsi_dh: Use SCSI device handler in dm-multipath
This patch converts dm-mpath to use scsi device handlers instead of
dm's hardware handlers.
This patch does not add any new functionality. Old behaviors remain and
userspace tools work as is except that arguments supplied with hardware
handler are ignored.
One behavioral exception is: Activation of a path is synchronous in this
patch, opposed to the older behavior of being asynchronous (changed in
patch 07: scsi_dh: Add a single threaded workqueue for initializing a path)
Note: There is no need to get a reference for the device handler module
(as it was done in the dm hardware handler case) here as the reference
is held when the device was first found. Instead we check and make sure
that support for the specified device is present at table load time.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[SCSI] scsi_dh: add infrastructure for SCSI Device Handlers
Some of the storage devices (that can be accessed through multiple paths),
do need some special handling for
1. Activating the passive path of the storage access.
2. Decode and handle the special sense codes returned by the devices.
3. Handle the I/Os being sent to the passive path, especially
during the device probe time.
when accessed through multiple paths.
As of today this special device handling is done at the dm-multipath
layer using dm-handlers. That works well for (1); for (2) to be handled
at dm layer, scsi sense information need to be exported from SCSI to dm-layer,
which is not very attractive; (3) cannot be done at all at the dm layer.
Device handler has been moved to SCSI mainly to handle (2) and (3) properly.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
l2tp: Fix possible oops if transmitting or receiving when tunnel goes down
tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
tcp: Increment OUTRSTS in tcp_send_active_reset()
raw: Raw socket leak.
lt2p: Fix possible WARN_ON from socket code when UDP socket is closed
USB ID for Philips CPWUA054/00 Wireless USB Adapter 11g
ssb: Fix context assertion in ssb_pcicore_dev_irqvecs_enable
libertas: fix command size for CMD_802_11_SUBSCRIBE_EVENT
ipw2200: expire and use oldest BSS on adhoc create
airo warning fix
b43legacy: Fix controller restart crash
sctp: Fix ECN markings for IPv6
sctp: Flush the queue only once during fast retransmit.
sctp: Start T3-RTX timer when fast retransmitting lowest TSN
sctp: Correctly implement Fast Recovery cwnd manipulations.
sctp: Move sctp_v4_dst_saddr out of loop
sctp: retran_path update bug fix
tcp: fix skb vs fack_count out-of-sync condition
sunhme: Cleanup use of deprecated calls to save_and_cli and restore_flags.
xfrm: xfrm_algo: correct usage of RIPEMD-160
...
James Chapman [Wed, 4 Jun 2008 22:54:07 +0000 (15:54 -0700)]
l2tp: Fix possible oops if transmitting or receiving when tunnel goes down
Some problems have been experienced in the field which cause an oops
in the pppol2tp driver if L2TP tunnels fail while passing data.
The pppol2tp driver uses private data that is referenced via the
sk->sk_user_data of its UDP and PPPoL2TP sockets. This patch makes
sure that the driver uses sock_hold() when it holds a reference to the
sk pointer. This affects its sendmsg(), recvmsg(), getname(),
[gs]etsockopt() and ioctl() handlers.
Tested by ISP where problem was seen. System has been up 10 days with
no oops since running this patch. Without the patch, an oops would
occur every 1-2 days.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
skb_splice_bits temporary drops the socket lock while iterating over
the socket queue in order to break a reverse locking condition which
happens with sendfile. This, however, opens a window of opportunity
for tcp_collapse() to aggregate skbs and thus potentially free the
current skb used in skb_splice_bits and tcp_read_sock.
This patch fixes the problem by (re-)getting the same "logical skb"
after the lock has been temporary dropped.
Based on idea and initial patch from Evgeniy Polyakov.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Corked packet is allocated via sock_wmalloc which holds the owner socket,
so one should uncork it and flush all pending data on close. Do this in the
same way as in UDP.
Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 4 Jun 2008 22:07:32 +0000 (15:07 -0700)]
lt2p: Fix possible WARN_ON from socket code when UDP socket is closed
If an L2TP daemon closes a tunnel socket while packets are queued in
the tunnel's reorder queue, a kernel warning is logged because the
socket is closed while skbs are still referencing it. The fix is to
purge the queue in the socket's release handler.
Dan Williams [Thu, 29 May 2008 18:38:28 +0000 (14:38 -0400)]
ipw2200: expire and use oldest BSS on adhoc create
If there are no networks on the free list, expire the oldest one when
creating a new adhoc network. Because ipw2200 and the ieee80211 stack
don't actually cull old networks and place them back on the free list
unless they are needed for new probe responses, over time the free list
would become empty and creating an adhoc network would fail due to the !
list_empty(...) check.
Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Andrew Morton [Wed, 28 May 2008 19:40:39 +0000 (12:40 -0700)]
airo warning fix
WARNING: space prohibited between function name and open parenthesis '('
#22: FILE: drivers/net/wireless/airo.c:2907:
+ while ((IN4500 (ai, COMMAND) & COMMAND_BUSY) && (delay < 10000)) {
total: 0 errors, 1 warnings, 8 lines checked
./patches/wireless-airo-waitbusy-wont-delay.patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Dan Williams <dcbw@redhat.com> Cc: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Vlad Yasevich [Wed, 4 Jun 2008 19:40:15 +0000 (12:40 -0700)]
sctp: Fix ECN markings for IPv6
Commit e9df2e8fd8fbc95c57dbd1d33dada66c4627b44c ("[IPV6]: Use
appropriate sock tclass setting for routing lookup.") also changed the
way that ECN capable transports mark this capability in IPv6. As a
result, SCTP was not marking ECN capablity because the traffic class
was never set. This patch brings back the markings for IPv6 traffic.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 4 Jun 2008 19:39:36 +0000 (12:39 -0700)]
sctp: Flush the queue only once during fast retransmit.
When fast retransmit is triggered by a sack, we should flush the queue
only once so that only 1 retransmit happens. Also, since we could
potentially have non-fast-rtx chunks on the retransmit queue, we need
make sure any chunks eligable for fast retransmit are sent first
during fast retransmission.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 4 Jun 2008 19:39:11 +0000 (12:39 -0700)]
sctp: Start T3-RTX timer when fast retransmitting lowest TSN
When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 4 Jun 2008 19:38:43 +0000 (12:38 -0700)]
sctp: Correctly implement Fast Recovery cwnd manipulations.
Correctly keep track of Fast Recovery state and do not reduce
congestion window multiple times during sucht state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gui Jianfeng [Wed, 4 Jun 2008 19:38:07 +0000 (12:38 -0700)]
sctp: Move sctp_v4_dst_saddr out of loop
There's no need to execute sctp_v4_dst_saddr() for each
iteration, just move it out of loop.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gui Jianfeng [Wed, 4 Jun 2008 19:37:33 +0000 (12:37 -0700)]
sctp: retran_path update bug fix
If the current retran_path is the only active one, it should
update it to the the next inactive one.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Wed, 4 Jun 2008 19:07:44 +0000 (12:07 -0700)]
tcp: fix skb vs fack_count out-of-sync condition
This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
1) DSACK in the middle of previous SACK block must be generated.
2) In order to take that particular branch, part or all of the
DSACKed segment must already be SACKed so that we have that
in cache in the first place.
3) The new info must be top enough so that fackets_out will be
updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.
It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.
Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).
This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.
This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing &
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the usage of RIPEMD-160 in xfrm_algo which in turn
allows hmac(rmd160) to be used as authentication mechanism in IPsec
ESP and AH (see RFC 2857).
Signed-off-by: Adrian-Ken Rueegsegger <rueegsegger@swiss-it.ch> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 4 Jun 2008 11:49:07 +0000 (15:49 +0400)]
[IPV6]: inet_sk(sk)->cork.opt leak
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
[IPV6]: Check outgoing interface even if source address is unspecified.
The outgoing interface index (ipi6_ifindex) in IPV6_PKTINFO
ancillary data, is not checked if the source address (ipi6_addr)
is unspecified. If the ipi6_ifindex is the not-exist interface,
it should be fail.
Based on patch from Shan Wei <shanwei@cn.fujitsu.com> and
Brian Haley <brian.haley@hp.com>.
Yang Hongyang [Wed, 28 May 2008 08:27:28 +0000 (16:27 +0800)]
[IPV6]: Fix the data length of get destination options with short length
If get destination options with length which is not enough for that
option,getsockopt() will still return the real length of the option,
which is larger then the buffer space.
This is because ipv6_getsockopt_sticky() returns the real length of
the option.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Yang Hongyang [Wed, 28 May 2008 08:23:47 +0000 (16:23 +0800)]
[IPV6]: Fix the return value of get destination options with NULL data pointer
If we pass NULL data buffer to getsockopt(), it will return 0,
and the option length is set to -EFAULT:
getsockopt(sk, IPPROTO_IPV6, IPV6_DSTOPTS, NULL, &len);
This is because ipv6_getsockopt_sticky() will return -EFAULT or
-EINVAL if some error occur.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
[IPV6] ADDRCONF: Allow longer lifetime on 64bit archs.
- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
helper functions: addrconf_timeout_fixup() and
addrconf_finite_timeout().
Colin [Mon, 26 May 2008 16:04:43 +0000 (00:04 +0800)]
[IPV6] TUNNEL6: Fix incoming packet length check for inter-protocol tunnel.
I discover a strange behavior in [ipv4 in ipv6] tunnel. When IPv6 tunnel
payload is less than 40(0x28), packet can be sent to network, received in
physical interface, but not seen in IP tunnel interface. No counter increase
in tunnel interface.
Thomas Graf [Wed, 28 May 2008 14:54:22 +0000 (16:54 +0200)]
[IPV6] ADDRCONF: Check range of prefix length
As of now, the prefix length is not vaildated when adding or deleting
addresses. The value is passed directly into the inet6_ifaddr structure
and later passed on to memcmp() as length indicator which relies on
the value never to exceed 128 (bits).
Due to the missing check, the currently code allows for any 8 bit
value to be passed on as prefix length while using the netlink
interface, and any 32 bit value while using the ioctl interface.
[Use unsigned int instead to generate better code - yoshfuji]
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Commit 7cbca67c073263c179f605bdbbdc565ab29d801d ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.
Ilpo Järvinen [Wed, 4 Jun 2008 18:34:22 +0000 (11:34 -0700)]
tcp: Fix inconsistency source (CA_Open only when !tcp_left_out(tp))
It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.
I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).
This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 4 Jun 2008 17:35:03 +0000 (10:35 -0700)]
Fix uart_set_ldisc() function type
Commit 64e9159f5d2c4edf5fa6425031e556f8fddaf7e6 ("serial_core:
uart_set_ldisc infrastructure") introduced the ability for low-level
serial drivers to be informed when the tty ldisc changes.
However, the actual tty-layer function that does this callback for
serial devices was declared with the wrong type, having a spurious and
unused 'ldisc' argument.
This fixed the resulting compiler warning by just removing it.
According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.
Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Dong Wei [Wed, 4 Jun 2008 16:57:51 +0000 (09:57 -0700)]
netfilter: xt_connlimit: fix accouning when receive RST packet in ESTABLISHED state
In xt_connlimit match module, the counter of an IP is decreased when
the TCP packet is go through the chain with ip_conntrack state TW.
Well, it's very natural that the server and client close the socket
with FIN packet. But when the client/server close the socket with RST
packet(using so_linger), the counter for this connection still exsit.
The following patch can fix it which is based on linux-2.6.25.4
Signed-off-by: Dong Wei <dwei.zh@gmail.com> Acked-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 4 Jun 2008 16:15:51 +0000 (09:15 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stack
suspend-vs-iommu: prevent suspend if we could not resume
x86: section mismatch fix
x86: fix Xorg crash with xf86MapVidMem error
x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
x86: fix bad pmd ffff810000207xxx(9090909090909090)
x86: ioremap fix failing nesting check
x86: fix broken math-emu with lazy allocation of fpu area
x86: enable preemption in delay
x86: disable preemption in native_smp_prepare_cpus
x86: fix APIC warning on 32bit v2
Casey Schaufler [Mon, 2 Jun 2008 17:04:32 +0000 (10:04 -0700)]
Smack: fuse mount hang fix
The d_instantiate hook for Smack can hang on the root inode of a
filesystem if the file system code has not really done all the set-up.
Fuse is known to encounter this problem.
This change detects an attempt to instantiate a root inode and addresses
it early in the processing, before any attempt is made to do something
that might hang.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Tested-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 4 Jun 2008 15:36:56 +0000 (08:36 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata-sff: Fix oops reported in kerneloops.org for pnp devices with no ctl
libata: kill unused constants
sata_mv: PHY_MODE4 cleanups
[libata] ata_piix: more acer short cable quirks
[libata] ACPI: Properly handle bay devices in dock stations