Ilpo Järvinen [Thu, 22 Feb 2007 07:02:30 +0000 (23:02 -0800)]
[TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh
In case a latency spike causes more than one RTO, the later should not
cause the already reduced ssthresh to propagate into the prior_ssthresh
since FRTO declares all such RTOs spurious at once or none of them. In
treating of ssthresh, we mimic what tcp_enter_loss() does.
The previous state (in frto_counter) must be available until we have
checked it in tcp_enter_frto(), and also ACK information flag in
process_frto().
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Thu, 22 Feb 2007 07:01:36 +0000 (23:01 -0800)]
[TCP] FRTO: Comment cleanup & improvement
Moved comments out from the body of process_frto() to the head
(preferred way; see Documentation/CodingStyle). Bonus: it's much
easier to read in this compacted form.
FRTO algorithm and implementation is described in greater detail.
For interested reader, more information is available in RFC4138.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[PARPORT] SUNBPP: Fix OOPS when debugging is enabled.
[SPARC] openprom: Switch to ref counting PCI API
Andrew Morton [Wed, 25 Apr 2007 20:01:21 +0000 (13:01 -0700)]
packet: fix error handling
The packet driver is assuming (reasonably) that the (undocumented)
request.errors is an errno. But it is in fact some mysterious bitfield. When
things go wrong we return weird positive numbers to the VFS as pointers and it
goes oops.
Thanks to William Heimbigner for reporting and diagnosis.
(It doesn't oops, but this driver still doesn't work for William)
Cc: William Heimbigner <icxcnika@mar.tar.cc> Cc: Peter Osterlund <petero2@telia.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel,
which resulted in infinite recursion and stack overflow.
The bug is present in all kernel versions since the feature appeared.
The patch also makes some minimal cleanup:
1. Return something consistent (-ENOENT) when fib table is missing
2. Do not crash when queue is empty (does not happen, but yet)
3. Put result of lookup
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of the resulting
oops is seen here:
http://lkml.org/lkml/2007/4/15/41
Neil correctly diagnosed the situation for how this can happen: if two
concurrent requests with the exact same sector number (due to direct IO
or aliasing between MD and the raw device access), the alias handling
will add the request to the sortlist, but next_rq remains NULL.
Read the more complete analysis at:
http://lkml.org/lkml/2007/4/25/57
This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).
The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.
A security issue is emerging. Disallow Routing Header Type 0 by default
as we have been doing for IPv4.
Note: We allow RH2 by default because it is harmless.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Tue, 24 Apr 2007 16:51:03 +0000 (12:51 -0400)]
drivers/net/hamradio/baycom_ser_fdx build fix
sparc64:
drivers/net/hamradio/baycom_ser_fdx.c: In function `ser12_open':
drivers/net/hamradio/baycom_ser_fdx.c:417: error: `NR_IRQS' undeclared (first us
e in this function)
drivers/net/hamradio/baycom_ser_fdx.c:417: error: (Each undeclared identifier is
reported only once
drivers/net/hamradio/baycom_ser_fdx.c:417: error: for each function it appears i
n.)
Cc: Folkert van Heusden <folkert@vanheusden.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Dan Williams [Tue, 24 Apr 2007 14:20:06 +0000 (10:20 -0400)]
usb-net/pegasus: fix pegasus carrier detection
Broken by 4a1728a28a193aa388900714bbb1f375e08a6d8e which switched the
return semantics of read_mii_word() but didn't fix usage of
read_mii_word() to conform to the new semantics.
Setting carrier to off based on the NO_CARRIER flag is also incorrect as
that flag only triggers on TX failure and therefore isn't correct when
no frames are being transmitted. Since there is already a 2*HZ MII
carrier check going on, defer to that.
Add a TRUST_LINK_STATUS feature flag for adapters where the LINK_STATUS
flag is actually correct, and use that rather than the NO_CARRIER flag.
Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Neil Horman [Fri, 20 Apr 2007 13:54:58 +0000 (09:54 -0400)]
sis900: Allocate rx replacement buffer before rx operation
The sis900 driver appears to have a bug in which the receive routine
passes the skbuff holding the received frame to the network stack before
refilling the buffer in the rx ring. If a new skbuff cannot be allocated, the
driver simply leaves a hole in the rx ring, which causes the driver to stop
receiving frames and become non-recoverable without an rmmod/insmod according to
reporters. This patch reverses that order, attempting to allocate a replacement
buffer first, and receiving the new frame only if one can be allocated. If no
skbuff can be allocated, the current skbuf in the rx ring is recycled, dropping
the current frame, but keeping the NIC operational.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
The following patch fixes a kernel bug in depca_platform_probe().
We don't use a dynamic pointer for pldev->dev.platform_data, so it seems
that the correct way to proceed if platform_device_add(pldev) fails is
to explicitly set the pldev->dev.platform_data pointer to NULL, before
calling the platform_device_put(pldev), or it will be kfree'ed by
platform_device_release().
8250: fix possible deadlock between serial8250_handle_port() and serial8250_interrupt()
Commit 40b36daa introduced possibility that serial8250_backup_timeout() ->
serial8250_handle_port() locks port.lock without disabling irqs, thus
allowing deadlock against interrupt handler (port.lock is acquired in
serial8250_interrupt()).
Spotted by lockdep.
Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Mon, 23 Apr 2007 21:41:17 +0000 (14:41 -0700)]
reiserfs: fix xattr root locking/refcount bug
The listxattr() and getxattr() operations are only protected by a read
lock. As a result, if either of these operations run in parallel, a race
condition exists where the xattr_root will end up being cached twice, which
results in the leaking of a reference and a BUG() on umount.
This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(),
into one get_xa_root() function that takes the appropriate locking around
the entire critical section.
Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it>
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Andrea Righi <a.righi@cineca.it> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Edward Shishkin <edward@namesys.com> Cc: Alex Zarochentsev <zam@namesys.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commit 34f5a39899f3f3e815da64f48ddb72942d86c366 restricted reading
of the tainted value. The attached patch changes this back to a
write-only check and restores the read behaviour of older versions.
Andrew Morton [Mon, 23 Apr 2007 21:41:13 +0000 (14:41 -0700)]
acpi-thermal: fix mod_timer() interval
Use relative time, not absolute. Discovered by Jung-Ik (John) Lee
<jilee@google.com>.
Cc: Jung-Ik (John) Lee <jilee@google.com> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
v9fs_insert uses v9fs_fid_lookup (which also locks the fid) to get the
primary fid associated with the dentry and destroys the v9fs_fid struct
after removing the file. If another process called v9fs_fid_lookup on the
same dentry, it may wait undefinitely for the fid's lock (as the struct is
freed).
This patch changes v9fs_remove to use a cloned fid, so the primary fid is
not locked and freed.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@hera.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefan Richter [Mon, 23 Apr 2007 21:41:10 +0000 (14:41 -0700)]
ieee1394: update MAINTAINERS database
- update Ben's address
- replace Ben's contact by mine as raw1394's 2nd contact
- eth1394's and pcilynx's maintenance doesn't really differ from that
of other parts of the stack like video1394
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Ben Collins <ben.collins@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NR_FILE_PAGES must be accounted for depending on the zone that the page
belongs to. If we replace the page in the radix tree then we may have to
shift the count to another zone.
Suggested-by: Ethan Solomita <solo@google.com> Eventually-typed-in-by: Christoph Lameter <clameter@sgi.com> Cc: Martin Bligh <mbligh@mbligh.org> Cc: <stable@kernel.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Taskstats fix the structure members alignment issue
We broke the the alignment of members of taskstats to the 8 byte boundary
with the CSA patches. In the current kernel, the taskstats structure is
not suitable for use by 32 bit applications in a 64 bit kernel.
On x86_64
Offsets of taskstats' members (64 bit kernel, 64 bit application)
This is one way to solve the problem without re-arranging structure members
is to pack the structure. The patch adds an __attribute__((aligned(8))) to
the taskstats structure members so that 32 bit applications using taskstats
can work with a 64 bit kernel.
Using __attribute__((packed)) would break the 64 bit alignment of members.
The fix was tested on x86_64. After the fix, we got
Offsets of taskstats' members (64 bit kernel, 64 bit application)
fix OOM killing processes wrongly thought MPOL_BIND
I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)". memhog is killed correctly once we initialize nodemask in
constrained_alloc().
Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: William Irwin <bill.irwin@oracle.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix possible NULL pointer access in 8250 serial driver
I encountered the following kernel panic. The cause of this problem was
NULL pointer access in check_modem_status() in 8250.c. I confirmed this
problem is fixed by the attached patch, but I don't know this is the
correct fix.
Fix the possible NULL pointer access in check_modem_status() in 8250.c. The
check_modem_status() would access 'info' member of uart_port structure, but it
is not initialized before uart_open() is called. The check_modem_status() can
be called through /proc/tty/driver/serial before uart_open() is called.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Taku Izumi <izumi2005@soft.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 24 Apr 2007 04:36:13 +0000 (21:36 -0700)]
oom: kill all threads that share mm with killed task
oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.
noreplacement is dangerous on modern systems because it will not replace the
context switch FNSAVE with SSE aware FXSAVE. But other places in the kernel still assume
SSE and do FXSAVE and the CPU will then access FXSAVE information with
FNSAVE and cause corruption.
Easiest way to avoid this is to remove the option. It was mostly for paranoia
reasons anyways and alternative()s have been stable for some time.
Thanks to Jeremy F. for reporting and helping debug it.
Joachim Deguara [Tue, 24 Apr 2007 11:05:36 +0000 (13:05 +0200)]
[PATCH] x86-64: make GART PTEs uncacheable
This patches fixes the silent data corruption problems being seen using the
GART iommu where 4kB of data where incorrect (seen mostly on Nvidia CK804
systems). This fix, to mark the memory regin the GART PTEs reside on as
uncacheable, also brings the code in line with the AGP specification.
Signed-off-by: Joachim Deguara <joachim.deguara@amd.com> Signed-off-by: Andi Kleen <ak@suse.de>
Patrick McHardy [Tue, 24 Apr 2007 05:39:02 +0000 (22:39 -0700)]
[XFRM]: beet: fix pseudo header length value
draft-nikander-esp-beet-mode-07.txt is not entirely clear on how the length
value of the pseudo header should be calculated, it states "The Header Length
field contains the length of the pseudo header, IPv4 options, and padding in
8 octets units.", but also states "Length in octets (Header Len + 1) * 8".
draft-nikander-esp-beet-mode-08-pre1.txt [1] clarifies this, the header length
should not include the first 8 byte.
This change affects backwards compatibility, but option encapsulation didn't
work until very recently anyway.
Change to defer congestion control initialization.
If setsockopt() was used to change TCP_CONGESTION before
connection is established, then protocols that use sequence numbers
to keep track of one RTT interval (vegas, illinois, ...) get confused.
Change the init hook to be called after handshake.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6:
ide/Kconfig: add missing range check for IDE_MAX_HWIFS
hpt366: fix kernel oops with HPT302N
ide/pci/delkin_cb.c: add new PCI ID
Fix a regression due to the patch "NFS: disconnect before retrying NFSv4
requests over TCP"
The assumption made in xprt_transmit() that the condition
"req->rq_bytes_sent == 0 and request is on the receive list"
should imply that we're dealing with a retransmission is false.
Firstly, it may simply happen that the socket send queue was full
at the time the request was initially sent through xprt_transmit().
Secondly, doing this for each request that was retransmitted implies
that we disconnect and reconnect for _every_ request that happened to
be retransmitted irrespective of whether or not a disconnection has
already occurred.
Fix is to move this logic into the call_status request timeout handler.
NFS: Fix the 'desynchronized value of nfs_i.ncommit' error
Redirtying a request that is already marked for commit will screw up the
accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit.
Ensure that all requests on the commit queue are labelled with the
PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside
nfs_page_mark_flush().
Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for
atomicity reasons. Avoid dropping the spinlock until we're done marking the
request in the radix tree and have added it to the ->dirty list.
Dave Jones [Fri, 20 Apr 2007 19:58:00 +0000 (15:58 -0400)]
Longhaul - Revert ACPI C3 on Longhaul ver. 2
Support for Longhaul ver. 2 broke driver for VIA C3 Eden 600MHz with
Samuel 2 core. Processor is not able to switch frequency anymore. I
don't know much about this issue at the moment, but until (if ever) I
will know why, this part should be reversed.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have a 10-15% performance regression for sequential writes on TCQ/NCQ
enabled drives in 2.6.21-rcX after the CFQ update went in. It has been
reported by Valerie Clement <valerie.clement@bull.net> and the Intel
testing folks. The regression is because of CFQ's now more aggressive
queue control, limiting the depth available to the device.
This patches fixes that regression by allowing a greater depth when only
one queue is busy. It has been tested to not impact sync-vs-async
workloads too much - we still do a lot better than 2.6.20.
Dave Johnson [Wed, 18 Apr 2007 14:39:41 +0000 (10:39 -0400)]
[MIPS] Fix wrong checksum for split TCP packets on 64-bit MIPS
I've traced down an off-by-one TCP checksum calculation error under
the following conditions:
1) The TCP code needs to split a full-sized packet due to a reduced
MSS (typically due to the addition of TCP options mid-stream like
SACK).
_AND_
2) The checksum of the 2nd fragment is larger than the checksum of the
original packet. After subtraction this results in a checksum for
the 1st fragment with bits 16..31 set to 1. (this is ok)
_AND_
3) The checksum of the 1st fragment's TCP header plus the previously
32bit checksum of the 1st fragment DOES NOT cause a 32bit overflow
when added together. This results in a checksum of the TCP header
plus TCP data that still has the upper 16 bits as 1's.
_THEN_
4) The TCP+data checksum is added to the checksum of the pseudo IP
header with csum_tcpudp_nofold() incorrectly (the bug).
The problem is the checksum of the TCP+data is passed to
csum_tcpudp_nofold() as an 32bit unsigned value, however the assembly
code acts on it as if it is a 64bit unsigned value.
This causes an incorrect 32->64bit extension if the sum has bit 31
set. The resulting checksum is off by one.
This problems is data and TCP header dependent due to #2 and #3
above so it doesn't occur on every TCP packet split.
Signed-off-by: Dave Johnson <djohnson+linux-mips@sw.starentnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
With commit 63dc68a8cf60cb110b147dab1704d990808b39e2, kernel can not
handle BUG() and BUG_ON() properly since get_user() returns false for
kernel code. Use __get_user() to skip unnecessary access_ok(). This
patch also make BRK_BUG code encoded in the TNE instruction.
[MIPS] Retry {save,restore}_fp_context if failed in atomic context.
The save_fp_context()/restore_fp_context() might sleep on accessing
user stack and therefore might lose FPU ownership in middle of them.
If these function failed due to "in_atomic" test in do_page_fault,
touch the sigcontext area in non-atomic context and retry these
save/restore operation.
This is a replacement of a (broken) fix which was titled "Allow CpU
exception in kernel partially".
The commit 4d40bff7110e9e1a97ff8c01bdd6350e9867cc10 ("Allow CpU
exception in kernel partially") was broken. The commit was to fix
theoretical problem but broke usual case. Revert it for now.
Mark Mason [Fri, 13 Apr 2007 17:32:25 +0000 (10:32 -0700)]
[MIPS] Add missing silicon revisions for BCM112x
Recent versions of the BCM112X processors aren't recognized by Linux
(preventing Linux from booting on those processors). This patch adds
support for those that are missing.
Signed-off-by: Mark Mason <mason@broadcom.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
but it's likely that the commit itself is not really introducing the
bug, but just allowing an unrelated problem to rear its ugly head (ie
one current working theory is that the code exposes us to a hardware
race condition by decreasing the amount of time we spend in each NAPI
poll cycle).
We'll revert it until root cause is known. Intel has a repeatable
reproduction on two different machines and bus traces of the hardware
doing something bad.
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: David S. Miller <davem@davemloft.net> Cc: Greg KH <gregkh@suse.de> Cc: Dave Jones <davej@redhat.com> Cc: Auke Kok <auke-jan.h.kok@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Thu, 19 Apr 2007 10:09:52 +0000 (11:09 +0100)]
pata_sis: Fix oops on boot
A small number of SiS setups require special handling (not many judging
by how long this dumb bug survived). A couple of Fedora 7 devel testers
hit an Oops on pata_sis loading which is caused by terminal confusion
between chipset as 'the chipset we have found' and chipset as 'array
iterator'
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Paul Mackerras [Thu, 19 Apr 2007 20:05:52 +0000 (13:05 -0700)]
[PPP]: Fix skbuff.c:BUG due incorrect logic in process_input_packet()
From: Paul Mackerras <paulus@samba.org>
This fixes:
Subject: kernel BUG at net/core/skbuff.c in linux-2.6.21-rc6
process_input_packet() treats the case where the first byte is 0xff
(PPP_ALLSTATIONS) but the second byte is 0x03 (PPP_UI) as indicating a
packet with a PPP protocol number of 0xff. Arguably that's wrong
since PPP protocol 0xff is reserved, and the RFC does envision the
possibility of receiving frames where the control field has values
other than 0x03.
Signed-off-by: David S. Miller <davem@davemloft.net>
The Yukon EC Ultra chips have transmit settings for store and
forward and PCI buffering. By setting these appropriately, normal
performance goes from 750Mbytes/sec to 940Mbytes/sec (non-jumbo).
It is also possible to do Jumbo mode, but it means turning off
TSO and checksum offload so the performance gets worse. There isn't
enough buffering for checksum offload to work.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
This device is having all sorts of problems that lead to data corruption
and system instability. It gets receive status and data out of order,
it generates descriptor and TSO errors, etc.
Until the problems are resolved, it should not be used by anyone
who cares about there system.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
The basic structure of "normal" UDP/IP/Ethernet
frames (that actually work):
- It starts with the Ethernet header (dest MAC, src MAC, etc.)
- The next part is occupied by the IP header (version info, length of
packet, id=0, fragment offset=0, checksum, from / to address, etc.)
- Then comes the UDP header (src / dest port, length, checksum)
- Actual payload
- Ethernet checksum
Now what's different for IP fragment:
- The IP header has id set to some value (same for all fragments),
offset is set appropriately (i.e. 0 for first fragment, following
according to size of other fragments), size is the length of the frame.
- UDP header is unchanged. I.e. length is according to full UDP
datagram, not just the part within the actual frame! But this is only
true within the first frame: all following frames don't have a valid
UDP-header at all.
The spidernet silicon seems to be quite intelligent: It's able to
compute (IP / UDP / Ethernet) checksums on the fly and tests if frames
are conforming to RFC -- at least conforming to RFC on complete frames.
But IP fragments are different as explained above:
I.e. for IP fragments containing part of a UDP datagram it sees
incompatible length in the headers for IP and UDP in the first frame
and, thus, skips this frame. But the content *is* correct for IP
fragments. For all following frames it finds (most probably) no valid
UDP header at all. But this *is* also correct for IP fragments.
The Linux IP-stack seems to be clever in this point. It expects the
spidernet to calculate the checksum (since the module claims to be able
to do so) and marks the skb's for "normal" frames accordingly
(ip_summed set to CHECKSUM_HW).
But for the IP fragments it does not expect the driver to be capable to
handle the frames appropriately. Thus all checksums are allready
computed. This is also flaged within the skb (ip_summed set to
CHECKSUM_NONE).
Unfortunately the spidernet driver ignores that hints. It tries to send
the IP fragments of UDP datagrams as normal UDP/IP frames. Since they
have different structure the silicon detects them the be not
"well-formed" and skips them.
The following one-liner against 2.6.21-rc2 changes this behavior. If the
IP-stack claims to have done the checksumming, the driver should not
try to checksum (and analyze) the frame but send it as is.
Signed-off-by: Norbert Eicker <n.eicker@fz-juelich.de> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Divy Le Ray [Tue, 17 Apr 2007 18:06:30 +0000 (11:06 -0700)]
cxgb3 - Fix low memory conditions
Reuse the incoming skb when a clientless abort req is recieved.
The release of RDMA connections HW resources might be deferred in
low memory situations.
Ensure that no further activity is passed up to the RDMA driver
for these connections.
Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Avi Kivity [Wed, 18 Apr 2007 08:18:18 +0000 (11:18 +0300)]
KVM: Fix off-by-one when writing to a nonpae guest pde
Nonpae guest pdes are shadowed by two pae ptes, so we double the offset
twice: once to account for the pte size difference, and once because we
need to shadow pdes for a single guest pde.
But when writing to the upper guest pde we also need to truncate the
lower bits, otherwise the multiply shifts these bits into the pde index
and causes an access to the wrong shadow pde. If we're at the end of the
page (accessing the very last guest pde) we can even overflow into the
next host page and oops.
[NETLINK]: Don't attach callback to a going-away netlink socket
There is a race between netlink_dump_start() and netlink_release()
that can lead to the situation when a netlink socket with non-zero
callback is freed.
spin_lock(&nlk->cb_lock);
if (nlk->cb) { /* false */
...
}
spin_unlock(&nlk->cb_lock);
spin_lock(&nlk->cb_lock);
if (nlk->cb) { /* false */
...
}
nlk->cb = cb;
spin_unlock(&nlk->cb_lock);
...
sock_orphan(sk);
/*
* proceed with releasing
* the socket
*/
The proposal it to make sock_orphan before detaching the callback
in netlink_release() and to check for the sock to be SOCK_DEAD in
netlink_dump_start() before setting a new callback.
Signed-off-by: Denis Lunev <den@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Olaf Kirch [Wed, 18 Apr 2007 22:07:22 +0000 (15:07 -0700)]
[IrDA]: Correctly handling socket error
This patch fixes an oops first reported in mid 2006 - see
http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that
when an error is signalled on the socket, irda_recvmsg_stream returns
without removing a local wait_queue variable from the socket's sk_sleep
queue. This causes havoc further down the road.
In response to this problem, a patch was made that invoked sock_orphan on
the socket when receiving a disconnect indication. This is not a good fix,
as this sets sk_sleep to NULL, causing applications sleeping in recvmsg
(and other places) to oops.
This is against the latest net-2.6 and should be considered for -stable
inclusion.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[SCTP]: Do not interleave non-fragments when in partial delivery
The way partial delivery is currently implemnted, it is possible to
intereleave a message (either from another steram, or unordered) that
is not part of partial delivery process. The only way to this is for
a message to not be a fragment and be 'in order' or unorderd for a
given stream. This will result in bypassing the reassembly/ordering
queues where things live duing partial delivery, and the
message will be delivered to the socket in the middle of partial delivery.
This is a two-fold problem, in that:
1. the app now must check the stream-id and flags which it may not
be doing.
2. this clearing partial delivery state from the association and results
in ulp hanging.
This patch is a band-aid over a much bigger problem in that we
don't do stream interleave.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[BRIDGE]: Unaligned access when comparing ethernet addresses
[SCTP]: Unmap v4mapped addresses during SCTP_BINDX_REM_ADDR operation.
[SCTP]: Fix assertion (!atomic_read(&sk->sk_rmem_alloc)) failed message
[NET]: Set a separate lockdep class for neighbour table's proxy_queue
[NET]: Fix UDP checksum issue in net poll mode.
[KEY]: Fix conversion between IPSEC_MODE_xxx and XFRM_MODE_xxx.
[NET]: Get rid of alloc_skb_from_cache