It may take a little staring to notice, but pte can actually fall off the
end of the pte page in this iteration, which makes life difficult for
kmap_atomic() and the users not expecting it to BUG(). Of course, we're
somewhat lucky in that arithmetic elsewhere in the function guarantees that
at least one iteration is made, lest this force larger rearrangements to be
made. This issue and patch also apply to non-mm mainline and with trivial
adjustments, at least two related kernels.
Discovered during internal testing at Oracle.
Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kirill Korotaev [Wed, 25 May 2005 02:29:47 +0000 (19:29 -0700)]
[PATCH] sigkill priority fix
If SIGKILL does not have priority, we cannot instantly kill task before it
makes some unexpected job. It can be critical, but we were unable to
reproduce this easily until Heiko Carstens <Heiko.Carstens@de.ibm.com>
reported this problem on LKML.
Dominik Hackl [Wed, 25 May 2005 02:29:46 +0000 (19:29 -0700)]
[PATCH] voyager_smp.c static inline fix
This patch fixes a compile bug by moving a static inline function to the
right place. The body of a static inline function has to be declared
before the use of this function.
Signed-off-by: Dominik Hackl <dominik@hackl.dhs.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
John W. Linville [Wed, 18 May 2005 17:41:33 +0000 (13:41 -0400)]
[PATCH] tulip: add return to ULI526X clause in tulip_mdio_write
The 'if' clause for ULI526X in tulip_mdio_write allows for
spin_unlock_irqrestore to be called twice for tp->mii_lock. I believe
this is caused by the unintentional omission of a return at the end
of that clause. This patch adds that return.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Kay Sievers [Mon, 23 May 2005 22:50:26 +0000 (15:50 -0700)]
[PATCH] driver core: restore event order for device_add()
As a result of the split of the kobject-registration and the
corresponding hotplug event, the order of events for device_add() has
changed. This restores the old order, cause it confused some userspace
applications.
Herbert Xu [Mon, 23 May 2005 20:11:07 +0000 (13:11 -0700)]
[IPV6]: Fix xfrm tunnel oops with large packets
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 23 May 2005 19:36:25 +0000 (12:36 -0700)]
[CRYPTO]: Only reschedule if !in_atomic()
The netlink gfp_any() problem made me double-check the uses of in_softirq()
in crypto/*. It seems to me that we should be checking in_atomic() instead
of in_softirq() in crypto_yield. Otherwise people calling the crypto ops
with spin locks held or preemption disabled will get burnt, right?
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 23 May 2005 19:03:06 +0000 (12:03 -0700)]
[TCP]: Fix stretch ACK performance killer when doing ucopy.
When we are doing ucopy, we try to defer the ACK generation to
cleanup_rbuf(). This works most of the time very well, but if the
ucopy prequeue is large, this ACKing behavior kills performance.
With TSO, it is possible to fill the prequeue so large that by the
time the ACK is sent and gets back to the sender, most of the window
has emptied of data and performance suffers significantly.
This behavior does help in some cases, so we should think about
re-enabling this trick in the future, using some kind of limit in
order to avoid the bug case.
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware sync of the timebase on SMP G5s uses a black magic
incantation to the i2c clock chip that was inspired from what Darwin
does.
However, this was an earlier version of Darwin that was ... buggy !
heh. This causes the latest models to break though when starting SMP,
so it's worth fixing.
Here's a new version of the incantation based on careful transcription
of the said incantations as found in the latest version of apple's
temple.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The latest speedbumped Apple G5 models have a "bug" in the Open Firmware
device tree that lacks the proper interrupt routing information for the
northbridge i2c controller. Apple's driver silently falls back into a
sub-optimal "polled" mode (heh, maybe they didn't even notice the bug
because of that :), our driver didn't properly check and crashes :(
This patch fixes our driver to not crash, and adds code to the
prom_init() OF trampoline code that detects the "bug" and adds the
missing information back for this chipset revision. This fixes booting
and thermal control on these models.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] fix for __generic_file_aio_read() to return 0 on EOF
I came across the following problem while running ltp-aiodio testcases from
ltp-full-20050405 on linux-2.6.12-rc3-mm3. I tried running the tests with
EXT3 as well as JFS filesystems.
One or two fsx-linux testcases were hung after some time. These testcases
were hanging at wait_for_all_aios().
Debugging shows that there were some iocbs which were not getting completed
eventhough the last retry for those returned -EIOCBQUEUED. Also all such
pending iocbs represented READ operation.
Further debugging revealed that all such iocbs hit EOF in the DIO layer.
To be more precise, the "pos" from which they were trying to read was
greater than the "size" of the file. So the generic_file_direct_IO
returned 0.
This happens rarely as there is already a check in
__generic_file_aio_read(), for whether "pos" < "size" before calling direct
IO routine.
But for READ, we are taking the inode->i_sem only in the DIO layer. So it
is possible that some other process can change the size of the file before
we take the i_sem. In such a case ( when "pos" > "size"), the
__generic_file_aio_read() would return -EIOCBQUEUED even though there were
no I/O requests submitted by the DIO layer. This would cause the AIO layer
to expect aio_complete() for THE iocb, which doesnot happen. And thus the
test hangs forever, waiting for an I/O completion, where there are no
requests submitted at all.
The following patch makes __generic_file_aio_read() return 0 (instead of
returning -EIOCBQUEUED), on getting 0 from generic_file_direct_IO(), so
that the AIO layer does the aio_complete().
Testing:
I have tested the patch on a SMP machine(with 2 Pentium 4 (HT)) running
linux-2.6.12-rc3-mm3. I ran the ltp-aiodio testcases and none of the
fsx-linux tests hung. Also the aio-stress tests ran without any problem.
Signed-off-by: Suzuki K P <suzuki@in.ibm.com> Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a bug introduced by Al Viro's patch: [patch 136/174]
reiserfs endianness: clone struct reiserfs_key
The problem is MAX_KEY and MAX_IN_CORE_KEY defined in this patch do not
look equal from reiserfs comp_key's point of view. This caused reiserfs'
sanity check to complain.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Samuel Thibault [Sat, 21 May 2005 15:50:15 +0000 (17:50 +0200)]
[PATCH] spin_unlock_bh() and preempt_check_resched()
In _spin_unlock_bh(lock):
do { \
_raw_spin_unlock(lock); \
preempt_enable(); \
local_bh_enable(); \
__release(lock); \
} while (0)
there is no reason for using preempt_enable() instead of a simple
preempt_enable_no_resched()
Since we know bottom halves are disabled, preempt_schedule() will always
return at once (preempt_count!=0), and hence preempt_check_resched() is
useless here...
This fixes it by using "preempt_enable_no_resched()" instead of the
"preempt_enable()", and thus avoids the useless preempt_check_resched()
just before re-enabling bottom halves.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Woodhouse [Sat, 21 May 2005 14:52:23 +0000 (15:52 +0100)]
When we detect that a 16550 was in fact part of a NatSemi SuperIO chip
with high-speed mode enabled, we switch it to high-speed mode so that
baud_base becomes 921600. However, we also need to multiply the baud
divisor by 8 at the same time, in case it's already in use as a console.
Signed-off-by: David Woodhouse Acked-by: Tom Rini Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pierre Ossman [Sat, 21 May 2005 09:27:02 +0000 (10:27 +0100)]
[PATCH] MMC: Proper MMC command classes support
Defines for the different command classes as defined in the MMC and SD
specifications.
Removes the check for high command classes and instead checks that the
command classes needed are present.
Previous solution killed forward compatibility at no apparent gain.
Andi Kleen [Fri, 20 May 2005 21:27:58 +0000 (14:27 -0700)]
[PATCH] x86_64: Fix 32bit system call restart
The test case at
http://cvs.sourceforge.net/viewcvs.py/posixtest/posixtestsuite/conforman
ce/interfaces/clock_nanosleep/1-5.c fails if it runs as a 32bit process on
x86_86 machines.
The root cause is the sub 32bit process fails to restart the syscall after it
is interrupted by a signal.
The syscall number of sys_restart_syscall in table sys_call_table is
__NR_restart_syscall (219) while it's __NR_ia32_restart_syscall
(0) in ia32_sys_call_table. When regs->rax==(unsigned
long)-ERESTART_RESTARTBLOCK, function do_signal doesn't distinguish if
the process is 64bit or 32bit, and always sets restart syscall number
as __NR_restart_syscall (219).
Andi Kleen [Fri, 20 May 2005 21:27:56 +0000 (14:27 -0700)]
[PATCH] x86_64: Don't allow accesses below register frame in ptrace
There was a "off by one quad word" error in there. I don't think it is
exploitable because it will only store into a unused area, but better to plug
it.
Andi Kleen [Fri, 20 May 2005 21:27:55 +0000 (14:27 -0700)]
[PATCH] x86_64: 386/x86-64 Further AMD dual core fixes
- Remove duplicated ifdef
- Make core_id match what Intel uses
- Initialize phys_proc_id correctly for non DC case
- Handle non power of two core numbers.
Paul Jackson [Fri, 20 May 2005 20:59:15 +0000 (13:59 -0700)]
[PATCH] cpusets+hotplug+preepmt broken
This patch removes the entwining of cpusets and hotplug code in the "No
more Mr. Nice Guy" case of sched.c move_task_off_dead_cpu().
Since the hotplug code is holding a spinlock at this point, we cannot take
the cpuset semaphore, cpuset_sem, as would seem to be required either to
update the tasks cpuset, or to scan up the nested cpuset chain, looking for
the nearest cpuset ancestor that still has some CPUs that are online. So
we just punt and blast the tasks cpus_allowed with all bits allowed.
This reverts these lines of code to what they were before the cpuset patch.
And it updates the cpuset Doc file, to match.
The one known alternative to this that seems to work came from Dinakar
Guniguntala, and required the hotplug code to take the cpuset_sem semaphore
much earlier in its processing. So far as we know, the increased locking
entanglement between cpusets and hot plug of this alternative approach is
not worth doing in this case.
Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Nathan Lynch <ntl@pobox.com> Acked-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matt Porter [Fri, 20 May 2005 20:59:14 +0000 (13:59 -0700)]
[PATCH] ppc32: fix CONFIG_TASK_SIZE handling on 44x
This patch fixed CONFIG_TASK_SIZE handling on 44x. Currently head_44x.S
hardcodes 0x80000000, which breaks if user chooses to change TASK_SIZE
(e.g. for 3G user-space). Tested on Ocotea in 3G/1G configuration.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net> Signed-off-by: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kumar Gala [Fri, 20 May 2005 20:59:13 +0000 (13:59 -0700)]
[PATCH] ppc32: Fix platform device initialization of 8250 serial ports
Initialization of 8250 serial ports that are platform devices require that
at empty entry exists in the array of plat_serial8250_port. With out an
empty entry we can get some pretty random behavior.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 20 May 2005 20:59:12 +0000 (13:59 -0700)]
[PATCH] uml: Change printf to printk in console driver
From: Al Viro - we have error messages with KERN_ERR in them, so they
should be printk-ed rather than printf-ed.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 20 May 2005 20:59:12 +0000 (13:59 -0700)]
[PATCH] uml: fixrange_init 3-level page table support
From: Al Viro - add three-level page table support to fixrange_init.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 20 May 2005 20:59:11 +0000 (13:59 -0700)]
[PATCH] uml: Remove ubd-mmap support
Finally rip out the ubd-mmap code, which turned out to be broken by design.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 20 May 2005 20:59:10 +0000 (13:59 -0700)]
[PATCH] uml: Export clear_user_*
From: Oleg Drokin: This patch is needed to support kernel modules that want to
use clear_user() (that is exported symbol on all other architectures).
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 20 May 2005 20:59:09 +0000 (13:59 -0700)]
[PATCH] uml: multicast driver cleanup
Byte-swapping of the port and IP address passed in to the multicast driver by
the user used to happen in different places, which was a bug in itself. The
port also was swapped before being printk-ed, which led to a misleading
message. This patch moves the port swapping to the same place as the IP
address swapping. It also cleans up the error paths of mcast_open.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 20 May 2005 20:59:08 +0000 (13:59 -0700)]
[PATCH] uml: Delay loop cleanups
This patch cleans up the delay implementations a bit, makes the loops
unoptimizable, and exports __udelay and __const_udelay.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 20 May 2005 20:59:08 +0000 (13:59 -0700)]
[PATCH] uml: Page fault fixes
Any access to a PROT_NONE page should segfault the process. A JVM seems to do
this on purpose. Also, Al noticed some bogus code, which is now deleted.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 20 May 2005 20:59:07 +0000 (13:59 -0700)]
[PATCH] uml: small fixes left over from rc4
Some changes that I sent in didn't make 2.6.12-rc4 for some reason. This
adds them back. We have
an x86_64 definition of TOP_ADDR
a reimplementation of the x86_64 csum_partial_copy_from_user
some syntax fixes in arch/um/kernel/ptrace.c
removal of a CFLAGS definition in the x86_64 Makefile
some include changes in the x86_64 ptrace.c and user-offsets.h
a syntax fix in elf-x86_64.h
Also moved an include in the i386 and x86_64 Makefiles to make the symlinks
work, and some small fixes from Al Viro.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Peter Osterlund [Fri, 20 May 2005 20:59:06 +0000 (13:59 -0700)]
[PATCH] packet driver permission checking fix
If you tried to open a packet device first in read-only mode and then a
second time in read-write mode, the second open succeeded even though the
device was not correctly set up for writing. If you then tried to write
data to the device, the writes would fail with I/O errors.
This patch prevents that problem by making the second open fail with
-EBUSY.
Signed-off-by: Peter Osterlund <petero2@telia.com> Cc: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Bottomley [Fri, 20 May 2005 02:30:13 +0000 (21:30 -0500)]
[SCSI] aic7xxx: fix U160 mode
The new period/dt setting routines don't get the coupling of these
parameters correct. This means that Domain Validation never gets DT
set, and thus the drive gets restricted to U80.
Fix this by restoring the couplings in the set routines.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
[SCSI] aic7xxx: make correct use of slave_alloc/destroy and remove the per device timer
The allocation of all of our components should be done in slave alloc.
Currently it's rather fancifully refcounted in the queuecommand
callback. This patch moves allocation and destroy to their correct
places in slave_alloc/slave_destory. Now we can guarantee that
everywhere a device is requested, it's actually been allocated, so don't
check for this anymore.
Additionally, the per device busy timer was the only source of potential
use after free. It's been deleted because Linux does the correct thing
with busy returns, so there's no need to implement a separate timer in
the driver.
Finally, implement code that forces all the device parameters to zero
(i.e. async and narrow) in the slave alloc, inform the spi class of the
bios recorded maximums and wait until slave configure before trying
anything more adventurous.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This should finish the spurious queue removal from aic7xxx (there are
other queues that are probably unnecessary, but at least the major and
obviously unnecessary ones are done with).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The aic7xxx driver has two spurious queues in it's linux glue code: the
busyq which queues incoming commands to the driver and the completeq
which queues finished commands before sending them back to the mid-layer
This patch just removes the busyq and makes the aic finally return the
correct status to get the mid-layer to manage its queueing, so a command
is either committed to the sequencer or returned to the midlayer for
requeue.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is similar to the previous sym2 problem. For Domain Validation to
work we can't allow any period setting to turn wide on if it was
previously off.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
[SCSI] implement parameter limits in the SPI transport class
There's a basic need not to have parameters go under or over certain
values when doing domain validation. The basic ones are
max_offset, max_width and min_period
This patch makes the transport class take and enforce these three
limits. Currently they can be set by the user, although they could
obviously be read from the HBA's on-board NVRAM area during
slave_configure (if it has one).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
David S. Miller [Fri, 20 May 2005 18:40:32 +0000 (11:40 -0700)]
[SPARC64]: Fix bad performance side effect of strbuf timeout changes.
The recent change to add a timeout to strbuf flushing had
a negative performance impact. The udelay()'s are too long,
and they were done in the wrong order wrt. the register read
checks. Fix both, and things are happy again.
There are more possible improvements in this area. In fact,
PCI streaming buffer flushing seems to be part of the bottleneck
in network receive performance on my SunBlade1000 box.
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Mackerras [Fri, 20 May 2005 06:45:58 +0000 (16:45 +1000)]
[PATCH] ppx32: Fix uninitialized variable in set_preferred_console
This fixes an uninitialized variable warning in arch/ppc/kernel/setup.c,
and this time gcc is actually right, there is a path that could result
in offset being uninitialized. Zero is a sane default in this instance.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Fri, 20 May 2005 06:50:55 +0000 (16:50 +1000)]
[PATCH] ppc32: Fix __copy_tofrom_user return value
Recently the __copy_tofrom_user routine was modified to avoid doing
prefetches past the end of the source array. However, in doing so we
introduced a bug in that it now returns the wrong value for the number
of bytes not copied when a fault is encountered. This fixes it to
return the correct number.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Fri, 20 May 2005 06:57:22 +0000 (16:57 +1000)]
[PATCH] ppc32: don't call progress functions after boot
On ppc32, the platform code can supply a "progress" function that is
used to show progress through the boot. These functions are usually
in an init section and so can't be called after the init pages are
freed. Now that the cpu bringup code can be called after the system
is booted (for hotplug cpu) we can get the situation where the
progress function can be called after boot. The simple fix is to set
the progress function pointer to NULL when the init pages are freed,
and that is what this patch does (note that all callers already check
whether the function pointer is NULL before trying to call it).
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Fri, 20 May 2005 05:43:37 +0000 (22:43 -0700)]
Fix get_unmapped_area sanity tests
As noted by Chris Wright, we need to do the full range of tests regardless
of whether MAP_FIXED is set or not, so re-organize get_unmapped_area()
slightly to do the sanity checks unconditionally.
In netlink_broadcast() we're sending shared skb's to netlink listeners
when possible (saves some copying). This is OK, since we hold the only
other reference to the skb.
However, this implies that we must drop our reference on the skb, before
allowing a receiving socket to disappear. Otherwise, the socket buffer
accounting is disrupted.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETLINK]: Move broadcast skb_orphan to the skb_get path.
Cloned packets don't need the orphan call.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122)
What's happening is that:
1) The skb is sent to socket 1.
2) Someone does a recvmsg on socket 1 and drops the ref on the skb.
Note that the rmalloc is not returned at this point since the
skb is still referenced.
3) The same skb is now sent to socket 2.
This version of the fix resurrects the skb_orphan call that was moved
out, last time we had 'shared-skb troubles'. It is practically a no-op
in the common case, but still prevents the possible race with recvmsg.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 19 May 2005 19:39:04 +0000 (12:39 -0700)]
[IPSEC]: Fixed alg_key_len usage in attach_one_algo
The variable alg_key_len is in bits and not bytes. The function
attach_one_algo is currently using it as if it were in bytes.
This causes it to read memory which may not be there.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 18 May 2005 22:39:33 +0000 (15:39 -0700)]
[PATCH] prevent NULL mmap in topdown model
Prevent the topdown allocator from allocating mmap areas all the way
down to address zero.
We still allow a MAP_FIXED mapping of page 0 (needed for various things,
ranging from Wine and DOSEMU to people who want to allow speculative
loads off a NULL pointer).
Herbert Xu [Thu, 19 May 2005 05:52:33 +0000 (22:52 -0700)]
[IPV4/IPV6] Ensure all frag_list members have NULL sk
Having frag_list members which holds wmem of an sk leads to nightmares
with partially cloned frag skb's. The reason is that once you unleash
a skb with a frag_list that has individual sk ownerships into the stack
you can never undo those ownerships safely as they may have been cloned
by things like netfilter. Since we have to undo them in order to make
skb_linearize happy this approach leads to a dead-end.
So let's go the other way and make this an invariant:
For any skb on a frag_list, skb->sk must be NULL.
That is, the socket ownership always belongs to the head skb.
It turns out that the implementation is actually pretty simple.
The above invariant is actually violated in the following patch
for a short duration inside ip_fragment. This is OK because the
offending frag_list member is either destroyed at the end of the
slow path without being sent anywhere, or it is detached from
the frag_list before being sent.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Evgeniy Polyakov [Thu, 19 May 2005 05:51:45 +0000 (22:51 -0700)]
[XFRM]: skb_cow_data() does not set proper owner for new skbs.
It looks like skb_cow_data() does not set
proper owner for newly created skb.
If we have several fragments for skb and some of them
are shared(?) or cloned (like in async IPsec) there
might be a situation when we require recreating skb and
thus using skb_copy() for it.
Newly created skb has neither a destructor nor a socket
assotiated with it, which must be copied from the old skb.
As far as I can see, current code sets destructor and socket
for the first one skb only and uses truesize of the first skb
only to increment sk_wmem_alloc value.
If above "analysis" is correct then attached patch fixes that.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 May 2005 05:49:26 +0000 (22:49 -0700)]
[TG3]: Set minimal hw interrupt mitigation.
Even though we do software interrupt mitigation
via NAPI, it still helps to have some minimal
hw assisted mitigation.
This helps, particularly, on systems where register
I/O overhead is much greater than the CPU horsepower.
For example, it helps on NUMA systems. In such cases
the PIO overhead to disable interrupts for NAPI accounts
for the majority of the packet processing cost. The
CPU is fast enough such that only a single packet is
processed by each NAPI poll call.
Thanks to Michael Chan for reviewing this patch.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 May 2005 05:46:34 +0000 (22:46 -0700)]
[TG3]: Add tagged status support.
When supported, use the TAGGED interrupt processing support
the chip provides. In this mode, instead of a "on/off" binary
semaphore, an incrementing tag scheme is used to ACK interrupts.
All MSI supporting chips support TAGGED mode, so the tg3_msi()
interrupt handler uses it unconditionally. This invariant is
verified when MSI support is tested.
Since we can invoke tg3_poll() multiple times per interrupt under
high packet load, we fetch a new copy of the tag value in the
status block right before we actually do the work.
Also, because the tagged status tells the chip exactly which
work we have processed, we can make two optimizations:
1) tg3_restart_ints() need not check tg3_has_work()
2) the tg3_timer() need not poke the chip 10 times per
second to keep from losing interrupt events
Based upon valuable feedback from Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Tweedie [Wed, 18 May 2005 15:47:17 +0000 (11:47 -0400)]
[PATCH] Avoid console spam with ext3 aborted journal.
Avoid console spam with ext3 aborted journal.
ext3 usually reports error conditions that it detects in its environment.
But when its journal gets aborted due to such errors, it can sometimes
continue to report that condition forever, spamming the console to such
an extent that the initial first cause of the journal abort can be lost.
When the journal aborts, we put the filesystem into readonly mode. Most
subsequent filesystem operations will get rejected immediately by checks
for MS_RDONLY either in the filesystem or in the VFS. But some paths do
not have such checks --- for example, if we continue to write to a file
handle that was opened before the fs went readonly. (We only check for
the ROFS condition when the file is first opened.) In these cases, we
can continue to generate log errors similar to
EXT3-fs error (device $DEV) in start_transaction: Journal has aborted
for each subsequent write.
There is really no point in generating these errors after the initial
error has been fully reported. Specifically, if we're starting a
completely new filesystem operation, and the filesystem is *already*
readonly (ie. the ext3 layer has already detected and handled the
underlying jbd abort), and we see an EROFS error, then there is simply
no point in reporting it again.
Signed-off-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Tweedie [Wed, 18 May 2005 15:22:31 +0000 (11:22 -0400)]
[PATCH] Fix filp being passed through raw ioctl handler
Don't pass meaningless file handles to block device ioctls.
The recent raw IO ioctl-passthrough fix started passing the raw file
handle into the block device ioctl handler. That's unlikely to be
useful, as the file handle is actually open on a character-mode raw
device, not a block device, so dereferencing it is not going to yield
useful results to a block device ioctl handler.
Previously we just passed NULL; also not a value that can usefully
be dereferenced, but at least if it does happen, we'll oops instead of
silently pretending that the file is a block device, so NULL is the more
defensive option here. This patch reverts to that behaviour.
Noticed by Al Viro.
Signed-off-by: Stephen Tweedie <sct@redhat.com> Acked-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Brownell [Thu, 12 May 2005 19:06:27 +0000 (12:06 -0700)]
[PATCH] Driver Core: remove driver model detach_state
The driver model has a "detach_state" mechanism that:
- Has never been used by any in-kernel drive;
- Is superfluous, since driver remove() methods can do the same thing;
- Became buggy when the suspend() parameter changed semantics and type;
- Could self-deadlock when called from certain suspend contexts;
- Is effectively wasted documentation, object code, and headspace.
This removes that "detach_state" mechanism; net code shrink, as well
as a per-device saving in the driver model and sysfs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Brownell [Mon, 9 May 2005 15:07:00 +0000 (08:07 -0700)]
[PATCH] Driver Core: pm diagnostics update, check for errors
This patch includes various tweaks in the messaging that appears during
system pm state transitions:
* Warn about certain illegal calls in the device tree, like resuming
child before parent or suspending parent before child. This could
happen easily enough through sysfs, or in some cases when drivers
use device_pm_set_parent().
* Be more consistent about dev_dbg() tracing ... do it for resume() and
shutdown() too, and never if the driver doesn't have that method.
* Say which type of system sleep state is being entered.
Except for the warnings, these only affect debug messaging.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Scott Murray [Mon, 9 May 2005 21:36:27 +0000 (17:36 -0400)]
[PATCH] PCI Hotplug: remove pci_visit_dev
If my CPCI hotplug update patch is applied, then there are no longer any
in tree users of the pci_visit_dev API, and it and its related code can be
removed.
Signed-off-by: Scott Murray <scottm@somanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>