[PATCH] kprobes: fix namespace problem and sparc64 build
The following renames arch_init, a kprobes function for performing any
architecture specific initialization, to arch_init_kprobes in order to
cleanup the namespace.
Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes
build from the last return probe patch.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add explicit disabling of 440GP IRQ compatibility mode when configuring
440GX interrupt controller. This helps when board firmware for some reason
uses this compatibility mode and leaves it enabled. It breaks 440GX
interrupt code because it assumes native 440GX IRQ mode. People seems to
be continuously bitten by this.
john stultz [Wed, 6 Jul 2005 01:54:44 +0000 (18:54 -0700)]
[PATCH] ppc32: stop misusing NTP's time_offset value
As part of my timeofday rework, I've been looking at the NTP code and I
noticed that the PPC architecture is apparently misusing the NTP's
time_offset (it is a terrible name!) value as some form of timezone offset.
This could cause problems when time_offset changed by the NTP code. This
patch changes the PPC code so it uses a more clear local variable:
timezone_offset.
Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Tom Rini <trini@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] ppc32: add Freescale MPC885ADS board support
This patch adds the Freescale MPC86xADS board support. The supported
devices are SMC UART and 10Mbit ethernet on SCC1.
The manual for the board says that it "is compatible with the MPC8xxFADS
for software point of view". That's why this patch extends FADS instead of
introducing a new platform.
FEC is not supported as the "combined FCC/FEC ethernet driver" driver by
Pantelis Antoniou should replace the current FEC driver.
Signed-off-by: Gennadiy Kurtsman <gkurtsman@ru.mvista.com> Signed-off-by: Andrei Konovalov <akonovalov@ru.mvista.com> Acked-by: Tom Rini <trini@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David S. Miller [Tue, 5 Jul 2005 22:43:58 +0000 (15:43 -0700)]
[TCP]: Never TSO defer under periods of congestion.
Congestion window recover after loss depends upon the fact
that if we have a full MSS sized frame at the head of the
send queue, we will send it. TSO deferral can defeat the
ACK clocking necessary to exit cleanly from recovery.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:24:38 +0000 (15:24 -0700)]
[TCP]: Move to new TSO segmenting scheme.
Make TSO segment transmit size decisions at send time not earlier.
The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.
This is guided by tp->xmit_size_goal. It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.
Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant. These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.
A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side. Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:19:54 +0000 (15:19 -0700)]
[TCP]: Break out tcp_snd_test() into it's constituent parts.
tcp_snd_test() does several different things, use inline
functions to express this more clearly.
1) It initializes the TSO count of SKB, if necessary.
2) It performs the Nagle test.
3) It makes sure the congestion window is adhered to.
4) It makes sure SKB fits into the send window.
This cleanup also sets things up so that things like the
available packets in the congestion window does not need
to be calculated multiple times by packet sending loops
such as tcp_write_xmit().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:19:38 +0000 (15:19 -0700)]
[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling.
'nonagle' should be passed to the tcp_snd_test() function
as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the
tail of the write_queue. This is because Nagle does not
apply to such frames since we cannot possibly tack more
data onto them.
However, while doing this __tcp_push_pending_frames() makes
all of the packets in the write_queue use this modified
'nonagle' value.
Fix the bug and simplify this function by just calling
tcp_write_xmit() directly if sk_send_head is non-NULL.
As a result, we can now make tcp_data_snd_check() just call
tcp_push_pending_frames() instead of the specialized
__tcp_data_snd_check().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:18:51 +0000 (15:18 -0700)]
[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames().
The tcp_cwnd_validate() function should only be invoked
if we actually send some frames, yet __tcp_push_pending_frames()
will always invoke it. tcp_write_xmit() does the call for us,
so the call here can simply be removed.
Also, tcp_write_xmit() can be marked static.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:18:34 +0000 (15:18 -0700)]
[TCP]: Add missing skb_header_release() call to tcp_fragment().
When we add any new packet to the TCP socket write queue,
we must call skb_header_release() on it in order for the
TSO sharing checks in the drivers to work.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:18:03 +0000 (15:18 -0700)]
[TCP]: Move send test logic out of net/tcp.h
This just moves the code into tcp_output.c, no code logic changes are
made by this patch.
Using this as a baseline, we can begin to untangle the mess of
comparisons for the Nagle test et al. We will also be able to reduce
all of the redundant computation that occurs when outputting data
packets.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:17:45 +0000 (15:17 -0700)]
[TCP]: Fix quick-ack decrementing with TSO.
On each packet output, we call tcp_dec_quickack_mode()
if the ACK flag is set. It drops tp->ack.quick until
it hits zero, at which time we deflate the ATO value.
When doing TSO, we are emitting multiple packets with
ACK set, so we should decrement tp->ack.quick that many
segments.
Note that, unlike this case, tcp_enter_cwr() should not
take the tcp_skb_pcount(skb) into consideration. That
function, one time, readjusts tp->snd_cwnd and moves
into TCP_CA_CWR state.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Jul 2005 22:17:25 +0000 (15:17 -0700)]
[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.
The ideal and most optimal layout for an SKB when doing
scatter-gather is to put all the headers at skb->data, and
all the user data in the page array.
This makes SKB splitting and combining extremely simple,
especially before a packet goes onto the wire the first
time.
So, when sk_stream_alloc_pskb() is given a zero size, make
sure there is no skb_tailroom(). This is achieved by applying
SKB_DATA_ALIGN() to the header length used here.
Next, make select_size() in TCP output segmentation use a
length of zero when NETIF_F_SG is true on the outgoing
interface.
Signed-off-by: David S. Miller <davem@davemloft.net>
I suspect "#define __ARGS(x) ()" was deprecated before I was born.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave, you were right and the sleeping locks in shaper were
broken. Markus Kanet noticed this and also tested the patch below that
switches locking to spinlocks.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Olsson [Tue, 5 Jul 2005 22:02:40 +0000 (15:02 -0700)]
[IPV4]: More broken memory allocation fixes for fib_trie
Below a patch to preallocate memory when doing resize of trie (inflate halve)
If preallocations fails it just skips the resize of this tnode for this time.
The oops we got when killing bgpd (with full routing) is now gone.
Patrick memory patch is also used.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 5 Jul 2005 21:55:24 +0000 (14:55 -0700)]
[NET]: Hashed spinlocks in net/ipv4/route.c
- Locking abstraction
- Spinlocks moved out of rt hash table : Less memory (50%) used by rt
hash table. it's a win even on UP.
- Sizing of spinlocks table depends on NR_CPUS
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 5 Jul 2005 21:44:55 +0000 (14:44 -0700)]
[IPV4]: Handle large allocations in fib_trie
Inflating a node a couple of times makes it exceed the 128k kmalloc limit.
Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash.
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Tue, 5 Jul 2005 21:15:53 +0000 (14:15 -0700)]
[PKT_SCHED]: Report rate estimator configuration errors during qdisc allocation
Current behaviour is to not report an error if a rate
estimator is created together with a qdisc and the
configuration of the rate estimator is bogus. This leads
to unexpected behaviour because the user is not notified.
New behaviour is to report the error and let the whole
qdisc creation operation fail so the user is able to fix
his mistake.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Tue, 5 Jul 2005 21:13:41 +0000 (14:13 -0700)]
[NET]: Reduce size of sk_buff by 4 bytes
Reduce local_df to a bit field and ip_summed to a 2 bits
field thus saving 13 bits. Move bit fields, packet type,
and protocol into the spare area between the priority
and the destructor. Saves 4 bytes on both, 32bit and
64bit architectures.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 5 Jul 2005 21:08:57 +0000 (14:08 -0700)]
[NET]: Remove redundant code in net/core/filter.c
skb_header_pointer handles linear and non-linear data, no need to handle
linear data again.
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 5 Jul 2005 21:08:10 +0000 (14:08 -0700)]
[NET]: Fix signedness issues in net/core/filter.c
This is the code to load packet data into a register:
k = fentry->k;
if (k < 0) {
...
} else {
u32 _tmp, *p;
p = skb_header_pointer(skb, k, 4, &_tmp);
if (p != NULL) {
A = ntohl(*p);
continue;
}
}
skb_header_pointer checks if the requested data is within the
linear area:
int hlen = skb_headlen(skb);
if (offset + len <= hlen)
return skb->data + offset;
When offset is within [INT_MAX-len+1..INT_MAX] the addition will
result in a negative number which is <= hlen.
I couldn't trigger a crash on my AMD64 with 2GB of memory, but a
coworker tried on his x86 machine and it crashed immediately.
This patch fixes the check in skb_header_pointer to handle large
positive offsets similar to skb_copy_bits. Invalid data can still
be accessed using negative offsets (also similar to skb_copy_bits),
anyone using negative offsets needs to verify them himself.
Thanks to Thomas Vögtle <thomas.voegtle@coreworks.de> for verifying the
problem by crashing his machine and providing me with an Oops.
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Jul 2005 20:26:04 +0000 (13:26 -0700)]
[SPARC64]: Do proper DMA IRQ syncing on Tomatillo
This was the main impetus behind adding the PCI IRQ shim.
In order to properly order DMA writes wrt. interrupts, you have to
write to a PCI controller register, then poll for that bit clearing.
There is one bit for each interrupt source, and setting this register
bit tells Tomatillo to drain all pending DMA from that device.
Furthermore, Tomatillo's with revision less than 4 require us to do a
block store due to some memory transaction ordering issues it has on
JBUS.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Jul 2005 20:24:38 +0000 (13:24 -0700)]
[SPARC64]: Add support for IRQ pre-handlers.
This allows a PCI controller to shim into IRQ delivery
so that DMA queues can be drained, if necessary.
If some bus specific code needs to run before an IRQ
handler is invoked, the bus driver simply needs to setup
the function pointer in bucket->irq_info->pre_handler and
the two args bucket->irq_info->pre_handler_arg[12].
The Schizo PCI driver is converted over to use a pre-handler
for the DMA write-sync processing it needs when a device
is behind a PCI->PCI bus deeper than the top-level APB
bridges.
While we're here, clean up all of the action allocation
and handling. Now, we allocate the irqaction as part of
the bucket->irq_info area. There is an array of 4 irqaction
(for PCI irq sharing) and a bitmask saying which entries
are active.
The bucket->irq_info is allocated at build_irq() time, not
at request_irq() time. This simplifies request_irq() and
free_irq() tremendously.
The SMP dynamic IRQ retargetting code got removed in this
change too. It was disabled for a few months now, and we
can resurrect it in the future if we want.
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch adds some ioctls to include/linux/compat_ioctl.h
to allow using ppdev from the 32 bit user space on sparc64.
This patch also adds the PPDEV option in the sparc64 menu, near Parallel
printer support in the 'General machine setup' submenu.
All those ioctls seem to be compatible, since (correct me if I'm wrong)
they dont use the 'long' type. See include/linux/ppdev.h.
The application I used to test the new ioctls only used the following:
PPEXCL
PPCLAIM
PPNEGOT
PPGETMODES
PPRCONTROL
PPWCONTROL
PPDATADIR
PPWDATA
PPRDATA
But I beleive that the other ioctls will work fine.
Signed-off-by: David S. Miller <davem@davemloft.net>
[PATCH] ARM: 2784/1: Fix the block cache flush operation range
Patch from Catalin Marinas
The range for the ARMv6 block cache operations is inclusive but the
kernel doesn't re-calculate the end address, causing a page fault when
used (this only happens with support for cache aliasing, otherwise the
blk_flush_kern_dcache_page() is not called). This patch subtracts
L1_CACHE_BYTES from the end address.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ben Dooks [Sun, 3 Jul 2005 16:44:40 +0000 (17:44 +0100)]
[PATCH] ARM: 2785/1: S3C24XX - serial calls request_irq() with IRQs disabled
Patch from Ben Dooks
The request_irq() function is called by s3c24xx uart driver with
the local IRQs disabled. The request_irq() function can allocate
memory via kmalloc(), and this may sleep causing a warning about
sleeping in an invalid context.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rob Punkunus [Sun, 3 Jul 2005 15:37:18 +0000 (17:37 +0200)]
[PATCH] amd74xx: support MCP55 device IDs
From: Rob Punkunus <rpunkunus@nvidia.com>
Rob Punkunus recently submitted a patch to enable support for MCP51/MCP55 in
the amd74xx driver. This patch was whitespace-corrupted and didn't apply to
2.6.12 since MCP51 support was merged in the 2.6.12-rc series.
Gentoo would like to support this hardware for our upcoming release media, so
I fixed the patch, and here it is :)
Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
Hannes Reinecke [Tue, 28 Jun 2005 12:57:10 +0000 (14:57 +0200)]
[PATCH] PCI: Remove newline from pci MODALIAS variable
the pci core sends out a hotplug event variable MODALIAS with a trailing
newline. This is inconsistent with all other event variables and breaks
some hotplug tools. This patch removes the said newline.
The dynamic pci id logic has been bothering me for a while, and now that
I started to look into how to move some of this to the driver core, I
thought it was time to clean it all up.
It ends up making the code smaller, and easier to follow, and fixes a
few bugs at the same time (dynamic ids were not being matched
everywhere, and so could be missed on some call paths for new devices,
semaphore not needed to be grabbed when adding a new id and calling the
driver core, etc.)
I also renamed the function pci_match_device() to pci_match_id() as
that's what it really does.
Ivan Kokshaysky [Wed, 15 Jun 2005 14:59:27 +0000 (18:59 +0400)]
[PATCH] PCI: pci_assign_unassigned_resources() on x86
- Add sanity check for io[port,mem]_resource in setup-bus.c. These
resources look like "free" as they have no parents, but obviously
we must not touch them.
- In i386.c:pci_allocate_bus_resources(), if a bridge resource cannot be
allocated for some reason, then clear its flags. This prevents any child
allocations in this range, so the setup-bus code will work with a clean
resource sub-tree.
- i386.c:pcibios_enable_resources() doesn't enable bridges, as it checks
only resources 0-5, which looks like a clear bug to me. I suspect it
might break hotplug as well in some cases.
From: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With the number of PCI bus resources increased to 8, we can
handle the subtractive decode PCI-PCI bridge like a normal
bridge, taking into account standard PCI-PCI bridge windows
(resources 0-2). This helps to avoid problems with peer-to-peer DMA
behind such bridges, poor performance for MMIO ranges outside bridge
windows and prefetchable vs. non-prefetchable memory issues.
To reflect the fact that such bridges do forward all addresses to
the secondary bus (transparency), remaining bus resources 3-7 are
linked to resources 0-4 of the primary bus. These resources will be
used as fallback by resource management code if allocation from
standard bridge windows fails for some reason.
[PATCH] PCI: Increase the number of PCI bus resources
This patch increases the number of resource pointers in the
pci_bus structure. This is needed to store >4 resource ranges
for host bridges and transparent PCI bridges. With this change,
all PCI buses will have more resource pointers, but most PCI
buses will only use the first 3 or 4, the remaining being NULL.
The PCI core already deals with this correctly.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ivan Kokshaysky [Fri, 1 Jul 2005 12:46:26 +0000 (16:46 +0400)]
[PATCH] alpha smp fix (part #2)
This fixes the bug that caused BUG_ON(!irqs_disabled()) to trigger in
run_posix_cpu_timers() on alpha/smp. We didn't disable interrupts
properly before calling smp_percpu_timer_interrupt().
We *do* disable interrupts everywhere except this unfortunate
smp_percpu_timer_interrupt(). Fixed thus.