Eric Dumazet [Mon, 17 Nov 2008 10:41:00 +0000 (02:41 -0800)]
net: sctp should update its inuse counter
This patch is a preparation to namespace conversion of /proc/net/protocols
In order to have relevant information for SCTP protocols, we should use
sock_prot_inuse_add() to update a (percpu and pernamespace) counter of
inuse sockets.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 17 Nov 2008 10:38:49 +0000 (02:38 -0800)]
net: af_unix should update its inuse counter
This patch is a preparation to namespace conversion of /proc/net/protocols
In order to have relevant information for UNIX protocol, we should use
sock_prot_inuse_add() to update a (percpu and pernamespace) counter of
inuse sockets.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 17 Nov 2008 06:56:55 +0000 (22:56 -0800)]
dccp: Tidy up setsockopt calls
This splits the setsockopt calls into two groups, depending on whether an
integer argument (val) is required and whether routines being called do
their own locking.
Some options (such as setting the CCID) use u8 rather than int, so that for
these the test with regard to integer-sizeof can not be used.
The second switch-case statement now only has those statements which need
locking and which make use of `val'.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 17 Nov 2008 06:55:08 +0000 (22:55 -0800)]
dccp: Deprecate Ack Ratio sysctl
This patch deprecates the Ack Ratio sysctl, since
* Ack Ratio is entirely ignored by CCID-3 and CCID-4,
* Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
* even if it would work in CCID-2, there is no point for a user to change it:
- Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
- if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
(since waiting for Acks which will never arrive in this window),
- cwnd is not a user-configurable value.
The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.
With this patch Ack Ratio is now under full control of feature negotiation:
* Ack Ratio is resolved as a dependency of the selected CCID;
* if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
Ratio 2 for both endpoints";
* what happens then is part of another patch set, since it concerns the
dynamic update of Ack Ratio while the connection is in full flight.
Thanks to Tomasz Grobelny for discussion leading up to this patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 17 Nov 2008 06:53:48 +0000 (22:53 -0800)]
dccp: Feature negotiation for minimum-checksum-coverage
This provides feature negotiation for server minimum checksum coverage
which so far has been missing.
Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.
Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 17 Nov 2008 06:51:23 +0000 (22:51 -0800)]
dccp: Deprecate old setsockopt framework
The previous setsockopt interface, which passed socket options via struct
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.
This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation.
These are essentially wrappers around the internal __feat_register functions,
with checking added to avoid
* wrong usage (type);
* changing values while the connection is in progress.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 17 Nov 2008 06:49:52 +0000 (22:49 -0800)]
dccp: Mechanism to resolve CCID dependencies
This adds a hook to resolve features whose value depends on the choice of
CCID. It is done at the server since it can only be done after the CCID
values have been negotiated; i.e. the client will add its CCID preference
list on the Change options sent in the Request, which will be reconciled
with the local preference list of the server.
The concept is documented on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
implementation_notes.html#ccid_dependencies
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
If segmentation offload is enabled by the host, we currently allocate
maximum sized packet buffers and pass them to the host. This uses up
20 ring entries, allowing us to supply only 20 packet buffers to the
host with a 256 entry ring. This is a huge overhead when receiving
small packets, and is most keenly felt when receiving MTU sized
packets from off-host.
The VIRTIO_NET_F_MRG_RXBUF feature flag is set by hosts which support
using receive buffers which are smaller than the maximum packet size.
In order to transfer large packets to the guest, the host merges
together multiple receive buffers to form a larger logical buffer.
The number of merged buffers is returned to the guest via a field in
the virtio_net_hdr.
Make use of this support by supplying single page receive buffers to
the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of
the payload to the skb's linear data buffer and adjust the fragment
offset to point to the remaining data. This ensures proper alignment
and allows us to not use any paged data for small packets. If the
payload occupies multiple pages, we simply append those pages as
fragments and free the associated skbs.
This scheme allows us to be efficient in our use of ring entries
while still supporting large packets. Benchmarking using netperf from
an external machine to a guest over a 10Gb/s network shows a 100%
improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark
with GSO disabled on the host side, throughput was seen to increase
from 700Mb/s to 1.7Gb/s.
Based on a patch from Herbert Xu.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv) Signed-off-by: David S. Miller <davem@davemloft.net>
Mark McLoughlin [Mon, 17 Nov 2008 06:40:36 +0000 (22:40 -0800)]
virtio_net: hook up the set-tso ethtool op
Seems like an oversight that we have set-tx-csum and set-sg hooked
up, but not set-tso.
Also leads to the strange situation that if you e.g. disable tx-csum,
then tso doesn't get disabled.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark McLoughlin [Mon, 17 Nov 2008 06:39:18 +0000 (22:39 -0800)]
virtio_net: Recycle some more rx buffer pages
Each time we re-fill the recv queue with buffers, we allocate
one too many skbs and free it again when adding fails. We should
recycle the pages allocated in this case.
A previous version of this patch made trim_pages() trim trailing
unused pages from skbs with some paged data, but this actually
caused a barely measurable slowdown.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv) Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 17 Nov 2008 03:46:36 +0000 (19:46 -0800)]
net: make sure struct dst_entry refcount is aligned on 64 bytes
As found in the past (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.
We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.
for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0
As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)
Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.
"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 17 Nov 2008 03:41:14 +0000 (19:41 -0800)]
rcu: documents rculist_nulls
Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'
end-of-list can help in some RCU algos.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 17 Nov 2008 03:40:17 +0000 (19:40 -0800)]
net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls
RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.
This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.
Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.
__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)
Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 17 Nov 2008 03:37:55 +0000 (19:37 -0800)]
rcu: Introduce hlist_nulls variant of hlist
hlist uses NULL value to finish a chain.
hlist_nulls variant use the low order bit set to 1 to signal an end-of-list marker.
This allows to store many different end markers, so that some RCU lockless
algos (used in TCP/UDP stack for example) can save some memory barriers in
fast paths.
Two new files are added :
include/linux/list_nulls.h
- mimics hlist part of include/linux/list.h, derived to hlist_nulls variant
include/linux/rculist_nulls.h
- mimics hlist part of include/linux/rculist.h, derived to hlist_nulls variant
Only four helpers are declared for the moment :
hlist_nulls_del_init_rcu(), hlist_nulls_del_rcu(),
hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry_rcu()
prefetches() were removed, since an end of list is not anymore NULL value.
prefetches() could trigger useless (and possibly dangerous) memory transactions.
Example of use (extracted from __udp4_lib_lookup())
struct sock *sk, *result;
struct hlist_nulls_node *node;
unsigned short hnum = ntohs(dport);
unsigned int hash = udp_hashfn(net, hnum);
struct udp_hslot *hslot = &udptable->hash[hash];
int score, badness;
rcu_read_lock();
begin:
result = NULL;
badness = -1;
sk_nulls_for_each_rcu(sk, node, &hslot->head) {
score = compute_score(sk, net, saddr, hnum, sport,
daddr, dport, dif);
if (score > badness) {
result = sk;
badness = score;
}
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
if (get_nulls_value(node) != hash)
goto begin;
if (result) {
if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
result = NULL;
else if (unlikely(compute_score(result, net, saddr, hnum, sport,
daddr, dport, dif) < badness)) {
sock_put(result);
goto begin;
}
}
rcu_read_unlock();
return result;
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
In case UDP traffic is redirected to a local UDP socket,
the originally addressed destination address/port
cannot be recovered with the in-kernel tproxy.
This patch adds an IP_RECVORIGDSTADDR sockopt that enables
a IP_ORIGDSTADDR ancillary message in recvmsg(). This
ancillary message contains the original destination address/port
of the packet being received.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Greear [Mon, 17 Nov 2008 03:19:38 +0000 (19:19 -0800)]
ipv4: Fix ARP behavior with many mac-vlans
Ben Greear wrote:
> I have 500 mac-vlans on a system talking to 500 other
> mac-vlans. My problem is that the arp-table gets extremely
> huge because every time an arp-request comes in on all mac-vlans,
> a stale arp entry is added for each mac-vlan. I have filtering
> turned on, but that doesn't help because the neigh_event_ns call
> below will cause a stale neighbor entry to be created regardless
> of whether a replay will be sent or not.
> Maybe the neigh_event code should be below the checks for dont_send,
> and only create check neigh_event_ns if we are !dont_send?
The attached patch makes it work much better for me. The patch
will cause the code to NOT create a stale neighbor entry if we
are not going to respond to the ARP request. The old code
*would* create a stale entry even if we are not going to respond.
Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 14 Nov 2008 06:54:36 +0000 (06:54 +0000)]
e1000e: enable ECC correction on 82571 silicon
This change enables ECC correction for the packet buffer on all 82571
silicon.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paulius Zaleckas [Fri, 14 Nov 2008 00:24:34 +0000 (00:24 +0000)]
phylib: make mdio-gpio work without OF (v4)
make mdio-gpio work with non OpenFirmware gpio implementation.
Aditional changes to mdio-gpio:
- use gpio_request() and gpio_free()
- place irq[] array in struct mdio_gpio_info
- add module description, author and license
- add note about compiling this driver as module
- rename mdc and mdio function (were ugly names)
- change MII to MDIO in bus name
- add __init __exit to module (un)loading functions
- probe fails if no phys added to the bus
- kzalloc bitbang with sizeof(*bitbang)
Changes since v3:
- keep bus naming "%x" to be compatible with existing drivers.
Changes since v2:
- more #ifdefs reduction
- platform driver will be registered on OF platforms also
- unified platform and OF bus_id to phy%i
Changes since v1:
- removed NO_IRQ
- reduced #idefs
Laurent, please test this driver under OF.
Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt> Signed-off-by: David S. Miller <davem@davemloft.net>
David Brownell [Sun, 16 Nov 2008 08:36:08 +0000 (00:36 -0800)]
pegasus: minor resource shrinkage
Make pegasus driver not allocate a workqueue until the driver
is bound to some device, which will need that workqueue if
the device is brought up. This conserves resources when the
driver is linked but there's no pegasus device connected.
Also shrink the runtime footprint a smidgeon by moving some
init-only code into its proper section, and move an obnoxious
(frequent and meaningless) message to be debug-only.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
PJ Waskiewicz [Fri, 7 Nov 2008 12:16:08 +0000 (12:16 +0000)]
ixgbe: Fix usage of netif_*_all_queues() with netif_carrier_{off|on}()
netif_carrier_off() is sufficient to stop Tx into the driver. Stopping the Tx
queues is redundant and unnecessary. By the same token, netif_carrier_on()
will be sufficient to re-enable Tx, so waking the queues is unnecessary.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 14 Nov 2008 08:53:54 +0000 (00:53 -0800)]
net: speedup dst_release()
During tbench/oprofile sessions, I found that dst_release() was in third position.
CPU: Core 2, speed 2999.68 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % symbol name
483726 9.0185 __copy_user_zeroing_intel
191466 3.5697 __copy_user_intel
185475 3.4580 dst_release
175114 3.2648 ip_queue_xmit
153447 2.8608 tcp_sendmsg
108775 2.0280 tcp_recvmsg
102659 1.9140 sysenter_past_esp
101450 1.8914 tcp_current_mss
95067 1.7724 __copy_from_user_ll
86531 1.6133 tcp_transmit_skb
Of course, all CPUS fight on the dst_entry associated with 127.0.0.1
Instead of first checking the refcount value, then decrement it,
we use atomic_dec_return() to help CPU to make the right memory transaction
(ie getting the cache line in exclusive mode)
dst_release() is now at the fifth position, and tbench a litle bit faster ;)
CPU: Core 2, speed 3000.1 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % symbol name
647107 8.8072 __copy_user_zeroing_intel
258840 3.5229 ip_queue_xmit
258302 3.5155 __copy_user_intel
209629 2.8531 tcp_sendmsg
165632 2.2543 dst_release
149232 2.0311 tcp_current_mss
147821 2.0119 tcp_recvmsg
137893 1.8767 sysenter_past_esp
127473 1.7349 __copy_from_user_ll
121308 1.6510 ip_finish_output
118510 1.6129 tcp_transmit_skb
109295 1.4875 tcp_v4_rcv
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Fri, 14 Nov 2008 06:56:30 +0000 (22:56 -0800)]
pkt_sched: Remove qdisc->ops->requeue() etc.
After implementing qdisc->ops->peek() and changing sch_netem into
classless qdisc there are no more qdisc->ops->requeue() users. This
patch removes this method with its wrappers (qdisc_requeue()), and
also unused qdisc->requeue structure. There are a few minor fixes of
warnings (htb_enqueue()) and comments btw.
The idea to kill ->requeue() and a similar patch were first developed
by David S. Miller.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Harvey Harrison [Fri, 14 Nov 2008 06:41:29 +0000 (22:41 -0800)]
isdn: use %pI4, remove get_{u8/u16/u32} and put_{u8/u16/u32} inlines
They would have been better named as get_be16, put_be16, etc.
as they were hiding an endian shift inside.
They don't add much over explicitly coding the byteshifting
and gcc sometimes has a problem with builtin_constant_p inside
inline functions, so it may do a better job of byteswapping
at compile time rather than runtime.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Thu, 13 Nov 2008 07:39:10 +0000 (23:39 -0800)]
netdevice: safe convert to netdev_priv() #part-4
We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.
This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Thu, 13 Nov 2008 07:38:36 +0000 (23:38 -0800)]
netdevice: safe convert to netdev_priv() #part-3
We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.
This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Thu, 13 Nov 2008 07:38:14 +0000 (23:38 -0800)]
netdevice: safe convert to netdev_priv() #part-2
We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.
This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Thu, 13 Nov 2008 07:37:49 +0000 (23:37 -0800)]
netdevice: safe convert to netdev_priv() #part-1
We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.
This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaud Ebalard [Thu, 13 Nov 2008 07:28:15 +0000 (23:28 -0800)]
net: Remove unused parameter of xfrm_gen_index()
In commit 2518c7c2b3d7f0a6b302b4efe17c911f8dd4049f ("[XFRM]: Hash
policies when non-prefixed."), the last use of xfrm_gen_policy() first
argument was removed, but the argument was left behind in the
prototype.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 13 Nov 2008 00:03:05 +0000 (16:03 -0800)]
bnx2: Update version to 1.8.2.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 13 Nov 2008 00:02:45 +0000 (16:02 -0800)]
bnx2: Reorganize timeout constants.
Move all related timeout constants to the same location. BNX2
prefix is also added to make them more consistent.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 13 Nov 2008 00:02:20 +0000 (16:02 -0800)]
bnx2: Set rx buffer water marks based on MTU.
The default rx buffer water marks for XOFF/XON are for 1500 MTU. At
larger MTUs, these water marks need to be adjusted for effective
flow control.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 13 Nov 2008 00:01:41 +0000 (16:01 -0800)]
bnx2: Restrict WoL support.
On some quad-port cards that cannot support WoL on all ports due
to excessive power consumption, the driver needs to restrict WoL
on some ports by checking VAUX_PRESET bit.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 13 Nov 2008 00:01:12 +0000 (16:01 -0800)]
bnx2: Add PCI ID for 5716S.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 12 Nov 2008 08:48:44 +0000 (00:48 -0800)]
dccp: Resolve dependencies of features on choice of CCID
This provides a missing link in the code chain, as several features implicitly
depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).
For Send Ack Vector, the situation is as follows:
* since CCID2 mandates the use of Ack Vectors, there is no point in allowing
endpoints which use CCID2 to disable Ack Vector features such a connection;
* a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);
* for all other CCIDs, the use of (Send) Ack Vector is optional and thus
negotiable. However, this implies that the code negotiating the use of Ack
Vectors also supports it (i.e. is able to supply and to either parse or
ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
Vector support), the use of Ack Vectors is here disabled, with a comment
in the source code.
An analogous consideration arises for the Send Loss Event Rate feature,
since the CCID-3 implementation does not support the loss interval options
of RFC 4342. To make such use explicit, corresponding feature-negotiation
options are inserted which signal the use of the loss event rate option,
as it is used by the CCID3 code.
Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.
The patch implements this as a function which is called after the user has
made all other registrations for changing default values of features.
The table is variable-length, the reserved (and hence for feature-negotiation
invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
is used to mark the end of the table.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 12 Nov 2008 08:47:26 +0000 (00:47 -0800)]
dccp: Query supported CCIDs
This provides a data structure to record which CCIDs are locally supported
and three accessor functions:
- a test function for internal use which is used to validate CCID requests
made by the user;
- a copy function so that the list can be used for feature-negotiation;
- documented getsockopt() support so that the user can query capabilities.
The data structure is a table which is filled in at compile-time with the
list of available CCIDs (which in turn depends on the Kconfig choices).
Using the copy function for cloning the list of supported CCIDs is useful for
feature negotiation, since the negotiation is now with the full list of available
CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation
will not fail if the peer requests to use CCID3 instead of CCID2.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 12 Nov 2008 08:43:40 +0000 (00:43 -0800)]
dccp: Registration routines for changing feature values
Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.
These are internal-only routines and therefore start with `__feat_register'.
It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 12 Nov 2008 08:42:58 +0000 (00:42 -0800)]
dccp: Limit feature negotiation to connection setup phase
This patch limits feature (capability) negotation to the connection setup phase:
1. Although it is theoretically possible to perform feature negotiation at any
time (and RFC 4340 supports this), in practice this is prohibitively complex,
as it requires to put traffic on hold for each new negotiation.
2. As a byproduct of restricting feature negotiation to connection setup, the
feature-negotiation retransmit timer is no longer required. This part is now
mapped onto the protocol-level retransmission.
Details indicating why timers are no longer needed can be found on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
implementation_notes.html
This patch disables anytime negotiation, subsequent patches work out full
feature negotiation support for connection setup.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 11 Nov 2008 18:53:50 +0000 (10:53 -0800)]
Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context
nohz: disable tick_nohz_kick_tick() for now
irq: call __irq_enter() before calling the tick_idle_check
x86: HPET: enter hpet_interrupt_handler with interrupts disabled
x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP
x86: HPET: convert WARN_ON to WARN_ON_ONCE
Linus Torvalds [Tue, 11 Nov 2008 18:52:25 +0000 (10:52 -0800)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: release buddies on yield
fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock
sched: clean up debug info
Linus Torvalds [Tue, 11 Nov 2008 18:51:50 +0000 (10:51 -0800)]
Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ring-buffer: prevent infinite looping on time stamping
ftrace: disable tracing on resize
ftrace: fix breakage in bin_fmt results
ftrace: ftrace.txt version update
ftrace: update txt document
Linus Torvalds [Tue, 11 Nov 2008 17:32:58 +0000 (09:32 -0800)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
[XFS] XFS: Check for valid transaction headers in recovery
[XFS] handle memory allocation failures during log initialisation
[XFS] Account for allocated blocks when expanding directories
[XFS] Wait for all I/O on truncate to zero file size
[XFS] Fix use-after-free with log and quotas
Linus Torvalds [Tue, 11 Nov 2008 17:31:32 +0000 (09:31 -0800)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (21 commits)
ocfs2: Check search result in ocfs2_xattr_block_get()
ocfs2: fix printk related build warnings in xattr.c
ocfs2: truncate outstanding block after direct io failure
ocfs2/xattr: Proper hash collision handle in bucket division
ocfs2: return 0 in page_mkwrite to let VFS retry.
ocfs2: Set journal descriptor to NULL after journal shutdown
ocfs2: Fix check of return value of ocfs2_start_trans() in xattr.c.
ocfs2: Let inode be really deleted when ocfs2_mknod_locked() fails
ocfs2: Fix checking of return value of new_inode()
ocfs2: Fix check of return value of ocfs2_start_trans()
ocfs2: Fix some typos in xattr annotations.
ocfs2: Remove unused ocfs2_restore_xattr_block().
ocfs2: Don't repeat ocfs2_xattr_block_find()
ocfs2: Specify appropriate journal access for new xattr buckets.
ocfs2: Check errors from ocfs2_xattr_update_xattr_search()
ocfs2: Don't return -EFAULT from a corrupt xattr entry.
ocfs2: Check xattr block signatures properly.
ocfs2: add handler_map array bounds checking
ocfs2: remove duplicate definition in xattr
ocfs2: fix function declaration and definition in xattr
...
Linus Torvalds [Tue, 11 Nov 2008 17:25:21 +0000 (09:25 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (35 commits)
V4L/DVB (9516): cx18: Move DVB buffer transfer handling from irq handler to work_queue
V4L/DVB (9557): gspca: Small changes for the sensor HV7131B in zc3xx.
V4L/DVB (9556): gspca: Bad init sequence for sensor HV7131B in zc3xx.
V4L/DVB (9549): gspca: Fix a typo in one of gspca chips name.
V4L/DVB (9515): cx18: Use correct Mailbox IRQ Ack values and misc IRQ handling cleanup
V4L/DVB (9493): kconfig patch
V4L/DVB (9527): af9015: fix compile warnings
V4L/DVB (9524): af9013: fix bug in status reading
V4L/DVB (9511): cx18: Mark CX18_CPU_DE_RELEASE_MDL as a slow API call
V4L/DVB (9510): cx18: Fix write retries for registers that always change - part 2.
V4L/DVB (9506): ivtv/cx18: fix test whether modules should be loaded or not.
V4L/DVB (9499): cx88-mpeg: final fix for analogue only compilation + de-alloc fix
V4L/DVB (9496): cx88-blackbird: bugfix: cx88-blackbird-mpeg-users
V4L/DVB (9495): cx88-blackbird: bugfix: cx88-blackbird-poll-fix
V4L/DVB (9494): anysee: initialize anysee_usb_mutex statically
V4L/DVB (9492): unplug oops from dvb_frontend_init...
V4L/DVB (9486): ivtv/ivtvfb: no longer experimental
V4L/DVB (9485): ivtv: remove incorrect V4L1 & tvaudio dependency
V4L/DVB (9482): Documentation, especially regarding audio and informational links
V4L/DVB (9475): cx18: Disable write retries for registers that always change - part 1.
...
Linus Torvalds [Tue, 11 Nov 2008 17:22:24 +0000 (09:22 -0800)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/i915: Move legacy breadcrumb out of the reserved status page area
drm/i915: Filter pci devices based on PCI_CLASS_DISPLAY_VGA
drm/radeon: map registers at load time
drm: Remove infrastructure for supporting i915's vblank swapping.
i915: Remove racy delayed vblank swap ioctl.
i915: Don't whine when pci_enable_msi() fails.
i915: Don't attempt to short-circuit object_wait_rendering by checking domains.
i915: Clean up sarea pointers on leavevt
i915: Save/restore MCHBAR_RENDER_STANDBY on GM965/GM45
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
dsa: fix master interface allmulti/promisc handling
dsa: fix skb->pkt_type when mac address of slave interface differs
net: fix setting of skb->tail in skb_recycle_check()
net: fix /proc/net/snmp as memory corruptor
mac80211: fix a buffer overrun in station debug code
netfilter: payload_len is be16, add size of struct rather than size of pointer
ipv6: fix ip6_mr_init error path
[4/4] dca: fixup initialization dependency
[3/4] I/OAT: fix async_tx.callback checking
[2/4] I/OAT: fix dma_pin_iovec_pages() error handling
[1/4] I/OAT: fix channel resources free for not allocated channels
ssb: Fix DMA-API compilation for non-PCI systems
SSB: hide empty sub menu
vlan: Fix typos in proc output string
[netdrvr] usb/hso: Cleanup rfkill error handling
sfc: Correct address of gPXE boot configuration in EEPROM
el3_common_init() should be __devinit, not __init
hso: rfkill type should be WWAN
mlx4_en: Start port error flow bug fix
af_key: mark policy as dead before destroying
Andy Walls [Wed, 5 Nov 2008 03:49:14 +0000 (00:49 -0300)]
V4L/DVB (9516): cx18: Move DVB buffer transfer handling from irq handler to work_queue
cx18: Move DVB buffer transfer handling from irq handler to work_queue thread.
In order to properly lock the epu2cpu mailbox for driver to CX23418 commands,
the DVB/TS buffer handling needs to be moved from the IRQ handler and IRQ
context to a work queue. This work_queue implmentation is strikingly similar
to the ivtv implementation - for better or worse.
Signed-off-by: Andy Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Andy Walls [Wed, 5 Nov 2008 01:02:23 +0000 (22:02 -0300)]
V4L/DVB (9515): cx18: Use correct Mailbox IRQ Ack values and misc IRQ handling cleanup
cx18: Use correct Mailbox IRQ Ack values and misc IRQ handling cleanup.
The SCB field definitions for Ack IRQ's for mailboxes were inconsistent with
the bitmasks being loaded into those SCB fields and the SW2 Ack IRQ handling
logic. Renamed fields in SCB to make things consistent and did misc IRQ
handling cleanups: removing legacy ivtv dma_reg_lock, HPU IRQ flags, etc.
Signed-off-by: Andy Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Frederic CAND [Wed, 29 Oct 2008 17:37:49 +0000 (14:37 -0300)]
V4L/DVB (9493): kconfig patch
Ok I made a patch that converts gspca kconfig file to a more standard=
one, with tabs + 2 white spaces, so that if a warning is added it still
compiles
please find it attached
Andy Walls [Sat, 1 Nov 2008 04:07:36 +0000 (01:07 -0300)]
V4L/DVB (9511): cx18: Mark CX18_CPU_DE_RELEASE_MDL as a slow API call
cx18: Mark CX18_CPU_DE_RELEASE_MDL as a slow API call.
Give the encoder time to complete the MDL release before destroying the
encoder internal task. This avoids an encoder lockup on the next digital
capture and error messages about buffers being returned for an inactive
encoder task handle.
Signed-off-by: Andy Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Andy Walls [Fri, 31 Oct 2008 23:49:12 +0000 (20:49 -0300)]
V4L/DVB (9510): cx18: Fix write retries for registers that always change - part 2.
cx18: Fix write retries for registers that always change - part 2.
Some registers, especially interrupt related ones, will never read
back the value just written. Modified interrupt register readback
checks to make sure the intended effect was achieved.
Signed-off-by: Andy Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
anysee_usb_mutex is initialized at every time the anysee device is probed.
If the second anysee device is probed while anysee_usb_mutex is locked by
the first anysee device, the mutex is broken.
This patch fixes by initialize anysee_usb_mutex statically rather
than initialize at probe time.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
V4L/DVB (9492): unplug oops from dvb_frontend_init...
When inadvertently hot-unplugging a WT-220U USB DVB-T receiver with
2.6.24, I was met with an oops [1]. The problem is relevant to
2.6.25/26-rc also.
dvb_frontend_init() was called either from re-creation of the kdvb-fe0
thread - seems unlikely, or someone called
dvb_frontend_reinitialise(), causing this path in the thread - really
unlikely, as I can't find any call-site for it.
Either way, quite a number of drivers call dvb_usb_generic_rw() [2]
without checking the validity of the relevant member in the
dvb_usb_device struct - which had changed. Having dvb_usb_generic_rw()
sanity-check and fail (rather than loading from 0x120) seems
reasonable defensive programming [3], in light of it being called in
this way.
The problem with this, is that drivers don't check the return code of
the init call [4]. Does it make sense to cook a patch which allows the
failure to be propagated back up, or am I missing something else?
ivtv used tvaudio in the past and at the time tvaudio required V4L1.
Since tvaudio is no longer dependent on V4L1 and since ivtv actually
no longer uses tvaudio at all, this is no removed from Kconfig.
Without this patch ivtv won't be build if V4L1 is disabled.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Andy Walls [Sun, 26 Oct 2008 02:27:06 +0000 (23:27 -0300)]
V4L/DVB (9475): cx18: Disable write retries for registers that always change - part 1.
cx18: Disable write retries for registers that always change - part 1.
Interrupt related registers will likely not read back the value we just wrote.
Disable retries for these registers for now to avoid accidently discarding
interrupts. More intelligent read back verification criteria are needed for
these and other registers (e.g. GPIO line registers), which will be addressed in
subsequent changes.
Signed-off-by: Andy Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Jean Delvare [Fri, 24 Oct 2008 18:08:28 +0000 (15:08 -0300)]
V4L/DVB (9372): Minor fixes to the saa7110 driver
* Apparently the author of the saa7110 driver was confused by the
number of outputs returned by DECODER_GET_CAPABILITIES. Of course a
decoder chip has no analog ouputs, but it must have at least one
digital output.
* Fix an off-by-one error when checking the input value of
DECODER_SET_INPUT.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Rafael Diniz [Sat, 25 Oct 2008 02:07:57 +0000 (23:07 -0300)]
V4L/DVB (9368): VBI fix for cx88 cards
The attached patch fix VBI support cx88 card.
I'm running a capture for hours, getting the closed caption from it[1], and
it's working perfect - the output is the same of a bttv card.
Please apply this patch as soon as possible.
[1] - using zvbi-ntsc-cc of zvbi project.
Signed-off-by: Rafael Diniz <diniz@wimobilis.com.br> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
V4L/DVB (9357): cx88-dvb: Fix Oops in case i2c bus failed to register
There already is an report at kernel bugzilla about this issue:
http://bugzilla.kernel.org/show_bug.cgi?id=9455
When enabling extra checks for the i2c-bus of cx88 based cards by
loading i2c_algo_bit with bit_test=1 this may trigger an oops
when loading cx88_dvb.
This is caused by the extra check code that detects that the
sda-line is stuck high and thus does not register the i2c-bus.
cx88-dvb however does not check if the i2c-bus is valid and just
uses core->i2c_adap to attach dvb frontend modules.
This leads to an oops at the first call to i2c_transfer:
Impact: driver could possibly stomp on resources outside of its scope
{mchehab@redhat.com: I got two versions of the same patch (identical,
except for whitespacing). One authored by Andy Burns and another
authored by Suresh Siddha. Due to that, I'm applying the one that has
less CodingStyle errors. I'm also adding both comments and the SOB's for
both patches, since they are both interesting}
BAR base is located in the middle of the 4K page and the hardcoded
size argument makes the request span two pages causing the conflict.
Fix the hard coded size argument in ioremap().
Andy Burns commented:
I have already sent this patch on the linux-dvb list, but it didn't get
much attention, so re-sending direct, I hope you all don't mind.
While attempting to run mythtv in a xen domU, I encountered problems
loading the driver for my saa7134 card, with an error from ioremap().
This error was due to the driver allocating an incorrectly sized mmio
area, which was trapped by xen's permission checks, but this would go
un-noticed on a kernel without xen.
My card has a 1K sized mmio area, I've had information that other cards
have 2K areas, perhaps others have different sizes, yet the driver
always attempts to map 4K. I realise that the granularity of mapping is
the page size, which typically would be 4K, but unless the card's base
address happens to fall on a 4K boundary (mine does not) then the
base+4K will end up spanning two pages, and this is when the error
occurs under xen.
My patch uses the pci_resource_len macro to determine the size required
for the user's particular card, instead of the hardcoded 4K value. I've
tested with a couple of printk() inside ioremap() that the start address
and size do get rounded to the closest page boundary.
With this patch I am able to successfully load the saa7134 driver and
run mythtv under xen with my card, subject to correct pollirq settings
in case of shared IRQ, I am still seeing occasional DMA panics, which I
think are related to swiotlb handling by dom0/domU, usually the panic
occurs when changing mux, once tuned to a mux, 12 hour continuous
recordings are possible without errors.
Jonathan Corbet [Fri, 17 Oct 2008 23:19:29 +0000 (20:19 -0300)]
V4L/DVB (9355): de-BKL cafe_ccic.c
Remove lock_kernel() call from cafe_ccic.c
Commit d56dc61265d2527a63ab5b0f03199a43cd89ca36 added lock_kernel()
calls to cafe_ccic.c. But that driver was written with proper locking
and does not need the BKL, so take it back out.
Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
As reported by David Ellingsworth:
> I'm not sure if it matters or not, but the ibmcam driver in the
> Mauro's linux-2.6 git tree in the for_linus branch is currently
> broken.
uvd is equal to NULL during most of ibmcam_probe. Due to that, an OOPS is
generated at dev_info. This patch replaces uvd->dev->dev to dev->dev
inside this routine.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Reviewed-by: David Ellingsworth <david@identd.dyndns.org>