pilppa.com Git - linux-2.6-omap-h63xx.git/log

netfilter: netns nat: fix ipt_MASQUERADE in netns

First, allow entry in notifier hook.
Second, start conntrack cleanup in netns to which netdevice belongs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: PPTP conntracking in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: GRE conntracking in netns

* make keymap list per-netns
* per-netns keymal lock (not strictly necessary)
* flush keymap at netns stop and module unload.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: H323 conntracking in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: SIP conntracking in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: final netns tweaks

Add init_net checks to not remove kmem_caches twice and so on.

Refactor functions to split code which should be executed only for
init_net into one place.

ip_ct_attach and ip_ct_destroy assignments remain separate, because
they're separate stages in setup and teardown.

NOTE: NOTRACK code is in for-every-net part. It will be made per-netns
after we decidce how to do it correctly.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns conntrack accounting

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_log_invalid sysctl

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_checksum sysctl

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_count sysctl

Note, sysctl table is always duplicated, this is simpler and less
special-cased.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns /proc/net/stat/nf_conntrack, /proc/net/stat/ip_conntrack

Show correct conntrack count, while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns statistics

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns event cache

Heh, last minute proof-reading of this patch made me think,
that this is actually unneeded, simply because "ct" pointers will be
different for different conntracks in different netns, just like they
are different in one netns.

Not so sure anymore.

[Patrick: pointers will be different, flushing can only be done while
inactive though and thus it needs to be per netns]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: pass conntrack to nf_conntrack_event_cache() not skb

This is cleaner, we already know conntrack to which event is relevant.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: cleanup after L3 and L4 proto unregister in every netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: unregister helper in every netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netns: export netns list

Conntrack code will use it for
a) removing expectations and helpers when corresponding module is removed, and
b) removing conntracks when L3 protocol conntrack module is removed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns /proc/net/ip_conntrack, /proc/net/stat/ip_conntrack, /proc/net/ip_conntrack_expect

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns /proc/net/nf_conntrack_expect

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns /proc/net/nf_conntrack, /proc/net/stat/nf_conntrack

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: pass netns pointer to L4 protocol's ->error hook

Again, it's deducible from skb, but we're going to use it for
nf_conntrack_checksum and statistics, so just pass it from upper layer.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: pass netns pointer to nf_conntrack_in()

It's deducible from skb->dev or skb->dst->dev, but we know netns at
the moment of call, so pass it down and use for finding and creating
conntracks.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns unconfirmed list

What is confirmed connection in one netns can very well be unconfirmed
in another one.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns expectations

Make per-netns a) expectation hash and b) expectations count.

Expectations always belongs to netns to which it's master conntrack belong.
This is natural and doesn't bloat expectation.

Proc files and leaf users are stubbed to init_net, this is temporary.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns: fix {ip,6}_route_me_harder() in netns

Take netns from skb->dst->dev. It should be safe because, they are called
from LOCAL_OUT hook where dst is valid (though, I'm not exactly sure about
IPVS and queueing packets to userspace).

[Patrick: its safe everywhere since they already expect skb->dst to be set]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns conntrack hash

* make per-netns conntrack hash

  Other solution is to add ->ct_net pointer to tuplehashes and still has one
  hash, I tried that it's ugly and requires more code deep down in protocol
  modules et al.

* propagate netns pointer to where needed, e. g. to conntrack iterators.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: per-netns conntrack count

Sysctls and proc files are stubbed to init_net's one. This is temporary.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: add ->ct_net -- pointer from conntrack to netns

Conntrack (struct nf_conn) gets pointer to netns: ->ct_net -- netns in which
it was created. It comes from netdevice.

->ct_net is write-once field.

Every conntrack in system has ->ct_net initialized, no exceptions.

->ct_net doesn't pin netns: conntracks are recycled after timeouts and
pinning background traffic will prevent netns from even starting shutdown
sequence.

Right now every conntrack is created in init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns nf_conntrack: add netns boilerplate

One comment: #ifdefs around #include is necessary to overcome amazing compile
breakages in NOTRACK-in-netns patch (see below).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns: ip6t_REJECT in netns for real

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns: ip6table_mangle in netns for real

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns: ip6table_raw in netns for real

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: netns: remove nf_*_net() wrappers

Now that dev_net() exists, the usefullness of them is even less. Also they're
a big problem in resolving circular header dependencies necessary for
NOTRACK-in-netns patch. See below.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: implement NFPROTO_UNSPEC as a wildcard for extensions

When a match or target is looked up using xt_find_{match,target},
Xtables will also search the NFPROTO_UNSPEC module list. This allows
for protocol-independent extensions (like xt_time) to be reused from
other components (e.g. arptables, ebtables).

Extensions that take different codepaths depending on match->family
or target->family of course cannot use NFPROTO_UNSPEC within the
registration structure (e.g. xt_pkttype).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: x_tables: use NFPROTO_* in extensions

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: Introduce NFPROTO_* constants

The netfilter subsystem only supports a handful of protocols (much
less than PF_*) and even non-PF protocols like ARP and
pseudo-protocols like PF_BRIDGE. By creating NFPROTO_*, we can earn a
few memory savings on arrays that previously were always PF_MAX-sized
and keep the pseudo-protocols to ourselves.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: xt_recent: IPv6 support

This updates xt_recent to support the IPv6 address family.
The new /proc/net/xt_recent directory must be used for this.
The old proc interface can also be configured out.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: rename ipt_recent to xt_recent

Like with other modules (such as ipt_state), ipt_recent.h is changed
to forward definitions to (IOW include) xt_recent.h, and xt_recent.c
is changed to use the new constant names.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: Use unsigned types for hooknum and pf vars

and (try to) consistently use u_int8_t for the L3 family.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netns: make uplitev6 mib per/namespace

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: make udpv6 mib per/namespace

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: add stub functions for per/namespace mibs allocation

The content of init_ipv6_mibs/cleanup_ipv6_mibs will be moved to new
calls one by one next.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: allow per device ipv6 snmp statistics in non-initial namespace

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: register global ipv6 mibs statistics in each namespace

Unused net variable will become used very soon.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: separate seq_ops for global & per/device ipv6 statistics

idev has been stored on seq->private. NULL has been stored for global
statistics.

The situation is changed with net namespace. We need to store pointer to
struct net and the only place is seq->private. So, we'll have for
/proc/net/dev_snmp6/* and for /proc/net/snmp6 pointers of two different
types stored in the same field.

This effectively requires to separate seq_ops of these files.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: consolidate ipv6 sock_stat code at the beginning of net/ipv6/proc.c

Simple, comsolidate sockstat6 staff in one place, at the beginning of
the file. Right now sockstat6_seq_open/sockstat6_seq_fops looks like an
intrusion in the middle of snmp6 code.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: register /proc/net/dev_snmp6/* in each ns

Do the same for /proc/net/snmp6.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: move /proc/net/dev_snmp6 to struct net

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: cleanup messy initializer

I'm quite sure that if I give this function in its old format
for you to inspect, you start to wonder what is the type of
demanded or if it's a global variable.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: kill pointless urg_mode

It all started from me noticing that this urgent check in
tcp_clean_rtx_queue is unnecessarily inside the loop. Then
I took a longer look to it and found out that the users of
urg_mode can trivially do without, well almost, there was
one gotcha.

Bonus: those funny people who use urg with >= 2^31 write_seq -
snd_una could now rejoice too (that's the only purpose for the
between being there, otherwise a simple compare would have done
the thing). Not that I assume that the rest of the tcp code
happily lives with such mind-boggling numbers :-). Alas, it
turned out to be impossible to set wmem to such numbers anyway,
yes I really tried a big sendfile after setting some wmem but
nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
So I hacked a bit variable to long and found out that it seems
to work... :-)

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: packet split receive api

Add some packet-split receive hooks.

For one this allows to do NUMA node affine page allocs. Later on these
hooks will be extended to do emergency reserve allocations for
fragments.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wrap sk->sk_backlog_rcv()

Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the
generic sk_backlog_rcv behaviour.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: initialize ip6_route sysctl vars in ip6_route_net_init()

This makes that ip6_route_net_init() does all of the route init code.
There used to be a race between ip6_route_net_init() and ip6_net_init()
and someone relying on the combined result was left out cold.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: clean up ip6_route_net_init() error handling

ip6_route_net_init() error handling looked less than solid, fix 'er up.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: Don't lookup the socket if there's a socket attached to the skb

Use the socket cached in the skb if it's present.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: Add udplib_lookup_skb() helpers

To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet_hashtables: Add inet_lookup_skb helpers

To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Respect SO_RCVLOWAT in tcp_poll().

Based upon a report by Vito Caputo.

Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: Simplify dev_requeue_skb and dequeue_skb

qdisc->requeue was planned to universally replace all requeuing code,
but at the top level we never requeue more than one skb, so qdisc->
gso_skb is enough for this. qdisc->requeue would be used on the lower
levels only for one level deep requeuing (like in sch_hfsc) after
finishing all the changes.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: Fix handling of gso skbs on requeuing

Jay Cliburn noticed and diagnosed a bug triggered in
dev_gso_skb_destructor() after last change from qdisc->gso_skb
to qdisc->requeue list. Since gso_segmented skbs can't be queued
to another list this patch brings back qdisc->gso_skb for them.

Reported-by: Jay Cliburn <jcliburn@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: MIGRATE enhancements (draft-ebalard-mext-pfkey-enhanced-migrate)

Provides implementation of the enhancements of XFRM/PF_KEY MIGRATE mechanism
specified in draft-ebalard-mext-pfkey-enhanced-migrate-00. Defines associated
PF_KEY SADB_X_EXT_KMADDRESS extension and XFRM/netlink XFRMA_KMADDRESS
attribute.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: pipe end-point protocol documentation

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: implement GPRS virtual interface over PEP socket

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: receive pipe control requests as out-of-band data

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: Pipe End Point for Phonet Pipes protocol

This protocol provides some connection handling and negotiated
congestion control. Nokia cellular modems use it for bulk transfers.
It provides packet boundaries (hence SOCK_SEQPACKET). Congestion
control is per packet rather per byte, so we do not re-use the
generic socket memory accounting.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: connected sockets glue

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: modules auto-loading support

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdrv: Fix unregister_netdev typos

Found during the (partial) unregister_netdevice audit that we didn't
have to have :)

It looks like a couple of Sun NIC drivers had unregister_netdevice
when they really meant unregister_netdev.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: correctly save sctp_adaptation from parameter.

The INIT perameter carries the adapatation value in network-byte
order. We need to store it in host byte order as expected
by data types and the user API.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: enable cookie-echo retransmission transport switch

This patch enables cookie-echo retransmission transport switch
feature. If COOKIE-ECHO retransmission happens, it will be sent
to the address other than the one last sent to.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Fix the SNMP counter of SCTP_MIB_OUTOFBLUES

RFC3873 defined SCTP_MIB_OUTOFBLUES:

sctpOutOfBlues OBJECT-TYPE
   SYNTAX         Counter32
   MAX-ACCESS     read-only
   STATUS         current
   DESCRIPTION
        "The number of out of the blue packets received by the host.
        An out of the blue packet is an SCTP packet correctly formed,
        including the proper checksum, but for which the receiver was
        unable to identify an appropriate association."
   REFERENCE
        "Section 8.4 in RFC2960 deals with the Out-Of-The-Blue
         (OOTB) packet definition and procedures."

But OOTB packet INIT, INIT-ACK and SHUTDOWN-ACK(COOKIE-WAIT or
COOKIE-ECHOED state) are not counted by SCTP_MIB_OUTOFBLUES.

Case 1(INIT):

Endpoint A               Endpoint B
(CLOSED)                 (CLOSED)

INIT     ---------->
          <----------    ABORT

Case 2(INIT-ACK):

Endpoint A               Endpoint B
(CLOSED)                 (CLOSED)

INIT-ACK  ---------->
           <----------   ABORT

Case 3(SHUTDOWN-ACK):

Endpoint A               Endpoint B
(CLOSED)                 (CLOSED)

          <----------    INIT
SHUTDOWN-ACK  ---------->
           <----------   SHUTDOWN-COMPLETE

Case 4(SHUTDOWN-ACK):

Endpoint A               Endpoint B
(CLOSED)                 (COOKIE-ECHOED)

SHUTDOWN-ACK  ---------->
           <----------   SHUTDOWN-COMPLETE

This patch fixed the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Fix to start T5-shutdown-guard timer while enter SHUTDOWN-SENT state

RFC 4960: Section 9.2
The sender of the SHUTDOWN MAY also start an overall guard timer
'T5-shutdown-guard' to bound the overall time for the shutdown
sequence. At the expiration of this timer, the sender SHOULD abort
the association by sending an ABORT chunk. If the 'T5-shutdown-
guard' timer is used, it SHOULD be set to the recommended value of 5
times 'RTO.Max'.

The timer 'T5-shutdown-guard' is used to counter the overall time
for shutdown sequence, and it's start by the sender of the SHUTDOWN.
So timer 'T5-shutdown-guard' should be start when we send the first
SHUTDOWN chunk and enter the SHUTDOWN-SENT state, not start when we
receipt of the SHUTDOWN primitive and enter SHUTDOWN-PENDING state.

If 'T5-shutdown-guard' timer is start at SHUTDOWN-PENDING state, the
association may be ABORT while data is still transmitting.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: try harder to figure out address family when checking wildcards

sctp_is_any() function that is used to check for wildcard addresses
only looks at the address itself to determine the address family.
This function is used in the API to check the address passed in from
the user. If the user simply zerroes out the sockaddr_storage and
pass that in, we'll end up failing. So, let's try harder to determine
the address family by also checking the socket if it's possible.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: reduce memory footprint of sctp_chunk structure

sctp_chunks should be put on a diet.  This is some of the low hanging
fruit that we can strip out.  Changes all the __s8/__u8 flags to
bitfields.  Saves 12 bytes per chunk.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Retransmit list is ineligable for missing indications

Chunks placed on the retransmit list are marked as inelegible
for fast retrasnmission. Since missing indications determine
when fast reransmission is done, there is not point in calling
sctp_mark_missing() on the retransmit list since those chunks
will not be marked.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Optimize SFR-CACC transport list walking during SACK processing

There is a possibility of walking the transport list twice during
SACK processing when doing SFR-CACC algorithm. We can restructure
the code to only do this once.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Only mark chunks as missing when there are gaps

Frist small step in optimizing SACK processing. Do not call
sctp_mark_missing() when there are no gaps reported and thus
not missing chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

udp: Export UDP socket lookup function

The iptables tproxy code has to be able to do UDP socket hash lookups,
so we have to provide an exported lookup function for this purpose.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Port redirection support for TCP

Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.

This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Make Netfilter's ip_route_me_harder() non-local address compatible

Netfilter's ip_route_me_harder() tries to re-route packets either
generated or re-routed by Netfilter. This patch changes
ip_route_me_harder() to handle packets from non-locally-bound sockets
with IP_TRANSPARENT set as local and to set the appropriate flowi
flags when re-doing the routing lookup.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Handle TCP SYN+ACK/ACK/RST transparency

The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to
incoming packets. The non-local source address check on output bites
us again, as replies for transparently redirected traffic won't have a
chance to leave the node.

This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing the
route lookup for those replies. Transparent replies are enabled if the
listening socket has the transparent socket flag set.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Conditionally enable transparent flow flag when connecting

Set FLOWI_FLAG_ANYSRC in flowi->flags if the socket has the
transparent socket option set. This way we selectively enable certain
connections with non-local source addresses to be routed.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Make inet_sock.h independent of route.h

inet_iif() in inet_sock.h requires route.h. Since users of inet_iif()
usually require other route.h functionality anyway this patch moves
inet_iif() to route.h.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Allow binding to non-local addresses if IP_TRANSPARENT is set

Setting IP_TRANSPARENT is not really useful without allowing non-local
binds for the socket. To make user-space code simpler we allow these
binds even if IP_TRANSPARENT is set but IP_FREEBIND is not.

Signed-off-by: Tóth László Attila <panther@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Implement IP_TRANSPARENT socket option

This patch introduces the IP_TRANSPARENT socket option: enabling that
will make the IPv4 routing omit the non-local source address check on
output. Setting IP_TRANSPARENT requires NET_ADMIN capability.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Loosen source address check on IPv4 output

ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.

This patch introduces a flowi flag which makes omitting this check
possible. The new flag provides a way of handling transparent and
non-transparent connections differently.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: BUG instead of corrupting memory in pskb_expand_head

If the caller of pskb_expand_head specifies a negative nhead
we'll silently overwrite other people's memory. This patch
makes it BUG instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipsec: Put dumpers on the dump list

Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:

As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep. It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.

I've also changed the order of addition on new states to prevent
a never-ending dump.

Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:

drivers/net/wireless/ath9k/core.c
drivers/net/wireless/ath9k/main.c
net/core/dev.c

af_key: Free dumping state on socket close

Fix a xfrm_{state,policy}_walk leak if pfkey socket is closed while
dumping is on-going.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: almost identical frag hashing funcs combined

$ diff-funcs ip6qhashfn reassembly.c netfilter/nf_conntrack_reasm.c
--- reassembly.c:ip6qhashfn()
+++ netfilter/nf_conntrack_reasm.c:ip6qhashfn()
@@ -1,5 +1,5 @@
-static unsigned int ip6qhashfn(__be32 id, struct in6_addr *saddr,
-        struct in6_addr *daddr)
+static unsigned int ip6qhashfn(__be32 id, const struct in6_addr *saddr,
+        const struct in6_addr *daddr)
{
u32 a, b, c;

@@ -9,7 +9,7 @@

a += JHASH_GOLDEN_RATIO;
b += JHASH_GOLDEN_RATIO;
- c += ip6_frags.rnd;
+ c += nf_frags.rnd;
__jhash_mix(a, b, c);

a += (__force u32)saddr->s6_addr32[3];

And codiff xx.o.old xx.o.new:

net/ipv6/netfilter/nf_conntrack_reasm.c:
  ip6qhashfn         | -512
  nf_hashfn          |   +6
  nf_ct_frag6_gather |  +36
3 functions changed, 42 bytes added, 512 bytes removed, diff: -470
net/ipv6/reassembly.c:
  ip6qhashfn    | -512
  ip6_hashfn    |   +7
  ipv6_frag_rcv |  +89
3 functions changed, 96 bytes added, 512 bytes removed, diff: -416

net/ipv6/reassembly.c:
  inet6_hash_frag | +510
1 function changed, 510 bytes added, diff: +510

Total: -376

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachep

ip6_dst_blackhole_ops.kmem_cachep is not expected to be NULL (i.e. to
be initialized) when dst_alloc() is called from ip6_dst_blackhole().
Otherwise, it results in the following (xfrm_larval_drop is now set to
1 by default):

[   78.697642] Unable to handle kernel paging request for data at address 0x0000004c
[   78.703449] Faulting instruction address: 0xc0097f54
[   78.786896] Oops: Kernel access of bad area, sig: 11 [#1]
[   78.792791] PowerMac
[   78.798383] Modules linked in: btusb usbhid bluetooth b43 mac80211 cfg80211 ehci_hcd ohci_hcd sungem sungem_phy usbcore ssb
[   78.804263] NIP: c0097f54 LR: c0334a28 CTR: c002d430
[   78.809997] REGS: eef19ad0 TRAP: 0300   Not tainted  (2.6.27-rc5)
[   78.815743] MSR: 00001032 <ME,IR,DR>  CR: 22242482  XER: 20000000
[   78.821550] DAR: 0000004c, DSISR: 40000000
[   78.827278] TASK = eef0df40[3035] 'mip6d' THREAD: eef18000
[   78.827408] GPR00: 00001032 eef19b80 eef0df40 00000000 00008020 eef19c30 00000001 00000000
[   78.833249] GPR08: eee5101c c05a5c10 ef9ad500 00000000 24242422 1005787c 00000000 1004f960
[   78.839151] GPR16: 00000000 10024e90 10050040 48030018 0fe44150 00000000 00000000 eef19c30
[   78.845046] GPR24: eef19e44 00000000 eef19bf8 efb37c14 eef19bf8 00008020 00009032 c0596064
[   78.856671] NIP [c0097f54] kmem_cache_alloc+0x20/0x94
[   78.862581] LR [c0334a28] dst_alloc+0x40/0xc4
[   78.868451] Call Trace:
[   78.874252] [eef19b80] [c03c1810] ip6_dst_lookup_tail+0x1c8/0x1dc (unreliable)
[   78.880222] [eef19ba0] [c0334a28] dst_alloc+0x40/0xc4
[   78.886164] [eef19bb0] [c03cd698] ip6_dst_blackhole+0x28/0x1cc
[   78.892090] [eef19be0] [c03d9be8] rawv6_sendmsg+0x75c/0xc88
[   78.897999] [eef19cb0] [c038bca4] inet_sendmsg+0x4c/0x78
[   78.903907] [eef19cd0] [c03207c8] sock_sendmsg+0xac/0xe4
[   78.909734] [eef19db0] [c03209e4] sys_sendmsg+0x1e4/0x2a0
[   78.915540] [eef19f00] [c03220a8] sys_socketcall+0xfc/0x210
[   78.921406] [eef19f40] [c0014b3c] ret_from_syscall+0x0/0x38
[   78.927295] --- Exception: c01 at 0xfe2d730
[   78.927297]     LR = 0xfe2d71c
[   78.939019] Instruction dump:
[   78.944835] 91640018 9144001c 900a0000 4bffff44 9421ffe0 7c0802a6 bf810010 7c9d2378
[   78.950694] 90010024 7fc000a6 57c0045e 7c000124 <83e3004c> 8383005c 2f9f0000 419e0050
[   78.956464] ---[ end trace 05fa1ed7972487a1 ]---

As commented by Benjamin Thery, the bug was introduced by
f2fc6a54585a1be6669613a31fbaba2ecbadcd36, while adding network
namespaces support to ipv6 routes.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

mv643xx_eth: hook up skb recycling

This gives a nice increase in the maximum loss-free packet forwarding
rate in routing workloads.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add skb_recycle_check() to enable netdriver skb recycling

This patch adds skb_recycle_check(), which can be used by a network
driver after transmitting an skb to check whether this skb can be
recycled as a receive buffer.

skb_recycle_check() checks that the skb is not shared or cloned, and
that it is linear and its head portion large enough (as determined by
the driver) to be recycled as a receive buffer. If these conditions
are met, it does any necessary reference count dropping and cleans
up the skbuff as if it just came from __alloc_skb().

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: NULL pointer dereferrence in tcp_v6_send_ack

The following actions are possible:
tcp_v6_rcv
  skb->dev = NULL;
  tcp_v6_do_rcv
    tcp_v6_hnd_req
      tcp_check_req
        req->rsk_ops->send_ack == tcp_v6_send_ack

So, skb->dev can be NULL in tcp_v6_send_ack. We must obtain namespace
from dst entry.

Thanks to Vitaliy Gusev <vgusev@openvz.org> for initial problem finding
in IPv4 code.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6

tcp: Fix NULL dereference in tcp_4_send_ack()

Fix NULL dereference in tcp_4_send_ack().

As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs:

BUG: unable to handle kernel NULL pointer dereference at 00000000000004d0
IP: [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250

Stack:  ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d
0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8
0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020
Call Trace:
<IRQ>  [<ffffffff80499c33>] tcp_v4_reqsk_send_ack+0x20/0x22
[<ffffffff8049bce5>] tcp_check_req+0x108/0x14c
[<ffffffff8047aaf7>] ? rt_intern_hash+0x322/0x33c
[<ffffffff80499846>] tcp_v4_do_rcv+0x399/0x4ec
[<ffffffff8045ce4b>] ? skb_checksum+0x4f/0x272
[<ffffffff80485b74>] ? __inet_lookup_listener+0x14a/0x15c
[<ffffffff8049babc>] tcp_v4_rcv+0x6a1/0x701
[<ffffffff8047e739>] ip_local_deliver_finish+0x157/0x24a
[<ffffffff8047ec9a>] ip_local_deliver+0x72/0x7c
[<ffffffff8047e5bd>] ip_rcv_finish+0x38d/0x3b2
[<ffffffff803d3548>] ? scsi_io_completion+0x19d/0x39e
[<ffffffff8047ebe5>] ip_rcv+0x2a2/0x2e5
[<ffffffff80462faa>] netif_receive_skb+0x293/0x303
[<ffffffff80465a9b>] process_backlog+0x80/0xd0
[<ffffffff802630b4>] ? __rcu_process_callbacks+0x125/0x1b4
[<ffffffff8046560e>] net_rx_action+0xb9/0x17f
[<ffffffff80234cc5>] __do_softirq+0xa3/0x164
[<ffffffff8020c52c>] call_softirq+0x1c/0x28
<EOI>  [<ffffffff8020de1c>] do_softirq+0x34/0x72
[<ffffffff80234b8e>] local_bh_enable_ip+0x3f/0x50
[<ffffffff804d43ca>] _spin_unlock_bh+0x12/0x14
[<ffffffff804599cd>] release_sock+0xb8/0xc1
[<ffffffff804a6f9a>] inet_stream_connect+0x146/0x25c
[<ffffffff80243078>] ? autoremove_wake_function+0x0/0x38
[<ffffffff8045751f>] sys_connect+0x68/0x8e
[<ffffffff80291818>] ? fd_install+0x5f/0x68
[<ffffffff80457784>] ? sock_map_fd+0x55/0x62
[<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80

Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de <48> 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48
RIP  [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250
RSP <ffffffff80762b78>
CR2: 00000000000004d0

Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

phonet: Protect if_phonet.h against multiple inclusions.

From: Remi Denis-Courmont <remi.denis-courmont@nokia.com>

Signed-off-by: David S. Miller <davem@davemloft.net>

ath5k: Add support for AR2417 v2

* Add support for AR2417 (include pci id) since my previous patch doesn't sit on top of base.c/ath5k.h anymore.
* Update module version to 0.6.0

Changes-Licensed-under: ISC
Signed-Off-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ath5k: Fix SREV reporting after SREV updates

* Fix srev reporting during attach

Changes-Licensed-under: ISC
Signed-Off-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>