[NETFILTER]: ip_tables: per-netns FILTER/MANGLE/RAW tables for real
Commit 9335f047fe61587ec82ff12fbb1220bcfdd32006 aka
"[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW"
added per-netns _view_ of iptables rules. They were shown to user, but
ignored by filtering code. Now that it's possible to at least ping loopback,
per-netns tables can affect filtering decisions.
netns is taken in case of
PRE_ROUTING, LOCAL_IN -- from in device,
POST_ROUTING, LOCAL_OUT -- from out device,
FORWARD -- from in device which should be equal to out device's netns.
This code is relatively new, so BUG_ON was plugged.
Wrappers were added to a) keep code the same from CONFIG_NET_NS=n users
(overwhelming majority), b) consolidate code in one place -- similar
changes will be done in ipv6 and arp netfilter code.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Thu, 20 Mar 2008 14:15:45 +0000 (15:15 +0100)]
[NETFILTER]: {ip,ip6}t_LOG: print MARK value in log output
Dump the mark value in log messages similar to nfnetlink_log. This
is useful for debugging complex setups where marks are used for
routing or traffic classification.
Alexey Dobriyan [Thu, 20 Mar 2008 14:15:43 +0000 (15:15 +0100)]
[NETFILTER]: nf_conntrack: less hairy ifdefs around proc and sysctl
Patch splits creation of /proc/net/nf_conntrack, /proc/net/stat/nf_conntrack
and net.netfilter hierarchy into their own functions with dummy ones
if PROC_FS or SYSCTL is not set. Also, remove dead "ret = 0" write
while I'm at it.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net>
[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().
This patches adds a call to increment IPSTATS_MIB_OUTFORWDATAGRAMS
when forwarding the packet in ip6_mr_forward() in the IPv6 multicast
routing module (net/ipv6/ip6mr.c).
Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:32:45 +0000 (22:32 -0700)]
[NETNS][DCCPV6]: Actually create ctl socket on each net and use it.
Move the call to inet_ctl_sock_create to init callback (and
inet_ctl_sock_destroy to exit one) and use proper ctl sock
in dccp_v6_ctl_send_reset.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:32:25 +0000 (22:32 -0700)]
[NETNS][DCCPV6]: Move the dccp_v6_ctl_sk on the struct net.
And replace all its usage with init_net's socket.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:32:02 +0000 (22:32 -0700)]
[NETNS][DCCPV6]: Add dummy per-net operations.
They will be responsible for ctl socket initialization, but
currently they are void.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:31:32 +0000 (22:31 -0700)]
[NETNS][DCCPV6]: Don't pass NULL to ip6_dst_lookup.
This call uses the sock to get the net to lookup the routing
in. With CONFIG_NET_NS this code will OOPS, since the sk ptr
is NULL.
After looking inside the ip6_dst_lookup and drawing the analogy
with respective ipv6 code, it seems, that the dccp ctl socket
is a good candidate for the first argument.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:31:05 +0000 (22:31 -0700)]
[NETNS][DCCPV4]: Enable DCCPv4 in net namespaces.
This enables sockets creation with IPPROTO_DCCP and enables
the ip level to pass DCCP packets to the DCCP level.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:30:43 +0000 (22:30 -0700)]
[NETNS][DCCPV4]: Make per-net socket lookup.
The inet_lookup family of functions requires a net to lookup
a socket in, so give a proper one to them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:30:19 +0000 (22:30 -0700)]
[NETNS][DCCPV4]: Use proper net to route the reset packet.
The dccp_v4_route_skb used in dccp_v4_ctl_send_reset, currently
works with init_net's routing tables - fix it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:29:59 +0000 (22:29 -0700)]
[NETNS][DCCPV4]: Actually create ctl socket on each net and use it.
Move the call to inet_ctl_sock_create to init callback (and
inet_ctl_sock_destroy to exit one) and use proper ctl sock
in dccp_v4_ctl_send_reset.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:29:37 +0000 (22:29 -0700)]
[NETNS][DCCPV4]: Move the dccp_v4_ctl_sk on the struct net.
And replace all its usage with init_net's socket.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:29:13 +0000 (22:29 -0700)]
[NETNS][DCCPV4]: Add dummy per-net operations.
They will be responsible for ctl socket initialization, but
currently they are void.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 14 Apr 2008 05:28:42 +0000 (22:28 -0700)]
[NETNS]: Add an empty netns_dccp structure on struct net.
According to the overall struct net design, it will be
filled with DCCP-related members.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/atm/horizon.c has unusually large number
of static inline functions - 36.
I looked through them. Most of them seems to be small enough,
but a few are big, others are using udelay or busy loop,
and as such are better not be inlined.
This patch removes "inline" from these static functions
(regardless of number of callsites - gcc nowadays auto-inlines
statics with one callsite).
Size difference for 32bit x86:
text data bss dec hex filename
8201 180 6 8387 20c3 linux-2.6-ALLYES/drivers/atm/horizon.o
7840 180 6 8026 1f5a linux-2.6.inline-ALLYES/drivers/atm/horizon.o
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sk_reuse is declared as "unsigned char", but is set as type valbool in net/core/sock.c.
There is no other place in net/ where sk->sk_reuse is set to a value > 1, so the test
"sk_reuse > 1" can not be true.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Allan Stephens [Mon, 14 Apr 2008 04:35:11 +0000 (21:35 -0700)]
[TIPC]: Improve socket time conversions
This patch modifies TIPC's socket code to use standard kernel
routines to handle time conversions between jiffies and ms.
This ensures proper operation even when HZ isn't 1000.
Acknowledgements to Eric Sesterhenn <snakebyte@gmx.de> for
identifying this issue and proposing a solution.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Sun, 13 Apr 2008 02:07:52 +0000 (19:07 -0700)]
LSM: Make the Labeled IPsec hooks more stack friendly
The xfrm_get_policy() and xfrm_add_pol_expire() put some rather large structs
on the stack to work around the LSM API. This patch attempts to fix that
problem by changing the LSM API to require only the relevant "security"
pointers instead of the entire SPD entry; we do this for all of the
security_xfrm_policy*() functions to keep things consistent.
Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Sun, 13 Apr 2008 02:06:42 +0000 (19:06 -0700)]
NetLabel: Allow passing the LSM domain as a shared pointer
Smack doesn't have the need to create a private copy of the LSM "domain" when
setting NetLabel security attributes like SELinux, however, the current
NetLabel code requires a private copy of the LSM "domain". This patches fixes
that by letting the LSM determine how it wants to pass the domain value.
* NETLBL_SECATTR_DOMAIN_CPY
The current behavior, NetLabel assumes that the domain value is a copy and
frees it when done
* NETLBL_SECATTR_DOMAIN
New, Smack-friendly behavior, NetLabel assumes that the domain value is a
reference to a string managed by the LSM and does not free it when done
Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[SCTP]: Remove an unused parameter from sctp_cmd_hb_timer_update
The 'asoc' parameter to sctp_cmd_hb_timer_update() is unused, and
we can remove it.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert P. J. Day [Sun, 13 Apr 2008 01:54:24 +0000 (18:54 -0700)]
[SCTP]: "list_for_each()" -> "list_for_each_entry()" where appropriate.
Replacing (almost) all invocations of list_for_each() with
list_for_each_entry() tightens up the code and allows for the deletion
of numerous list iterator variables that are no longer necessary.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Sun, 13 Apr 2008 01:53:48 +0000 (18:53 -0700)]
[SCTP]: Correct /proc/net/assocs formatting error
Recently I posted a patch to add some informational items to
/proc/net/sctp/assocs. All the information is correct, but because
of how the seqfile show operation is laid out, some of the formatting
is backwards. This patch corrects that formatting, so that the new
information appears at the end of each line, rather than in the middle.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian Haley [Fri, 11 Apr 2008 04:38:24 +0000 (00:38 -0400)]
[IPv6]: Change IPv6 unspecified destination address to ::1 for raw and un-connected sockets
This patch fixes a difference between IPv4 and IPv6 when sending packets
to the unspecified address (either 0.0.0.0 or ::) when using raw or
un-connected UDP sockets. There are two cases where IPv6 either fails
to send anything, or sends with the destination address set to ::. For
example:
--> ping -c1 0.0.0.0
PING 0.0.0.0 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.032 ms
[IPV6] MROUTE: Adjust IPV6 multicast routing module to use mroute6 header declarations.
- This patch adjusts IPv6 multicast routing module, net/ipv6/ip6mr.c,
to use mroute6 header definitions instead of mroute.
(MFC6_LINES instead of MFC_LINES, MAXMIFS instead of MAXVIFS, mifi_t
instead of vifi_t.)
- In addition, inclusion of some headers was removed as it is not needed.
Wang Chen [Mon, 7 Apr 2008 01:42:07 +0000 (09:42 +0800)]
[IPV6]: Check length of optval provided by user in setsockopt().
Check length of setsockopt's optval, which provided by user, before copy it
from user space.
For POSIX compliant, return -EINVAL for setsockopt of short lengths.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
[IPV6]: Sparse: Reuse previous delaration where appropriate.
| net/ipv6/ipv6_sockglue.c:162:16: warning: symbol 'net' shadows an earlier one
| net/ipv6/ipv6_sockglue.c:111:13: originally declared here
| net/ipv6/ipv6_sockglue.c:175:16: warning: symbol 'net' shadows an earlier one
| net/ipv6/ipv6_sockglue.c:111:13: originally declared here
| net/ipv6/ip6mr.c:1241:10: warning: symbol 'ret' shadows an earlier one
| net/ipv6/ip6mr.c:1163:6: originally declared here
IPV4: use xor rather than multiple ands for route compare
The comparison in ip_route_input is a hot path, by recoding the C
"and" as bit operations, fewer conditional branches get generated
so the code should be faster. Maybe someday Gcc will be smart
enough to do this?
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid unneeded test in the case where object to be freed
has to be a leaf. Don't need to use the generic tnode_free()
function, instead just setup leaf to be freed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[Syncookies]: Add support for TCP options via timestamps.
Allow the use of SACK and window scaling when syncookies are used
and the client supports tcp timestamps. Options are encoded into
the timestamp sent in the syn-ack and restored from the timestamp
echo when the ack is received.
Based on earlier work by Glenn Griffin.
This patch avoids increasing the size of structs by encoding TCP
options into the least significant bits of the timestamp and
by not using any 'timestamp offset'.
The downside is that the timestamp sent in the packet after the synack
will increase by several seconds.
changes since v1:
don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
and have ipv6/syncookies.c use it.
Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()
Reviewed-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Use vmalloc rather than alloc_pages to avoid wasting memory.
The problem is that tnode structure has a power of 2 sized array,
plus a header. So the current code wastes almost half the memory
allocated because it always needs the next bigger size to hold
that small header.
This is similar to an earlier patch by Eric, but instead of a list
and lock, I used a workqueue to handle the fact that vfree can't
be done in interrupt context.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[IPV6]: Remove unused declarations in include/net/ip6_route.h.
1) Standlaone ip6_null_entry is no longer needed as it is replaced by
the ip6_null_entry member of ipv6 (instance of struct netns_ipv6) in
struct net (as a result of Network Namespaces patches).
2) These 3 methods from this same header are not defined anywhere:
ip6_rt_addr_add(), ip6_rt_addr_del(), rt6_sndmsg()
Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This BUG_ON is not needed, since all (debug) checks are also done
in smp_call_function() which gets called by this function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ursula Braun <braunu@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert P. J. Day [Thu, 10 Apr 2008 09:11:24 +0000 (02:11 -0700)]
af_iucv: Use non-deprecated __RW_LOCK_UNLOCKED macro.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Ursula Braun <braunu@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 10 Apr 2008 09:02:28 +0000 (02:02 -0700)]
[SKFILTER]: Add SKF_ADF_NLATTR instruction
SKF_ADF_NLATTR searches for a netlink attribute, which avoids manually
parsing and walking attributes. It takes the offset at which to start
searching in the 'A' register and the attribute type in the 'X' register
and returns the offset in the 'A' register. When the attribute is not
found it returns zero.
A top-level attribute can be located using a filter like this
(example for nfnetlink, using struct nfgenmsg):
Some minor style cleanups:
* Move __KERNEL__ definitions to one place in filter.h
* Use const for sk_filter_len
* Line wrapping
* Put EXPORT_SYMBOL next to function definition
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 8 Apr 2008 15:56:52 +0000 (17:56 +0200)]
mac80211: fix key vs. sta locking problems
Up to now, key manipulation is supposed to run under RTNL to
avoid concurrent manipulations and also allow the set_key()
hardware callback to sleep. This is not feasible because STA
structs are rcu-protected and thus a lot of operations there
cannot take the RTNL. Also, key references are rcu-protected
so we cannot do things atomically.
This patch changes key locking completely:
* key operations are now atomic
* hardware crypto offload is enabled and disabled from
a workqueue, due to that key freeing is also delayed
* debugfs code is also run from a workqueue
* keys reference STAs (and vice versa!) so during STA
unlink the STAs key reference is removed but not the
keys STA reference, to avoid races key todo work is
run before STA destruction.
* fewer STA operations now need the RTNL which was
required due to key operations
This fixes the locking problems lockdep pointed out and also
makes things more light-weight because the rtnl isn't required
as much.
Note that the key todo lock/key mutex are global locks, this
is not required, of course, they could be per-hardware instead.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Tue, 8 Apr 2008 11:08:20 +0000 (13:08 +0200)]
mac80211: fix sta-info pinning
When a STA is supposed to be unlinked but is pinned, it still needs
to be unlinked from all structures. Only at the end of the unlink
process should we check for pin status and invalidate the callers
reference if it is pinned. Move the pin status check down.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
These two symbols are used only in ifdeffed function. Move them to that
section too.
net/mac80211/sta_info.c:387: warning: `__sta_info_pin' defined but not used
net/mac80211/sta_info.c:397: warning: `__sta_info_unpin' defined but not used
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Michael Wu <flamingice@sourmilk.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ron Rindjunsky [Mon, 7 Apr 2008 17:16:56 +0000 (10:16 -0700)]
mac80211: BA session debug prints changes
This patch contains next issues:
1 - prevents "stop BA session" multiple warnings
2 - adds debug print to stop Rx BA session flow
3 - adds EOL in one debug print
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mohamed Abbas [Fri, 4 Apr 2008 23:59:58 +0000 (16:59 -0700)]
mac80211: notify mac from low level driver (iwlwifi)
Add new API to MAC80211 to allow low level driver to
notify MAC with driver status.
Signed-off-by: Mohamed Abbas <mabbas@linux.intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Fri, 4 Apr 2008 21:33:37 +0000 (23:33 +0200)]
mac80211: make debugfs files root-only
Unfortunately, debugfs can be made to access invalid memory by
open()ing a file and then waiting until the corresponding debugfs
file has been removed (and, probably, the underlying object.)
That could be exploited by any user if the user is able to open
debugfs files and can cause networking devices, STA entries or
similar to disappear which is quite easy to do.
Hence, all debugfs files should be root-only.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Fri, 4 Apr 2008 08:41:56 +0000 (10:41 +0200)]
iwlwifi: honour regulatory restrictions in scan code
When doing firmware-assisted scanning, iwlwifi drivers do not
honour the regulatory control code that might disable channels
that are enabled in the EEPROM, for example when the user is
visiting another country and adjusted the regulatory domain
accordingly. This patch fixes that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
When drivers receive change notification they may do work that
will enable the changes to take effect. For example, if new association
the device needs to be programmed with this information.
Give the driver chance to make the changes before notifying the
upper layer - thus preventing race condition where upper layer
attempts to utilize state that may not be configured yet.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mohamed Abbas [Thu, 3 Apr 2008 23:05:24 +0000 (16:05 -0700)]
iwlwifi: fix rfkill memory error
Do not free reference to device twice. After rfkill registration succeeds
we only need to call rfkill_unregister() and not rfkill_free().
Also add some debugging.
Signed-off-by: Mohamed Abbas <mabbas@linux.intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>