James Morris [Tue, 31 Oct 2006 08:43:44 +0000 (00:43 -0800)]
[IPV6]: fix flowlabel seqfile handling
There's a bug in the seqfile show operation for flowlabel objects, where
each hash chain is traversed cumulatively for each element. The following
function is called for each element of each chain:
Thus, objects can appear mutliple times when reading
/proc/net/ip6_flowlabel, as the above is called for each element in the
chain.
The solution is to remove the while() loop from the above, and traverse
each chain exactly once, per the patch below. This also removes the
ip6fl_fl_seq_show() function, which does nothing else.
Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
James Morris [Tue, 31 Oct 2006 02:56:06 +0000 (18:56 -0800)]
[IPV6]: return EINVAL for invalid address with flowlabel lease request
Currently, when an application requests a lease for a flowlabel via the
IPV6_FLOWLABEL_MGR socket option, no error is returned if an invalid type
of destination address is supplied as part of the request, leading to a
silent failure. This patch ensures that EINVAL is returned to the
application in this case.
Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Tue, 31 Oct 2006 02:55:11 +0000 (18:55 -0800)]
[SCTP]: Remove temporary associations from backlog and hash.
Every time SCTP creates a temporary association, the stack hashes it,
puts it on a list of endpoint associations and increments the backlog.
However, the lifetime of a temporary association is the processing time
of a current packet and it's destroyed after that. In fact, we don't
really want anyone else finding this association. There is no reason to
do this extra work.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Tue, 31 Oct 2006 02:54:32 +0000 (18:54 -0800)]
[SCTP]: Correctly set IP id for SCTP traffic
Make SCTP 1-1 style and peeled-off associations behave like TCP when
setting IP id. In both cases, we set the inet_sk(sk)->daddr and initialize
inet_sk(sk)->id to a random value.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Mon, 30 Oct 2006 23:22:15 +0000 (15:22 -0800)]
[NetLabel]: protect the CIPSOv4 socket option from setsockopt()
This patch makes two changes to protect applications from either removing or
tampering with the CIPSOv4 IP option on a socket. The first is the requirement
that applications have the CAP_NET_RAW capability to set an IPOPT_CIPSO option
on a socket; this prevents untrusted applications from setting their own
CIPSOv4 security attributes on the packets they send. The second change is to
SELinux and it prevents applications from setting any IPv4 options when there
is an IPOPT_CIPSO option already present on the socket; this prevents
applications from removing CIPSOv4 security attributes from the packets they
send.
Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes bug in iptables modules refcounting on compat error way.
As we are getting modules in check_compat_entry_size_and_hooks(), in case of
later error, we should put them all in translate_compat_table(), not in the
compat_copy_entry_from_user() or compat_copy_match_from_user(), as it is now.
Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-by: Vasily Averin <vvs@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Josefsson [Mon, 30 Oct 2006 23:13:58 +0000 (15:13 -0800)]
[NETFILTER]: nf_conntrack: add missing unlock in get_next_corpse()
Add missing unlock in get_next_corpse() in nf_conntrack. It was missed
during the removal of listhelp.h . Also remove an unneeded use of
nf_ct_tuplehash_to_ctrack() in the same function.
Should be applied before 2.6.19 is released.
Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Mon, 30 Oct 2006 23:13:28 +0000 (15:13 -0800)]
[NETFILTER]: ip_tables: compat error way cleanup
This patch adds forgotten compat_flush_offset() call to error way of
translate_compat_table(). May lead to table corruption on the next
compat_do_replace().
Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Dmitry Mishin <dim@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Mishin [Mon, 30 Oct 2006 23:12:55 +0000 (15:12 -0800)]
[NETFILTER]: Missed and reordered checks in {arp,ip,ip6}_tables
There is a number of issues in parsing user-provided table in
translate_table(). Malicious user with CAP_NET_ADMIN may crash system by
passing special-crafted table to the *_tables.
The first issue is that mark_source_chains() function is called before entry
content checks. In case of standard target, mark_source_chains() function
uses t->verdict field in order to determine new position. But the check, that
this field leads no further, than the table end, is in check_entry(), which
is called later, than mark_source_chains().
The second issue, that there is no check that target_offset points inside
entry. If so, *_ITERATE_MATCH macro will follow further, than the entry
ends. As a result, we'll have oops or memory disclosure.
And the third issue, that there is no check that the target is completely
inside entry. Results are the same, as in previous issue.
Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
James Morris [Mon, 30 Oct 2006 23:08:42 +0000 (15:08 -0800)]
[IPV6]: fix lockup via /proc/net/ip6_flowlabel
There's a bug in the seqfile handling for /proc/net/ip6_flowlabel, where,
after finding a flowlabel, the code will loop forever not finding any
further flowlabels, first traversing the rest of the hash bucket then just
looping.
This patch fixes the problem by breaking after the hash bucket has been
traversed.
Note that this bug can cause lockups and oopses, and is trivially invoked
by an unpriveleged user.
Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 30 Oct 2006 07:46:42 +0000 (23:46 -0800)]
[SCTP]: Always linearise packet on input
I was looking at a RHEL5 bug report involving Xen and SCTP
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212550).
It turns out that SCTP wasn't written to handle skb fragments at
all. The absence of any calls to skb_may_pull is testament to
that.
It just so happens that Xen creates fragmented packets more often
than other scenarios (header & data split when going from domU to
dom0). That's what caused this bug to show up.
Until someone has the time sits down and audits the entire net/sctp
directory, here is a conservative and safe solution that simply
linearises all packets on input.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Mon, 30 Oct 2006 00:03:30 +0000 (16:03 -0800)]
[DCCP]: fix printk format warnings
Fix printk format warnings:
build2.out:net/dccp/ccids/ccid2.c:355: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:360: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:482: warning: long long unsigned int format, u64 arg (arg 5)
build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 4)
build2.out:net/dccp/ccids/ccid2.c:674: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:720: warning: long long unsigned int format, u64 arg (arg 3)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 29 Oct 2006 23:59:41 +0000 (15:59 -0800)]
[NET]: Fix segmentation of linear packets
skb_segment fails to segment linear packets correctly because it
tries to write all linear parts of the original skb into each
segment. This will always panic as each segment only contains
enough space for one MSS.
This was not detected earlier because linear packets should be
rare for GSO. In fact it still remains to be seen what exactly
created the linear packets that triggered this bug. Basically
the only time this should happen is if someone enables GSO
emulation on an interface that does not support SG.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Oct 2006 22:29:47 +0000 (15:29 -0700)]
[XFRM] xfrm_user: Fix unaligned accesses.
Use memcpy() to move xfrm_address_t objects in and out
of netlink messages. The vast majority of xfrm_user was
doing this properly, except for copy_from_user_state()
and copy_to_user_state().
Signed-off-by: David S. Miller <davem@davemloft.net>
Albert Cahalan [Mon, 30 Oct 2006 03:26:17 +0000 (22:26 -0500)]
[PATCH] fix i386 regparm=3 RT signal handlers on x86_64
The recent change to make x86_64 support i386 binaries compiled
with -mregparm=3 only covered signal handlers without SA_SIGINFO.
(the 3-arg "real-time" ones) This is useful for klibc at least.
Signed-off-by: Albert Cahalan <acahalan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Neil Brown [Mon, 30 Oct 2006 06:46:45 +0000 (22:46 -0800)]
[PATCH] sunrpc: fix refcounting problems in rpc servers
A recent patch fixed a problem which would occur when the refcount on an
auth_domain reached zero. This problem has not been reported in practice
despite existing in two major kernel releases because the refcount can
never reach zero.
This patch fixes the problems that stop the refcount reaching zero.
1/ We were adding to the refcount when inserting in the hash table,
but only removing from the hashtable when the refcount reached zero.
Obviously it never would. So don't count the implied reference of
being in the hash table.
2/ There are two paths on which a socket can be destroyed. One called
svcauth_unix_info_release(). The other didn't. So when the other was
taken, we can lose a reference to an ip_map which in-turn holds a
reference to an auth_domain
So unify the exit paths into svc_sock_put. This highlights the fact
that svc_delete_socket has slightly odd semantics - it does not drop
a reference but probably should. Fixing this need a bit more
thought and testing.
Signed-off-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
akpm@osdl.org [Mon, 30 Oct 2006 06:46:42 +0000 (22:46 -0800)]
[PATCH] uml: fix compilation options for USER_OBJS
From: Jeff Dike <jdike@addtoit.com>, Paolo Giarrusso <blaisorblade@yahoo.it>
Make sure that when compiling USER_OBJS the correct compilation options are
passed; since they are compiled with USER_CFLAGS which is derived from
CFLAGS, make sure it is a recursively evaluated variable, so that changes
to CFLAGS done afterwards the inclusion of arch/$(ARCH)/Makefile are
reflected in USER_CFLAGS.
For instance, without this patch userspace objects are never compiled with
debug info active.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Fix "Remove the use of _syscallX macros in UML"
Fix commit 5f4c6bc1f369f20807a8e753c2308d1629478c61: it spits out warnings
about missing syscall prototype (it is in <unistd.h>) and it does not
recognize that two uses of _syscallX are to be resolved against kernel
headers in the source tree, not against _syscallX; they in fact do not
compile and would not work anyway.
If _syscallX macros will be removed from the kernel tree altogether, the
only reasonable solution for that piece of code is switching to open-coded
inline assembly (it's remapping the whole executable from memory, except
the page containing this code).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Mon, 30 Oct 2006 06:46:40 +0000 (22:46 -0800)]
[PATCH] docbook: make a filesystems book
Make a filesystems DocBook book/file by moving all filesystems info from
kernel-api.tmpl. Will also merge journal-api.tmpl into it soon (with
permission from Roger Gammans). Localizes filesystem info and reduces size
of the huge (produced) kernel-api output files.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Mon, 30 Oct 2006 06:46:40 +0000 (22:46 -0800)]
[PATCH] MTD: fix last kernel-doc warning
Fix the last current kernel-doc warning:
Warning(/var/linsrc/linux-2619-rc3g5//include/linux/mtd/nand.h:416): No description found for parameter 'write_page'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When gigaset_initbcs() is called, cs->dev is not initialized yet. If
dev_alloc_skb() failed in this function, NULL poinster dereference will
happen at dev_warn().
Cc: Kai Germaschewski <kai.germaschewski@gmx.de> Cc: Hansjoerg Lipp <hjlipp@web.de> Cc: Tilman Schmidt <tilman@imap.cc> Acked-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Mon, 30 Oct 2006 06:46:34 +0000 (22:46 -0800)]
[PATCH] ndiswrapper: don't set the module->taints flags
For ndiswrapper, don't set the module->taints flags, just set the kernel
global tainted flag. This should allow ndiswrapper to continue to use GPL
symbols.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Florin Malita <fmalita@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jens Axboe [Mon, 30 Oct 2006 18:54:23 +0000 (19:54 +0100)]
[PATCH] CFQ: bad locking in changed_ioprio()
When the ioprio code recently got juggled a bit, a bug was introduced.
changed_ioprio() is no longer called with interrupts disabled, so using
plain spin_lock() on the queue_lock is a bug.
Jens Axboe [Mon, 30 Oct 2006 18:07:48 +0000 (19:07 +0100)]
[PATCH] CFQ: use irq safe locking in cfq_cic_link()
If cfq_set_request() is called for a new process AND a non-fs io
request (so that __GFP_WAIT may not be set), cfq_cic_link() may
use spin_lock_irq() and spin_unlock_irq() with interrupts already
disabled.
Fix is to always use irq safe locking in cfq_cic_link()
Acked-By: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As per Guennadi Liakhovetski, the mac address change support code breaks
some normal uses (_without_ any address changes), and until it's all
sorted out, we're better off without it.
Says Francois:
"Go revert it.
Despite what I claimed, I can not find a third-party confirmation by
email that it works elsewhere.
It would probably be enough to remove the call to
__rtl8169_set_mac_addr() in rtl8169_hw_start() though."
Linus Torvalds [Mon, 30 Oct 2006 01:25:48 +0000 (17:25 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3914/1: [Jornada7xx] - Typo Fix in cpu-sa1110.c (b != B)
[ARM] 3913/1: n2100: fix IRQ routing for second ethernet port
[ARM] Add KBUILD_IMAGE target support
[ARM] Fix suspend oops caused by PXA2xx PCMCIA driver
[ARM] Fix i2c-pxa slave mode support
[ARM] 3900/1: Fix VFP Division by Zero exception handling.
[ARM] 3899/1: Fix the normalization of the denormal double precision number.
[ARM] 3909/1: Disable UWIND_INFO for ARM (again)
[ARM] Add __must_check to uaccess functions
[ARM] Add realview SMP default configuration
[ARM] Fix SMP irqflags support
Oleg Nesterov [Sun, 29 Oct 2006 15:57:16 +0000 (18:57 +0300)]
[PATCH] taskstats: fix sk_buff size calculation
prepare_reply() adds GENL_HDRLEN to the payload (genlmsg_total_size()),
but then it does genlmsg_put()->nlmsg_put(). This means we forget to
reserve a room for 'struct nlmsghdr'.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Graf <tgraf@suug.ch> Cc: Andrew Morton <akpm@osdl.org> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Sun, 29 Oct 2006 13:45:58 +0000 (16:45 +0300)]
[PATCH] taskstats: fix sk_buff leak
'return genlmsg_cancel()' in taskstats_user_cmd/taskstats_exit_send
potentially leaks a skb. Unless we pass 'rep_skb' to the netlink layer
we own sk_buff. This means we should always do kfree_skb() on failure.
[ Thomas acked and pointed out missing return value in original version ]
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Thomas Graf <tgraf@suug.ch> Cc: Andrew Morton <akpm@osdl.org> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stefan Richter [Sun, 29 Oct 2006 18:52:49 +0000 (19:52 +0100)]
ieee1394: ohci1394: revert fail on error in suspend
Some errors during preparation for suspended state can be skipped with a
warning instead of a failure of the whole suspend transition, notably an
error in pci_set_power_state.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
[ARM] 3913/1: n2100: fix IRQ routing for second ethernet port
The second ethernet port on the Thecus n2100 was incorrectly assigned
to XINT1 instead of the correct XINT3 (PCI INTB instead of INTD), which
caused that port to be non-functional.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As reported by Martin J. Bligh <mbligh@google.com>, we let through some
non-slab bits to slab allocation through __get_vm_area_node when doing a
vmalloc.
I haven't been able to reproduce this, although I understand why it
happens: vmalloc allocates memory with
GFP_KERNEL | __GFP_HIGHMEM
and commit 52fd24ca1db3a741f144bbc229beefe044202cac resulted in the same
flags are passed down to cache_alloc_refill, causing the BUG. The
following patch fixes it.
Note that when calling kmalloc_node, I am masking off __GFP_HIGHMEM with
GFP_LEVEL_MASK, whereas __vmalloc_area_node does the same with
~(__GFP_HIGHMEM | __GFP_ZERO).
IMHO, using GFP_LEVEL_MASK is preferable, but either should fix this
problem.
Signed-off-by: Giridhar Pemmasani (pgiri@yahoo.com) Cc: Martin J. Bligh <mbligh@google.com> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Russell King [Sun, 29 Oct 2006 12:51:05 +0000 (12:51 +0000)]
[ARM] Add KBUILD_IMAGE target support
Add support for KBUILD_IMAGE on ARM. This takes the usual target
specifiers (zImage/Image/etc) in the same way that powerpc does
(iow, without the arch/arm/boot prefix).
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Sat, 28 Oct 2006 21:42:56 +0000 (22:42 +0100)]
[ARM] Fix suspend oops caused by PXA2xx PCMCIA driver
The PXA2xx PCMCIA driver was registering a device_driver with the
platform_bus_type. Unfortunately, this causes data outside the
device_driver structure to be dereferenced as if it were a
platform_driver structure, causing an oops. Convert the PXA2xx
core driver to use the proper platform_driver structure.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Sat, 28 Oct 2006 18:38:39 +0000 (11:38 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] cio: Make ccw_device_register() static.
[S390] Improve AP bus device removal.
[S390] uaccess error handling.
[S390] cio: css_probe_device() must be called enabled.
[S390] Initialize interval value to 0.
[S390] sys_getcpu compat wrapper.
Mel Gorman [Sat, 28 Oct 2006 17:38:59 +0000 (10:38 -0700)]
[PATCH] Calculation fix for memory holes beyong the end of physical memory
absent_pages_in_range() made the assumption that users of the
arch-independent zone-sizing API would not care about holes beyound the end
of physical memory. This was not the case and was "fixed" in a patch
called "Account for holes that are outside the range of physical memory".
However, when given a range that started before a hole in "real" memory and
ended beyond the end of memory, it would get the result wrong. The bug is
in mainline but a patch is below.
It has been tested successfully on a number of machines and architectures.
Additional credit to Keith Mannthey for discovering the problem, helping
identify the correct fix and confirming it Worked For Him.
Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: keith mannthey <kmannth@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Stern [Sat, 28 Oct 2006 17:38:58 +0000 (10:38 -0700)]
[PATCH] workqueue: update kerneldoc
This patch (as812) changes the kerneldoc comments explaining the return
values from queue_work(), queue_delayed_work(), and
queue_delayed_work_on(). The updated comments explain more accurately the
meaning of the return code and avoid suggesting that a 0 value means the
routine was unsuccessful.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jim Houston [Sat, 28 Oct 2006 17:38:56 +0000 (10:38 -0700)]
[PATCH] time_adjust cleared before use
I notice that the code which implements adjtime clears the time_adjust
value before using it. The attached patch makes the obvious fix.
Acked-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Jim Houston <jim.houston@ccur.com> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Sat, 28 Oct 2006 17:38:55 +0000 (10:38 -0700)]
[PATCH] move SYS_HYPERVISOR inside the Generic Driver menu
Put SYS_HYPERVISOR inside the Generic Driver Config menu where it should
be. Otherwise xconfig displays it as a dangling (lost) menu item under
Device Drivers, all by itself (when all options are displayed).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: <holzheu@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Sat, 28 Oct 2006 17:38:54 +0000 (10:38 -0700)]
[PATCH] fill_tgid: cleanup delays accounting
fill_tgid() should skip not only an already exited group leader. If the
task has ->exit_state != 0 it already did exit_notify(), so it also did
fill_tgid_exit()->delayacct_add_tsk(->signal->stats) and we should skip it
to avoid a double accounting.
This patch doesn't close the race completely, but it cleanups the code.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Sat, 28 Oct 2006 17:38:54 +0000 (10:38 -0700)]
[PATCH] taskstats: don't use tasklist_lock
Remove tasklist_lock from taskstats.c. find_task_by_pid() is rcu-safe.
->siglock allows us to traverse subthread without tasklist.
Q: delay accounting looks wrong to me. If sub-thread has already called
taskstats_exit_send() but didn't call release_task(self) yet it will be
accounted twice. The window is big. No?
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Sat, 28 Oct 2006 17:38:53 +0000 (10:38 -0700)]
[PATCH] taskstats: kill ->taskstats_lock in favor of ->siglock
signal_struct is (mostly) protected by ->sighand->siglock, I think we don't
need ->taskstats_lock to protect ->stats. This also allows us to simplify the
locking in fill_tgid().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Sat, 28 Oct 2006 17:38:51 +0000 (10:38 -0700)]
[PATCH] taskstats_tgid_free: fix usage
taskstats_tgid_free() is called on copy_process's error path. This is wrong.
IF (clone_flags & CLONE_THREAD)
We should not clear ->signal->taskstats, current uses it,
it probably has a valid accumulated info.
ELSE
taskstats_tgid_init() set ->signal->taskstats = NULL,
there is nothing to free.
Move the callsite to __exit_signal(). We don't need any locking, entire
thread group is exiting, nobody should have a reference to soon to be
released ->signal.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is possible that current != tsk. Probably it was supposed
to be 'tsk->group_leader->start_time. But why we are reading
group_leader's start_time ? This accounting is per thread,
not per procees, I changed this to 'tsk->start_time.
Please corect me.
tsk->parent never == NULL, and it is unsafe to dereference it.
Both the task and it's parent may exit after the caller unlocks
tasklist_lock, the memory could be unmapped (DEBUG_SLAB).
(And we should use ->real_parent->tgid in fact).
Q: I don't understand the 'if (thread_group_leader(tsk))' check.
Why it is needed ?
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael Holzheu [Sat, 28 Oct 2006 17:38:47 +0000 (10:38 -0700)]
[PATCH] strstrip remove last blank fix
strstrip() does not remove the last blank from strings which only consist
of blanks.
Example:
char string[] = " ";
strstrip(string);
results in " ", but should produce an empty string!
The following patch solves this problem:
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by Joern Engel <joern@wh.fh-wedel.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Rothwell [Sat, 28 Oct 2006 17:38:46 +0000 (10:38 -0700)]
[PATCH] Constify compat_get_bitmap argument
This means we can call it when the bitmap we want to fetch is declared
const.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Howells [Sat, 28 Oct 2006 17:38:46 +0000 (10:38 -0700)]
[PATCH] VFS: Fix an error in unused dentry counting
With Vasily Averin <vvs@sw.ru>
Fix an error in unused dentry counting in shrink_dcache_for_umount_subtree()
in which the count is modified without the dcache_lock held.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Vasily Averin <vvs@sw.ru> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# If prune_dcache finds a dentry that it cannot free, it leaves it where it
# is (at the tail of the list) and exits, on the assumption that some other
# thread will be removing that dentry soon.
However as far as I see this comment is not correct: when we cannot take
s_umount rw_semaphore (for example because it was taken in do_remount) this
dentry is already extracted from dentry_unused list and we do not add it
into the list again. Therefore dentry will not be found by prune_dcache()
and shrink_dcache_sb() and will leave in memory very long time until the
partition will be unmounted.
The patch adds this dentry into tail of the dentry_unused list.
Signed-off-by: Vasily Averin <vvs@sw.ru> Cc: Neil Brown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 28 Oct 2006 17:38:43 +0000 (10:38 -0700)]
[PATCH] hugetlb: fix absurd HugePages_Rsvd
If you truncated an mmap'ed hugetlbfs file, then faulted on the truncated
area, /proc/meminfo's HugePages_Rsvd wrapped hugely "negative". Reinstate my
preliminary i_size check before attempting to allocate the page (though this
only fixes the most obvious case: more work will be needed here).
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 28 Oct 2006 17:38:43 +0000 (10:38 -0700)]
[PATCH] hugetlb: fix prio_tree unit
hugetlb_vmtruncate_list was misconverted to prio_tree: its prio_tree is in
units of PAGE_SIZE (PAGE_CACHE_SIZE) like any other, not HPAGE_SIZE (whereas
its radix_tree is kept in units of HPAGE_SIZE, otherwise slots would be
absurdly sparse).
At first I thought the error benign, just calling __unmap_hugepage_range on
more vmas than necessary; but on 32-bit machines, when the prio_tree is
searched correctly, it happens to ensure the v_offset calculation won't
overflow. As it stood, when truncating at or beyond 4GB, it was liable to
discard pages COWed from lower offsets; or even to clear pmd entries of
preceding vmas, triggering exit_mmap's BUG_ON(nr_ptes).
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 28 Oct 2006 17:38:41 +0000 (10:38 -0700)]
[PATCH] hugetlb: fix size=4G parsing
On 32-bit machines, mount -t hugetlbfs -o size=4G gave a 0GB filesystem,
size=5G gave a 1GB filesystem etc: there's no point in masking size with
HPAGE_MASK just before shifting its lower bits away, and since HPAGE_MASK is a
UL, that removed all the higher bits of the unsigned long long size.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Sat, 28 Oct 2006 17:38:40 +0000 (10:38 -0700)]
[PATCH] cciss: fix printk format warning
Fix printk format warnings:
drivers/block/cciss.c:2000: warning: long long int format, long unsigned int arg (arg 2)
drivers/block/cciss.c:2035: warning: long long int format, long unsigned int arg (arg 2)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] __vmalloc with GFP_ATOMIC causes 'sleeping from invalid context'
If __vmalloc is called to allocate memory with GFP_ATOMIC in atomic
context, the chain of calls results in __get_vm_area_node allocating memory
for vm_struct with GFP_KERNEL, causing the 'sleeping from invalid context'
warning. This patch fixes it by passing the gfp flags along so
__get_vm_area_node allocates memory for vm_struct with the same flags.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pavel Emelianov [Sat, 28 Oct 2006 17:38:33 +0000 (10:38 -0700)]
[PATCH] Fix potential OOPs in blkdev_open()
blkdev_open() calls bc_acquire() to get a struct block_device. Since
bc_acquire() may return NULL when system is out of memory an appropriate
check is required.
Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Yasunori Goto [Sat, 28 Oct 2006 17:38:32 +0000 (10:38 -0700)]
[PATCH] memory hotplug: __GFP_NOWARN is better for __kmalloc_section_memmap()
Add __GFP_NOWARN flag to calling of __alloc_pages() in
__kmalloc_section_memmap(). It can reduce noisy failure message.
In ia64, section size is 1 GB, this means that order 8 pages are necessary
for each section's memmap. It is often very hard requirement under heavy
memory pressure as you know. So, __alloc_pages() gives up allocation and
shows many noisy stack traces which means no page for each sections.
(Current my environment shows 32 times of stack trace....)
But, __kmalloc_section_memmap() calls vmalloc() after failure of it, and it
can succeed allocation of memmap. So, its stack trace warning becomes just
noisy. I suppose it shouldn't be shown.
Randy Dunlap [Sat, 28 Oct 2006 17:38:32 +0000 (10:38 -0700)]
[PATCH] md: fix printk format warnings, seen on powerpc64:
drivers/md/raid1.c:1479: warning: long long unsigned int format, long unsigned int arg (arg 4)
drivers/md/raid10.c:1475: warning: long long unsigned int format, long unsigned int arg (arg 4)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Sat, 28 Oct 2006 17:38:31 +0000 (10:38 -0700)]
[PATCH] md: fix up maintenance of ->degraded in multipath
A recent fix which made sure ->degraded was initialised properly exposed a
second bug - ->degraded wasn't been updated when drives failed or were
hot-added.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Sat, 28 Oct 2006 17:38:30 +0000 (10:38 -0700)]
[PATCH] md: simplify checking of available size when resizing an array
When "mdadm --grow --size=xxx" is used to resize an array (use more or less of
each device), we check the new siza against the available space in each
device.
We already have that number recorded in rdev->size, so calculating it is
pointless (and wrong in one obscure case).
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
bibo,mao [Sat, 28 Oct 2006 17:38:29 +0000 (10:38 -0700)]
[PATCH] fix efi_memory_present_wrapper()
efi_memory_present_wrapper() parameter start/end is physical address, but
function memory_present parameter is PFN, this patch converts physical
address to PFN.
Signed-off-by: bibo, mao <bibo.mao@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Sandeen [Sat, 28 Oct 2006 17:38:28 +0000 (10:38 -0700)]
[PATCH] jbd2: journal_dirty_data re-check for unmapped buffers
When running several fsx's and other filesystem stress tests, we found
cases where an unmapped buffer was still being sent to submit_bh by the
ext3 dirty data journaling code.
I saw this happen in two ways, both related to another thread doing a
truncate which would unmap the buffer in question.
Either we would get into journal_dirty_data with a bh which was already
unmapped (although journal_dirty_data_fn had checked for this earlier, the
state was not locked at that point), or it would get unmapped in the middle
of journal_dirty_data when we dropped locks to call sync_dirty_buffer.
By re-checking for mapped state after we've acquired the bh state lock, we
should avoid these races. If we find a buffer which is no longer mapped,
we essentially ignore it, because journal_unmap_buffer has already decided
that this buffer can go away.
I've also added tracepoints in these two cases, and made a couple other
tracepoint changes that I found useful in debugging this.
Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Sandeen [Sat, 28 Oct 2006 17:38:27 +0000 (10:38 -0700)]
[PATCH] jbd: journal_dirty_data re-check for unmapped buffers
When running several fsx's and other filesystem stress tests, we found
cases where an unmapped buffer was still being sent to submit_bh by the
ext3 dirty data journaling code.
I saw this happen in two ways, both related to another thread doing a
truncate which would unmap the buffer in question.
Either we would get into journal_dirty_data with a bh which was already
unmapped (although journal_dirty_data_fn had checked for this earlier, the
state was not locked at that point), or it would get unmapped in the middle
of journal_dirty_data when we dropped locks to call sync_dirty_buffer.
By re-checking for mapped state after we've acquired the bh state lock, we
should avoid these races. If we find a buffer which is no longer mapped,
we essentially ignore it, because journal_unmap_buffer has already decided
that this buffer can go away.
I've also added tracepoints in these two cases, and made a couple other
tracepoint changes that I found useful in debugging this.
Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Sat, 28 Oct 2006 17:38:26 +0000 (10:38 -0700)]
[PATCH] ext4: fix printk format warnings
fs/ext4/resize.c:72: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:76: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:81: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:85: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:89: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:89: warning: long long unsigned int format, __u64 arg (arg 5)
fs/ext4/resize.c:93: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:93: warning: long long unsigned int format, __u64 arg (arg 5)
fs/ext4/resize.c:98: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:103: warning: long long unsigned int format, __u64 arg (arg 4)
fs/ext4/resize.c:109: warning: long long unsigned int format, __u64 arg (arg 4)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Martin Bligh [Sat, 28 Oct 2006 17:38:25 +0000 (10:38 -0700)]
[PATCH] Use min of two prio settings in calculating distress for reclaim
If try_to_free_pages / balance_pgdat are called with a gfp_mask specifying
GFP_IO and/or GFP_FS, they will reclaim the requisite number of pages, and the
reset prev_priority to DEF_PRIORITY (or to some other high (ie: unurgent)
value).
However, another reclaimer without those gfp_mask flags set (say, GFP_NOIO)
may still be struggling to reclaim pages. The concurrent overwrite of
zone->prev_priority will cause this GFP_NOIO thread to unexpectedly cease
deactivating mapped pages, thus causing reclaim difficulties.
Fix this is to key the distress calculation not off zone->prev_priority, but
also take into account the local caller's priority by using
min(zone->prev_priority, sc->priority)
Signed-off-by: Martin J. Bligh <mbligh@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Martin Bligh [Sat, 28 Oct 2006 17:38:24 +0000 (10:38 -0700)]
[PATCH] vmscan: Fix temp_priority race
The temp_priority field in zone is racy, as we can walk through a reclaim
path, and just before we copy it into prev_priority, it can be overwritten
(say with DEF_PRIORITY) by another reclaimer.
The same bug is contained in both try_to_free_pages and balance_pgdat, but
it is fixed slightly differently. In balance_pgdat, we keep a separate
priority record per zone in a local array. In try_to_free_pages there is
no need to do this, as the priority level is the same for all zones that we
reclaim from.
Impact of this bug is that temp_priority is copied into prev_priority, and
setting this artificially high causes reclaimers to set distress
artificially low. They then fail to reclaim mapped pages, when they are,
in fact, under severe memory pressure (their priority may be as low as 0).
This causes the OOM killer to fire incorrectly.
From: Andrew Morton <akpm@osdl.org>
__zone_reclaim() isn't modifying zone->prev_priority. But zone->prev_priority
is used in the decision whether or not to bring mapped pages onto the inactive
list. Hence there's a risk here that __zone_reclaim() will fail because
zone->prev_priority ir large (ie: low urgency) and lots of mapped pages end up
stuck on the active list.
Fix that up by decreasing (ie making more urgent) zone->prev_priority as
__zone_reclaim() scans the zone's pages.
This bug perhaps explains why ZONE_RECLAIM_PRIORITY was created. It should be
possible to remove that now, and to just start out at DEF_PRIORITY?
Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Russell King [Wed, 25 Oct 2006 12:59:16 +0000 (13:59 +0100)]
[ARM] Fix SMP irqflags support
The IRQ changes a while back broke the build for SMP machines.
Fix up the SMP code to use set_irq_regs/get_irq_regs as
appropriate. Also, fix a warning in arch/arm/kernel/time.c
where 'regs' becomes unused for SMP builds.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6:
PCI: Remove quirk_via_abnormal_poweroff
PCI: reset pci device state to unknown state for resume
PCI: x86-64: mmconfig missing printk levels
PCI: fix pci_fixup_video as it blows up on sparc64
acpiphp: fix latch status