Pekka Enberg [Thu, 30 Jun 2005 09:59:05 +0000 (02:59 -0700)]
[PATCH] freevxfs: minor cleanups
This patch addresses the following minor issues:
- Typo in printk
- Redundant casts
- Use C99 struct initializers instead of memset
- Parenthesis around return value
- Use inline instead of __inline__
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jay Lan [Thu, 30 Jun 2005 09:59:03 +0000 (02:59 -0700)]
[PATCH] Improper initrd failure message at boot time
On system boot up, there was an failure reported to boot.msg:
<5>Trying to move old root to /initrd ... failed
According to initrd(4) man page, step #7 of BOOT-UP OPERATION
is described as below:
7. If the normal root file has directory /initrd, device
/dev/ram0 is moved from / to /initrd. Otherwise if
directory /initrd does not exist device /dev/ram0 is
unmounted.
We got service calls from customers concerning about this failure message
at boot time. Many systems do not have /initrd and thus the message can be
changed in the case of non-existing /initrd so that it does not sound like
a failure of the system.
Signed-off-by: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chris Zankel [Thu, 30 Jun 2005 09:58:59 +0000 (02:58 -0700)]
[PATCH] xtensa: Removed local copy of zlib and fixed O= support
Removed an unnecessary local copy of zlib (sorry for the add'l traffic).
Fixed 'O=' support (thanks to Jan Dittmer for pointing it out). Some minor
clean-ups in the make files.
Signed-off-by: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Paris [Thu, 30 Jun 2005 09:58:51 +0000 (02:58 -0700)]
[PATCH] selinux_sb_copy_data() should not require a whole page
Currently selinux_sb_copy_data requires an entire page be allocated to
*orig when the function is called. This "requirement" is based on the fact
that we call copy_page(in_save, nosec_save) and in_save = orig when the
data is not FS_BINARY_MOUNTDATA. This means that if a caller were to call
do_kern_mount with only about 10 bytes of options, they would get passed
here and then we would corrupt PAGE_SIZE - 10 bytes of memory (with all
zeros.)
Currently it appears all in kernel FS's use one page of data so this has
not been a problem. An out of kernel FS did just what is described above
and it would almost always panic shortly after they tried to mount. From
looking else where in the kernel it is obvious that this string of data
must always be null terminated. (See example in do_mount where it always
zeros the last byte.) Thus I suggest we use strcpy in place of copy_page.
In this way we make sure the amount we copy is always less than or equal to
the amount we received and since do_mount is zeroing the last byte this
should be safe for all.
Signed-off-by: Eric Paris <eparis@parisplace.org> Cc: Stephen Smalley <sds@epoch.ncsc.mil> Acked-by: James Morris <jmorris@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kylene Jo Hall [Thu, 30 Jun 2005 09:58:50 +0000 (02:58 -0700)]
[PATCH] tpm: fix bug introduced by the /proc/misc
In fixing the /proc/misc problem that was reported last week where the tpm
module name was being obfuscated in /proc/misc I introduced a bug in the
module unloading code. This patch fixes the problem.
Signed-off-by: Kylene Hall <kjhall@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Wed, 29 Jun 2005 22:53:06 +0000 (18:53 -0400)]
[PATCH] reiserfs: enable attrs by default if saf
The following patch enables attrs by default if the reiserfs_attrs_cleared
bit is set in the superblock. This allows chattr-type attrs to be used
without any further action by the user.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Wed, 29 Jun 2005 22:52:28 +0000 (18:52 -0400)]
[PATCH] reiserfs: Check if attrs are enabled for attr ioctls
ReiserFS currently will allow the user to set/get attrs for files
regardless if they are enabled. The patch checks to see if they are
enabled, and returns -NOTTY if they are not.
ext[23] doesn't need this check because attrs are always enabled.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tony Lindgren [Wed, 29 Jun 2005 18:59:48 +0000 (19:59 +0100)]
[PATCH] ARM: 2771/1: Dynamic Tick support for OMAP, take 4
Patch from Tony Lindgren
This patch adds support for Dynamic Tick Timer for OMAP.
This patch is an updated version of ARM patch 2642/1 to
make it work with the already integrated generic ARM
dyntick support.
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Wed, 29 Jun 2005 17:45:19 +0000 (18:45 +0100)]
[PATCH] Serial: Split 8250 port table (part 2)
Remove legacy ISA serial ports for Accent, Boca, Fourport, Hub6 and MCA
from the architecture specific serial.h include.
The only ports which remain in asm-*/serial.h are the platform specific
entries. These should really be converted by platform maintainers to
use a platform device, such as can be found in
arch/arm/mach-footbridge/isa.c
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Wed, 29 Jun 2005 17:41:51 +0000 (18:41 +0100)]
[PATCH] Serial: Disable OX950 transmitter for flow control
Disable the transmitter whenever we want to prevent characters
being transmitted by flow control. However, if we run out of
characters to send and want to only disable the TX interrupt,
allow that scenario.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Wed, 29 Jun 2005 17:10:54 +0000 (18:10 +0100)]
[PATCH] ARM: 2723/2: remove __udivdi3 and __umoddi3 from the kernel
Patch from Nicolas Pitre
Those are big, slow and generally not recommended for kernel code.
They are even not present on i386. So it should be concluded that
one could as well get away with do_div() alone.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Prakash Punnoor [Wed, 29 Jun 2005 12:13:54 +0000 (14:13 +0200)]
[PATCH] Don't fill up log with atxp1 vcore messages change message
I am using the atxp1 module to change vcore on my NForce2 via userspace
daemon (see punnoor.de).
Currently the atxp1 module will write to the log on every vcore change,
thus filling up my log - which I don't want. I am no kernel coder, but
I guess, this one-liner will change this behaviour in a wanted way, ie
output will be made for debug purposes only.
Hugh Dickins [Wed, 29 Jun 2005 14:15:40 +0000 (15:15 +0100)]
[PATCH] Fix get_request nastiness
get_request is now expected to be holding on to queue_lock, with interrupts
disabled, when it returns NULL; but one path forgot that, causing all kinds
of nastiness under swap load - badness backtraces, strange failures, BUGs.
Catalin Marinas [Wed, 29 Jun 2005 14:34:39 +0000 (15:34 +0100)]
[PATCH] ARM: 2769/1: cpu_init() stack setup fix
Patch from Catalin Marinas
The compiler allocates r14 for the stk variable in the __asm__ directive.
This is a shadowed register and gets changed when the mode is changed,
causing random values in the SP register. The patch adds a clobber for
the r14 register.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Wed, 29 Jun 2005 14:15:54 +0000 (15:15 +0100)]
[PATCH] ARM: Convert ARM timer implementations to use readl/writel
Convert ARMs timer implementations to use readl/writel instead of accessing
the registers via a struct.
People have recently asked if accessing timers via a structure is the
"right way" and its not the Linux way. So fix this code to conform to
"The Linux Way"(tm).
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Ben Dooks [Wed, 29 Jun 2005 10:09:15 +0000 (11:09 +0100)]
[PATCH] ARM: 2764/1: S3C24XX - Common PM functions for Simtec boards
Patch from Ben Dooks
All current S3C24XX implementations from Simtec share the same
requirements for suspend/resume information.
This patch moves the save code out of the mach-bast.c file,
and into it's own so it can be shared by all the current
Simtec S3C24XX implementations.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Wed, 29 Jun 2005 08:42:38 +0000 (09:42 +0100)]
[PATCH] Serial: Adjust serial locking
This patch changes the way serial ports are locked when getting modem
status. This change is necessary because we will need to atomically
read the modem status and take action depending on the CTS status.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Something reverted most of the arch/sparc/Kconfig changes, leaving
arch/sparc/ unconfigurable. This patch re-removes the parts made redundant
by drivers/Kconfig in addition to a mysterious, spurious second instance of
source "mm/Kconfig". cvs strikes again?
Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- uv_ratio : allow a ratio of saturation between u and v. If you
have a ratio of 40 and a saturation of 100, usat will be 80 and
vstat 120. Useful to correct a bad color balance.
- full_luma_range : provide a better contrast in using the full
range 0-253 of values instead of 16-253.
- coring : to have a better black level.
- radio range is now defined on tuner-core.c. Cleaning up.
*tuner-core.c:
- some tuner_info msgs will be generated only if insmod opt
tuner_debug enabled.
- Implemented tuner-core support for VIDIO_S_TUNER to allow
changing mono/stereo mode
- Remove unneeded config options.
- I2C_CLIENT_MULTI option removed.
- support for Philips FMD12ME hybrid tuner
- allow to initialize with another tuner
- Move PHILIPS_FMD initialization code to set_type function,
* tda8290:
- Fix dumb error in tda8290 tunning.
- Radio tuner uses high-precision step instead of 62.5 KHz.
*tea5767.c:
- tuner_info msgs will be generated only if insmod tuner option
tuner_debug enabled.
- some cleanups for better reading.
- Radio tuner uses high-precision step instead of 62.5 KHz.
- Changing radio mode stereo/mono for tea5767 working.
*tuner-simple.c:
- TNF9533-D/IF UHF fixup.
- Radio tuners now uses high-precision step instead of 62.5 KHz.
*mt20xx.c:
- Radio tuner uses high-precision step instead of 62.5 KHz.
*tda9887.c:
- tab and blank spaces corrections.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br> Signed-off-by: Gerd Knorr <kraxel@bytesex.org> Signed-off-by: Nickolay V Shmyrev <nshmyrev@yandex.ru> Signed-off-by: Hartmut Hackmann <hartmut.hackmann@t-online.de> Signed-off-by: Michael Krufky <mkrufky@m1k.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Cox [Wed, 29 Jun 2005 03:45:18 +0000 (20:45 -0700)]
[PATCH] irqpoll
Anyone reporting a stuck IRQ should try these options. Its effectiveness
varies we've found in the Fedora case. Quite a few systems with misdescribed
IRQ routing just work when you use irqpoll. It also fixes up the VIA systems
although thats now fixed with the VIA quirk (which we could just make default
as its what Redmond OS does but Linus didn't like it historically).
A small number of systems have jammed IRQ sources or misdescribes that cause
an IRQ that we have no handler registered anywhere for. In those cases it
doesn't help.
Signed-off-by: Alan Cox <number6@the-village.bc.nu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently in ext3 block reservation code, the global filesystem reservation
tree lock (rsv_block) is hold during the process of searching for a space
to make a new reservation window, including while scaning the block bitmap
to verify if the avalible window has a free block. Holding the lock during
bitmap scan is unnecessary and could possibly cause scalability issue and
latency issues.
This patch tries to address this by dropping the lock before scan the
bitmap. Before that we need to reserve the open window in case someone
else is targetting at the same window. Question was should we reserve the
whole free reservable space or just the window size we need. Reserve the
whole free reservable space will possibly force other threads which
intended to do block allocation nearby move to another block group(cause
bad layout). In this patch, we just reserve the desired size before drop
the lock and scan the block bitmap. This patch fixed a ext3 reservation
latency issue seen on a cvs check out test. Patch is tested with many fsx,
tiobench, dbench and untar a kernel test.
Signed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Wed, 29 Jun 2005 03:45:15 +0000 (20:45 -0700)]
[PATCH] blk: light iocontext ops
get_io_context needlessly turned off interrupts and checked for racing io
context creations. Both of which aren't needed, because the io context can
only be created while in process context of the current process.
Also, split the function in 2. A light version, current_io_context does not
elevate the reference count specifically, but can be used when in process
context, because the process holds a reference itself.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Wed, 29 Jun 2005 03:45:13 +0000 (20:45 -0700)]
[PATCH] blk: __make_request efficiency
In the case where the request is not able to be merged by the elevator, don't
retake the lock and retry the merge mechanism after allocating a new request.
Instead assume that the chance of a merge remains slim, and now that we've
done most of the work allocating a request we may as well just go with it.
Also be rid of the GFP_ATOMIC allocation: we've got working mempools for the
block layer now, so let's save atomic memory for things like networking.
Lastly, in get_request_wait, do an initial get_request call before going into
the waitqueue. This is reported to help efficiency.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The return value of "match_int" is checked 27 out of 28 times
In lib/parser.c
142 /**
143 * match_int: - scan a decimal representation of an integer from a substring_t
144 * @s: substring_t to be scanned
145 * @result: resulting integer on success
146 *
147 * Description: Attempts to parse the &substring_t @s as a decimal integer. On
148 * success, sets @result to the integer represented by the string and returns 0.
149 * Returns either -ENOMEM or -EINVAL on failure.
150 */
151 int match_int(substring_t *s, int *result)
152 {
153 return match_number(s, result, 0);
154 }
GOTO Masanori [Wed, 29 Jun 2005 03:45:05 +0000 (20:45 -0700)]
[PATCH] headers: include linux/types.h for usb_ch9.h
This patch for usb_ch9.h includes linux/types.h instead of asm/types.h so that
__le16 and so on is explicitly defined. It also cleans up non standard //
comment.
GOTO Masanori [Wed, 29 Jun 2005 03:45:03 +0000 (20:45 -0700)]
[PATCH] headers: enable ppc64 ___arch__swab16 and ___arch__swab32
This patch cleans up asm-ppc64/byteorder.h to enable ___arch__swab16 and
___arch__swab32 which are marked TODO currently. It removes ___arch__swab64
because ppc64 does not have short instruction combinations for swab64, the
recent gcc generates enough smart code that is equivalent to hand assembled
code under my tests.
Signed-off-by: GOTO Masanori <gotom@debian.or.jp> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sébastien Dugu [Wed, 29 Jun 2005 03:44:59 +0000 (20:44 -0700)]
[PATCH] aio-retry-fix: fix aio retry work queueing
In the case of buffered AIO, in the aio retry path (aio_run_iocb), when the
retry method returns EIOCBRETRY the kicked iocb is added to the context run
list but is never queued onto the work queue. The request therefore is
never completed.
This patch fixes that by adding the appropriate call to aio_queue_work in
aio_run_aiocb so that subsequent retries will be handled by the aio worker
thread.
Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net> Acked-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Looks like it sneaked back with the NFS ACL merge..
Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Wed, 29 Jun 2005 03:44:54 +0000 (20:44 -0700)]
[PATCH] swabb.h warning fixes
In file included from drivers/media/dvb/ttpci/av7110_hw.c:38:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110_v4l.c:36:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110_av.c:37:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
drivers/isdn/icn/icn.c:719:4: warning: #warning TODO test headroom or use skb->nb to flag ACK
In file included from drivers/media/dvb/ttpci/av7110_ca.c:39:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110.c:41:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
Does declaring a function to return a const value actually mean something to
gcc?
Dunno. Kill it and replace sone `__inline__'s with `inline' too.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Wed, 29 Jun 2005 03:44:53 +0000 (20:44 -0700)]
[PATCH] hisax warning fixes
drivers/isdn/hisax/hfc4s8s_l1.c:317: warning: type qualifiers ignored on function return type
drivers/isdn/hisax/hfc4s8s_l1.c:329: warning: type qualifiers ignored on function return type
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid race occurs when some process have open file descriptor for class
device attributes and already firmware allocated memory are freed. Don't
allow negative loading timeout.
Signed-off-by: Stanislaw W. Gruszka <stf_xl@wp.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Wed, 29 Jun 2005 03:44:47 +0000 (20:44 -0700)]
[PATCH] ITIMER_REAL: fix possible deadlock and race
As Steven Rostedt pointed out, there are 2 problems with ITIMER_REAL
timers.
1. do_setitimer() does not call del_timer_sync() in case
when the timer is not pending (it_real_value() returns 0).
This is wrong, the timer may still be running, and it can
rearm itself.
2. It calls del_timer_sync() with tsk->sighand->siglock held.
This is deadlockable, because timer's handler needs this
lock too.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patrick McHardy [Tue, 28 Jun 2005 23:04:44 +0000 (16:04 -0700)]
[NETFILTER]: Fix connection tracking bug in 2.6.12
In 2.6.12 we started dropping the conntrack reference when a packet
leaves the IP layer. This broke connection tracking on a bridge,
because bridge-netfilter defers calling some NF_IP_* hooks to the bridge
layer for locally generated packets going out a bridge, where the
conntrack reference is no longer available. This patch keeps the
reference in this case as a temporary solution, long term we will
remove the defered hook calling. No attempt is made to drop the
reference in the bridge-code when it is no longer needed, tc actions
could already have sent the packet anywhere.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 28 Jun 2005 22:58:50 +0000 (15:58 -0700)]
[NET]: Add missing include to linux/netdevice.h
linux/etherdevice.h can't be included standalone at the moment, which
is required in order to sort the header files in the recommended
alphabetic order. This patch fixes that and is needed to build spider_net.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 28 Jun 2005 22:40:02 +0000 (15:40 -0700)]
[IPVS]: Close race conditions on ip_vs_conn_tab list modification
In an smp system, it is possible for an connection timer to expire, calling
ip_vs_conn_expire while the connection table is being flushed, before
ct_write_lock_bh is acquired.
Since the list iterator loop in ip_vs_con_flush releases and re-acquires the
spinlock (even though it doesn't re-enable softirqs), it is possible for the
expiration function to modify the connection list, while it is being traversed
in ip_vs_conn_flush.
The result is that the next pointer gets set to NULL, and subsequently
dereferenced, resulting in an oops.
Signed-off-by: Neil Horman <nhorman@redhat.com> Acked-by: JulianAnastasov Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 28 Jun 2005 22:25:31 +0000 (15:25 -0700)]
[NET]: Remove gratuitous use of skb->tail in network drivers.
Many drivers use skb->tail unnecessarily.
In these situations, the code roughly looks like:
dev = dev_alloc_skb(...);
[optional] skb_reserve(skb, ...);
... skb->tail ...
But even if the skb_reserve() happens, skb->data equals
skb->tail. So it doesn't make any sense to use anything
other than skb->data in these cases.
Another case was the s2io.c driver directly mucking with
the skb->data and skb->tail pointers. It really just wanted
to do an skb_reserve(), so that's what the code was changed
to do instead.
Another reason I'm making this change as it allows some SKB
cleanups I have planned simpler to merge. In those cleanups,
skb->head, skb->tail, and skb->end pointers are removed, and
replaced with skb->head_room and skb->tail_room integers.
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Jeff Garzik <jgarzik@pobox.com>
Ingo Molnar [Tue, 28 Jun 2005 14:40:42 +0000 (16:40 +0200)]
[PATCH] Tweak idle thread setup semantics
This patch tweaks idle thread setup semantics a bit: instead of setting
NEED_RESCHED in init_idle(), we do an explicit schedule() before calling
into cpu_idle().
This patch, while having no negative side-effects, enables wider use of
cond_resched()s. (which might happen in the stock kernel too, but it's
particulary important for voluntary-preempt)
Currently we cap request allocations at q->nr_requests, but we allow a
batching io context to allocate up to 32 more (default setting). This
can flood the queue with request allocations, with only a few batching
processes. The real fix would be to limit the number of batchers, but
as that isn't currently tracked, I suggest we just cap the maximum
number of allocated requests to eg 50% over the limit.
This was observed in real life, users typically see this as vmstat bo
numbers going off the wall with seconds of no queueing afterwards.
Behaviour this bursty is not beneficial.