sched: fix 2.6.27-rc5 couldn't boot on tulsa machine randomly
On my tulsa x86-64 machine, kernel 2.6.25-rc5 couldn't boot randomly.
Basically, function __enable_runtime forgets to reset rt_rq->rt_throttled
to 0. When every cpu is up, per-cpu migration_thread is created and it runs
very fast, sometimes to mark the corresponding rt_rq->rt_throttled to 1 very
quickly. After all cpus are up, with below calling chain:
_enable_runtime is called against every rt_rq again, so rt_rq->rt_time is
reset to 0, but rt_rq->rt_throttled might be still 1. Later on function
do_sched_rt_period_timer couldn't reset it, and all RT tasks couldn't be
scheduled to run on that cpu. here is RT task migration_thread which is
woken up when a task is migrated to another cpu.
Max Krasnyansky [Fri, 29 Aug 2008 20:11:41 +0000 (13:11 -0700)]
sched: arch_reinit_sched_domains() must destroy domains to force rebuild
What I realized recently is that calling rebuild_sched_domains() in
arch_reinit_sched_domains() by itself is not enough when cpusets are enabled.
partition_sched_domains() code is trying to avoid unnecessary domain rebuilds
and will not actually rebuild anything if new domain masks match the old ones.
What this means is that doing
echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
on a system with cpusets enabled will not take affect untill something changes
in the cpuset setup (ie new sets created or deleted).
This patch fixes restore correct behaviour where domains must be rebuilt in
order to enable MC powersaving flags.
Test on quad-core Core2 box with both CONFIG_CPUSETS and !CONFIG_CPUSETS.
Also tested on dual-core Core2 laptop. Lockdep is happy and things are working
as expected.
Spencer reported a problem where utime and stime were going negative despite
the fixes in commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. The suspected
reason for the problem is that signal_struct maintains it's own utime and
stime (of exited tasks), these are not updated using the new task_utime()
routine, hence sig->utime can go backwards and cause the same problem
to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
fixes the problem
TODO: using max(task->prev_utime, derived utime) works for now, but a more
generic solution is to implement cputime_max() and use the cputime_gt()
function for comparison.
Peter Zijlstra [Mon, 1 Sep 2008 14:44:23 +0000 (16:44 +0200)]
sched_clock: fix NOHZ interaction
If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
actual process times. Fix this by re-calibrating the clock against GTOD when
leaving nohz mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
PCI: fix pbus_size_mem() resource alignment for CardBus controllers
Commit 884525655d07fdee9245716b998ecdc45cdd8007 ("PCI: clean up resource
alignment management") changed the resource handling to mark how a
resource was aligned on a per-resource basis.
Thus, instead of looking at the resource number to determine whether it
was a bridge resource or a regular resource (they have different
alignment rules), we should just ask the resource for its alignment
directly.
The reason this broke only cardbus resources was that for the other
types of resources, the old way of deciding alignment actually still
happened to work. But CardBus bridge resources had been changed by
commit 934b7024f0ed29003c95cef447d92737ab86dc4f ("Fix cardbus resource
allocation") to look more like regular resources than PCI bridge
resources from an alignment handling standpoint.
Reported-and-tested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When calibration against PIT fails, the warning that we print is misleading.
In a virtualized environment the VM may get descheduled while calibration
or, the check in PIT calibration may fail due to other virtualization
overheads.
The warning message explicitly assumes that calibration failed due to SMI's
which may not be the case. Change that to something proper.
Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc: Fix for getting CPU number in power_save_ppc32_restore()
powerpc: Fix build error with 64K pages and !hugetlbfs
powerpc: Work around gcc's -fno-omit-frame-pointer bug
powerpc: Make sure _etext is after all kernel text
powerpc: Only make kernel text pages of linear mapping executable
powerpc: Fix uninitialised variable in VSX alignment code
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
bnx2x: Accessing un-mapped page
ath9k: Fix TX control flag use for no ACK and RTS/CTS
ath9k: Fix TX status reporting
iwlwifi: fix STATUS_EXIT_PENDING is not set on pci_remove
iwlwifi: call apm stop on exit
iwlwifi: fix Tx cmd memory allocation failure handling
iwlwifi: fix rx_chain computation
iwlwifi: fix station mimo power save values
iwlwifi: remove false rxon if rx chain changes
iwlwifi: fix hidden ssid discovery in passive channels
iwlwifi: W/A for the TSF correction in IBSS
netxen: Remove workaround for chipset quirk
pcnet-cs, axnet_cs: add new IDs, remove dup ID with less info
ixgbe: initialize interrupt throttle rate
net/usb/pegasus: avoid hundreds of diagnostics
tipc: Don't use structure names which easily globally conflict.
Eric Paris [Wed, 3 Sep 2008 15:49:47 +0000 (11:49 -0400)]
SELinux: memory leak in security_context_to_sid_core
Fix a bug and a philosophical decision about who handles errors.
security_context_to_sid_core() was leaking a context in the common case.
This was causing problems on fedora systems which recently have started
making extensive use of this function.
In discussion it was decided that if string_to_context_struct() had an
error it was its own responsibility to clean up any mess it created
along the way.
Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
The allocated RX buffer size was 64 bytes bigger than the PCI mapped
size with no good reason. If the packet was actually using the buffer up
to its limit and if the last 64 bytes of the buffer crossed 4KB boundary
then an unmapped PCI page was accessed. The fix is to use only one
parameter for the buffer size - there is no need to differentiate
between the buffer size and the PCI mapping size since the extra 64
bytes can actually be used by the FW to align the Ethernet payload to
64 bytes.
Also updating the driver version and date
Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
iwlwifi: fix STATUS_EXIT_PENDING is not set on pci_remove
This patch sets STATUS_EXIT_PENDING on pci_remove. Otherwise
iwl4965_down may fail to uninitialize the driver.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch calls apm stop on exit and suspend. Without this patch
hardware consumes power even after driver is removed or suspended.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch "iwlwifi: do not use GFP_DMA in iwl_tx_queue_init" removes
GFP_DMA from allocation tx command buffers. GFP_DMA allows allocation
only for memory under 16M which causes allocation problems
suspend/resume flows.
Using kmalloc is temporal solution and some consistent/coherent
allocation schema will be more correct. Since iwlwifi hardware
supports 64bit address this solution should work on x86 (32 and
64bit) for now.
This patch fixes memory freeing problem in the previous patch.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Ian Schram <ischram@telenet.be> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tomas Winkler [Wed, 3 Sep 2008 03:18:47 +0000 (11:18 +0800)]
iwlwifi: fix rx_chain computation
This patch fixes rx_chain computation. The code that adjusts number of
rx chains to number supported by HW was missing. Miss configuration
causes firmware error. Note: iwlwifi supports HW with up to 3 RX
chains (2x2, 2x3, 1x2, and 3x3 MIMO). This patch also simplifies the
whole RX chain computation.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ron Rindjunsky [Wed, 3 Sep 2008 03:18:46 +0000 (11:18 +0800)]
iwlwifi: fix station mimo power save values
This patch fixes the wrong use MIMO power save values. Our TX was
configured with our MIMO power save values instead of peer's MIMO power
save values, this may affect connectivity. The peer STA/AP may not sense
our traffic at all as it doesn't have all RX chains opened.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mohamed Abbas [Wed, 3 Sep 2008 03:18:44 +0000 (11:18 +0800)]
iwlwifi: remove false rxon if rx chain changes
Rx chain might change during power save transitions but it doesn't
require sending Full-ROXN command to the firmware. Full-RXON requires
reconnection to an AP and thus affects user experience. The patch
avoids the Full-RXON by removing the rx_chain modification check in
iwl_full_rxon_required function.
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ron Rindjunsky [Wed, 3 Sep 2008 03:18:43 +0000 (11:18 +0800)]
iwlwifi: fix hidden ssid discovery in passive channels
This enables sending of direct probes on passive channels, as long as
traffic was detected on that channel. This enables connectivity to
hidden/non broadcasting SSIDs APs on passive channels. Note 5000 HW
declares all 5.2 spectrum as passive.
Signed-off-by: Cahill Ben <ben.m.cahill@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch is a W/A for the TSF sync issue in IBSS merging. HW is not
capable to sync TSF (it's constantly little behind). This creates
constant IBSS merging upon reception of each beacon, adding and removing
station which in turn creates above 50% packet loss and thus dramatically
degrade the throughput. The W/A simply stops the driver from declaring it
has a reliable TSF value and thus eliminates IBSS merging.
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Split up PIT part of TSC calibration from native_calibrate_tsc
The TSC calibration function is still very complicated, but this makes
it at least a little bit less so by moving the PIT part out into a
helper function of its own.
Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
Dhananjay Phadke [Thu, 28 Aug 2008 04:57:30 +0000 (21:57 -0700)]
netxen: Remove workaround for chipset quirk
Remove chipset-specific quirk workaround; the workaround caused
unrecoverable DMA lockups when the driver was loaded following a
PXE boot.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com> Signed-off-by: Michael Brown <mbrown@fensystems.co.uk> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The following patch adds it back. Without this the default value of 0
causes the performance of this card to be awful. Restoring these to the
default values yields much better performance.
This regression has been around since 2.6.25.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: stable@kernel.org [2.6.25 and later] Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Kumar Gala [Tue, 26 Aug 2008 02:08:56 +0000 (12:08 +1000)]
powerpc: Fix for getting CPU number in power_save_ppc32_restore()
The calculation to get TI_CPU based off of SPRG3 was just plain wrong,
meaning that we were getting garbage for the CPU number on 6xx/G3/G4
based SMP boxes in this code.
Just offset off the stack pointer (to get to thread_info) like all the
other references to TI_CPU do.
This was pointed out by Chen Gong <G.Chen@freescale.com>
[paulus@samba.org - use rlwinm r12,r11,... instead of
rlwinm r12,r1,...; tophys()]
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
powerpc: Fix build error with 64K pages and !hugetlbfs
HAVE_ARCH_UNMAPPED_AREA and HAVE_ARCH_UNMAPPED_AREA_TOPDOWN must
be defined whenever CONFIG_PPC_MM_SLICES is enabled, not just when
CONFIG_HUGETLB_PAGE is. They used to be always defined together but
this is no longer the case since 3a8247cc2c856930f34eafce33f6a039227ee175
("powerpc: Only demote individual slices rather than whole process").
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Tony Breeds [Tue, 2 Sep 2008 06:50:38 +0000 (16:50 +1000)]
powerpc: Work around gcc's -fno-omit-frame-pointer bug
This bug is causing random crashes
(http://bugzilla.kernel.org/show_bug.cgi?id=11414).
-fno-omit-frame-pointer is only needed on powerpc when -pg is also
supplied, and there is a gcc bug that causes incorrect code generation
on 32-bit powerpc when -fno-omit-frame-pointer is used---it uses stack
locations below the stack pointer, which is not allowed by the ABI
because those locations can and sometimes do get corrupted by an
interrupt.
This ensures that CONFIG_FRAME_POINTER is only selected by ftrace.
When CONFIG_FTRACE is enabled we also pass -mno-sched-epilog to work
around the gcc codegen bug.
Patch based on work by:
Andreas Schwab <schwab@suse.de>
Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
powerpc: Make sure _etext is after all kernel text
This makes core_kernel_text() (and therefore kernel_text_address())
return the correct result. Currently all the __devinit routines (at
least) will not be considered to be kernel text.
This is just a quick fix for 2.6.27 - hopefully we will be able to fix
this better in 2.6.28.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Paul Mackerras [Sat, 30 Aug 2008 01:26:27 +0000 (11:26 +1000)]
powerpc: Only make kernel text pages of linear mapping executable
Commit bc033b63bbfeb6c4b4eb0a1d083c650e4a0d2af8 ("powerpc/mm: Fix
attribute confusion with htab_bolt_mapping()") moved the check for
whether we should make pages of the linear mapping executable from
htab_bolt_mapping into its callers, including htab_initialize.
A side-effect of this is that the decision is now made once for
each contiguous section in the LMB array rather than for each page
individually. This can often mean that the whole of the linear
mapping ends up being executable.
This reverts to the previous behaviour, where individual pages are
checked for being part of the kernel text or not, by moving the check
back down into htab_bolt_mapping.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Michael Neuling [Thu, 28 Aug 2008 04:57:39 +0000 (14:57 +1000)]
powerpc: Fix uninitialised variable in VSX alignment code
This fixes an uninitialised variable in the VSX alignment code. It can
cause warnings from GCC (noticed with gcc-4.1.1). Gcc is actually
correct in this instance, and this bug could cause the alignment
interrupt handler to send a SIGSEGV to the process on a legitimate
access.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
David S. Miller [Wed, 3 Sep 2008 06:38:32 +0000 (23:38 -0700)]
tipc: Don't use structure names which easily globally conflict.
Andrew Morton reported a build failure on sparc32, because TIPC
uses names like "struct node" and there is a like named data
structure defined in linux/node.h
This just regexp replaces "struct node*" to "struct tipc_node*"
to avoid this and any future similar problems.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
ipsec: Fix deadlock in xfrm_state management.
ipv: Re-enable IP when MTU > 68
net/xfrm: Use an IS_ERR test rather than a NULL test
ath9: Fix ath_rx_flush_tid() for IRQs disabled kernel warning message.
ath9k: Incorrect key used when group and pairwise ciphers are different.
rt2x00: Compiler warning unmasked by fix of BUILD_BUG_ON
mac80211: Fix debugfs union misuse and pointer corruption
wireless/libertas/if_cs.c: fix memory leaks
orinoco: Multicast to the specified addresses
iwlwifi: fix 64bit platform firmware loading
iwlwifi: fix apm_stop (wrong bit polarity for FLAG_INIT_DONE)
iwlwifi: workaround interrupt handling no some platforms
iwlwifi: do not use GFP_DMA in iwl_tx_queue_init
net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS
net: Unbreak userspace usage of linux/mroute.h
pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock()
ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0
Thomas Gleixner [Tue, 2 Sep 2008 22:54:47 +0000 (00:54 +0200)]
[x86] Fix TSC calibration issues
Larry Finger reported at http://lkml.org/lkml/2008/9/1/90:
An ancient laptop of mine started throwing errors from b43legacy when
I started using 2.6.27 on it. This has been bisected to commit bfc0f59
"x86: merge tsc calibration".
The unification of the TSC code adopted mostly the 64bit code, which
prefers PMTIMER/HPET over the PIT calibration.
Larrys system has an AMD K6 CPU. Such systems are known to have
PMTIMER incarnations which run at double speed. This results in a
miscalibration of the TSC by factor 0.5. So the resulting calibrated
CPU/TSC speed is half of the real CPU speed, which means that the TSC
based delay loop will run half the time it should run. That might
explain why the b43legacy driver went berserk.
On the other hand we know about systems, where the PIT based
calibration results in random crap due to heavy SMI/SMM
disturbance. On those systems the PMTIMER/HPET based calibration logic
with SMI detection shows better results.
According to Alok also virtualized systems suffer from the PIT
calibration method.
The solution is to use a more wreckage aware aproach than the current
either/or decision.
1) reimplement the retry loop which was dropped from the 32bit code
during the merge. It repeats the calibration and selects the lowest
frequency value as this is probably the closest estimate to the real
frequency
2) Monitor the delta of the TSC values in the delay loop which waits
for the PIT counter to reach zero. If the maximum value is
significantly different from the minimum, then we have a pretty safe
indicator that the loop was disturbed by an SMI.
3) keep the pmtimer/hpet reference as a backup solution for systems
where the SMI disturbance is a permanent point of failure for PIT
based calibration
4) do the loop iteration for both methods, record the lowest value and
decide after all iterations finished.
5) Set a clear preference to PIT based calibration when the result
makes sense.
The implementation does the reference calibration based on
HPET/PMTIMER around the delay, which is necessary for the PIT anyway,
but keeps separate TSC values to ensure the "independency" of the
resulting calibration values.
Tested on various 32bit/64bit machines including Geode 266Mhz, AMD K6
(affected machine with a double speed pmtimer which I grabbed out of
the dump), Pentium class machines and AMD/Intel 64 bit boxen.
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Wed, 3 Sep 2008 03:14:15 +0000 (20:14 -0700)]
ipsec: Fix deadlock in xfrm_state management.
Ever since commit 4c563f7669c10a12354b72b518c2287ffc6ebfb3
("[XFRM]: Speed up xfrm_policy and xfrm_state walking") it is
illegal to call __xfrm_state_destroy (and thus xfrm_state_put())
with xfrm_state_lock held. If we do, we'll deadlock since we
have the lock already and __xfrm_state_destroy() tries to take
it again.
Fix this by pushing the xfrm_state_put() calls after the lock
is dropped.
Signed-off-by: David S. Miller <davem@davemloft.net>
spin_lock_irqsave(&r->lock, flags);
...
if (r->entropy_count > r->poolinfo->POOLBITS)
r->entropy_count = r->poolinfo->POOLBITS;
so there is a time window in which this BUG_ON():
static size_t account(struct entropy_store *r, size_t nbytes, int min,
int reserved)
{
unsigned long flags;
BUG_ON(r->entropy_count > r->poolinfo->POOLBITS);
/* Hold lock while accounting */
spin_lock_irqsave(&r->lock, flags);
can trigger.
We could fix this by moving the assertion inside the lock, but it seems
safer and saner to revert to the old behaviour wherein
entropy_store.entropy_count at no time exceeds
entropy_store.poolinfo->POOLBITS.
John Kacur [Tue, 2 Sep 2008 21:36:13 +0000 (14:36 -0700)]
pm_qos_requirement might sleep
Make PM_QOS and CPU_IDLE play nicer when run with the RT-Preempt kernel.
The purpose of the patch is to remove the spin_lock around the read in the
function pm_qos_requirement - since spinlocks can sleep in -rt and this
function is called from idle.
CPU_IDLE polls the target_value's of some of the pm_qos parameters from
the idle loop causing sleeping locking warnings. Changing the
target_value to an atomic avoids this issue.
Remove the spinlock in pm_qos_requirement by making target_value an atomic
type.
Signed-off-by: mark gross <mgross@linux.intel.com> Signed-off-by: John Kacur <jkacur@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update rtc-cmos shutdown handling to leave RTC alarms active, resolving
http://bugzilla.kernel.org/show_bug.cgi?id=11411 on several boards. There
are still some systems where the ACPI event handling doesn't cooperate.
(Possibly related to bugid 11312, reporting the spontaneous disabling of
RTC events.)
Bug 11411 reported that changes to work around some ACPI event issues
broke wake-from-S5 handling, as used for DVR applications. (They like to
power off, then wake later to record programs.)
[yakui.zhao@intel.com: add shutdown for PNP devices]
[dbrownell@users.sourceforge.net: update comments] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Stefan Bauer <stefan.bauer@cs.tu-chemnitz.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Christie [Tue, 2 Sep 2008 21:36:07 +0000 (14:36 -0700)]
ibft: fix target info parsing in ibft module
I got this patch through Red Hat's bugzilla from the bug submitter and
patch creator. I have just fixed it up so it applies without fuzz to
upstream kernels.
Original patch and description from Shyam kumar Iyer:
The issue [ibft module not displaying targets with short names] is because
of an offset calculatation error in the iscsi_ibft.c code. Due to this
error directory structure for the target in /sys/firmware/ibft does not
get created and so the initiator is unable to connect to the target.
Note that this bug surfaced only with an name that had a short section at
the end. eg: "iqn.1984-05.com.dell:dell". It did not surface when the
iqn's had a longer section at the end. eg:
"iqn.2001-04.com.example:storage.disk2.sys1.xyz"
So, the eot_offset was calculated such that an extra 48 bytes i.e. the
size of the ibft_header which has already been accounted was subtracted
twice.
This was not evident with longer iqn names because they would overshoot
the total ibft length more than 48 bytes and thus would escape the bug.
Signed-off-by: Shyam Kumar Iyer <shyam_iyer@dell.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: Konrad Rzeszutek <konrad@virtualiron.com> Cc: Peter Jones <pjones@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Tue, 2 Sep 2008 21:36:03 +0000 (14:36 -0700)]
hp-wmi: add proper hotkey support
It turns out that event 0x4 merely indcates that a hotkey has been
pressed, not which one. A further query is required in order to determine
the actual keypress. The following patch adds support for that along with
the known keycodes.
Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Tue, 2 Sep 2008 21:36:00 +0000 (14:36 -0700)]
hp-wmi: update to match current rfkill semantics
hp-wmi currently changes the RFKill state by altering the struct members
rather than using the dedicated interface, meaning that update events
won't be pushed to userspace. This patch fixes that, along with fixing
the declared type of the WWAN kill switch. It also ensures that rfkill
interfaces are only registered for hardware that exists.
Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Ivo van Doorn <ivdoorn@gmail.com> Cc: Dave Young <hidave.darkstar@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update Documentation/filesystems/proc.txt: it describes the file
auto_msgmni intoduced to enable/disable msgmni automatic recomputing upon
memory add/remove (see thread http://lkml.org/lkml/2008/7/4/27). Also
added a description for msgmni (this filex is only listed in
Documentation/sysctl/kernel.txt).
mm: size of quicklists shouldn't be proportional to the number of CPUs
Quicklists store pages for each CPU as caches. (Each CPU can cache
node_free_pages/16 pages)
It is used for page table cache. exit() will increase the cache size,
while fork() consumes it.
So for example if an apache-style application runs (one parent and many
child model), one CPU process will fork() while another CPU will process
the middleware work and exit().
At that time, the CPU on which the parent runs doesn't have page table
cache at all. Others (on which children runs) have maximum caches.
- My machine spec is
number of numa node: 2
number of cpus: 8 (4CPU x2 node)
total mem: 8GB (4GB x2 node)
free mem: about 5GB
- Then, 4.7GB x 16% ~= 880MB.
So, Quicklist can use 800MB.
So, if following spec machine run that program
CPUs: 64 (8cpu x 8node)
Mem: 1TB (128GB x8node)
Then, quicklist can waste 300GB (= 1TB x 30%). It is too large.
So, I don't like cache policies which is proportional to # of cpus.
My patch changes the number of caches
from:
per-cpu-cache-amount = memory_on_node / 16
to
per-cpu-cache-amount = memory_on_node / 16 / number_of_cpus_on_node.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Tested-by: David Miller <davem@davemloft.net> Acked-by: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Tue, 2 Sep 2008 21:35:52 +0000 (14:35 -0700)]
devcgroup: fix race against rmdir()
During the use of a dev_cgroup, we should guarantee the corresponding
cgroup won't be deleted (i.e. via rmdir). This can be done through
css_get(&dev_cgroup->css), but here we can just get and use the dev_cgroup
under rcu_read_lock.
And also remove checking NULL dev_cgroup, it won't be NULL since a task
always belongs to a cgroup.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Tue, 2 Sep 2008 21:35:51 +0000 (14:35 -0700)]
cirrusfb: check_par fixes
1. Check if virtual resolution fits into memory.
Otherwise, Linux hangs during panning.
2. When selected use all available memory to
maximize yres_virtual to speed up panning
(previously also xres_virtual was increased).
3. Simplify memory restriction calculations.
Signed-off-by: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits
We don't change pid_ns->child_reaper when the main thread of the
subnamespace init exits. As Robert Rex <robert.rex@exasol.com> pointed
out this is wrong.
Yes, the re-parenting itself works correctly, but if the reparented task
exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the
main thread is zombie its ->nsproxy was already cleared by
exit_task_namespaces().
Introduce the new function, find_new_reaper(), which finds the new
->parent for the re-parenting and changes ->child_reaper if needed. Kill
the now unneeded exit_child_reaper().
Also move the changing of ->child_reaper from zap_pid_ns_processes() to
find_new_reaper(), this consolidates the games with ->child_reaper and
makes it stable under tasklist_lock.
pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing
zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
Yes, we have already killed all tasks in this namespace, and sys_wait4()
doesn't see any child. But this doesn't mean ->children list is empty, we
may have EXIT_DEAD tasks which are not visible to do_wait(). In that case
the subsequent forget_original_parent() will crash the kernel because it
will try to re-parent these tasks to the NULL reaper.
Even if there are no childs, it is not good that forget_original_parent()
uses reaper == NULL.
Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
We could use pid_ns->parent->child_reaper as well, I think this does not
really matter. These EXIT_DEAD tasks are not visible to the new ->parent
after re-parenting, they will silently do release_task() eventually.
Note that we must change ->child_reaper, otherwise
forget_original_parent() will use reaper == father, and in that case we
will hit the (correct) BUG_ON(!list_empty(&father->children)).
David Brownell [Tue, 2 Sep 2008 21:35:46 +0000 (14:35 -0700)]
mmc: at91_mci: don't use coherent dma buffers
At91_mci is abusing dma_free_coherent(), which may not be called with IRQs
disabled. I saw "mkfs.ext3" on an MMC card objecting voluminously as each
write completed:
WARNING: at arch/arm/mm/consistent.c:368 dma_free_coherent+0x2c/0x224()
[<c002726c>] (dump_stack+0x0/0x14) from [<c00387d4>] (warn_on_slowpath+0x4c/0x68)
[<c0038788>] (warn_on_slowpath+0x0/0x68) from [<c0028768>] (dma_free_coherent+0x2c/0x224)
r6:00008008 r5:ffc06000 r4:00000000
[<c002873c>] (dma_free_coherent+0x0/0x224) from [<c01918ac>] (at91_mci_irq+0x374/0x420)
[<c0191538>] (at91_mci_irq+0x0/0x420) from [<c0065d9c>] (handle_IRQ_event+0x2c/0x6c)
...
This bug has been around for a LONG time. The MM warning is from late
2005, but the driver merged a year later ... so I'm puzzled why nobody
noticed this before now.
The fix involves noting that this buffer shouldn't be DMA-coherent; it's
just used for normal DMA writes. So replace it with standard kmalloc()
buffering and DMA mapping calls.
This is the quickie fix. A better one would not rely on allocating large
bounce buffers. (Note that dma_alloc_coherent could have failed too, but
that case was ignored... kmalloc is a bit more likely to fail though.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Pierre Ossman <drzeus-mmc@drzeus.cx> Cc: Andrew Victor <linux@maxim.org.za> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Newton [Tue, 2 Sep 2008 21:35:44 +0000 (14:35 -0700)]
8250: improve workaround for UARTs that don't re-assert THRE correctly
Recent changes to tighten the check for UARTs that don't correctly
re-assert THRE (01c194d9278efc15d4785ff205643e9c0bdcef53: "serial 8250:
tighten test for using backup timer") caused problems when such a UART was
opened for the second time - the bug could only successfully be detected
at first initialization. For users of this version of this particular
UART IP it is fatal.
This patch stores the information about the bug in the bugs field of the
port structure when the port is first started up so subsequent opens can
check this bit even if the test for the bug fails.
David Brownell: "My own exposure to this is that the UART on DaVinci
hardware, which TI allegedly derived from its original 16550 logic, has
periodically gone from working to unusable with the mainline 8250.c ...
and back and forth a bunch. Currently it's "unusable", a regression from
some previous versions. With this patch from Will, it's usable."
Signed-off-by: Will Newton <will.newton@gmail.com> Acked-by: Alex Williamson <alex.williamson@hp.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Brownell <david-b@pacbell.net> Cc: <stable@kernel.org> [2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARNING: vmlinux.o(.data+0x1f5c0): Section mismatch in reference from the variable contig_page_data to the variable .init.data:bootmem_node_data
The variable contig_page_data references
the variable __initdata bootmem_node_data
If the reference is valid then annotate the
variable with __init* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Johannes Weiner <hannes@saeurebad.de> Cc: Sean MacLennan <smaclennan@pikatech.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jbd: fix race between free buffer and commit transaction
was merged into 2.6.27-rc1, but I noticed that this patch is not enough
to fix the race.
I did fsstress test heavily to 2.6.27-rc1, and found that dio write still
sometimes got EIO through this test.
The patch above fixed race between freeing buffer(dio) and committing
transaction(jbd) but I discovered that there is another race, freeing
buffer(dio) and ext3/4_ordered_writepage.
: background_writeout()
->write_cache_pages()
->ext3_ordered_writepage()
walk_page_buffers() -> take a bh ref
block_write_full_page() -> unlock_page
: <- end_page_writeback
: <- race! (dio write->try_to_release_page fails)
walk_page_buffers() ->release a bh ref
ext3_ordered_writepage holds bh ref and does unlock_page remaining
taking a bh ref, so this causes the race and failure of
try_to_release_page.
To fix this race, I used the approach of falling back to buffered
writes if try_to_release_page() fails on a page.
[akpm@linux-foundation.org: cleanups] Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adam Litke [Tue, 2 Sep 2008 21:35:38 +0000 (14:35 -0700)]
mm: make setup_zone_migrate_reserve() aware of overlapping nodes
I have gotten to the root cause of the hugetlb badness I reported back on
August 15th. My system has the following memory topology (note the
overlapping node):
setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking
for a pageblock to move onto the MIGRATE_RESERVE list. Finding no
candidates, it happily continues the scan into 0x8000000-0x44000000. When
a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on
the wrong zone. Oops.
setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes.
Signed-off-by: Adam Litke <agl@us.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net/xfrm: Use an IS_ERR test rather than a NULL test
In case of error, the function xfrm_bundle_create returns an ERR
pointer, but never returns a NULL pointer. So a NULL test that comes
after an IS_ERR test should be deleted.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@match_bad_null_test@
expression x, E;
statement S1,S2;
@@
x = xfrm_bundle_create(...)
... when != x = E
* if (x != NULL)
S1 else S2
// </smpl>
Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
ath9: Fix ath_rx_flush_tid() for IRQs disabled kernel warning message.
This patch addresses an issue with the locking order. ath_rx_flush_tid()
uses spin_lock/unlock_bh when IRQs are disabled in sta_notify by mac80211.
As node clean up is still pending with ath9k and this problematic portion
of the code is expected to change anyway, thinking of a proper fix may not
be worthwhile. So having this interim fix helps the users to get rid of the
kernel warning message.
ath9k: Incorrect key used when group and pairwise ciphers are different.
Updating sc_keytype multiple times when groupwise and pairwise
ciphers are different results in incorrect pairwise key type
assumed for TX control and normal ping fails. This works fine
for cases where both groupwise and pairwise ciphers are same.
Also use mac80211 provided enums for key length calculation.
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
rt2x00: Compiler warning unmasked by fix of BUILD_BUG_ON
A "Set" to a sign-bit in an "&" operation causes a compiler warning.
Make calculations unsigned.
[ The warning was masked by the old definition of BUILD_BUG_ON() ]
Also remove __builtin_constant_p from FIELD_CHECK since BUILD_BUG_ON
no longer permits non-const values.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> CC: Ingo Molnar <mingo@elte.hu> CC: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Jouni Malinen [Thu, 28 Aug 2008 12:12:06 +0000 (15:12 +0300)]
mac80211: Fix debugfs union misuse and pointer corruption
debugfs union in struct ieee80211_sub_if_data is misused by including a
common default_key dentry as a union member. This ends occupying the same
memory area with the first dentry in other union members (structures;
usually drop_unencrypted). Consequently, debugfs operations on
default_key symlinks and drop_unencrypted entry are using the same
dentry pointer even though they are supposed to be separate ones. This
can lead to removing entries incorrectly or potentially leaving
something behind since one of the dentry pointers gets lost.
Fix this by moving the default_key dentry to a new struct
(common_debugfs) that contains dentries (more to be added in future)
that are shared by all vif types. The debugfs union must only be used
for vif type-specific entries to avoid this type of pointer corruption.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Adrian Bunk [Wed, 27 Aug 2008 22:05:08 +0000 (01:05 +0300)]
wireless/libertas/if_cs.c: fix memory leaks
The leak in if_cs_prog_helper() is obvious.
It looks a bit as if not freeing "fw" in if_cs_prog_real() was done
intentionally, but I'm not seeing why it shouldn't be freed.
Reported-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Holger Schurig <hs4233@mail.mn-solutions.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
David Kilroy [Sat, 23 Aug 2008 18:03:34 +0000 (19:03 +0100)]
orinoco: Multicast to the specified addresses
When multicasting the driver sets the number of group addresses using
the count from the previous set multicast command. In general this means
you have to set the multicast addresses twice to get the behaviour you
want.
If we were multicasting, and reduce the number of addresses we are
multicasting to, then the driver would write uninitialised data from the
stack into the group addresses to multicast to.
Only write the multicast addresses we have specifically set.
Signed-off-by: David Kilroy <kilroyd@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mohamed Abbas [Thu, 28 Aug 2008 09:25:05 +0000 (17:25 +0800)]
iwlwifi: fix apm_stop (wrong bit polarity for FLAG_INIT_DONE)
The patch fixes CSR_GP_CNTRL_REG_FLAG_INIT_DONE was set instead of
cleared which disabled moving device to D0U state.
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tomas Winkler [Thu, 28 Aug 2008 09:25:04 +0000 (17:25 +0800)]
iwlwifi: workaround interrupt handling no some platforms
This patch adds workaround for an interrupt related hardware bug on
some platforms. (Apparently these platforms boot-up w/ INTX_DISABLED
set. -- JWL)
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS
Current setup with hal and NetworkManager will fail to work
without newest hal version with this config option disabled.
Although this will solve itself by time, at the moment it is
dishonest to say that we don't know any software that uses it,
if there are many many people relying on old hal versions.
Signed-off-by: Florian Mickler <florian@mickler.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
hwif_to_node() incorrectly assumes that hwif->dev always belongs to
a PCI device. This results in ide-cs oopsing in init_irq() after
commit c56c5648a3bd15ff14c50f284b261140cd5b5472 accidentally fixed
device tree registration for ide-cs. Fix it by using dev_to_node().
Thanks to Martin Michlmayr and Larry Finger for help with debugging
the issue.
Reported-by: Martin Michlmayr <tbm@cyrius.com> Tested-by: Martin Michlmayr <tbm@cyrius.com> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
Fix problem with waiting while holding rcu read lock in md/bitmap.c
Remove invalidate_partition call from do_md_stop.
Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux:
nfsd: fix buffer overrun decoding NFSv4 acl
sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports
nfsd: fix compound state allocation error handling
svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet
Oleg Nesterov [Sat, 30 Aug 2008 17:08:40 +0000 (21:08 +0400)]
softlockup: minor cleanup, don't check task->state twice
The recent commit 16d9679f33caf7e683471647d1472bfe133d858 changed
check_hung_task() to filter out the TASK_KILLABLE tasks. We can
move this check to the caller which has to test t->state anyway.
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: restore original behavior of /proc/partition when there's no partition
remove blk_register_filter and blk_unregister_filter in gendisk