Steven Rostedt [Thu, 12 Feb 2009 19:16:46 +0000 (14:16 -0500)]
sched: do not account for NMIs
Impact: avoid corruption in system time accounting
Martin Schwidefsky told me that there was an issue with NMIs and
system accounting. The problem is that the accounting code is
not reentrant, and if an NMI goes off after an interrupt it can
corrupt the accounting.
For now, the best we can do is to treat NMIs like SMIs and they
are not accounted for.
This patch changes nmi_enter to not call __irq_enter and to do
the preempt-count and tracing calls directly.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 12 Feb 2009 15:53:37 +0000 (10:53 -0500)]
preempt-count: force hardirq-count to max of 10
To add a bit in the preempt_count to be set when in NMI context, we
found that some archs did not have enough bits to spare. This is
due to the hardirq_count being a mask that can hold NR_IRQS.
Some archs allow for over 16000 IRQs, and that would require a mask
of 14 bits. The sofitrq mask is 8 bits and the preempt disable mask
is also 8 bits. The PREEMP_ACTIVE bit is bit 30, and bit 31 would
make the preempt_count (which is type int) a negative number.
A negative preempt_count is a sign of failure.
Add them up 14+8+8+1+1 you get 32 bits. No room for the NMI bit.
But the hardirq_count is to track the number of nested IRQs, not
the number of total IRQs. This originally took the paranoid approach
of setting the max nesting to NR_IRQS. But when we have archs with
over 1000 IRQs, it is not practical to think they will ever all
nest on a single CPU. Not to mention that this would most definitely
cause a stack overflow.
This patch sets a max of 10 bits to be used for IRQ nesting.
I did a 'git grep HARDIRQ' to examine all users of HARDIRQ_BITS and
HARDIRQ_MASK, and found that making it a max of 10 would not hurt
anyone. I did find that the m68k expected it to be 8 bits, so
I allow for the archs to set the number to be less than 10.
I removed the setting of HARDIRQ_BITS from the archs that set it
to more than 10. This includes ALPHA, ia64 and avr32.
This will always allow room for the NMI bit, and if we need to allow
for NMI nesting, we have 4 bits to play with.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 10 Feb 2009 18:07:13 +0000 (13:07 -0500)]
tracing, x86: fix fixup section to return to original code
Impact: fix to prevent a kernel crash on fault
If for some reason the pointer to the parent function on the
stack takes a fault, the fix up code will not return back to
the original faulting code. This can lead to unpredictable
results and perhaps even a kernel panic.
A fault should not happen, but if it does, we should simply
disable the tracer, warn, and continue running the kernel.
It should not lead to a kernel crash.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Hugh Dickins [Mon, 9 Feb 2009 19:20:50 +0000 (19:20 +0000)]
profiling: fix broken profiling regression
Impact: fix broken /proc/profile on UP machines
Commit c309b917cab55799ea489d7b5f1b77025d9f8462 "cpumask: convert
kernel/profile.c" broke profiling. prof_cpu_mask was previously
initialized to CPU_MASK_ALL, but left uninitialized in that commit.
We need to copy cpu_possible_mask (cpu_online_mask is not enough).
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: Storage: Update unusual_devs entry for Datafab KECF-USB
USB: Correct Makefile to make isp1760 buildable
USB: option: New mobile broadband modems to be supported
USB: two more usb ids for ti_usb_3410_5052
USB: ftdi_sio: unlock_kernel() on error in set_serial_info()
USB: usb-storage: add Pentax to the bad-vendor list
USB: ftdi_sio: add support for the NDI Polaris system
USB: usb-serial: fix the aircable_init failure path
USB: usb-storage: remove WARN from last-sector hacks
Revert USB: option: add Pantech cards
USB: cdc-acm.c: remove duplicate lines for MTK gps support
USB: fsl_qe_udc: Fix stalled TX requests bug
USB: fsl_qe_udc: Fix muram corruption by disabled endpoints
USB: fsl_qe_udc: Fix disconnects reporting during bus reset
USB: fsl_qe_udc: Fix QE USB controller initialization
USB: fsl_qe_udc: Fix recursive locking bug in ch9getstatus()
USB: fsl_qe_udc: Fix oops on QE UDC probe failure
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6:
Staging: panel: fix lcd panel driver build failure
Staging: android: fix up units in timed_gpio
Staging: android: ram_console: Disable ECC when early init is enabled and validate buffer size
Staging: at76_usb: Add support for OQO Model 01+
Staging: at76_usb: fix bugs introduced by "Staging: at76_usb: cleanup dma on stack issues"
Revert Staging: at76_usb: update drivers/staging/at76_usb w/ mac80211 port
Linus Torvalds [Mon, 9 Feb 2009 21:58:22 +0000 (13:58 -0800)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] powernow-k8: Get transition latency from ACPI _PSS table
[CPUFREQ] Make ignore_nice_load setting of ondemand work as expected.
Chris Mason [Mon, 9 Feb 2009 21:22:03 +0000 (16:22 -0500)]
Btrfs: don't use spin_is_contended
Btrfs was using spin_is_contended to see if it should drop locks before
doing extent allocations during btrfs_search_slot. The idea was to avoid
expensive searches in the tree unless the lock was actually contended.
But, spin_is_contended is specific to the ticket spinlocks on x86, so this
is causing compile errors everywhere else.
In practice, the contention could easily appear some time after we started
doing the extent allocation, and it makes more sense to always drop the lock
instead.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The "no DMA on stack" conversion was incomplete with respect to
updating the arguments passed to usb_control_msg. The value 40 is
hardcoded as it was prior to conversion.
The driver can now load firmware, but is not fully functional.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Cc: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Wed, 4 Feb 2009 20:48:03 +0000 (15:48 -0500)]
USB: usb-storage: add Pentax to the bad-vendor list
This patch (as1202) adds Pentax to usb-storage's list of bad vendors
whose devices always need the CAPACITY_HEURISTICS flag. This is in
addition to the existing entries: Nokia, Nikon, and Motorola.
Dave Young [Sun, 1 Feb 2009 10:54:54 +0000 (18:54 +0800)]
USB: usb-serial: fix the aircable_init failure path
The failure path of aircable_init is wrong, fix the order of (goto) labels.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Naranjo Manuel Francisco <naranjo.manuel@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Mon, 2 Feb 2009 14:51:01 +0000 (09:51 -0500)]
USB: usb-storage: remove WARN from last-sector hacks
This patch (as1201) removes the WARN() from the last-sector hacks in
usb-storage, thereby making the code match the version now in
.27-stable and .28-stable. The WARN() isn't needed, since there is no
longer any intention of assuming that all storage devices have an even
number of sectors, and it annoys users for no good reason.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Anton Vorontsov [Thu, 25 Dec 2008 14:15:14 +0000 (17:15 +0300)]
USB: fsl_qe_udc: Fix stalled TX requests bug
While disabling an endpoint the driver nuking any pending requests,
thus completing them with -ESHUTDOWN status. But the driver doesn't
clear the tx_req, which means that a next TX request (after
ep_enable), might get stalled, since the driver won't queue the new
reqests.
This patch fixes a bug I'm observing with ethernet gadget while
playing with ifconfig usb0 up/down (the up/down sequence disables
and enables `in' and `out' endpoints).
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Anton Vorontsov [Thu, 25 Dec 2008 14:15:11 +0000 (17:15 +0300)]
USB: fsl_qe_udc: Fix muram corruption by disabled endpoints
Before freeing an endpoint's muram memory, we should stop all activity
of the endpoint, otherwise the QE UDC controller might do nasty things
with the muram memory that isn't belong to that endpoint anymore.
The qe_ep_reset() effectively flushes the hardware fifos, finishes all
late transaction and thus prevents the corruption.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Anton Vorontsov [Thu, 25 Dec 2008 14:15:09 +0000 (17:15 +0300)]
USB: fsl_qe_udc: Fix disconnects reporting during bus reset
Freescale QE UDC controllers can't report the "port change" states,
so the only way to handle disconnects is to process bus reset
interrupts. The bus reset can take some time, that is, few irqs.
Gadgets may print the disconnection events, and this causes few
repetitive messages in the kernel log.
This patch fixes the issue by using the usb_state machine, if the
usb controller has been already reset, just quit the reset irq
early.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Anton Vorontsov [Thu, 25 Dec 2008 14:15:07 +0000 (17:15 +0300)]
USB: fsl_qe_udc: Fix QE USB controller initialization
qe_udc_reg_init() leaves the USB controller enabled before muram memory
initialized. Sometimes the uninitialized muram memory confuses the
controller, and it start sending the busy interrupts.
Fix this by disabling the controller, it will be enabled later by
the gadget driver, at bind time.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Anton Vorontsov [Thu, 25 Dec 2008 14:15:05 +0000 (17:15 +0300)]
USB: fsl_qe_udc: Fix recursive locking bug in ch9getstatus()
The call chain is this:
qe_udc_irq() <- grabs the udc->lock spinlock
rx_irq()
qe_ep0_rx()
ep0_setup_handle()
setup_received_handle()
ch9getstatus()
qe_ep_queue() <- tries to grab the udc->lock again
It seems unsafe to temporarily drop the lock in the ch9getstatus(),
so to fix that bug the lock-less __qe_ep_queue() function
implemented and used by the ch9getstatus().
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Anton Vorontsov [Thu, 25 Dec 2008 14:15:02 +0000 (17:15 +0300)]
USB: fsl_qe_udc: Fix oops on QE UDC probe failure
In case of probing errors the driver kfrees the udc_controller, but it
doesn't set the pointer to NULL.
When usb_gadget_register_driver is called, it checks for udc_controller
!= NULL, the check passes and the driver accesses nonexistent memory.
Fix this by setting udc_controller to NULL in case of errors.
While at it, also implement irq_of_parse_and_map()'s failure and cleanup
cases.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
J. Bruce Fields [Wed, 4 Feb 2009 22:35:38 +0000 (17:35 -0500)]
lockd: fix regression in lockd's handling of blocked locks
If a client requests a blocking lock, is denied, then requests it again,
then here in nlmsvc_lock() we will call vfs_lock_file() without FL_SLEEP
set, because we've already queued a block and don't need the locks code
to do it again.
But that means vfs_lock_file() will return -EAGAIN instead of
FILE_LOCK_DENIED. So we still need to translate that -EAGAIN return
into a nlm_lck_blocked error in this case, and put ourselves back on
lockd's block list.
The bug was introduced by bde74e4bc64415b1 "locks: add special return
value for asynchronous locks".
Thanks to Frank van Maarseveen for the report; his original test
case was essentially
for i in `seq 30`; do flock /nfsmount/foo sleep 10 & done
Tested-by: Frank van Maarseveen <frankvm@frankvm.com> Reported-by: Frank van Maarseveen <frankvm@frankvm.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Linus Torvalds [Mon, 9 Feb 2009 16:52:28 +0000 (08:52 -0800)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/i915: select framebuffer support automatically
drm/i915: add get_vblank_counter function for GM45
drm/i915: capture last_vblank count at IRQ uninstall time too
drm/i915: Unlock mutex on i915_gem_fault() error path
drm/i915: Quiet the message on get/setparam ioctl with an unknown value.
drm/i915: skip LVDS initialization on Apple Mac Mini
drm/i915: sync SDVO code with stable userland modesetting driver
drm/i915: Unref the object after failing to set tiling mode.
drm/i915: add fence register management to execbuf
drm/i915: Return error from i915_gem_object_get_fence_reg() when failing.
drm/i915: Set up an MTRR covering the GTT at driver load.
drm/i915: Skip SDVO/HDMI init when the chipset tells us it's not present.
drm/i915: Suppress GEM teardown on X Server exit in KMS mode.
drm/radeon: fix ioremap conflict with AGP mappings
i915: fix unneeded locking in i915 LVDS get modes code.
Architectures other than mips and x86 are not using ticket spinlocks.
Therefore, the contention on the lock is meaningless, since there is
nobody known to be waiting on it (arguably /fairly/ unfair locks).
tracing/function-graph-tracer: handle the leaf functions from trace_pipe
When one cats the trace file, the leaf functions are printed without brackets:
function();
whereas in the trace_pipe file we'll see the following:
function() {
}
This is because the ring_buffer handling is not the same between those two files.
On the trace file, when an entry is printed, the iterator advanced and then we can
check the next entry.
There is no iterator with trace_pipe, the current entry to print has been peeked
and not consumed. So checking the next entry will still return the current one while
we don't consume it.
This patch introduces a new value for the output callbacks to ask the tracing
core to not consume the current entry after printing it.
We need it because we will have to consume the current entry ourself to check
the next one.
Now the trace_pipe is able to handle well the leaf functions.
Ingo Molnar [Mon, 9 Feb 2009 11:06:54 +0000 (12:06 +0100)]
tracing/blktrace: move the tracing file to kernel/trace, fix
Impact: build fix
The BLK_DEV_IO_TRACE entry used to be in block/Kconfig - which
file itself was dependent on CONFIG_BLOCK. But now the entry is
in kernel/trace/Kconfig - which is present even on !CONFIG_BLOCK.
tracing/function-graph-tracer: drop the kernel_text_address check
When the function graph tracer picks a return address, it ensures this address
is really a kernel text one by calling __kernel_text_address()
Actually this path has never been taken.Its role was more likely to debug the tracer
on the beginning of its development but this function is wasteful since it is called
for every traced function.
Hugh Dickins [Sun, 8 Feb 2009 20:56:58 +0000 (20:56 +0000)]
mm: fix error case in mlock downgrade reversion
Commit 27421e211a39784694b597dbf35848b88363c248, Manually revert
"mlock: downgrade mmap sem while populating mlocked regions", has
introduced its own regression: __mlock_vma_pages_range() may report
an error (for example, -EFAULT from trying to lock down pages from
beyond EOF), but mlock_vma_pages_range() must hide that from its
callers as before.
radeonfb: Fix resume from D3Cold on some platforms
For historical reason, this driver used its own saving/restoring
of the PCI config space, and used the state of it on resume as
an indication as to whether it needed to re-POST the chip or not.
This methods breaks with the later core changes since the core will
have restored things for us.
This patch fixes it by removing that custom code, using standard
core methods to save/restore state, and testing for the need to
re-POST by comparing the content of a few key PLL registers.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
aty128fb: Properly save PCI state before changing PCI PM level
This fixes aty128fb to properly save the PCI config space -before- it
potentially switches the PM state of the chip. This avoids a
warning with the new PM core and is the right thing to do anyway.
I also replaced the hand-coded switch to D2 with a call to the
genericc pci_set_power_state() and removed the code that switches it
back to D0 since the generic code is doing that for us nowadays.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
atyfb: Properly save PCI state before changing PCI PM level
This fixes atyfb to properly save the PCI config space -before- it
potentially switches the PM state of the chip. This avoids a
warning with the new PM core and is the right thing to do anyway.
I also slightly cleaned up the code that checks whether we are
running on a PowerMac to do a runtime check instead of a compile
check only, and replaced a deprecated number with the proper
symbolic constant.
Finally, I removed the useless switch to D0 from resume since
the core does it for us.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cornelia Huck [Tue, 20 Jan 2009 14:31:31 +0000 (15:31 +0100)]
async: Rename _special -> _domain for clarity.
Rename the async_*_special() functions to async_*_domain(), which
describes the purpose of these functions much better.
[Broke up long lines to silence checkpatch]
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cornelia Huck [Mon, 19 Jan 2009 12:45:28 +0000 (13:45 +0100)]
async: Fix running list handling.
async_schedule() should pass in async_running as the running
list, and run_one_entry() should put the entry to be run on
the provided running list instead of always on the generic one.
Reported-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Ingo Molnar [Thu, 5 Feb 2009 15:03:34 +0000 (16:03 +0100)]
drm/i915: select framebuffer support automatically
Migration helper.
The i915 driver recently added a 'depends on FB' rule to its
Kconfig entry - which silently turns off DRM_I915 if someone
has a working config but no CONFIG_FB selected, and upgrades
to the latest upstream kernel.
So change it to "select FB", which auto-selects framebuffer
support. This way the driver keeps working, regardless of
whether FB was enabled before or not.
Kconfig select's of interactive options can be problematic to
dependencies and can cause build breakages - but in this case
it's safe because it's a leaf entry with no dependencies of its
own.
( There is some minor circular dependency fallout as FB_I810
and FB_INTEL also used 'depends on FB' constructs - update
those to "select FB" too. )
Jesse Barnes [Fri, 6 Feb 2009 18:22:41 +0000 (10:22 -0800)]
drm/i915: add get_vblank_counter function for GM45
As discussed in the long thread about vblank related timeouts, it turns out
GM45 has different frame count registers than previous chips. This patch
adds support for them, which prevents us from waiting on really stale
sequence values in drm_wait_vblank (which rather than returning immediately
ends up timing out or getting interrupted).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Dave Airlie <airlied@linux.ie>
Jesse Barnes [Fri, 6 Feb 2009 21:04:49 +0000 (13:04 -0800)]
drm/i915: capture last_vblank count at IRQ uninstall time too
In dc1336ff4fe08ae7cfe8301bfd7f0b2cfd31d20a (set vblank enable flag correctly
across IRQ uninstall), we made sure drivers that uninstall their interrupt
handler set the vblank enabled flag correctly, so that when interrupts are
re-enabled, vblank interrupts & counts work as expected. However I missed the
last_vblank field: it needs to be updated as well, otherwise, at the next
drm_update_vblank_count we'll end up comparing a current count to a stale
one (the last one captured by the disable function), which may trigger the
wraparound handling, leading to a jumpy counter and hangs in drm_wait_vblank.
The jumpy counter can prevent the DRM_WAIT_ON from returning success if the
difference between the current count and the requested count is greater than
2^23, leading to timeouts or hangs, if the ioctl is restarted in a loop (as
is the case in libdrm < 2.4.4).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Michel Dänzer <michel@daenzer.net> Tested-by: Timo Aaltonen <tjaalton@cc.hut.fi> Signed-off-by: Dave Airlie <airlied@redhat.com>
Eric Anholt [Tue, 3 Feb 2009 20:10:21 +0000 (12:10 -0800)]
drm/i915: Quiet the message on get/setparam ioctl with an unknown value.
Getting an unknown get/setparam used to be more significant back when they
didn't change much. However, now that we're in the git world we're using
them instead of a monotonic version number to signal feature availability,
so clients ask about unknown params on older kernels more often.
Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@linux.ie>
Paul Collins [Wed, 4 Feb 2009 10:05:41 +0000 (23:05 +1300)]
drm/i915: skip LVDS initialization on Apple Mac Mini
The Apple Mac Mini falsely reports LVDS. Use DMI to check whether we
are running on a Mac Mini, and skip LVDS initialization if that proves
to be the case.
Signed-off-by: Paul Collins <paul@ondioline.org> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@linux.ie>
Jesse Barnes [Tue, 27 Jan 2009 01:10:45 +0000 (17:10 -0800)]
drm/i915: add fence register management to execbuf
Adds code to set up fence registers at execbuf time on pre-965 chips as
necessary. Also fixes up a few bugs in the pre-965 tile register support
(get_order != ffs). The number of fences available to the kernel defaults
to the hw limit minus 3 (for legacy X front/back/depth), but a new parameter
allows userspace to override that as needed.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@linux.ie>
Eric Anholt [Tue, 27 Jan 2009 18:33:49 +0000 (10:33 -0800)]
drm/i915: Return error from i915_gem_object_get_fence_reg() when failing.
Previously, the caller would continue along without knowing that the
function failed, resulting in potential mis-rendering. Right now vm_fault
just returns SIGBUS in that case, and we may need to disable signal handling
to avoid that happening.
Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@linux.ie>
Eric Anholt [Fri, 23 Jan 2009 20:57:47 +0000 (12:57 -0800)]
drm/i915: Set up an MTRR covering the GTT at driver load.
We'd love to just be using PAT, but even on chips with PAT it gets disabled
sometimes due to an errata. It would probably be better to have pat_enabled
exported and only bother with this when !pat_enabled.
Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@linux.ie>
Steven Rostedt [Fri, 6 Feb 2009 06:14:26 +0000 (01:14 -0500)]
ftrace: change function graph tracer to use new in_nmi
The function graph tracer piggy backed onto the dynamic ftracer
to use the in_nmi custom code for dynamic tracing. The problem
was (as Andrew Morton pointed out) it really only wanted to bail
out if the context of the current CPU was in NMI context. But the
dynamic ftrace in_nmi custom code was true if _any_ CPU happened
to be in NMI context.
Now that we have a generic in_nmi interface, this patch changes
the function graph code to use it instead of the dynamic ftarce
custom code.
Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 6 Feb 2009 05:51:37 +0000 (00:51 -0500)]
nmi: add generic nmi tracking state
This code adds an in_nmi() macro that uses the current tasks preempt count
to track when it is in NMI context. Other parts of the kernel can
use this to determine if the context is in NMI context or not.
This code was inspired by the -rt patch in_nmi version that was
written by Peter Zijlstra, who borrowed that code from
Mathieu Desnoyers.
Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 6 Feb 2009 03:30:07 +0000 (22:30 -0500)]
ftrace, x86: rename in_nmi variable
Impact: clean up
The in_nmi variable in x86 arch ftrace.c is a misnomer.
Andrew Morton pointed out that the in_nmi variable is incremented
by all CPUS. It can be set when another CPU is running an NMI.
Since this is actually intentional, the fix is to rename it to
what it really is: "nmi_running"
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 6 Feb 2009 00:54:51 +0000 (19:54 -0500)]
ring-buffer: allow tracing_off to be used in core kernel code
tracing_off() is the fastest way to stop recording to the ring buffers.
This may be used in places like panic and die, just before the
ftrace_dump is called.
This patch adds the appropriate CPP conditionals to make it a stub
function when the ring buffer is not configured it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 5 Feb 2009 23:43:07 +0000 (18:43 -0500)]
ring-buffer: add NMI protection for spinlocks
Impact: prevent deadlock in NMI
The ring buffers are not yet totally lockless with writing to
the buffer. When a writer crosses a page, it grabs a per cpu spinlock
to protect against a reader. The spinlocks taken by a writer are not
to protect against other writers, since a writer can only write to
its own per cpu buffer. The spinlocks protect against readers that
can touch any cpu buffer. The writers are made to be reentrant
with the spinlocks disabling interrupts.
The problem arises when an NMI writes to the buffer, and that write
crosses a page boundary. If it grabs a spinlock, it can be racing
with another writer (since disabling interrupts does not protect
against NMIs) or with a reader on the same CPU. Luckily, most of the
users are not reentrant and protects against this issue. But if a
user of the ring buffer becomes reentrant (which is what the ring
buffers do allow), if the NMI also writes to the ring buffer then
we risk the chance of a deadlock.
This patch moves the ftrace_nmi_enter called by nmi_enter() to the
ring buffer code. It replaces the current ftrace_nmi_enter that is
used by arch specific code to arch_ftrace_nmi_enter and updates
the Kconfig to handle it.
When an NMI is called, it will set a per cpu variable in the ring buffer
code and will clear it when the NMI exits. If a write to the ring buffer
crosses page boundaries inside an NMI, a trylock is used on the spin
lock instead. If the spinlock fails to be acquired, then the entry
is discarded.
This bug appeared in the ftrace work in the RT tree, where event tracing
is reentrant. This workaround solved the deadlocks that appeared there.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sun, 8 Feb 2009 00:38:43 +0000 (19:38 -0500)]
trace: remove deprecated entry->cpu
Impact: fix to prevent developers from using entry->cpu
With the new ring buffer infrastructure, the cpu for the entry is
implicit with which CPU buffer it is on.
The original code use to record the current cpu into the generic
entry header, which can be retrieved by entry->cpu. When the
ring buffer was introduced, the users were convert to use the
the cpu number of which cpu ring buffer was in use (this was passed
to the tracers by the iterator: iter->cpu).
Unfortunately, the cpu item in the entry structure was never removed.
This allowed for developers to use it instead of the proper iter->cpu,
unknowingly, using an uninitialized variable. This was not the fault
of the developers, since it would seem like the logical place to
retrieve the cpu identifier.
This patch removes the cpu item from the entry structure and fixes
all the users that should have been using iter->cpu.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Linus Torvalds [Sat, 7 Feb 2009 18:46:30 +0000 (10:46 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI PM: make the PM core more careful with drivers using the new PM framework
PCI PM: Read power state from device after trying to change it on resume
PCI PM: Do not disable and enable bridges during suspend-resume
PCI: PCIe portdrv: Simplify suspend and resume
PCI PM: Fix saving of device state in pci_legacy_suspend
PCI PM: Check if the state has been saved before trying to restore it
PCI PM: Fix handling of devices without drivers
PCI: return error on failure to read PCI ROMs
PCI: properly clean up ASPM link state on device remove
Rusty Russell [Sat, 7 Feb 2009 07:45:56 +0000 (18:15 +1030)]
module: remove over-zealous check in __module_get()
Impact: fix spurious BUG_ON() triggered under load
module_refcount() isn't reliable outside stop_machine(), as demonstrated
by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
(an inc on one cpu and dec on another while module_refcount() is tallying
can give false results, for example).
Almost noone should be using __module_get, but that's another issue.
Linus Torvalds [Sat, 7 Feb 2009 16:30:20 +0000 (08:30 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (30 commits)
ACPI: Kconfig text - Fix the ACPI_CONTAINER module name according to the real module name.
eeepc-laptop: fix oops when changing backlight brightness during eeepc-laptop init
ACPICA: Fix table entry truncation calculation
ACPI: Enable bit 11 in _PDC to advertise hw coord
ACPI: struct device - replace bus_id with dev_name(), dev_set_name()
ACPI: add missing KERN_* constants to printks
ACPI: dock: Don't eval _STA on every show_docked sysfs read
ACPI: disable ACPI cleanly when bad RSDP found
ACPI: delete CPU_IDLE=n code
ACPI: cpufreq: Remove deprecated /proc/acpi/processor/../performance proc entries
ACPI: make some IO ports off-limits to AML
ACPICA: add debug dump of BIOS _OSI strings
ACPI: proc_dir_entry 'video/VGA' already registered
ACPI: Skip the first two elements in the _BCL package
ACPI: remove BM_RLD access from idle entry path
ACPI: remove locking from PM1x_STS register reads
eeepc-laptop: use netlink interface
eeepc-laptop: Implement rfkill hotplugging in eeepc-laptop
eeepc-laptop: Check return values from rfkill_register
eeepc-laptop: Add support for extended hotkeys
...
Darren Salt [Sat, 7 Feb 2009 06:02:07 +0000 (01:02 -0500)]
eeepc-laptop: fix oops when changing backlight brightness during eeepc-laptop init
I got the following oops while changing the backlight brightness during
startup. When it happens, it prevents use of the hotkeys, Fn-Fx, and the
lid button.
It's a clear use-before-init, as I verified by testing with an
appropriately-placed "else printk".
Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Myron Stowe [Fri, 30 Jan 2009 22:44:53 +0000 (15:44 -0700)]
ACPICA: Fix table entry truncation calculation
During early boot, ACPI RSDT/XSDT table entries are gathered into the
'initial_tables[]' array. This array is currently statically defined (see
./drivers/acpi/tables.c). When there are more table entries than can be
held in the 'initial_tables[]' array, the message "Truncating N table
entries!" is output. As currently implemented, this message will always
erroneously calculate N as 0.
This patch fixes the calculation that determines how many table entries
will be missing (truncated).
This modification may be used under either the GPL or the BSD-style
license used for Intel ACPI CA code.
Signed-off-by: Myron Stowe <myron.stowe@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Bit 11 in intel PDC definitions is meant for OS capability to handle
hardware coordination of P-states. In Linux we have always supported
hwardware coordination of P-states. Just let the BIOSes know that we
support it, by setting this bit.
Some BIOSes use this bit to choose between hardware or software coordination
and without this change below, BIOSes switch to software coordination, which
is not very optimal in terms of power consumption and extra wakeups from idle.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Frank Seidel [Wed, 4 Feb 2009 16:03:07 +0000 (17:03 +0100)]
ACPI: add missing KERN_* constants to printks
According to kerneljanitors todo list all printk calls (beginning
a new line) should have an according KERN_* constant.
Those are the missing peaces here for the acpi subsystem.
Signed-off-by: Frank Seidel <frank@f-seidel.de> Signed-off-by: Len Brown <len.brown@intel.com>
Holger Macht [Tue, 20 Jan 2009 11:18:24 +0000 (12:18 +0100)]
ACPI: dock: Don't eval _STA on every show_docked sysfs read
Some devices trigger a DEVICE_CHECK on every evalutation of _STA. This
can also be seen in commit 8b59560a3baf2e7c24e0fb92ea5d09eca92805db
(ACPI: dock: avoid check _STA method). If an undock is processed, the
dock driver sends a uevent and userspace might read the show_docked
property in sysfs. This causes an evaluation of _STA of the particular
device which causes the dock driver to immediately dock again.
In any case, evaluation of _STA (show_docked) does not necessarily mean
that we are docked, so check with the internal device structure.
http://bugzilla.kernel.org/show_bug.cgi?id=12360
Signed-off-by: Holger Macht <hmacht@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>