Steven Rostedt [Wed, 11 Mar 2009 23:52:30 +0000 (19:52 -0400)]
tracing: fix trace_wait to know to wait on all cpus or just one
Impact: fix to task live locking on reading trace_pipe on one CPU
The same code is used for both trace_pipe (all CPUS) and the per_cpu
trace_pipe file. When there is no data to read, it will check for
signals and wait on the trace wait queue.
The problem happens with the per_cpu wait. The trace_wait code checks
all CPUs. Thus, if there's data in another CPU buffer, then it will
exit the wait, without checking for signals or waiting on the wait queue.
It would then try to read the empty buffer, and since that will just
return nothing, then it will try to wait again. Unfortunately, that will
again fail due to there still being data in the other buffers. This
ends up with a live lock for the task.
This patch fixes the trace_wait to be aware that the iterator may only
be waiting on a single buffer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Wed, 11 Mar 2009 18:33:00 +0000 (14:33 -0400)]
tracing: expand the ring buffers when an event is activated
To save memory, the tracer ring buffers are set to a minimum.
The activating of a trace expands the ring buffer size. This patch
adds this expanding, when an event is activated.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Wed, 11 Mar 2009 17:42:01 +0000 (13:42 -0400)]
tracing: keep ring buffer to minimum size till used
Impact: less memory impact on systems not using tracer
When the kernel boots up that has tracing configured, it allocates
the default size of the ring buffer. This currently happens to be
1.4Megs per possible CPU. This is quite a bit of wasted memory if
the system is never using the tracer.
The current solution is to keep the ring buffers to a minimum size
until the user uses them. Once a tracer is piped into the current_tracer
the ring buffer will be expanded to the default size. If the user
changes the size of the ring buffer, it will take the size given
by the user immediately.
If the user adds a "ftrace=" to the kernel command line, then the ring
buffers will be set to the default size on initialization.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 10 Mar 2009 16:41:38 +0000 (12:41 -0400)]
tracing: flip the TP_printk and TP_fast_assign in the TRACE_EVENT macro
Impact: clean up
In trying to stay consistant with the C style format in the TRACE_EVENT
macro, it makes more sense to do the printk after the assigning of
the variables.
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 10 Mar 2009 16:04:02 +0000 (12:04 -0400)]
tracing: add back the available_events file
The event directory files type and available_types were no longer
needed with the new TRACE_EVENT_FORMAT macros, they were deleted.
But by accident the available_events file was also removed.
This patch brings it back.
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 10 Mar 2009 15:32:40 +0000 (11:32 -0400)]
tracing: do not allow modifying the ftrace events via the event files
Impact: fix to prevent crash on calling NULL function pointer
The ftrace internal records have their format exported via the event
system under the ftrace subsystem. These are only for exporting the
format to allow binary readers to be able to parse them in a binary
output.
The ftrace subsystem events can only be enabled via the ftrace tracers
and do not have a registering function. The event files expect the
event record to have registering function and will call it directly.
Passing in a ftrace subsystem event will cause the kernel to crash
because it will execute a NULL pointer.
This patch prevents the ftrace subsystem from being viewable to the
event enabling files.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 10 Mar 2009 04:15:34 +0000 (00:15 -0400)]
tracing: remove obsolete TRACE_EVENT_FORMAT macro
Impact: clean up
The TRACE_EVENT_FORMAT macro is no longer used by trace points
and only the DECLARE_TRACE, TRACE_FORMAT or TRACE_EVENT macros should
be used by them. Although the TRACE_EVENT_FORMAT macro is still used
by the internal tracing utility, it should not be used in core
kernel code.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 10 Mar 2009 03:23:30 +0000 (23:23 -0400)]
tracing: convert irq trace points to new macros
Impact: enhancement
Converted the two irq trace point macros. The entry macro copies
the name of the irq handler, thus it is better to simply use the
TRACE_FORMAT macro which uses the trace_printk.
The return of the handler does not need to record the name, thus
the faster C style handler is more approriate.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Mon, 9 Mar 2009 21:14:30 +0000 (17:14 -0400)]
tracing: new format for specialized trace points
Impact: clean up and enhancement
The TRACE_EVENT_FORMAT macro looks quite ugly and is limited in its
ability to save data as well as to print the record out. Working with
Ingo Molnar, we came up with a new format that is much more pleasing to
the eye of C developers. This new macro is more C style than the old
macro, and is more obvious to what it does.
Here's the example. The only updated macro in this patch is the
sched_switch trace point.
The above method is hard to read and requires two format fields.
The new method:
/*
* Tracepoint for task switches, performed by the scheduler:
*
* (NOTE: the 'rq' argument is not used by generic trace events,
* but used by the latency tracer plugin. )
*/
TRACE_EVENT(sched_switch,
This macro is called TRACE_EVENT, it is broken up into 5 parts:
TP_PROTO: the proto type of the trace point
TP_ARGS: the arguments of the trace point
TP_STRUCT_entry: the structure layout of the entry in the ring buffer
TP_printk: the printk format
TP_fast_assign: the method used to write the entry into the ring buffer
The structure is the definition of how the event will be saved in the
ring buffer. The printk is used by the internal tracing in case of
an oops, and the kernel needs to print out the format of the record
to the console. This the TP_printk gives a means to show the records
in a human readable format. It is also used to print out the data
from the trace file.
The TP_fast_assign is executed directly. It is basically like a C function,
where the __entry is the handle to the record.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 6 Mar 2009 15:50:53 +0000 (10:50 -0500)]
tracing: typecast sizeof and offsetof to unsigned int
Impact: fix compiler warnings
On x86_64 sizeof and offsetof are treated as long, where as on x86_32
they are int. This patch typecasts them to unsigned int to avoid
one arch giving warnings while the other does not.
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk()
Impact: faster and lighter tracing
Now that we have trace_bprintk() which is faster and consume lesser
memory than trace_printk() and has the same purpose, we can now drop
the old implementation in favour of the binary one from trace_bprintk(),
which means we move all the implementation of trace_bprintk() to
trace_printk(), so the Api doesn't change except that we must now use
trace_seq_bprintk() to print the TRACE_PRINT entries.
Some changes result of this:
- Previously, trace_bprintk depended of a single tracer and couldn't
work without. This tracer has been dropped and the whole implementation
of trace_printk() (like the module formats management) is now integrated
in the tracing core (comes with CONFIG_TRACING), though we keep the file
trace_printk (previously trace_bprintk.c) where we can find the module
management. Thus we don't overflow trace.c
- changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.
- change a bit trace_printk/trace_vprintk macros to support non-builtin formats
constants, and fix 'const' qualifiers warnings. But this is all transparent for
developers.
- etc...
V2:
- Rebase against last changes
- Fix mispell on the changelog
V3:
- Rebase against last changes (moving trace_printk() to kernel.h)
Lai Jiangshan [Fri, 6 Mar 2009 16:21:48 +0000 (17:21 +0100)]
tracing: add trace_bprintk()
Impact: add a generic printk() for tracing, like trace_printk()
trace_bprintk() uses the infrastructure to record events on ring_buffer.
[ fweisbec@gmail.com: ported to latest -tip, made it work if
!CONFIG_MODULES, never free the format strings from modules
because we can't keep track of them and conditionnaly create
the ftrace format strings section (reported by Steven Rostedt) ]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Fri, 6 Mar 2009 16:21:47 +0000 (17:21 +0100)]
tracing: infrastructure for supporting binary record
Impact: save on memory for tracing
Current tracers are typically using a struct(like struct ftrace_entry,
struct ctx_switch_entry, struct special_entr etc...)to record a binary
event. These structs can only record a their own kind of events.
A new kind of tracer need a new struct and a lot of code too handle it.
So we need a generic binary record for events. This infrastructure
is for this purpose.
[fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
tracing as reported by Steven Rostedt]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
vsprintf: unify the format decoding layer for its 3 users
An new optimization is making its way to ftrace. Its purpose is to
make trace_printk() consuming less memory and be faster.
Written by Lai Jiangshan, the approach is to delay the formatting
job from tracing time to output time.
Currently, a call to trace_printk() will format the whole string and
insert it into the ring buffer. Then you can read it on /debug/tracing/trace
file.
The new implementation stores the address of the format string and
the binary parameters into the ring buffer, making the packet more compact
and faster to insert.
Later, when the user exports the traces, the format string is retrieved
with the binary parameters and the formatting job is eventually done.
The new implementation rewrites a lot of format decoding bits from
vsnprintf() function, making now 3 differents functions to maintain
in their duplicated parts of printf format decoding bits.
Suggested by Ingo Molnar, this patch tries to factorize the most
possible common bits from these functions.
The real common part between them is the format decoding. Although
they do somewhat similar jobs, their way to export or import the parameters
is very different. Thus, only the decoding layer is extracted, unless you see
other parts that could be worth factorized.
Changes in V2:
- Address a suggestion from Linus to group the format_decode() parameters inside
a structure.
Changes in v3:
- Address other cleanups suggested by Ingo and Linus such as passing the
printf_spec struct to the format helpers: pointer()/number()/string()
Note that this struct is passed by copy and not by address. This is to
avoid side effects because these functions often change these values and the
changes shoudn't be persistant when a callee helper returns.
It would be too risky.
- Various cleanups (code alignement, switch/case instead of if/else fountains).
- Fix a bug that printed the first format specifier following a %p
Changes in v4:
- drop unapropriate const qualifier loss while casting fmt to a char *
(thanks to Vegard Nossum for having pointed this out).
Steven Rostedt [Fri, 6 Mar 2009 02:35:29 +0000 (21:35 -0500)]
tracing: add format files for ftrace default entries
Impact: allow user apps to read binary format of basic ftrace entries
Currently, only defined raw events export their formats so a binary
reader can parse them. There's no reason that the default ftrace entries
can't export their formats.
This patch adds a subsystem called "ftrace" in the events directory
that includes the ftrace entries for basic ftrace recorded items.
These only have three files in the events directory:
type : printf
available_types : printf
format : format for the event entry
Steven Rostedt [Thu, 5 Mar 2009 16:45:43 +0000 (11:45 -0500)]
tracing: move print of event format to separate file
Impact: clean up
Move the macro that creates the event format file to a separate header.
This will allow the default ftrace events to use this same macro
to create the formats to read those events.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 5 Mar 2009 15:35:56 +0000 (10:35 -0500)]
tracing: add tracing_on/tracing_off to kernel.h
Impact: cleanup
The functions tracing_start/tracing_stop have been moved to kernel.h.
These are not the functions a developer most likely wants to use
when they want to insert a place to stop tracing and restart it from
user space.
tracing_start/tracing_stop was created to work with things like
suspend to ram, where even calling smp_processor_id() can crash the
system. The tracing_start/tracing_stop was used to stop the tracer from
doing anything. These are still light weight functions, but add a bit
more overhead to be able to stop the tracers. They also have no interface
back to userland. That is, if the kernel calls tracing_stop, userland
can not start tracing.
What a developer most likely wants to use is tracing_on/tracing_off.
These are very light weight functions (simply sets or clears a bit).
These functions just stop recording into the ring buffer. The tracers
don't even know that this happens except that they would receive NULL
from the ring_buffer_lock_reserve function.
Also, there's a way for the user land to enable or disable this bit.
In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
or echo "1" (same as tracing_on()) into this file. This becomes handy when
a kernel developer is debugging and wants tracing to turn off when it
hits an anomaly. Then the developer can examine the trace, and restart
tracing if they want to try again (echo 1 > tracing_on).
This patch moves the prototypes for tracing_on/tracing_off to kernel.h
and comments their use, so that a kernel developer will know how
to use them.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
tracing/function-graph-tracer: use the more lightweight local clock
Impact: decrease hangs risks with the graph tracer on slow systems
Since the function graph tracer can spend too much time on timer
interrupts, it's better now to use the more lightweight local
clock. Anyway, the function graph traces are more reliable on a
per cpu trace.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 5 Mar 2009 09:24:48 +0000 (10:24 +0100)]
tracing: rename ftrace_printk() => trace_printk()
Impact: cleanup
Use a more generic name - this also allows the prototype to move
to kernel.h and be generally available to kernel developers who
want to do some quick tracing.
Steven Rostedt [Thu, 5 Mar 2009 03:15:30 +0000 (22:15 -0500)]
tracing: have latency tracers set the latency format
The latency tracers (irqsoff, preemptoff, preemptirqsoff, and wakeup)
are pretty useless with the default output format. This patch makes them
automatically enable the latency format when they are selected. They
also record the state of the latency option, and if it was not enabled
when selected, they disable it on reset.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 5 Mar 2009 02:57:29 +0000 (21:57 -0500)]
tracing: consolidate print_lat_fmt and print_trace_fmt
Impact: clean up
Both print_lat_fmt and print_trace_fmt do pretty much the same thing
except for one different function call. This patch consolidates the
two functions and adds an if statement to perform the difference.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 5 Mar 2009 02:42:04 +0000 (21:42 -0500)]
tracing: remove extra latency_trace method from trace structure
Impact: clean up
The trace and latency_trace function pointers are identical for
every tracer but the function tracer. The differences in the function
tracer are trivial (latency output puts paranthesis around parent).
This patch removes the latency_trace pointer and all prints will
now just use the trace output function pointer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 5 Mar 2009 01:34:24 +0000 (20:34 -0500)]
tracing: add latency output format option
With the removal of the latency_trace file, we lost the ability
to see some of the finer details in a trace. Like the state of
interrupts enabled, the preempt count, need resched, and if we
are in an interrupt handler, softirq handler or not.
This patch simply creates an option to bring back the old format.
This also removes the warning about an unused variable that held
the latency_trace file operations.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 5 Mar 2009 00:10:05 +0000 (19:10 -0500)]
tracing: do not return EFAULT if read copied anything
Impact: fix trace read to conform to standards
Andrew Morton, Theodore Tso and H. Peter Anvin brought to my attention
that a userspace read should not return -EFAULT if it succeeded in
copying anything. It should only return -EFAULT if it failed to copy
at all.
This patch modifies the check of copy_from_user and updates the return
code appropriately.
I also used H. Peter Anvin's short cut rule to just test ret == count.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Wed, 4 Mar 2009 04:52:42 +0000 (23:52 -0500)]
ring-buffer: fix timestamp in partial ring_buffer_page_read
If a partial ring_buffer_page_read happens, then some of the
incremental timestamps may be lost. This patch writes the
recent timestamp into the page that is passed back to the caller.
A partial ring_buffer_page_read is where the full page would not
be written back to the user, and instead, just part of the page
is copied to the user. A full page would be a page swap with the
ring buffer and the timestamps would be correct.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
added a new field to the iterator called cpu_file. This was a handle
to differentiate between the per cpu trace output files and the
all cpu "trace" file. The all cpu "trace" file required setting this
to TRACE_PIPE_ALL_CPU.
The problem is that the ftrace_dump sets up its own iterator but was
not updated to handle this change. The result was only CPU 0 printing
out on crash and a lot of "<0>"'s also being printed.
Reported-by: Thomas Gleixner <tglx@linuxtronix.de> Tested-by: Darren Hart <dvhtc@us.ibm.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 2 Dec 2008 03:20:19 +0000 (22:20 -0500)]
tracing: add binary buffer files for use with splice
Impact: new feature
This patch creates a directory of files that correspond to the
per CPU ring buffers. These are binary files and are made to
be used with splice. This is the fastest way to extract data from
the ftrace ring buffers.
Thanks to Jiaying Zhang for pushing me to get this code fixed,
and to Eduard - Gabriel Munteanu for his splice code that helped
me debug my code.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Wed, 4 Mar 2009 00:51:40 +0000 (19:51 -0500)]
ring-buffer: make ring_buffer_read_page read from start on partial page
Impact: dont leave holes in read buffer page
The ring_buffer_read_page swaps a given page with the reader page
of the ring buffer, if certain conditions are set:
1) requested length is big enough to hold entire page data
2) a writer is not currently on the page
3) the page is not partially consumed.
Instead of swapping with the supplied page. It copies the data to
the supplied page instead. But currently the data is copied in the
same offset as the source page. This causes a hole at the start
of the reader page. This complicates the use of this function.
Instead, it should copy the data at the beginning of the function
and update the index fields accordingly.
Other small clean ups are also done in this patch.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 3 Mar 2009 18:53:07 +0000 (13:53 -0500)]
ring-buffer: replace sizeof of event header with offsetof
Impact: fix to possible alignment problems on some archs.
Some arch compilers include an NULL char array in the sizeof field.
Since the ring_buffer_event type includes one of these, it is better
to use the "offsetof" instead, to avoid strange bugs on these archs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 3 Mar 2009 05:27:49 +0000 (00:27 -0500)]
ring-buffer: fix ring_buffer_read_page
The ring_buffer_read_page was broken if it were to only copy part
of the page. This patch fixes that up as well as adds a parameter
to allow a length field, in order to only copy part of the buffer page.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Linus Torvalds [Tue, 3 Mar 2009 22:33:20 +0000 (14:33 -0800)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: don't allow setuid to succeed if the user does not have rt bandwidth
sched_rt: don't start timer when rt bandwidth disabled
Linus Torvalds [Tue, 3 Mar 2009 22:32:55 +0000 (14:32 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: oprofile: don't set counter width from cpuid on Core2
x86: fix init_memory_mapping() to handle small ranges
Linus Torvalds [Tue, 3 Mar 2009 22:32:37 +0000 (14:32 -0800)]
Merge branch 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86 mmiotrace: fix race with release_kmmio_fault_page()
x86 mmiotrace: improve handling of secondary faults
x86 mmiotrace: split set_page_presence()
x86 mmiotrace: fix save/restore page table state
x86 mmiotrace: WARN_ONCE if dis/arming a page fails
x86: add far read test to testmmiotrace
x86: count errors in testmmiotrace.ko
Linus Torvalds [Tue, 3 Mar 2009 22:32:04 +0000 (14:32 -0800)]
Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Teach RCU that idle task is not quiscent state at boot
Russell King [Tue, 3 Mar 2009 13:43:47 +0000 (13:43 +0000)]
[ARM] fix lots of ARM __devexit sillyness
`iop_adma_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`mv_xor_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`mv64xxx_i2c_unmap_regs' referenced in section `.devinit.text' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`mv64xxx_i2c_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`orion_nand_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`pxafb_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Catalin Marinas [Tue, 3 Mar 2009 10:44:12 +0000 (11:44 +0100)]
[ARM] 5417/1: Set the correct cacheid for ARMv6 CPUs with ARMv7 style MMU
The cacheid_init() function assumes that if cpu_architecture() returns
7, the caches are VIPT_NONALIASING. The cpu_architecture() function
returns the version of the supported MMU features (e.g. TEX remapping)
but it doesn't make any assumptions about the cache type. The patch adds
the checking of the Cache Type Register for the ARMv7 format.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Seth Forshee [Mon, 2 Mar 2009 21:39:36 +0000 (22:39 +0100)]
[ARM] 5416/1: Use unused address in v6_early_abort
The target of the strex instruction to clear the exlusive monitor
is currently the top of the stack. If the store succeeeds this
corrupts r0 in pt_regs. Use the next stack location instead of
the current one to prevent any chance of corrupting an in-use
address.
Signed-off-by: Seth Forshee <seth.forshee@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Yinghai Lu [Tue, 3 Mar 2009 07:36:13 +0000 (23:36 -0800)]
x86: fix init_memory_mapping() to handle small ranges
Impact: fix failed EFI bootup in certain circumstances
Ying Huang found init_memory_mapping() has problem with small ranges
less than 2M when he tried to direct map the EFI runtime code out of
max_low_pfn_mapped.
It turns out we never considered that case and didn't check the range...
Reported-by: Ying Huang <ying.huang@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Brian Maly <bmaly@redhat.com>
LKML-Reference: <49ACDDED.1060508@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Tue, 3 Mar 2009 00:23:33 +0000 (16:23 -0800)]
Revert "menu: fix embedded menu snafu"
This reverts commit 155b25bcc28631a5b5230191aa3f56c40dfffa3f, which was
totally wrong - the "embedded" options still exists (very much so) even
on non-embedded platforms.
It's just that we don't bother with actually asking about them when
we're not embedded, we just take their default values (which is usually
'y' - the options add features that may not be worth it in a constrained
environment).
Noticed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 3 Mar 2009 00:11:36 +0000 (16:11 -0800)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/i915: Fix use-before-null-check in i915_irq_emit().
drm: Avoid client deadlocks when the master disappears.
drm: Wake up all lock waiters when the master disappears.
drm: Don't return ERESTARTSYS to user-space.
drm: Avoid client deadlocks when the master disappears.
This is done by
1) Wake up lock waiters when we close the master file descriptor.
Not when the master structure is removed, since the latter
requires the waiters themselves to release the refcount on the
master structure -> Deadlock.
2) Send a SIGTERM to all clients waiting for the lock.
Normally these clients will get a SIGPIPE when the X server dies,
but clients may also spin trying to grab the DRM lock, without
getting any sort of notification.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@linux.ie>
Randy Dunlap [Mon, 2 Mar 2009 22:14:06 +0000 (14:14 -0800)]
menu: fix embedded menu snafu
The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in
the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED
menu. Fix by moving it to just after all of the EMBEDDED menu symbols.
Also, surround all of the EMBEDDED symbols with "if EMBEDDED"/"endif" so
that this EMBEDDED block is clearer.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 2 Mar 2009 23:48:00 +0000 (15:48 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
sdhci: Add NO_BUSY_IRQ quirk for Marvell CAFE host chip
sdhci: Add quirk for controllers with no end-of-busy IRQ
Linus Torvalds [Mon, 2 Mar 2009 23:47:19 +0000 (15:47 -0800)]
Merge branch 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Add probe_mask default for Toshiba laptop with ALC268
ALSA: hda - Add quirk for new HP xw series
ALSA: hda - Fix digital mic on dell-m4-1 and dell-m4-3
Linus Torvalds [Mon, 2 Mar 2009 23:47:01 +0000 (15:47 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
fix warning in io_mapping_map_wc()
x86: i915 needs pgprot_writecombine() and is_io_mapping_possible()
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
zaurus: add usb id for motomagx phones
usbnet: make usbnet_get_link() fall back to ethtool_op_get_link()
veth: Fix carrier detect
cdc_ether: add usb id for Ericsson F3507g
r8169: read MAC address from EEPROM on init (2nd attempt)
tcp: fix retrans_out leaks
net headers: export dcbnl.h
net headers: cleanup dcbnl.h
netpoll: Add drop checks to all entry points
gianfar: Do right check on num_txbdfree
pkt_sched: sch_drr: Fix oops in drr_change_class.
b44: Disable device on shutdown
b44: Unconditionally enable interrupt routing on reset
net: fix hp-plus build error
libertas: fix misuse of netdev_priv() and dev->ml_priv
ipv6: don't use tw net when accounting for recycled tw
asix: new device ids
tcp_scalable: Update malformed & dead url
netfilter: xt_recent: fix proc-file addition/removal of IPv4 addresses
netxen: handle pci bar 0 mapping failure
...
Linus Torvalds [Mon, 2 Mar 2009 23:43:03 +0000 (15:43 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elantech - touchpad driver miss-recognising logitech mice
Input: synaptics - ensure we reset the device on resume
Input: usbtouchscreen - fix eGalax HID ignoring
Input: ambakmi - fix timeout handling in amba_kmi_write()
Input: pxa930_trkball - fix write timeout handling
Input: struct device - replace bus_id with dev_name(), dev_set_name()
Input: bf54x-keys - fix debounce time validation
Input: spitzkbd - mark probe function as __devinit
Input: omap-keypad - mark probe function as __devinit
Input: corgi_ts - mark probe function as __devinit
Input: corgikbd - mark probe function as __devinit
Input: uvc - the button on the camera is KEY_CAMERA
Input: psmouse - make MOUSE_PS2_LIFEBOOK depend on X86
Input: atkbd - make forced_release_keys[] static
Input: usbtouchscreen - allow reporting calibrated data
Linus Torvalds [Mon, 2 Mar 2009 23:42:26 +0000 (15:42 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: don't call jbd2_journal_force_commit_nested without journal
ext4: Reorder fs/Makefile so that ext2 root fs's are mounted using ext2
ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze()
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] mpt: fix disable lsi sas to use msi as default
[SCSI] fix ABORTED_COMMAND looping forever problem
[SCSI] sd: revive sd_index_lock
[SCSI] cxgb3i: update the driver version to 1.0.1
[SCSI] cxgb3i: Fix spelling errors in documentation
[SCSI] cxgb3i: added missing include in cxgb3i_ddp.h
[SCSI] cxgb3i: Outgoing pdus need to observe skb's MAX_SKB_FRAGS
[SCSI] cxgb3i: added per-task data to track transmit progress
[SCSI] cxgb3i: transmit work-request fixes
[SCSI] hptiop: Add new PCI device ID
Roland McGrath [Sat, 28 Feb 2009 07:25:54 +0000 (23:25 -0800)]
x86-64: seccomp: fix 32/64 syscall hole
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call. A 64-bit process make a 32-bit system call with int $0x80.
In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table. The fix is simple: test TS_COMPAT
instead of TIF_IA32. Here is an example exploit:
/* test case for seccomp circumvention on x86-64
There are two failure modes: compile with -m64 or compile with -m32.
The -m64 case is the worst one, because it does "chmod 777 ." (could
be any chmod call). The -m32 case demonstrates it was able to do
stat(), which can glean information but not harm anything directly.
A buggy kernel will let the test do something, print, and exit 1; a
fixed kernel will make it exit with SIGKILL before it does anything.
*/
Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Sat, 28 Feb 2009 03:03:24 +0000 (19:03 -0800)]
x86-64: syscall-audit: fix 32/64 syscall hole
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call. A 64-bit process make a 32-bit system call with int $0x80.
In both these cases, audit_syscall_entry() will use the wrong system
call number table and the wrong system call argument registers. This
could be used to circumvent a syscall audit configuration that filters
based on the syscall numbers or argument details.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current definition of CALLER_ADDRx isn't suitable for all platforms.
E.g. for ARM __builtin_return_address(N) doesn't work for N > 0 and
AFAIK for powerpc there are no frame pointers needed to have a working
__builtin_return_address. This patch allows defining the CALLER_ADDRx
macros in <asm/ftrace.h> and let these take precedence.
Because now <asm/ftrace.h> is included unconditionally in
<linux/ftrace.h> all archs that don't already had this include get an
empty one for free.
Signed-off-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Andres Salomon [Mon, 2 Mar 2009 20:48:20 +0000 (21:48 +0100)]
sdhci: Add NO_BUSY_IRQ quirk for Marvell CAFE host chip
As described here: http://lkml.org/lkml/2009/2/20/265
The CAFE chip is broken due to commit e809517f6fa5803a5a1cd5602.
Anton added a quirk here: http://lkml.org/lkml/2009/2/20/279 that fixes
CAFE's problem. This adds the quirk for CAFE.
Signed-off-by: Andres Salomon <dilinger@debian.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>