Rusty Russell [Tue, 17 Mar 2009 21:52:30 +0000 (08:22 +1030)]
x86, uv: fix cpumask iterator in uv_bau_init()
Impact: fix boot crash on UV systems
Commit 76ba0ecda0de9accea9a91cb6dbde46782110e1c "cpumask: use
cpumask_var_t in uv_flush_tlb_others" used cur_cpu as an iterator;
it was supposed to be zero for the code below it.
Suresh Siddha [Tue, 17 Mar 2009 18:16:54 +0000 (10:16 -0800)]
x86: add x2apic_wrmsr_fence() to x2apic flush tlb paths
Impact: optimize APIC IPI related barriers
Uncached MMIO accesses for xapic are inherently serializing and hence
we don't need explicit barriers for xapic IPI paths.
x2apic MSR writes/reads don't have serializing semantics and hence need
a serializing instruction or mfence, to make all the previous memory
stores globally visisble before the x2apic msr write for IPI.
Add x2apic_wrmsr_fence() in flush tlb path to x2apic specific paths.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "steiner@sgi.com" <steiner@sgi.com> Cc: Nick Piggin <npiggin@suse.de>
LKML-Reference: <1237313814.27006.203.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh Siddha [Tue, 17 Mar 2009 00:05:05 +0000 (17:05 -0700)]
x86, dmar: use atomic allocations for QI and Intr-remapping init
Impact: invalid use of GFP_KERNEL in interrupt context
Queued invalidation and interrupt-remapping will get initialized with
interrupts disabled (while enabling interrupt-remapping). So use
GFP_ATOMIC instead of GFP_KERNEL for memory alloacations.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Suresh Siddha [Tue, 17 Mar 2009 00:05:04 +0000 (17:05 -0700)]
x86: fix broken irq migration logic while cleaning up multiple vectors
Impact: fix spurious IRQs
During irq migration, we send a low priority interrupt to the previous
irq destination. This happens in non interrupt-remapping case after interrupt
starts arriving at new destination and in interrupt-remapping case after
modifying and flushing the interrupt-remapping table entry caches.
This low priority irq cleanup handler can cleanup multiple vectors, as
multiple irq's can be migrated at almost the same time. While
there will be multiple invocations of irq cleanup handler (one cleanup
IPI for each irq migration), first invocation of the cleanup handler
can potentially cleanup more than one vector (as the first invocation can
see the requests for more than vector cleanup). When we cleanup multiple
vectors during the first invocation of the smp_irq_move_cleanup_interrupt(),
other vectors that are to be cleanedup can still be pending in the local
cpu's IRR (as smp_irq_move_cleanup_interrupt() runs with interrupts disabled).
When we are ready to unhook a vector corresponding to an irq, check if that
vector is registered in the local cpu's IRR. If so skip that cleanup and
do a self IPI with the cleanup vector, so that we give a chance to
service the pending vector interrupt and then cleanup that vector
allocation once we execute the lowest priority handler.
This fixes spurious interrupts seen when migrating multiple vectors
at the same time.
[ This is apparently possible even on conventional xapic, although to
the best of our knowledge it has never been seen. The stable
maintainers may wish to consider this one for -stable. ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: stable@kernel.org
Suresh Siddha [Tue, 17 Mar 2009 00:05:03 +0000 (17:05 -0700)]
x86, ioapic: Fix non atomic allocation with interrupts disabled
Impact: fix possible race
save_mask_IO_APIC_setup() was using non atomic memory allocation while getting
called with interrupts disabled. Fix this by splitting this into two different
function. Allocation part save_IO_APIC_setup() now happens before
disabling interrupts.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Suresh Siddha [Tue, 17 Mar 2009 00:05:01 +0000 (17:05 -0700)]
x86, x2apic: cleanup the IO-APIC level migration with interrupt-remapping
Impact: simplification
In the current code, for level triggered migration, we need to modify the
io-apic RTE with the update vector information, along with modifying interrupt
remapping table entry(IRTE) with vector and destination. This is to ensure that
remote IRR bit inthe IOAPIC RTE gets cleared when the cpu does EOI.
With this patch, for level triggered, we eliminate the io-apic RTE modification
(with the updated vector information), by using a virtual vector (io-apic pin
number). Real vector that is used for interrupting cpu will be coming from
the interrupt-remapping table entry. Trigger mode in the IRTE will always be
edge, and the actual level or edge trigger will be setup in the IO-APIC RTE.
So a level triggered interrupt will appear as an edge to the local apic
cpu but still as level to the IO-APIC.
With this change, level irq migration can be done by simply modifying
the interrupt-remapping table entry with out changing the io-apic RTE.
And as the interrupt appears as edge at the cpu, in addition to do the
local apic EOI, we need to do IO-APIC directed EOI to clear the remote
IRR bit in the IO-APIC RTE.
This simplies the irq migration in the presence of interrupt-remapping.
Idea-by: Rajesh Sankaran <rajesh.sankaran@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Suresh Siddha [Tue, 17 Mar 2009 00:04:59 +0000 (17:04 -0700)]
x86, x2apic: use virtual wire A mode in disable_IO_APIC() with interrupt-remapping
Impact: make kexec work with x2apic
disable_IO_APIC() gets called during crashdump aswell, which configures the
IO-APIC/LAPIC so that legacy interrupts can be delivered for the kexec'd kernel.
In the presence of interrupt-remapping, we need to change the
interrupt-remapping configuration aswell as modifying IO-APIC for virtual wire
B mode.
To keep things simple during the crash, use virtual wire A mode
(for which we don't need to touch io-apic and interrupt-remapping tables).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Suresh Siddha [Tue, 17 Mar 2009 00:04:57 +0000 (17:04 -0700)]
x86, dmar: start with sane state while enabling dma and interrupt-remapping
Impact: cleanup/sanitization
Start from a sane state while enabling dma and interrupt-remapping, by
clearing the previous recorded faults and disabling previously
enabled queued invalidation and interrupt-remapping.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Suresh Siddha [Tue, 17 Mar 2009 00:04:53 +0000 (17:04 -0700)]
x86, x2apic: fix lock ordering during IRQ migration
Impact: fix potential deadlock on x2apic
fix "hard-safe -> hard-unsafe lock order detected" with irq_2_ir_lock
On x2apic enabled system:
[ INFO: hard-safe -> hard-unsafe lock order detected ] 2.6.27-03151-g4480f15b #1
------------------------------------------------------
swapper/1 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(irq_2_ir_lock){--..}, at: [<ffffffff8038ebc0>] get_irte+0x2f/0x95
and this task is already holding:
(&irq_desc_lock_class){+...}, at: [<ffffffff802649ed>] setup_irq+0x67/0x281
which would create a new lock dependency:
(&irq_desc_lock_class){+...} -> (irq_2_ir_lock){--..}
but this new dependency connects a hard-irq-safe lock:
(&irq_desc_lock_class){+...}
... which became hard-irq-safe at:
[<ffffffffffffffff>] 0xffffffffffffffff
H. Peter Anvin [Tue, 17 Mar 2009 22:26:06 +0000 (15:26 -0700)]
x86, setup: move 32-bit code to .text32
Impact: cleanup
The setup code is mostly 16-bit code, but there is a small stub of
32-bit code at the end. Move the 32-bit code to a separate segment,
.text32, to avoid scrambling the disassembly.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
x86/brk: put the brk reservations in their own section
Impact: disambiguate real .bss variables from .brk storage
Add a .brk section after the .bss section. This has no effect
on the final vmlinux, but it more clearly distinguishes the space
taken by actual .bss symbols, and the variable space reserved
by .brk users.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
x86/brk: make the brk reservation symbols inaccessible from C
Impact: bulletproofing, clarification
The brk reservation symbols are just there to document the amount
of space reserved by brk users in the final vmlinux file. Their
addresses are irrelevent, and using their addresses will cause
certain havok. Name them ".brk.NAME", which is a valid asm symbol
but C can't reference it; it also highlights their special
role in the symbol table.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
H. Peter Anvin [Tue, 17 Mar 2009 18:38:23 +0000 (11:38 -0700)]
x86-32: tighten the bound on additional memory to map
Impact: Tighten bound to avoid masking errors
The definition of MAPPING_BEYOND_END was excessive; this has a nasty
tendency to mask bugs. We have learned over time that this kind of
bug hiding can cause some very strange errors. Therefore, tighten the
bound to only need to map the actual kernel area.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yinghai@kernel.org>
ALLOCATOR_SLOP is a vestigial remain from when we used the
bootmem allocator to allocate the kernel's linear memory mapping.
Now we directly reserve pages from the e820 mapping, and no
longer require secondary structures to keep track of allocated
pages.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
x86-32: make sure we map enough to fit linear map pagetables
Impact: crash fix
head_32.S needs to map the kernel itself, and enough space so
that mm/init.c can allocate space from the e820 allocator
for the linear map of low memory.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Linus Torvalds [Tue, 17 Mar 2009 17:02:35 +0000 (10:02 -0700)]
Avoid 64-bit "switch()" statements on 32-bit architectures
Commit ee6f779b9e0851e2f7da292a9f58e0095edf615a ("filp->f_pos not
correctly updated in proc_task_readdir") changed the proc code to use
filp->f_pos directly, rather than through a temporary variable. In the
process, that caused the operations to be done on the full 64 bits, even
though the offset is never that big.
That's all fine and dandy per se, but for some unfathomable reason gcc
generates absolutely horrid code when using 64-bit values in switch()
statements. To the point of actually calling out to gcc helper
functions like __cmpdi2 rather than just doing the trivial comparisons
directly the way gcc does for normal compares. At which point we get
link failures, because we really don't want to support that kind of
crazy code.
Fix this by just casting the f_pos value to "unsigned long", which
is plenty big enough for /proc, and avoids the gcc code generation issue.
Masami Hiramatsu [Mon, 16 Mar 2009 22:57:22 +0000 (18:57 -0400)]
prevent boosting kprobes on exception address
Don't boost at the addresses which are listed on exception tables,
because major page fault will occur on those addresses. In that case,
kprobes can not ensure that when instruction buffer can be freed since
some processes will sleep on the buffer.
Linus Torvalds [Tue, 17 Mar 2009 15:13:17 +0000 (08:13 -0700)]
Fast TSC calibration: calculate proper frequency error bounds
In order for ntpd to correctly synchronize the clocks, the frequency of
the system clock must not be off by more than 500 ppm (or, put another
way, 1:2000), or ntpd will end up giving up on trying to synchronize
properly, and ends up reseting the clock in jumps instead.
The fast TSC PIT calibration sometimes failed this test - it was
assuming that the PIT reads always took about one microsecond each (2us
for the two reads to get a 16-bit timer), and that calibrating TSC to
the PIT over 15ms should thus be sufficient to get much closer than
500ppm (max 2us error on both sides giving 4us over 15ms: a 270 ppm
error value).
However, that assumption does not always hold: apparently some hardware
is either very much slower at reading the PIT registers, or there was
other noise causing at least one machine to get 700+ ppm errors.
So instead of using a fixed 15ms timing loop, this changes the fast PIT
calibration to read the TSC delta over the individual PIT timer reads,
and use the result to calculate the error bars on the PIT read timing
properly. We then successfully calibrate the TSC only if the maximum
error bars fall below 500ppm.
In the process, we also relax the timing to allow up to 25ms for the
calibration, although it can happen much faster depending on hardware.
Reported-and-tested-by: Jesper Krogh <jesper@krogh.cc> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 17 Mar 2009 14:58:26 +0000 (07:58 -0700)]
Fix potential fast PIT TSC calibration startup glitch
During bootup, when we reprogram the PIT (programmable interval timer)
to start counting down from 0xffff in order to use it for the fast TSC
calibration, we should also make sure to delay a bit afterwards to allow
the PIT hardware to actually start counting with the new value.
That will happens at the next CLK pulse (1.193182 MHz), so the easiest
way to do that is to just wait at least one microsecond after
programming the new PIT counter value. We do that by just reading the
counter value back once - which will take about 2us on PC hardware.
Reported-and-tested-by: john stultz <johnstul@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86, paravirt: prevent gcc from generating the wrong addressing mode
Impact: fix crash on VMI (VMware)
When we generate a call sequence for calling a paravirtualized
function, we presume that the generated code is "call *0xXXXXX",
which is a 6 byte opcode; this is larger than a normal
direct call, and so we can patch a direct call over it.
At the moment, however we give gcc enough rope to hang us by
putting the address in a register and generating a two byte
indirect-via-register call. Prevent this by explicitly
dereferencing the function pointer and passing it into the
asm as a constant.
This prevents crashes in VMI, as it cannot handle unpatchable
callsites.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com>
LKML-Reference: <49BEEDC2.2070809@goop.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Linus Torvalds [Mon, 16 Mar 2009 19:49:12 +0000 (12:49 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
acpi-wmi: unsigned cannot be less than 0
thinkpad-acpi: fix module autoloading for older models
acer-wmi: Unmark as 'experimental'
acpi-wmi: Unmark as 'experimental'
acer-wmi: double free in acer_rfkill_exit()
platform/x86: depends instead of select for laptop platform drivers
asus-laptop: use select instead of depends on
eeepc-laptop: restore acpi_generate_proc_event()
asus-laptop: restore acpi_generate_proc_event()
acpi: check for pxm_to_node_map overflow
ACPI: remove doubled status checking
ACPI suspend: Blacklist Toshiba Satellite L300 that requires to set SCI_EN directly on resume
Revert "ACPI: make some IO ports off-limits to AML"
suspend: switch the Asus Pundit P1-AH2 to old ACPI sleep ordering
When a table is being replaced, it waits for I/O to complete
before destroying the mempool, but the endio function doesn't
call mempool_free() until after completing the bio.
Fix it by swapping the order of those two operations.
The same problem occurs in dm.c with md referenced after dec_pending.
Again, we swap the order.
Cc: stable@kernel.org Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Huang Ying [Mon, 16 Mar 2009 17:44:33 +0000 (17:44 +0000)]
dm crypt: fix kcryptd_async_done parameter
In the async encryption-complete function (kcryptd_async_done), the
crypto_async_request passed in may be different from the one passed to
crypto_ablkcipher_encrypt/decrypt. Only crypto_async_request->data is
guaranteed to be same as the one passed in. The current
kcryptd_async_done uses the passed-in crypto_async_request directly
which may cause the AES-NI-based AES algorithm implementation to panic.
This patch fixes this bug by only using crypto_async_request->data,
which points to dm_crypt_request, the crypto_async_request passed in.
The original data (convert_context) is gotten from dm_crypt_request.
[mbroz@redhat.com: reworked] Cc: stable@kernel.org Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 16 Mar 2009 17:44:30 +0000 (17:44 +0000)]
dm io: respect BIO_MAX_PAGES limit
dm-io calls bio_get_nr_vecs to get the maximum number of pages to use
for a given device. It allocates one additional bio_vec to use
internally but failed to respect BIO_MAX_PAGES, so fix this.
This was the likely cause of:
https://bugzilla.redhat.com/show_bug.cgi?id=173153
Milan Broz [Mon, 16 Mar 2009 16:56:01 +0000 (16:56 +0000)]
dm ioctl: validate name length when renaming
When renaming a mapped device validate the length of the new name.
The rename ioctl accepted any correctly-terminated string enclosed
within the data passed from userspace. The other ioctls enforce a
size limit of DM_NAME_LEN. If the name is changed and becomes longer
than that, the device can no longer be addressed by name.
Fix it by properly checking for device name length (including
terminating zero).
Cc: stable@kernel.org Signed-off-by: Milan Broz <mbroz@redhat.com> Reviewed-by: Jonathan Brassow <jbrassow@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Rusty Russell [Sun, 15 Mar 2009 22:35:07 +0000 (09:05 +1030)]
linux.conf.au 2009: Tuz
Impact: help prevent extinction of species
The Tasmanian Devil is a shy iconic Australian creature named for its
spine-chilling screech. It is threatened with extinction due to a
scientifically interesting but horrific transmissible facial cancer.
This one is standing in for Tux for one release using the far less-known
Devil Facial Tux Disguise.
Save The Tasmanian Devil http://tassiedevil.com.au
Signed-off-by: Linux.conf.au Hobart Team <contact@marchsouth.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Le [Mon, 16 Mar 2009 06:44:31 +0000 (14:44 +0800)]
filp->f_pos not correctly updated in proc_task_readdir
filp->f_pos only get updated at the end of the function. Thus d_off of those
dirents who are in the middle will be 0, and this will cause a problem in
glibc's readdir implementation, specifically endless loop. Because when overflow
occurs, f_pos will be set to next dirent to read, however it will be 0, unless
the next one is the last one. So it will start over again and again.
There is a sample program in man 2 gendents. This is the output of the program
running on a multithread program's task dir before this patch is applied:
Hidetoshi Seto [Mon, 16 Mar 2009 08:07:33 +0000 (17:07 +0900)]
x86, mce: remove incorrect __cpuinit for intel_init_cmci()
Impact: Bug fix on UP
Referring commit cc3ca22063784076bd240fda87217387a8f2ae92,
Peter removed __cpuinit annotations for mce_cpu_features()
and its successor functions, which caused troubles on UP
configurations.
However the intel_init_cmci() was introduced after that and
it also has __cpuinit annotation even though it is called from
mce_cpu_features(). Remove the annotation from that function
too.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Roel Kluin [Wed, 4 Mar 2009 19:55:30 +0000 (11:55 -0800)]
acpi-wmi: unsigned cannot be less than 0
include/linux/pci-acpi.h:74:
typedef u32 acpi_status;
result is unsigned, so an error returned by acpi_bus_register_driver()
will not be noticed.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
thinkpad-acpi: fix module autoloading for older models
Looking at the source, there seems to be a missing * to match my DMI
string. I mean for newer IBM and Lenovo's laptops you match either one
of the following:
MODULE_ALIAS("dmi:bvnIBM:*:svnIBM:*:pvrThinkPad*:rvnIBM:*");
MODULE_ALIAS("dmi:bvnLENOVO:*:svnLENOVO:*:pvrThinkPad*:rvnLENOVO:*");
While for older Thinkpads, you do this (for instance):
IBM_BIOS_MODULE_ALIAS("1[0,3,6,8,A-G,I,K,M-P,S,T]");
with IBM_BIOS_MODULE_ALIAS being MODULE_ALIAS("dmi:bvnIBM:bvr" __type "ET??WW")
Note there's no * terminating the string. As result, udev doesn't load
anything because modprobe cannot find anything matching this (my
machine actually):
Signed-off-by: Mathieu Chouquet-Stringer <mchouque@free.fr> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Len Brown <len.brown@intel.com>
Dan Carpenter [Sat, 14 Feb 2009 09:53:48 +0000 (09:53 +0000)]
acer-wmi: double free in acer_rfkill_exit()
This is acer_rfkill_exit() from drivers/platform/x86/acer-wmi.c.
The code frees wireless_rfkill->data again instead of
bluetooth_rfkill->data.
This was found using a code checker (http://repo.or.cz/w/smatch.git/).
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk> Signed-off-by: Len Brown <len.brown@intel.com>
Cyrill Gorcunov [Wed, 4 Mar 2009 19:55:29 +0000 (11:55 -0800)]
acpi: check for pxm_to_node_map overflow
It is hardly (if ever) possible but in case of broken _PXM entry we could
reach out of pxm_to_node_map array bounds in acpi_map_pxm_to_node() call.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Jiri Slaby [Wed, 4 Mar 2009 19:55:27 +0000 (11:55 -0800)]
ACPI: remove doubled status checking
There was a misplaced status test (two consequent tests without a
statement in between) in acpi_bus_init for ages. Remove it, since the
function which should be checked (acpi_os_initialize1) has BUG_ONs on
failure paths.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 25 Feb 2009 23:00:18 +0000 (18:00 -0500)]
Revert "ACPI: make some IO ports off-limits to AML"
This reverts commit 5ec5d38a1c8af255ffc481c81eef13e9155524b3.
because it caused spurious dmesg warmings.
We'll implement the check for off-limit ports
in a more clever way in the future.
françois romieu [Sun, 15 Mar 2009 01:10:50 +0000 (01:10 +0000)]
r8169: revert "r8169: read MAC address from EEPROM on init (2nd attempt)"
It fails on the following systems:
- RTL8169sc/8110sc (XID 18000000)
reported by Tim Durack <tdurack@gmail.com> (x86)
- RTL8169sb/8110sb (XID 10000000)
reported by Mikael Pettersson <mikpe@it.uu.se> (ARM)
The patch appeared to work on x86 for the following systems:
RTL8169sb/8110sb 10000000 PCI (EXT)
RTL8110s 04000000 PCI (EXT)
RTL8102e 24a00000 PCI-E (LOM)
RTL8168c/8111c 3c2000c0 PCI-E (LOM)
RTL8168b/8111b 38000000 PCI-E (LOM)
RTL8168b/8111b 38000000 PCI-E (EXT)
The patch exposes two problems:
1) while not completely wrong, mac addresses are not read correctly
from the EEPROM
2) the MAC address registers are not correctly set
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 15 Mar 2009 20:34:56 +0000 (13:34 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm: (23 commits)
[ARM] Fix virtual to physical translation macro corner cases
[ARM] update mach-types
[ARM] 5421/1: ftrace: fix crash due to tracing of __naked functions
MX1 fix include
[ARM] 5419/1: ep93xx: fix build warnings about struct i2c_board_info
[ARM] 5418/1: restore lr before leaving mcount
ARM: OMAP: board-omap3beagle: set i2c-3 to 100kHz
ARM: OMAP: Allow I2C bus driver to be compiled as a module
ARM: OMAP: sched_clock() corrected
ARM: OMAP: Fix compile error if pm.h is included
[ARM] orion5x: pass dram mbus data to xor driver
[ARM] S3C64XX: Fix s3c64xx_setrate_clksrc
[ARM] S3C64XX: sparse warnings in arch/arm/plat-s3c64xx/irq.c
[ARM] S3C64XX: sparse warnings in arch/arm/plat-s3c64xx/s3c6400-clock.c
[ARM] S3C64XX: Fix USB host clock mux list
[ARM] S3C64XX: Fix name of USB host clock.
[ARM] S3C64XX: Rename IRQ_UHOST to IRQ_USBH
[ARM] S3C64XX: Do gpiolib configuration earlier
[ARM] S3C64XX: Staticise s3c64xx_init_irq_eint()
[ARM] SMDK6410: Declare iodesc table static
...
Alexander Duyck [Sun, 15 Mar 2009 05:26:40 +0000 (22:26 -0700)]
igb: remove ASPM L0s workaround
The L0s workaround should be moved into a pci quirk and so it is not
necessary in the driver. This update removes the L0s workaround from the
igb driver.
This was the second half of the PCI quirk patch that Matthew Wilcox did
not pick up when he picked up the quirk patch.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yinghai Lu [Mon, 9 Mar 2009 08:15:57 +0000 (01:15 -0700)]
x86: put initial_pg_tables into .bss
Impact: makes vmlinux section information more useful
Don't use ram after _end blindly for pagetables. aka init pages is before _end
put those pg table into .bss
[Adapted to use brk segment - Jeremy]
v2: keep initial page table up to 512M only.
v4: put initial page tables just before _end
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add RESERVE_BRK(name, size) macro to reserve space in the brk
area. This should be a conservative (ie, larger) estimate of
how much space might possibly be required from the brk area.
Any unused space will be freed, so there's no real downside
on making the reservation too large (within limits).
The name should be unique within a given file, and somewhat
descriptive.
The C definition of RESERVE_BRK() ends up being more complex than
one would expect to work around a cluster of gcc infelicities:
The first attempt was to simply try putting __section(.brk_reservation)
on a variable. This doesn't work because it ends up making it a
@progbits section, which gets actual space allocated in the vmlinux
executable.
The second attempt was to emit the space into a section using asm,
but gcc doesn't allow arguments to be passed to file-level asm()
statements, making it hard to pass in the size.
The final attempt is to wrap the asm() in a function to allow
it to have arguments, and put the function itself into the
.discard section, which vmlinux*.lds drops entirely from the
emitted vmlinux.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Thu, 12 Mar 2009 23:04:42 +0000 (16:04 -0700)]
x86-32: compute initial mapping size more accurately
Impact: simplification
We only need to map the kernel in head_32.S, not the whole of
lowmem. We use 512MB as a reasonable (but arbitrary) limit on
the maximum size of the kernel image.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
x86-32: use brk segment for allocating initial kernel pagetable
Impact: use new interface instead of previous ad hoc implementation
Rather than having special purpose init_pg_table_start/end variables
to delimit the kernel pagetable built by head_32.S, just use the brk
mechanism to extend the bss for the new pagetable.
This patch removes init_pg_table_start/end and pg0, defines __brk_base
(which is page-aligned and immediately follows _end), initializes
the brk region to start there, and uses it for the 32-bit pagetable.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
H. Peter Anvin [Sun, 15 Mar 2009 00:19:51 +0000 (17:19 -0700)]
x86: move brk initialization out of #ifdef CONFIG_BLK_DEV_INITRD
Impact: build fix
The brk initialization functions were incorrectly located inside
an #ifdef CONFIG_VLK_DEV_INITRD block, causing the obvious build failure in
minimal configurations.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
x86: add brk allocation for very, very early allocations
Impact: new interface
Add a brk()-like allocator which effectively extends the bss in order
to allow very early code to do dynamic allocations. This is better than
using statically allocated arrays for data in subsystems which may never
get used.
The space for brk allocations is in the bss ELF segment, so that the
space is mapped properly by the code which maps the kernel, and so
that bootloaders keep the space free rather than putting a ramdisk or
something into it.
The bss itself, delimited by __bss_stop, ends before the brk area
(__brk_base to __brk_limit). The kernel text, data and bss is reserved
up to __bss_stop.
Any brk-allocated data is reserved separately just before the kernel
pagetable is built, as that code allocates from unreserved spaces
in the e820 map, potentially allocating from any unused brk memory.
Ultimately any unused memory in the brk area is used in the general
kernel memory pool.
Initially the brk space is set to 1MB, which is probably much larger
than any user needs (the largest current user is i386 head_32.S's code
to build the pagetables to map the kernel, which can get fairly large
with a big kernel image and no PSE support). So long as the system
has sufficient memory for the bootloader to reserve the kernel+1MB brk,
there are no bad effects resulting from an over-large brk.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
x86: make section delimiter symbols part of their section
Impact: cleanup
Move the symbols delimiting a section part of the section
(section relative) rather than absolute. This avoids any
unexpected gaps between the section-start symbol and the first
data in the section, which could be caused by implicit
alignment of the section data. It also makes the general
form of vmlinux_64.lds.S consistent with vmlinux_32.lds.S.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Grant Likely [Mon, 9 Mar 2009 12:42:24 +0000 (13:42 +0100)]
Fix Xilinx SystemACE driver to handle empty CF slot
The SystemACE driver does not handle an empty CF slot gracefully. An
empty CF slot ends up hanging the system. This patch adds a check for
the CF state and stops trying to process requests if the slot is empty.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
un'ichi Nomura [Mon, 9 Mar 2009 09:40:52 +0000 (10:40 +0100)]
block: Add gfp_mask parameter to bio_integrity_clone()
Stricter gfp_mask might be required for clone allocation.
For example, request-based dm may clone bio in interrupt context
so it has to use GFP_ATOMIC.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Linus Torvalds [Sat, 14 Mar 2009 19:02:21 +0000 (12:02 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: Mark Eins: Fix configuration.
MIPS: Fix TIF_32BIT undefined problem when seccomp is disabled
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (31 commits)
[SCSI] qla2xxx: Update version number to 8.03.00-k4.
[SCSI] qla2xxx: Correct overwrite of pre-assigned init-control-block structure size.
[SCSI] qla2xxx: Correct truncation in return-code status checking.
[SCSI] qla2xxx: Correct vport delete bug.
[SCSI] qla2xxx: Use correct value for max vport in LOOP topology.
[SCSI] qla2xxx: Correct address range checking for option-rom updates.
[SCSI] fcoe: Change fcoe receive thread nice value from 19 (lowest priority) to -20
[SCSI] fcoe: fix handling of pending queue, prevent out of order frames (v3)
[SCSI] fcoe: Out of order tx frames was causing several check condition SCSI status
[SCSI] fcoe: fix kfree(skb)
[SCSI] fcoe: ETH_P_8021Q is already in if_ether and fcoe is not using it anyway
[SCSI] libfc: do not change the fh_rx_id of a recevied frame
[SCSI] fcoe: Correct fcoe_transports initialization vs. registration
[SCSI] fcoe: Use setup_timer() and mod_timer()
[SCSI] libfc, fcoe: Remove unnecessary cast by removing inline wrapper
[SCSI] libfc, fcoe: Cleanup function formatting and minor typos
[SCSI] libfc, fcoe: Fix kerneldoc comments
[SCSI] libfc: Cleanup libfc_function_template comments
[SCSI] libfc: check for err when recv and state is incorrect
[SCSI] libfc: rename rp to rdata in fc_disc_new_target()
...
Linus Torvalds [Sat, 14 Mar 2009 19:00:42 +0000 (12:00 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ata_piix: add workaround for Samsung DB-P70
libata: Keep shadow last_ctl up to date during resets
sata_mv: fix MSI irq race condition
Linus Torvalds [Sat, 14 Mar 2009 19:00:18 +0000 (12:00 -0700)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix the fix to Bugzilla #11061, when IPv6 isn't defined...
SUNRPC: xprt_connect() don't abort the task if the transport isn't bound
SUNRPC: Fix an Oops due to socket not set up yet...
Bug 11061, NFS mounts dropped
NFS: Handle -ESTALE error in access()
NLM: Fix GRANT callback address comparison when IPv6 is enabled
NLM: Shrink the IPv4-only version of nlm_cmp_addr()
NFSv3: Fix posix ACL code
NFS: Fix misparsing of nfsv4 fs_locations attribute (take 2)
SUNRPC: Tighten up the task locking rules in __rpc_execute()
Linus Torvalds [Sat, 14 Mar 2009 18:59:22 +0000 (11:59 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
ocfs2: Use xs->bucket to set xattr value outside
ocfs2: Fix a bug found by sparse check.
ocfs2: tweak to get the maximum inline data size with xattr
ocfs2: reserve xattr block for new directory with inline data
Linus Torvalds [Sat, 14 Mar 2009 18:59:05 +0000 (11:59 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
V4L/DVB (10978): Report tuning algorith correctly
V4L/DVB (10977): STB6100 init fix, the call to stb6100_set_bandwidth needs an argument
V4L/DVB (10976): Bug fix: For legacy applications stv0899 performs search only first time after insmod.
V4L/DVB (10975): Bug: Use signed types, Offsets and range can be negative
V4L/DVB (10974): Use Diseqc 3/3 mode to send data
V4L/DVB (10972): zl10353: i2c_gate_ctrl bug fix
V4L/DVB (10834): zoran: auto-select bt866 for AverMedia 6 Eyes
V4L/DVB (10832): tvaudio: Avoid breakage with tda9874a
V4L/DVB (10789): m5602-s5k4aa: Split up the initial sensor probe in chunks.
Tyler Hicks [Fri, 13 Mar 2009 20:51:59 +0000 (13:51 -0700)]
eCryptfs: don't encrypt file key with filename key
eCryptfs has file encryption keys (FEK), file encryption key encryption
keys (FEKEK), and filename encryption keys (FNEK). The per-file FEK is
encrypted with one or more FEKEKs and stored in the header of the
encrypted file. I noticed that the FEK is also being encrypted by the
FNEK. This is a problem if a user wants to use a different FNEK than
their FEKEK, as their file contents will still be accessible with the
FNEK.
This is a minimalistic patch which prevents the FNEKs signatures from
being copied to the inode signatures list. Ultimately, it keeps the FEK
from being encrypted with a FNEK.
Johannes Weiner [Fri, 13 Mar 2009 20:51:58 +0000 (13:51 -0700)]
nommu: ramfs: don't leak pages when adding to page cache fails
When a ramfs nommu mapping is expanded, contiguous pages are allocated
and added to the pagecache. The caller's reference is then passed on
by moving whole pagevecs to the file lru list.
If the page cache adding fails, make sure that the error path also
moves the pagevec contents which might still contain up to PAGEVEC_SIZE
successfully added pages, of which we would leak references otherwise.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Howells <dhowells@redhat.com> Cc: Enrik Berkhan <Enrik.Berkhan@ge.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enrik Berkhan [Fri, 13 Mar 2009 20:51:56 +0000 (13:51 -0700)]
nommu: ramfs: pages allocated to an inode's pagecache may get wrongly discarded
The pages attached to a ramfs inode's pagecache by truncation from nothing
- as done by SYSV SHM for example - may get discarded under memory
pressure.
The problem is that the pages are not marked dirty. Anything that creates
data in an MMU-based ramfs will cause the pages holding that data will
cause the set_page_dirty() aop to be called.
For the NOMMU-based mmap, set_page_dirty() may be called by write(), but
it won't be called by page-writing faults on writable mmaps, and it isn't
called by ramfs_nommu_expand_for_mapping() when a file is being truncated
from nothing to allocate a contiguous run.
The solution is to mark the pages dirty at the point of allocation by the
truncation code.
Signed-off-by: Enrik Berkhan <Enrik.Berkhan@ge.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove flash size check which made sense only for ancient
boards with 1MB flash. The check is based on values read
from specific locations and fails with firmware size changes.
This prevents driver from getting right mac addresses.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* the 64bit variant now also initializes the padlock unit.
* ->c_early_init() is executed again from ->c_init()
* the 64bit fixups made into 32bit path.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: herbert@gondor.apana.org.au
LKML-Reference: <1237029843-28076-2-git-send-email-sebastian@breakpoint.cc> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Fri, 13 Mar 2009 19:46:07 +0000 (12:46 -0700)]
x86: fix get_mtrr() warning about smp_processor_id() with CONFIG_PREEMPT=y
Impact: fix debug warning
Jaswinder noticed that there is a warning about smp_processor_id()
in get_mtrr().
Fix it by wrapping the printout into a get/put_cpu() pair.
Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49BAB7FF.4030107@kernel.org>
[ changed to get/put_cpu(), cleaned up surrounding code a it. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>