Stefan Schmidt [Thu, 10 Jul 2008 13:32:54 +0000 (14:32 +0100)]
[ARM] 5169/1: Defconfig for the EZX machines
This defconfig enables all currently available features. It also builds one
zImage which runs on all machines.
Signed-off-by: Antonio Ospite <ao2@openezx.org> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Stefan Schmidt [Wed, 9 Jul 2008 07:08:17 +0000 (08:08 +0100)]
[ARM] 5162/1: Common code for the Motorola EZX GSM phones
Common code for the different EZX GSM phones. Functions to control framebuffer,
backlight power, OHCI and UART init.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Daniel Ribeiro <drwyrm@gmail.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Stefan Schmidt [Wed, 9 Jul 2008 07:08:50 +0000 (08:08 +0100)]
[ARM] 5161/1: Maintainer entries for the Motorola EZX GSM mobile phones
Maintainer entries for the Motorola EZX GSM mobile phones.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Daniel Ribeiro <drwyrm@gmail.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Marek Vašut [Mon, 7 Jul 2008 16:31:58 +0000 (17:31 +0100)]
[ARM] 5155/1: PalmTX battery monitor
This patch adds battery monitoring driver for PalmTX.
It can read voltage from the battery and temperature.
It also monitors charging/discharging status.
Signed-off-by: Marek Vasut <marek.vasut@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Marek Vašut [Mon, 7 Jul 2008 16:25:46 +0000 (17:25 +0100)]
[ARM] 5153/1: Add support for PalmTX handheld computer
PalmTX is PXA27x based device with wifi, bluetooth,
touchscreen, sdio slot, irda, keypad, nand flash,
pxa framebuffer, serial and usb gadget interface.
Supported by this patch is pxafb, touchscreen, irda,
keypad and sdio slot.
Signed-off-by: Marek Vasut <marek.vasut@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Philipp Zabel [Mon, 30 Jun 2008 17:11:55 +0000 (18:11 +0100)]
[ARM] 5138/1: magician: set pwm-backlight .id = -1
There will always be only one pwm-backlight on this device.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Philipp Zabel [Mon, 30 Jun 2008 17:11:35 +0000 (18:11 +0100)]
[ARM] 5137/1: magician: MACH_MAGICIAN doesn't need to depend on ARCH_PXA
It is only defined inside an "if ARCH_PXA ... endif" block, so the
depends on is not needed.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>? Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Philipp Zabel [Thu, 26 Jun 2008 20:04:31 +0000 (21:04 +0100)]
[ARM] 5126/1: magician: remove superfluous mtd includes
These were only needed for hardcoded flash partition tables, which were
never submitted. It is better to have the bootloader pass the partition
table to the kernel instead.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Philipp Zabel [Thu, 26 Jun 2008 20:03:54 +0000 (21:03 +0100)]
[ARM] 5125/1: magician: move gpio pin configuration into __initdata section
The pin configuration array is only used during board init.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5095/2: pcm990: switch from pxa_gpio_mode to pxa2xx_mfp_config
pxa_gpio_mode() is deprecated, use the new pxa2xx_mfp_config() function to
configure GPIOs in pcm990 platform code. Convert "array, ARRAY_SIZE(array)"
to "ARRAY_AND_SIZE(array)" while at it.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5143/1: pxa: further cleanup PXA Kconfig by removing one
unnecessary menu level
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5142/1: pxa: move move zaurus declarations to proper place
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5088/3: pxa2xx: add pxa2xx_set_spi_info to register pxa2xx-spi platform devices
Add a function to dynamically allocate and register pxa2xx-spi platform
devices, to be used by PXA2xx and PXA3xx based systems. Switch pcm027 and
lubbock to use it.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Mark Brown [Tue, 10 Jun 2008 09:48:25 +0000 (10:48 +0100)]
[ARM] 5084/1: zylonite: Register AC97 device
The Zylonite has an AC97 subsystem on it so register the AC97 controller
device.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Mark Brown [Tue, 10 Jun 2008 11:30:05 +0000 (12:30 +0100)]
[ARM] 5085/2: PXA: Move AC97 over to the new central device declaration model
As well as moving all the device declarations to a single one in devices.c
this causes all platforms to register the I/O and interrupt resources for
the AC97 controller.
Cc: eric miao <eric.miao@marvell.com> Cc: Mike Rapoport <mike@compulab.co.il> Cc: Lennert Buytenhek <buytenh@wantstofly.org> Cc: Jürgen Schindele <linux@schindele.name> Cc: Juergen Beisert <jbe@pengutronix.de> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Philipp Zabel [Sun, 22 Jun 2008 22:36:39 +0000 (23:36 +0100)]
[ARM] 5120/1: pxa: correct platform driver names for PXA25x and PXA27x UDC drivers
The pxa2xx_udc.c driver is renamed to pxa25x_udc.c (the platform
driver name changes from pxa2xx-udc to pxa25x-udc) and the
platform driver name of pxa27x_udc.c is fixed to pxa27x-udc.
pxa_device_udc in devices.c is split into pxa25x and pxa27x flavors
and the pxa27x_device_udc is enabled in pxa27x.c.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Acked-by: Nicolas Pitre <nico@cam.org> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Including from Ian Molton:
Fixes for mistakes left over from the PXA2{5,7}X UDC split.
Signed-off-by: Ian Molton <spyro@f2s.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 30 Jun 2008 18:47:59 +0000 (19:47 +0100)]
[ARM] pxa: allow clk aliases
We need to support more than one name+device for a struct clk for a
small number of peripherals. We do this by re-using struct clk alias
to another struct clk - IOW, if we find that the entry we're using is
an alias, we return the aliased entry not the one we found.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Stefan Schmidt [Fri, 6 Jun 2008 09:12:37 +0000 (10:12 +0100)]
[ARM] 5079/1: Warn people when using pxa2xx-gpio.h
Warn people when using pxa2xx-gpio.h as it is only here for backwards
compatibility. The new mfp-pxa2[57]x.h and the relevant API should be used
instead.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Philipp Zabel [Mon, 30 Jun 2008 17:08:11 +0000 (18:08 +0100)]
[ARM] 5135/1: pxa: drop superfluous asm/arch/pxa2xx-gpio.h includes
Both i2c-pxa.c and irq.c still include pxa2xx-gpio.h although is is not
needed anymore.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Jaya Kumar [Sun, 22 Jun 2008 03:27:28 +0000 (04:27 +0100)]
[ARM] 5118/1: pxafb: add exit and remove handlers
This patch adds exit and remove handlers to pxafb so that it can be loaded
and unloaded as a module.
Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com> Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
eric miao [Mon, 26 May 2008 02:28:09 +0000 (03:28 +0100)]
[ARM] 5063/1: pxa: add clk support for pxa2xx I2S
Signed-off-by: Eric Miao <eric.miao@marvell.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5094/1: pcm990: Add framebuffer and backlight support
PCM990 boards can be assembled with either a Sharp STN or a NEC TFT LCD. This
patch adds support for these displays and for the backlight, using the pwm_bl
driver.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Philipp Zabel [Thu, 22 May 2008 13:20:01 +0000 (14:20 +0100)]
[ARM] 5045/1: magician: use the pwm_bl driver for the LCD backlight
magician has a GPIO that modifies the brightness level additionally to
the PWM duty value. This patch makes use of the pwm_bl notify callback
to present userspace with a single brightness scale.
This gets rid of the pxa_set_cken calls and direct PWM register access.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Robert Jarzmik [Tue, 10 Jun 2008 22:02:31 +0000 (23:02 +0100)]
[ARM] 5087/1: Get the PWM layer to handle clock enable/disable properly.
Allow pwm_enable()/pwm_disable() to be called as many times
as the driver wants (and not even count them).
The PWM model is different from things like the clock API
where we need enable counting, because PWMs have one
exclusive user per PWM whereas the clock API can have
multiple users of the same clock.
Acked-by: eric miao <eric.miao@marvell.com> Signed-off-by: Robert Jarzmik <rjarzmik@free.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 5078/1: pxa-pwm: Add missing MODULE_LICENSE to be able to build the driver
as a module
Without a GPL-compatible license this driver cannot be built as a module,
because the platform_driver_* API is only exported to GPL modules.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Philipp Zabel [Mon, 30 Jun 2008 17:09:03 +0000 (18:09 +0100)]
[ARM] 5136/1: pxa: fix PWM device order for pxa27x
Currently PWM0/2 (pxa27x_device_pwm0 at 0x40b00000 and 0x40b00010
are registered as as pwm_id 0 and 1, PWM1/3 (pxa27x_device_pwm1 at
0x40c00000 and 0x40c00010) are registered as pwm_id 2 and 3.
This patch corrects the pwm_ids to match the documented register names.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Wed, 25 Jun 2008 01:12:33 +0000 (18:12 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()
[IA64] Handle count==0 in sn2_ptc_proc_write()
[IA64] Fix boot failure on ia64/sn2
Linus Torvalds [Wed, 25 Jun 2008 01:09:06 +0000 (18:09 -0700)]
Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: Remove now unused structs from kvm_para.h
x86: KVM guest: Use the paravirt clocksource structs and functions
KVM: Make kvm host use the paravirt clocksource structs
x86: Make xen use the paravirt clocksource structs and functions
x86: Add structs and functions for paravirt clocksource
KVM: VMX: Fix host msr corruption with preemption enabled
KVM: ioapic: fix lost interrupt when changing a device's irq
KVM: MMU: Fix oops on guest userspace access to guest pagetable
KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
KVM: close timer injection race window in __vcpu_run
KVM: Fix race between timer migration and vcpu migration
Jie Luo [Tue, 24 Jun 2008 17:38:31 +0000 (10:38 -0700)]
enable bus mastering on i915 at resume time
On 9xx chips, bus mastering needs to be enabled at resume time for much of the
chip to function. With this patch, vblank interrupts will work as expected
on resume, along with other chip functions. Fixes kernel bugzilla #10844.
Signed-off-by: Jie Luo <clotho67@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerd Hoffmann [Tue, 3 Jun 2008 14:17:32 +0000 (16:17 +0200)]
x86: KVM guest: Use the paravirt clocksource structs and functions
This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.
The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup. kvmclock must take that in account when registering the
physical address within the host.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Gerd Hoffmann [Tue, 3 Jun 2008 14:17:29 +0000 (16:17 +0200)]
x86: Add structs and functions for paravirt clocksource
This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).
It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code. They are enabled using
CONFIG_PARAVIRT_CLOCK.
Subsequent patches of this series will put the code in use.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch changes the computation for zero_metapath_length(), which it
renames to metapath_branch_start(). When you are extending the metadata
tree, The indirect blocks that point to the new data block must either
diverge from the existing tree either at the inode, or at the first
indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for
509 pointers, so when the tree is grown, there is some free space in the
first indirect block. What metapath_branch_start() now computes is the
height where the first indirect block for the new data block is located.
It can either be 1 (if the indirect block diverges from the inode) or 2
(if it diverges from the first indirect block).
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Julia Lawall [Tue, 24 Jun 2008 08:22:05 +0000 (10:22 +0200)]
[IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()
As noted by Akinobu Mita alloc_bootmem and related functions never return
NULL and always return a zeroed region of memory. Thus a NULL test or
memset after calls to these functions is unnecessary.
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Tony Luck <tony.luck@intel.com>
Cliff Wickman [Tue, 24 Jun 2008 17:20:06 +0000 (10:20 -0700)]
[IA64] Handle count==0 in sn2_ptc_proc_write()
The fix applied in e0c6d97c65e0784aade7e97b9411f245a6c543e7
"security hole in sn2_ptc_proc_write" didn't take into account
the case where count==0 (which results in a buffer underrun
when adding the trailing '\0'). Thanks to Andi Kleen for
pointing this out.
Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used. xen-unstable has now officially dropped
non-PAE support. Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avi Kivity [Tue, 24 Jun 2008 08:48:49 +0000 (11:48 +0300)]
KVM: VMX: Fix host msr corruption with preemption enabled
Switching msrs can occur either synchronously as a result of calls to
the msr management functions (usually in response to the guest touching
virtualized msrs), or asynchronously when preempting a kvm thread that has
guest state loaded. If we're unlucky enough to have the two at the same
time, host msrs are corrupted and the machine goes kaput on the next syscall.
Most easily triggered by Windows Server 2008, as it does a lot of msr
switching during bootup.
Avi Kivity [Tue, 17 Jun 2008 22:36:36 +0000 (15:36 -0700)]
KVM: ioapic: fix lost interrupt when changing a device's irq
The ioapic acknowledge path translates interrupt vectors to irqs. It
currently uses a first match algorithm, stopping when it finds the first
redirection table entry containing the vector. That fails however if the
guest changes the irq to a different line, leaving the old redirection table
entry in place (though masked). Result is interrupts not making it to the
guest.
Fix by always scanning the entire redirection table.
Avi Kivity [Thu, 12 Jun 2008 13:54:41 +0000 (16:54 +0300)]
KVM: MMU: Fix oops on guest userspace access to guest pagetable
KVM has a heuristic to unshadow guest pagetables when userspace accesses
them, on the assumption that most guests do not allow userspace to access
pagetables directly. Unfortunately, in addition to unshadowing the pagetables,
it also oopses.
This never triggers on ordinary guests since sane OSes will clear the
pagetables before assigning them to userspace, which will trigger the flood
heuristic, unshadowing the pagetables before the first userspace access. One
particular guest, though (Xenner) will run the kernel in userspace, triggering
the oops. Since the heuristic is incorrect in this case, we can simply
remove it.
Marcelo Tosatti [Wed, 11 Jun 2008 23:32:40 +0000 (20:32 -0300)]
KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed
guests properly. It will instantiate two 2MB sptes pointing to the same
physical 2MB page when a guest large pte update is trapped.
Instead of duplicating code to handle this, disallow directory level
updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes
emulating one guest 4MB pte can be correctly created by the page fault
handling path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
rmap_next() does not work correctly after rmap_remove(), as it expects
the rmap chains not to change during iteration. Fix (for now) by restarting
iteration from the beginning.
Marcelo Tosatti [Fri, 6 Jun 2008 19:37:36 +0000 (16:37 -0300)]
KVM: close timer injection race window in __vcpu_run
If a timer fires after kvm_inject_pending_timer_irqs() but before
local_irq_disable() the code will enter guest mode and only inject such
timer interrupt the next time an unrelated event causes an exit.
It would be simpler if the timer->pending irq conversion could be done
with IRQ's disabled, so that the above problem cannot happen.
For now introduce a new vcpu requests bit to cancel guest entry.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Fri, 6 Jun 2008 19:37:35 +0000 (16:37 -0300)]
KVM: Fix race between timer migration and vcpu migration
A guest vcpu instance can be scheduled to a different physical CPU
between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable().
If that happens, the timer will only be migrated to the current pCPU on
the next exit, meaning that guest LAPIC timer event can be delayed until
a host interrupt is triggered.
Fix it by cancelling guest entry if any vcpu request is pending. This
has the side effect of nicely consolidating vcpu->requests checks.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Linus Torvalds [Mon, 23 Jun 2008 23:25:11 +0000 (16:25 -0700)]
Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: nfs_updatepage(): don't mark page as dirty if an error occurred
NFS: Fix filehandle size comparisons in the mount code
NFS: Reduce the NFS mount code stack usage.
Linus Torvalds [Mon, 23 Jun 2008 19:48:50 +0000 (12:48 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: refactor wait_for_completion_timeout()
sched: fix wait_for_completion_timeout() spurious failure under heavy load
sched: rt: dont stop the period timer when there are tasks wanting to run
Linus Torvalds [Mon, 23 Jun 2008 19:48:17 +0000 (12:48 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
xen: don't drop NX bit
xen: mask unwanted pte bits in __supported_pte_mask
xen: Use wmb instead of rmb in xen_evtchn_do_upcall().
x86: fix NULL pointer deref in __switch_to
Nick Piggin [Mon, 23 Jun 2008 12:30:30 +0000 (14:30 +0200)]
mm: fix race in COW logic
There is a race in the COW logic. It contains a shortcut to avoid the
COW and reuse the page if we have the sole reference on the page,
however it is possible to have two racing do_wp_page()ers with one
causing the other to mistakenly believe it is safe to take the shortcut
when it is not. This could lead to data corruption.
Process 1 and process2 each have a wp pte of the same anon page (ie.
one forked the other). The page's mapcount is 2. Then they both
attempt to write to it around the same time...
write private key into page
read from page
ptep_clear_flush()
set_pte_at(pte of new_page)
Fix this by moving the page_remove_rmap of the old page after the pte
clear and flush. Potentially the entire branch could be moved down
here, but in order to stay consistent, I won't (should probably move all
the *_mm_counter stuff with one patch).
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"This broke vmware 6.0.4.
Jun 22 14:53:03.845: vmx| NOT_IMPLEMENTED
/build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774"
and the reason seems to be that there's an old bug in how we handle do
FOLL_ANON on VM_SHARED areas in get_user_pages(), but since it only
triggered if the whole page table was missing, nobody had apparently hit
it before.
The recent changes to 'follow_page()' made the FOLL_ANON logic trigger
not just for whole missing page tables, but for individual pages as
well, and exposed this problem.
This fixes it by making the test for when FOLL_ANON is used more
careful, and also makes the code easier to read and understand by moving
the logic to a separate inline function.
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eli Cohen [Mon, 23 Jun 2008 16:29:58 +0000 (09:29 -0700)]
IB/mthca: Clear ICM pages before handing to FW
Current memfree FW has a bug which in some cases, assumes that ICM
pages passed to it are cleared. This patch uses __GFP_ZERO to
allocate all ICM pages passed to the FW. Once firmware with a fix is
released, we can make the workaround conditional on firmware version.
This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here:
http://lists.openfabrics.org/pipermail/general/2008-May/050026.html
Cc: <stable@kernel.org> Signed-off-by: Eli Cohen <eli@mellanox.co.il>
[ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing
ICM memory and memset()ing it to 0. - Roland ]
Thomas Gleixner [Mon, 23 Jun 2008 09:21:58 +0000 (11:21 +0200)]
futexes: fix fault handling in futex_lock_pi
This patch addresses a very sporadic pi-futex related failure in
highly threaded java apps on large SMP systems.
David Holmes reported that the pi_state consistency check in
lookup_pi_state triggered with his test application. This means that
the kernel internal pi_state and the user space futex variable are out
of sync. First we assumed that this is a user space data corruption,
but deeper investigation revieled that the problem happend because the
pi-futex code is not handling a fault in the futex_lock_pi path when
the user space variable needs to be fixed up.
The fault happens when a fork mapped the anon memory which contains
the futex readonly for COW or the page got swapped out exactly between
the unlock of the futex and the return of either the new futex owner
or the task which was the expected owner but failed to acquire the
kernel internal rtmutex. The current futex_lock_pi() code drops out
with an inconsistent in case it faults and returns -EFAULT to user
space. User space has no way to fixup that state.
When we wrote this code we thought that we could not drop the hash
bucket lock at this point to handle the fault.
After analysing the code again it turned out to be wrong because there
are only two tasks involved which might modify the pi_state and the
user space variable:
- the task which acquired the rtmutex
- the pending owner of the pi_state which did not get the rtmutex
Both tasks drop into the fixup_pi_state() function before returning to
user space. The first task which acquired the hash bucket lock faults
in the fixup of the user space variable, drops the spinlock and calls
futex_handle_fault() to fault in the page. Now the second task could
acquire the hash bucket lock and tries to fixup the user space
variable as well. It either faults as well or it succeeds because the
first task already faulted the page in.
One caveat is to avoid a double fixup. After returning from the fault
handling we reacquire the hash bucket lock and check whether the
pi_state owner has been modified already.
Reported-by: David Holmes <david.holmes@sun.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Holmes <david.holmes@sun.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel/futex.c | 93 ++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 73 insertions(+), 20 deletions(-)
Linus Torvalds [Sun, 22 Jun 2008 19:23:15 +0000 (12:23 -0700)]
Fix performance regression on lmbench select benchmark
Christian Borntraeger reported that reinstating cond_resched() with
CONFIG_PREEMPT caused a performance regression on lmbench:
For example select file 500:
23 microseconds
32 microseconds
and that's really because we totally unnecessarily do the cond_resched()
in the innermost loop of select(), which is just silly.
This moves it out from the innermost loop (which only ever loops ove the
bits in a single "unsigned long" anyway), which makes the performance
regression go away.
Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The zonelist patches caused the loop that checks for available
objects in permitted zones to not terminate immediately. One object
per zone per allocation may be allocated and then abandoned.
Break the loop when we have successfully allocated one object.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netns: Don't receive new packets in a dead network namespace.
sctp: Make sure N * sizeof(union sctp_addr) does not overflow.
pppoe: warning fix
ipv6: Drop packets for loopback address from outside of the box.
ipv6: Remove options header when setsockopt's optlen is 0
mac80211: detect driver tx bugs