Linus Torvalds [Tue, 9 Jan 2007 17:35:16 +0000 (09:35 -0800)]
Merge branch 'merge' of master.kernel.org:/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of master.kernel.org:/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Fix bugs in the hypervisor call stats code
[POWERPC] Fix corruption in hcall9
[POWERPC] iSeries: fix setup initcall
[POWERPC] iSeries: fix viopath initialisation
[POWERPC] iSeries: fix lpevents initialisation
[POWERPC] iSeries: fix proc/iSeries initialisation
[POWERPC] iSeries: fix mf proc initialisation
[POWERPC] disable PReP and EFIKA during make oldconfig
[POWERPC] Fix mpc52xx serial driver to work for arch/ppc again
[POWERPC] Don't include powerpc/sysdev/rom.o for arch/ppc builds
[POWERPC] Fix mpc52xx fdt to use correct device_type for sound devices
[POWERPC] 52xx: Don't use device_initcall to probe of_platform_bus
[POWERPC] Add legacy iSeries to ppc64_defconfig
[POWERPC] Update ppc64_defconfig
[POWERPC] Fix manual assembly WARN_ON() in enter_rtas().
[POWERPC] Avoid calling get_irq_server() with a real, not virtual irq.
[POWERPC] Fix unbalanced uses of of_node_put
[POWERPC] Fix bogus BUG_ON() in in hugetlb_get_unmapped_area()
Linus Torvalds [Tue, 9 Jan 2007 17:34:20 +0000 (09:34 -0800)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] locking problem with __cpcmd.
[S390] don't call handle_mm_fault() if in an atomic context.
[S390] Fix vmalloc area size calculation.
[S390] Fix cpu hotplug (missing 'online' attribute).
[S390] cio: use barrier() in stsch_reset.
[S390] memory detection misses 128k.
Mark M. Hoffman [Tue, 9 Jan 2007 03:11:29 +0000 (22:11 -0500)]
[PATCH] i2c/pci: fix sis96x smbus quirk once and for all
The sis96x SMBus PCI device depends on two different quirks to run
in a specific order. Apart from being fragile, this was found to
actually break on (at least) recent FC4, FC5, and FC6 kernels. This
patch fixes the quirks so that they work without relying on the
compiler and/or linker to put them in any specific order.
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Adrian Bunk <bunk@stusta.de> Cc: Greg K-H <greg@kroah.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changeset 740b5706b9c4b3767f597b3ea76654c6f2a800b2 moved the protecting
spinlock from __cpcmd to cpcmd. Therefore vmcp can no longer use __cpcmd,
instead we have to use cpcmd.
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 9 Jan 2007 09:18:50 +0000 (10:18 +0100)]
[S390] don't call handle_mm_fault() if in an atomic context.
There are several places in the futex code where a spin_lock is held
and still uaccesses happen. Deadlocks are avoided by increasing the
preempt count. The pagefault handler will then not take any locks
but will immediately search the fixup tables.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 9 Jan 2007 09:18:47 +0000 (10:18 +0100)]
[S390] Fix vmalloc area size calculation.
setup_memory_end() uses VMALLOC_END instead of VMALLOC_END_INIT to
calculate the maximum supported size of physical memory. Since
VMALLOC_END is zero, this will cause a crash on 31 bit systems.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 9 Jan 2007 09:18:44 +0000 (10:18 +0100)]
[S390] Fix cpu hotplug (missing 'online' attribute).
72486f1f8f0a2bc828b9d30cf4690cf2dd6807fc inverts the logic if an
'online' attribute in /sys/devices/system/cpu/cpuX should appear.
So we end up with no hotpluggable cpus at all...
Set the hotpluggable value to one to make sure the online
attribute appears again.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 9 Jan 2007 09:18:41 +0000 (10:18 +0100)]
[S390] cio: use barrier() in stsch_reset.
Use barrier() in stsch_reset() instead of duplicating the stsch()
inline assembly and adding "memory" to the clobberlist.
Pointed out by Chuck Ebbert.
Real fix would be to add a fixup section to the stsch() and extend the
basic program check handler so it searches the exception tables in case
of a program check.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hongjie Yang [Tue, 9 Jan 2007 09:18:36 +0000 (10:18 +0100)]
[S390] memory detection misses 128k.
Fix a memory leak problem in the memory detection routines. A memory leak
of 128k occurs when we have a contiguous memory with mixed access-mode
(read or write) ranges.
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Anton Blanchard [Mon, 8 Jan 2007 15:43:02 +0000 (02:43 +1100)]
[POWERPC] Fix bugs in the hypervisor call stats code
There were a few issues with the HCALL_STATS code:
- PURR cpu feature checks were backwards
- We iterated one entry off the end of the hcall_stats array
- Remove dead update_hcall_stats() function prototype
I noticed one thing while debugging, and that is we call H_ENTER (to set
up the MMU hashtable in early init) before we have done the cpu fixups.
This means we will execute the PURR SPR reads even on a CPU that isnt
capable of it. I wonder if we can move the CPU feature fixups earlier.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Anton Blanchard [Mon, 8 Jan 2007 15:37:16 +0000 (02:37 +1100)]
[POWERPC] Fix corruption in hcall9
It looks to me like we are corrupting r12 in the hcall9 function.
Although we have r0 free we cant use offsets against it, so save
away r12 in there instead. r12 holds the ninth return value from
the hypervisor call, so without this fix, the caller will see the
wrong value for the ninth element in the array that gets the return
values.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
This proc file should only be created if we are running on legacy
iSeries. Since we can now run the same kernel on legacy iSeries and
other machines, we currently get the /proc/iSeries directory and the
files in it on non-iSeries machines, and accessing them causes an oops
in some cases. This and the following patches make sure that these
files are not created on non-iSeries machines, thus avoiding the oops.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Olaf Hering [Wed, 3 Jan 2007 17:33:56 +0000 (18:33 +0100)]
[POWERPC] disable PReP and EFIKA during make oldconfig
New boards should not be enabled per default.
Disable EFIKA and PReP per default.
Anyone who really needes the new code can enable it during make oldconfig.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
Grant Likely [Tue, 2 Jan 2007 22:44:44 +0000 (15:44 -0700)]
[POWERPC] Fix mpc52xx fdt to use correct device_type for sound devices
This corrects the documented interface for mpc52xx device trees.
Sound devices should be using 'sound' for the device_type field, not
the type of sound interface.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Sylvain Munaut [Tue, 2 Jan 2007 22:29:53 +0000 (23:29 +0100)]
[POWERPC] 52xx: Don't use device_initcall to probe of_platform_bus
Using device_initcall makes it happen for every platform that
compiles this file in. This is really bad, for obvious reasons.
Instead, we use the .init field of the machine description. If
the platform needs the hook to do something specific it can provides
its own function and call mpc52xx_declare_of_platform_devices from
there. If not, the mpc52xx_declare_of_platform_devices function can
directly be used as the init hook.
Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
Enabled new netfilter stuff corresponding to what was enabled before
under different names, and turned on the gxt4500 video driver;
otherwise just took the defaults.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
David Woodhouse [Mon, 1 Jan 2007 18:45:34 +0000 (18:45 +0000)]
[POWERPC] Fix manual assembly WARN_ON() in enter_rtas().
When we switched over to the generic BUG mechanism we forgot to change
the assembly code which open-codes a WARN_ON() in enter_rtas(), so the
bug table got corrupted.
This patch provides an EMIT_BUG_ENTRY macro for use in assembly code,
and uses it in entry_64.S. Tested with CONFIG_DEBUG_BUGVERBOSE on ppc64
but not without -- I tried to turn it off but it wouldn't go away; I
suspect Aunt Tillie probably needed it.
This version gets __FILE__ and __LINE__ right in the assembly version --
rather than saying include/asm-powerpc/bug.h line 21 every time which is
a little suboptimal.
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Nathan Lynch [Tue, 2 Jan 2007 22:37:06 +0000 (16:37 -0600)]
[POWERPC] Fix unbalanced uses of of_node_put
The (maple|pasemi)_init_IRQ functions call of_node_put(root) once more
than they should, causing the refcount of the root node to underflow,
which triggers the WARN_ON in kref_get.
Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
David Gibson [Thu, 21 Dec 2006 22:23:03 +0000 (09:23 +1100)]
[POWERPC] Fix bogus BUG_ON() in in hugetlb_get_unmapped_area()
The powerpc specific version of hugetlb_get_unmapped_area() makes some
unwarranted assumptions about what checks have been made to its
parameters by its callers. This will lead to a BUG_ON() if a 32-bit
process attempts to make a hugepage mapping which extends above
TASK_SIZE (4GB).
I'm not sure if these assumptions came about because they were valid
with earlier versions of the get_unmapped_area() path, or if it was
always broken. Nonetheless this patch fixes the logic, and removes
the crash.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linus Torvalds [Mon, 8 Jan 2007 23:04:46 +0000 (15:04 -0800)]
Revert "[PATCH] x86-64: Try multiple timer variants in check_timer"
This reverts commit b026872601976f666bae77b609dc490d1834bf77, which has
been linked to several problem reports with IO-APIC and the timer.
Machines either don't boot because the timer doesn't happen, or we get
double timer interrupts because we end up double-routing the timer irq
through multiple interfaces.
Patches to fix this cleanup exist (and have been confirmed to work fine
at least for some of the affected cases) and we'll revisit it for
2.6.21, but this late in the -rc series we're better off just reverting
the incomplete commit that caused the problems.
Suggested-by: Adrian Bunk <bunk@stusta.de> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Yinghai Lu <yinghai.lu@amd.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Vitaly Wool [Thu, 28 Dec 2006 14:14:05 +0000 (17:14 +0300)]
[MIPS] PNX8550: Fix system timer support
the patch inlined below restores proper time accounting for PNX8550-based
boards. It also gets rid of #ifdef in the generic code which becomes
unnecessary then.
It's functionally identical to the previous patch with the same name but
it has minor comments from Atsushi and Sergei taken into account.
Atsushi Nemoto [Sun, 17 Dec 2006 15:38:21 +0000 (00:38 +0900)]
[MIPS] TX49: Fix use of CDEX build_store_reg()
The commit a923660d786a53e78834b19062f7af2535f7f8ad accidently
prevents TX49 from using CDEX. Use build_dst_pref() only if prefetch
for store was really available.
Davy Chan [Fri, 5 Jan 2007 05:56:46 +0000 (13:56 +0800)]
[MIPS] pnx8550: Fix write_config_byte() PCI config space accessor
There's a serious typo in the function:
arch/mips/pci/ops-pnx8550.c:write_config_byte()
The parameter passed to the function config_access() is PCI_CMD_CONFIG_READ
instead of PCI_CMD_CONFIG_WRITE. This renders any attempts to write
a single byte to the PCI configuration registers useless.
This problem does not exist for write_config_word() nor write_config_dword().
This problem has been there since kernel v2.6.17 and is still there
as of kernel v2.6.19.1.
Atsushi Nemoto [Tue, 12 Dec 2006 16:22:06 +0000 (01:22 +0900)]
[MIPS] csum_partial and copy in parallel
Implement optimized asm version of csum_partial_copy_nocheck,
csum_partial_copy_from_user and csum_and_copy_to_user which can do
calculate and copy in parallel, based on memcpy.S.
Russell King [Mon, 8 Jan 2007 19:49:12 +0000 (19:49 +0000)]
[ARM] Provide basic printk_clock() implementation
Current sched_clock() implementations on ARM cause unbootable kernels
with PRINTK_TIME support enabled. To avoid this, provide a basic
printk_clock() implementation which avoids sched_clock() being called
before the page tables have been set up.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Sat, 30 Dec 2006 23:17:40 +0000 (23:17 +0000)]
[ARM] Resolve fuse and direct-IO failures due to missing cache flushes
fuse does not work on ARM due to cache incoherency issues - fuse wants
to use get_user_pages() to copy data from the current process into
kernel space. However, since this accesses userspace via the kernel
mapping, the kernel mapping can be out of date wrt data written to
userspace.
This can lead to unpredictable behaviour (in the case of fuse) or data
corruption for direct-IO.
This resolves debian bug #402876
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Sat, 30 Dec 2006 22:24:19 +0000 (22:24 +0000)]
[ARM] pass vma for flush_anon_page()
Since get_user_pages() may be used with processes other than the
current process and calls flush_anon_page(), flush_anon_page() has to
cope in some way with non-current processes.
It may not be appropriate, or even desirable to flush a region of
virtual memory cache in the current process when that is different to
the process that we want the flush to occur for.
Therefore, pass the vma into flush_anon_page() so that the architecture
can work out whether the 'vmaddr' is for the current process or not.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 8 Jan 2007 16:42:51 +0000 (16:42 +0000)]
[ARM] Fix potential MMCI bug
The MMCI driver might end up aborting the initial command and leaving
the data part of the command sequence still in place. Avoid this
problem by ensuring that any data sequence is properly cleared out
when a command completes.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If the kernel attempts to execute a CP1 or CP2 instruction and it
aborts, and a FP emulator is not loaded, we try to return as if to
a user context, instead of the proper kernel context. Since the
fault came from kernel mode, we must use the kernel return paths.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Hugh Dickins reports that it causes random failures on x86 with SuSE
10.2, and points out
"Isn't that randomization, anywhere from 0x10000 to ELF_ET_DYN_BASE,
sure to place the ET_DYN from time to time just where the comment
says it's trying to avoid? I assume that somehow results in the error
reported."
(where the comment in question is the existing comment in the source
code about mmap/brk clashes).
Suggested-by: Hugh Dickins <hugh@veritas.com> Acked-by: Marcus Meissner <meissner@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: asix: Fix AX88772 device PHY selection
USB: usblp.c - add Kyocera Mita FS 820 to list of "quirky" printers
sisusb_con warning fixes
USB: Fixed bug in endpoint release function.
USB: small update to Documentation/usb/acm.txt
USB storage: fix ipod ejecting issue
USB Storage: unusual_devs: add supertop drives
USB: omap_udc build fixes (sync with linux-omap)
USB: funsoft is borken on sparc
USB: fix interaction between different interfaces in an "Option" usb device
UHCI: support device_may_wakeup
UHCI: make test for ASUS motherboard more specific
Linus Torvalds [Sat, 6 Jan 2007 08:09:14 +0000 (00:09 -0800)]
Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
i2c/m41t00: Do not forget to write year
i2c-mv64xxx: Fix random oops at boot
i2c: Migration aids for i2c_adapter.dev removal
i2c-pnx: Add entry to MAINTAINERS
i2c-pnx: Fix interrupt handler, get rid of EARLY config option
Looks like this is the problem, which point Al Viro some time ago:
ufs's get_block callback allocates 16k of disk at a time, and links that
entire 16k into the file's metadata. But because get_block is called for only
a single buffer_head (a 2k buffer_head in this case?) we are only able to tell
the VFS that this 2k is buffer_new().
So when ufs_getfrag_block() is later called to map some more data in the file,
and when that data resides within the remaining 14k of this fragment,
ufs_getfrag_block() will incorrectly return a !buffer_new() buffer_head.
I don't see _right_ way to do nullification of whole block, if use inode
page cache, some pages may be outside of inode limits (inode size), and
will be lost; if use blockdev page cache it is possible to zero real data,
if later inode page cache will be used.
The simpliest way, as can I see usage of block device page cache, but not only
mark dirty, but also sync it during "nullification". I use my simple tests
collection, which I used for check that create,open,write,read,close works on
ufs, and I see that this patch makes ufs code 18% slower then before.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 6 Jan 2007 00:37:03 +0000 (16:37 -0800)]
[PATCH] fix OOM killing of swapoff
These days, if you swapoff when there isn't enough memory, OOM killer gives
"BUG: scheduling while atomic" and the machine hangs: badness() needs to do
its PF_SWAPOFF return after the task_unlock (tasklist_lock is also held
here, so p isn't going to be freed: PF_SWAPOFF might get turned off at any
moment, but that doesn't really matter).
[PATCH] fix the toshiba_acpi write_lcd return value
write_lcd() in toshiba_acpi returns 0 on success since the big ACPI patch
merged in 2.6.20-rc2. It should return count.
Signed-off-by: Matthijs van Otterdijk <thotter@gmail.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both process_zones() and drain_node_pages() check for populated zones
before touching pagesets. However, __drain_pages does not do so,
This may result in a NULL pointer dereference for pagesets in unpopulated
zones if a NUMA setup is combined with cpu hotplug.
Initially the unpopulated zone has the pcp pointers pointing to the boot
pagesets. Since the zone is not populated the boot pageset pointers will
not be changed during page allocator and slab bootstrap.
If a cpu is later brought down (first call to __drain_pages()) then the pcp
pointers for cpus in unpopulated zones are set to NULL since __drain_pages
does not first check for an unpopulated zone.
If the cpu is then brought up again then we call process_zones() which will
ignore the unpopulated zone. So the pageset pointers will still be NULL.
If the cpu is then again brought down then __drain_pages will attempt to
drain pages by following the NULL pageset pointer for unpopulated zones.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan [Sat, 6 Jan 2007 00:37:01 +0000 (16:37 -0800)]
[PATCH] hpt37x: Two important bug fixes
The HPT37x driver very carefully handles DMA completions and the needed
fixups are done on pci registers 0x50 and 0x52. This is unfortunate
because the actual registers are 0x50 and 0x54. Fixing this offset cures
the second channel problems reported.
Secondly there are some problems with the HPT370 and certain ATA drives.
The filter code however only filters ATAPI devices due to a reversed type
check.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avi Kivity [Sat, 6 Jan 2007 00:36:53 +0000 (16:36 -0800)]
[PATCH] KVM: MMU: Replace atomic allocations by preallocated objects
The mmu sometimes needs memory for reverse mapping and parent pte chains.
however, we can't allocate from within the mmu because of the atomic context.
So, move the allocations to a central place that can be executed before the
main mmu machinery, where we can bail out on failure before any damage is
done.
(error handling is deffered for now, but the basic structure is there)
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In fork() (or when we protect a page that is no longer a page table), we can
experience floods of writes to a page, which have to be emulated. This is
expensive.
So, if we detect such a flood, zap the page so subsequent writes can proceed
natively.
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avi Kivity [Sat, 6 Jan 2007 00:36:47 +0000 (16:36 -0800)]
[PATCH] KVM: MMU: Remove invlpg interception
Since we write protect shadowed guest page tables, there is no need to trap
page invalidations (the guest will always change the mapping before issuing
the invlpg instruction).
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avi Kivity [Sat, 6 Jan 2007 00:36:45 +0000 (16:36 -0800)]
[PATCH] KVM: MMU: If emulating an instruction fails, try unprotecting the page
A page table may have been recycled into a regular page, and so any
instruction can be executed on it. Unprotect the page and let the cpu do its
thing.
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avi Kivity [Sat, 6 Jan 2007 00:36:44 +0000 (16:36 -0800)]
[PATCH] KVM: MMU: Let the walker extract the target page gfn from the pte
This fixes a problem where set_pte_common() looked for shadowed pages based on
the page directory gfn (a huge page) instead of the actual gfn being mapped.
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avi Kivity [Sat, 6 Jan 2007 00:36:43 +0000 (16:36 -0800)]
[PATCH] KVM: MMU: Write protect guest pages when a shadow is created for them
When we cache a guest page table into a shadow page table, we need to prevent
further access to that page by the guest, as that would render the cache
incoherent.
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avi Kivity [Sat, 6 Jan 2007 00:36:43 +0000 (16:36 -0800)]
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avi Kivity [Sat, 6 Jan 2007 00:36:40 +0000 (16:36 -0800)]
[PATCH] KVM: MU: Special treatment for shadow pae root pages
Since we're not going to cache the pae-mode shadow root pages, allocate a
single pae shadow that will hold the four lower-level pages, which will act as
roots.
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>