Stefan Richter [Fri, 25 Jan 2008 17:57:41 +0000 (18:57 +0100)]
firewire: enforce access order between generation and node ID, fix "giving up on config rom"
fw_device.node_id and fw_device.generation are accessed without mutexes.
We have to ensure that all readers will get to see node_id updates
before generation updates.
Fixes an inability to recognize devices after "giving up on config rom",
https://bugzilla.redhat.com/show_bug.cgi?id=429950
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Reviewed by Nick Piggin <nickpiggin@yahoo.com.au>.
Verified to fix 'giving up on config rom' issues on multiple system and
drive combinations that were previously affected.
Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Stefan Richter [Thu, 24 Jan 2008 00:53:51 +0000 (01:53 +0100)]
firewire: fw-cdev: use device generation, not card generation
We have to use the fw_device.generation here, not the fw_card.generation,
because the generation must never be newer than the node ID when we emit
a transaction. This cannot be guaranteed with fw_card.generation.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Verified in concert with subsequent memory barriers patch to fix 'giving
up on config rom' issues on multiple system and drive combinations that
were previously affected.
Stefan Richter [Thu, 24 Jan 2008 00:53:19 +0000 (01:53 +0100)]
firewire: fw-sbp2: use device generation, not card generation
There was a small window where a login or reconnect job could use an
already updated card generation with an outdated node ID. We have to
use the fw_device.generation here, not the fw_card.generation, because
the generation must never be newer than the node ID when we emit a
transaction. This cannot be guaranteed with fw_card.generation.
Furthermore, the target's and initiator's node IDs can be obtained from
fw_device and fw_card. Dereferencing their underlying topology objects
is not necessary.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Verified in concert with subsequent memory barriers patch to fix 'giving
up on config rom' issues on multiple system and drive combinations that
were previously affected.
Stefan Richter [Sun, 20 Jan 2008 00:25:31 +0000 (01:25 +0100)]
firewire: fw-sbp2: try to increase reconnect_hold (speed up reconnection)
Ask the target to grant 4 seconds instead of the standard and minimum of
1 second window after bus reset for reconnection. This accelerates
reconnection if there are more than one targets on the bus: If a login
and inquiry to one target blocks the fw-sbp2 workqueue for more than 1s
after bus reset, we now still can reconnect to the other target.
Before that, fw-sbp2's reconnect attempts would be rejected with "error
status: 0:9" (function rejected), and fw-sbp2 would finally re-login.
All those futile reconnect attemps cost extra time until the target
which needs re-login is ready for I/O again.
The reconnect timeout field in the login ORB doesn't have to be honored
by the target though. I found that we could get up to
- allegedly 32768s from an old OXFW911 firmware
- 256s from LSI bridges
- 4s from OXUF922 and OXFW912 bridges,
- 2s from TI bridges,
- only the standard 1s from Initio and Prolific bridges and from
Apple OpenFirmware in target mode.
We just try to get 4 seconds which already covers the case of a few
HDDs on the same bus quite nicely.
A minor drawback occurs in the following (rare and impractical) border
case:
- two initiators are there, initiator 1 holds an exclusive login to
a target,
- initiator 1 goes off the bus,
- target refuses login attempts from initiator 2 until reconnect_hold
seconds after bus reset.
An alternative approach to the issue at hand would be to parallelize
fw-sbp2's reconnect and login work.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Jarod Wilson <jwilson@redhat.com>
Stefan Richter [Sun, 20 Jan 2008 00:24:26 +0000 (01:24 +0100)]
firewire: fw-sbp2: skip unnecessary logout
Don't attempt to send a logout ORB if the target was already unplugged
or had its link switched off. If two targets are attached, this
enhances the chance to quickly reconnect to the remaining target when
one target is plugged out.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Jarod Wilson <jwilson@redhat.com>
David Moore [Sun, 6 Jan 2008 22:21:41 +0000 (17:21 -0500)]
firewire: fw-ohci: Dynamically allocate buffers for DMA descriptors
Previously, the fw-ohci driver used fixed-length buffers for storing
descriptors for isochronous receive DMA programs. If an application
(such as libdc1394) generated a DMA program that was too large, fw-ohci
would reach the limit of its fixed-sized buffer and return an error to
userspace.
This patch replaces the fixed-length ring-buffer with a linked-list of
page-sized buffers. Additional buffers can be dynamically allocated and
appended to the list when necessary. For a particular context, buffers
are kept around after use and reused as necessary, so there is no
allocation taking place after the DMA program is generated for the first
time.
In addition, the buffers it uses are coherent for DMA so there is no
syncing required before and after writes. This syncing wasn't properly
done in the previous version of the code.
-
This is the fourth version of my patch that replaces a fixed-length
buffer for DMA descriptors with a dynamically allocated linked-list of
buffers.
As we discovered with the last attempt, new context programs are
sometimes queued from interrupt context, making it unacceptable to call
tasklet_disable() from context_get_descriptors().
This version of the patch uses ohci->lock for all locking needs instead
of tasklet_disable/enable. There is a new requirement that
context_get_descriptors() be called while holding ohci->lock. It was
already held for the AT context, so adding the requirement for the iso
context did not seem particularly onerous. In addition, this has the
side benefit of allowing iso queue to be safely called from concurrent
user-space threads, which previously was not safe.
Signed-off-by: David Moore <dcm@acm.org> Signed-off-by: Kristian Høgsberg <krh@redhat.com> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
-
Fixes the following issues:
- Isochronous reception stopped prematurely if an application used a
larger buffer. (Reproduced with coriander.)
- Isochronous reception stopped after one or a few frames on VT630x
in OHCI 1.0 mode. (Fixes reception in coriander, but dvgrab still
doesn't work with these chips.)
Patch update: struct member alignment, whitespace nits
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
The firewire-ohci driver so far lacked the ability to resume cycle
master duty after that condition happened, as added to ohci1394 in Linux
2.6.18 by commit 57fdb58fa5a140bdd52cf4c4ffc30df73676f0a5. This ports
this patch to fw-ohci.
The "cycle too long" condition has been seen in practice
- with IIDC cameras if a mode with packets too large for a speed is
chosen,
- sporadically when capturing DV on a VIA VT6306 card with ohci1394/
ieee1394/ raw1394/ dvgrab 2.
https://bugzilla.redhat.com/show_bug.cgi?id=415841#c7
(This does not fix Fedora bug 415841.)
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
David Moore [Wed, 19 Dec 2007 20:26:38 +0000 (15:26 -0500)]
firewire: fw-ohci: Bug fixes for packet-per-buffer support
This patch corrects a number of bugs in the current OHCI 1.0
packet-per-buffer support:
1. Correctly deal with payloads that cross a page boundary. The
previous version would not split the descriptor at such a boundary,
potentially corrupting unrelated memory.
2. Allow user-space to specify multiple packets per struct
fw_cdev_iso_packet in the same way that dual-buffer allows. This is
signaled by header_length being a multiple of header_size. This
multiple determines the number of packets. The payload size allocated
per packet is determined by dividing the total payload size by the
number of packets.
3. Make sync support work properly for packet-per-buffer.
I have tested this patch with libdc1394 by forcing my OHCI 1.1
controller to use the packet-per-buffer support instead of dual-buffer.
I would greatly appreciate testing by those who have a DV devices and
other types of iso streamers to make sure I didn't cause any
regressions.
Stefan, with this patch, I'm hoping that libdc1394 will work with all
your OHCI 1.0 controllers now.
The one bit of future work that remains for packet-per-buffer support is
the automatic compaction of short payloads that I discussed with
Kristian.
Signed-off-by: David Moore <dcm@acm.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
David Moore [Wed, 19 Dec 2007 08:09:18 +0000 (03:09 -0500)]
firewire: fw-ohci: Fix for dualbuffer three-or-more buffers
This patch fixes the problem where different OHCI 1.1 controllers behave
differently when a received iso packet straddles three or more buffers
when using the dual-buffer receive mode. Two changes are made in order
to handle this situation:
1. The packet sync DMA descriptor is given a non-zero header length and
non-zero payload length. This is because zero-payload descriptors are
not discussed in the OHCI 1.1 specs and their behavior is thus
undefined. Instead we use a header size just large enough for a single
header and a payload length of 4 bytes for this first descriptor.
2. As we process received packets in the context's tasklet, read the
packet length out of the headers. Keep track of the running total of
the packet length as "excess_bytes", so we can ignore any descriptors
where no packet starts or ends. These descriptors may not have had
their first_res_count or second_res_count fields updated by the
controller so we cannot rely on those values.
The main drawback of this patch is that the excess_bytes value might get
"out of sync" with the packet descriptors if something strange happens
to the DMA program. I'm not if such a thing could ever happen, but I
appreciate any suggestions in making it more robust.
Also, the packet-per-buffer support may need a similar fix to deal with
issue 1, but I haven't done any work on that yet.
Stefan, I'm hoping that with this patch, all your OHCI 1.1 controllers
will work properly with an unmodified version of libdc1394.
Signed-off-by: David Moore <dcm@acm.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Sun, 16 Dec 2007 19:53:13 +0000 (20:53 +0100)]
ieee1394: ohci1394: don't schedule IT tasklets on IR events
Bug noted by Pieter Palmers: Isochronous transmit tasklets were
scheduled on isochronous receive events, in addition to the proper
isochronous receive tasklets.
Stefan Richter [Sun, 16 Dec 2007 16:31:26 +0000 (17:31 +0100)]
ieee1394: sbp2: raise default transfer size limit
This patch speeds up sbp2 a little bit --- but more importantly, it
brings the behavior of sbp2 and fw-sbp2 closer to each other. Like
fw-sbp2, sbp2 now does not limit the size of single transfers to 255
sectors anymore, unless told so by a blacklist flag or by module load
parameters.
Only very old bridge chips have been known to need the 255 sectors
limit, and we have got one such chip in our hardwired blacklist. There
certainly is a danger that more bridges need that limit; but I prefer to
have this issue present in both fw-sbp2 and sbp2 rather than just one of
them.
An OXUF922 with 400GB 7200RPM disk on an S400 controller is sped up by
this patch from 22.9 to 23.5 MB/s according to hdparm. The same effect
could be achieved before by setting a higher max_sectors module
parameter. On buses which use 1394b beta mode, sbp2 and fw-sbp2 will
now achieve virtually the same bandwidth. Fw-sbp2 only remains faster
on 1394a buses due to fw-core's gap count optimization.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Nick Piggin [Wed, 5 Dec 2007 07:15:53 +0000 (18:15 +1100)]
ieee1394: nopage
Convert ieee1394 from nopage to fault.
Remove redundant vma range checks (correct resource range check is retained).
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (890 commits)
x86: fix nodemap_size according to nodeid bits
x86: fix overlap between pagetable with bss section
x86: add PCI IDs to k8topology_64.c
x86: fix early_ioremap pagetable ops
x86: use the same pgd_list for PAE and 64-bit
x86: defer cr3 reload when doing pud_clear()
x86: early boot debugging via FireWire (ohci1394_dma=early)
x86: don't special-case pmd allocations as much
x86: shrink some ifdefs in fault.c
x86: ignore spurious faults
x86: remove nx_enabled from fault.c
x86: unify fault_32|64.c
x86: unify fault_32|64.c with ifdefs
x86: unify fault_32|64.c by ifdef'd function bodies
x86: arch/x86/mm/init_32.c printk fixes
x86: arch/x86/mm/init_32.c cleanup
x86: arch/x86/mm/init_64.c printk fixes
x86: unify ioremap
x86: fixes some bugs about EFI memory map handling
x86: use reboot_type on EFI 32
...
Both the old e1000 driver and the new e1000e driver can drive some
PCI-Express e1000 cards, and we should avoid ambiguity about which
driver will pick up the support for those cards when both drivers are
enabled.
This solves the problem by having the old driver support those cards if
the new driver isn't configured, but otherwise ceding support for PCI
Express versions of the e1000 chipset to the newer driver. Thus
allowing both legacy configurations where only the old driver is active
(and handles all chips it knows about) and the new configuration with
the new driver handling the more modern PCIE variants.
Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 30 Jan 2008 13:26:10 +0000 (00:26 +1100)]
Make !NETFILTER_ADVANCED enable IP6_NF_MATCH_IPV6HEADER
We want IPV6HEADER matching for the non-advanced default netfilter
configuration, since it's part of the standard netfilter setup of at
least some distributions (eg Fedora).
Otherwise NETFILTER_ADVANCED loses much of its point, since even
non-advanced users would have to enable all the advanced options just to
get a working IPv6 netfilter setup.
Use a standard list threaded through page->lru for maintaining the pgd
list on PAE. This is the same as 64-bit, and seems saner than using a
non-standard list via page->index.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
PAE mode requires that we reload cr3 in order to guarantee that
changes to the pgd will be noticed by the processor. This means that
in principle pud_clear needs to reload cr3 every time. However,
because reloading cr3 implies a tlb flush, we want to avoid it where
possible.
pud_clear() is only used in a couple of places:
- in free_pmd_range(), when pulling down a range of process address space, and
- huge_pmd_unshare()
In both cases, the calling code will do a a tlb flush anyway, so
there's no need to do it within pud_clear().
In free_pmd_range(), the pud_clear is immediately followed by
pmd_free_tlb(); we can hook that to make the mmu_gather do an
unconditional full flush to make sure cr3 gets reloaded.
In huge_pmd_unshare, it is followed by flush_tlb_range, which always
results in a full cr3-reload tlb flush.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: William Irwin <wli@holomorphy.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Bernhard Kaindl [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: early boot debugging via FireWire (ohci1394_dma=early)
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.
A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff
Signed-Off-By: Bernhard Kaindl <bk@suse.de> Tested-By: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In x86 PAE mode, stop treating pmds as a special case. Previously
they were always allocated and freed with the pgd. The modifies the
code to be the same as 64-bit mode, where they are allocated on
demand.
This is a step on the way to unifying 32/64-bit pagetable allocation
as much as possible.
There is a complicating wart, however. When you install a new
reference to a pmd in the pgd, the processor isn't guaranteed to see
it unless you reload cr3. Since reloading cr3 also has the
side-effect of flushing the tlb, this is an expense that we want to
avoid whereever possible.
This patch simply avoids reloading cr3 unless the update is to the
current pagetable. Later patches will optimise this further.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: William Irwin <wli@holomorphy.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When changing a kernel page from RO->RW, it's OK to leave stale TLB
entries around, since doing a global flush is expensive and they pose
no security problem. They can, however, generate a spurious fault,
which we should catch and simply return from (which will have the
side-effect of reloading the TLB to the current PTE).
This can occur when running under Xen, because it frequently changes
kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids
doing a global TLB flush after changing page permissions.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: remove nx_enabled from fault.c
On !PAE 32-bit, _PAGE_NX will be 0, making is_prefetch always
return early. The test is sufficient on PAE as __supported_pte_mask
is updated in the same places as nx_enabled in init_32.c which also
takes disable_nx into account.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: make ioremap() UC by default
Yes! A mere 120 c_p_a() fixing and rewriting patches later,
we are now confident that we can enable UC by default for
ioremap(), on x86 too.
Every other architectures was doing this already. Doing so
makes Linux more robust against MTRR mixups (which might go
unnoticed if BIOS writers test other OSs only - where PAT
might override bad MTRRs defaults).
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: cpa cleanup the 64-bit alias math
Cleanup the address calculations, which are necessary to identify the
high/low alias mappings of the kernel on 64 bit machines. Instead of
calling __pa/__va back and forth, calculate the physical address once
and base the other calculations on it. Add understandable constants so
we can use the already available within() helper. Also add comments,
which help mere mortals to understand what this code does.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa: rename global_flush_tlb() to cpa_flush_all()
The function name global_flush_tlb() suggests something different from
what the function really does. Rename it to cpa_flush_all(), which is an
understandable counterpart to cpa_flush_range().
no global visibility of the old API anymore.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa: implement clflush optimization
Use clflush on CPUs which support this.
clflush is only used when the page attribute operation has been
successful. On CPUs which do not support clflush and in the case of
error the old fashioned global_flush_tlb() is called.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa move the flush into set and clear functions
To avoid the modification of the flush code for the clflush
implementation, move the flush into the set and clear functions and
provide helper functions for the debugging code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: add testcases for RODATA and NX protections/attributes
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
I also cleaned up the NX test code quite a bit, and got rid of the ugly
exception table sorting stuff.
From: Arjan van de Ven <arjan@linux.intel.com>
This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
as well as the NX CPU feature/mappings. Both testcases can move to tests/
once that patch gets merged into mainline.
(I'm half considering moving the rodata test into mm/init.c but I'll
wait with that until init.c is unified)
As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
for the RODATA sections, which lead to 1 page less being marked read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: fix pageattr-selftest
In Ingo's testing, he found a bug in the CPA selftest code. What would
happen is that the test would call change_page_attr_addr on a range of
memory, part of which was read only, part of which was writable. The
only thing the test wanted to change was the global bit...
What actually happened was that the selftest would take the permissions
of the first page, and then the change_page_attr_addr call would then
set the permissions of the entire range to this first page. In the
rodata section case, this resulted in pages after the .rodata becoming
read only... which made the kernel rather unhappy in many interesting
ways.
This is just another example of how dangerous the cpa API is (was); this
patch changes the test to use the incremental clear/set APIs
instead, and it changes the clear/set implementation to work on a 1 page
at a time basis.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: cpa: move flush to cpa
The set_memory_* and set_pages_* family of API's currently requires the
callers to do a global tlb flush after the function call; forgetting this is
a very nasty deathtrap. This patch moves the global tlb flush into
each of the callers
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: make various pageattr.c functions static
change_page_attr_add is only used in pageattr.c now, so we can
make this function static.
change_page_attr() isn't used anywere at all anymore; this function
is a really bad API anyway so just remove the bloat entirely.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: fix the missing BIOS area check in page_is_ram
page_is_ram has a FIXME since ages, which reminds to sanity check the
BIOS area between 640k and 1M, which is sometimes falsely reported as
RAM in the e820 tables.
Implement the sanity check. Move the BIOS range defines from
pageattr.c into e820.h to avoid duplicate defines.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: deprecate change_page_attr() for drivers
With the introduction of the new API, no driver or non-archcore code needs
to use c-p-a anymore, so this patch also deprecates the EXPORT_SYMBOL of CPA
(it's a horrible API after all).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: a new API for drivers/etc to control cache and other page attributes
Right now, if drivers or other code want to change, say, a cache attribute of a
page, the only API they have is change_page_attr(). c-p-a is a really bad API
for this, because it forces the caller to know *ALL* the attributes he wants
for the page, not just the 1 thing he wants to change. So code that wants to
set a page uncachable, needs to be aware of the NX status as well etc etc etc.
This patch introduces a set of new APIs for this, set_pages_<attr> and
set_memory_<attr>, that offer a logical change to the user, and leave all
attributes not implied by the requested logical change alone.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: make c_p_a unconditional in ioremap
Make c_p_a unconditional for ioremap and iounmap. This ensures
complete consistency of the flags which are handed to
ioremap_page_range and the real flags in the mappings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: introduce max_pfn_mapped
64bit uses end_pfn_map and 32bit uses max_low_pfn. There are several
files which have #ifdef'ed defines which map either to end_pfn_map or
max_low_pfn. Replace this by a universal define and clean up all the
other instances.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: fix ioremap pgprot inconsistency
The pgprot flags which are handed into ioremap_page_range() are
different to those which are set in change_page_attr(). The
ioremap_page_range flags are executable, while the c_p_a flags are
not.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: fix ioremap pgprot inconsistency
The pgprot flags which are handed into ioremap_page_range() are
different to those which are set in change_page_attr(). The
ioremap_page_range flags are executable, while the c_p_a flags are
not. Also make the mappings global (which is a NOP currently on 32bit,
although CPUs from PPRO+ onwards support it, but that's a separate
fix.)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: turn the check_exec function into function that
What the check_exec() function really is trying to do is enforce certain
bits in the pgprot that are required by the x86 architecture, but that
callers might not be aware of (such as NX bit exclusion of the BIOS
area for BIOS based PCI access; it's not uncommon to ioremap the BIOS
region for various purposes and normally ioremap() memory has the NX bit
set).
This patch turns the check_exec() function into static_protections()
which also is now used to make sure the kernel text area remains non-NX
and that the .rodata section remains read-only. If the architecture
ends up requiring more such mandatory prot settings for specific areas,
this is now a reasonable place to add these.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Huang, Ying [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: ioremap_nocache fix
This patch fixes a bug of ioremap_nocache. ioremap_nocache() will call
__ioremap() with flags != 0 to do the real work, which will call
change_page_attr_addr() if phys_addr + size - 1 < (end_pfn_map << PAGE_SHIFT).
But some pages between 0 ~ end_pfn_map << PAGE_SHIFT are not mapped by
identity map, this will make change_page_attr_addr failed.
This patch is based on latest x86 git and has been tested on x86_64 platform.
Huang, Ying [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: fix NX bit handling in change_page_attr()
This patch fixes a bug of change_page_attr/change_page_attr_addr on
Intel i386/x86_64 CPUs. After changing page attribute to be
executable with these functions, the page remains un-executable on
Intel i386/x86_64 CPU. Because on Intel i386/x86_64 CPU, only if the
"NX" bits of all three level page tables are cleared (PAE is enabled),
the corresponding page is executable (refer to section 4.13.2 of Intel
64 and IA-32 Architectures Software Developer's Manual). So, the bug
is fixed through clearing the "NX" bit of PMD when splitting the huge
PMD.