Paul Mackerras [Wed, 8 Oct 2008 14:03:29 +0000 (14:03 +0000)]
powerpc: Sync RPA note in zImage with kernel's RPA note
Commit 9b09c6d909dfd8de96b99b9b9c808b94b0a71614 ("powerpc: Change the
default link address for pSeries zImage kernels") changed the
real-base value in the CHRP note added by the addnote program from
12MB to 32MB to give more space for Open Firmware to load the zImage.
(The real-base value says where we want OF to position itself in
memory.) However, this change was ineffective on most pSeries
machines, because the RPA note added by addnote has the "ignore me"
flag set to 1. This was intended to tell OF to ignore just the RPA
note, but has the side effect of also making OF ignore the CHRP note
(at least on most pSeries machines).
To solve this we have to set the "ignore me" flag to 0 in the RPA
note. (We can't just omit the RPA note because that is equivalent to
having an RPA note with default values, and the default values are not
what we want.) However, then we have to make sure the values in the
zImage's RPA note match up with the values that the kernel supplies
later in prom_init.c with either the ibm,client-architecture-support
call or the process-elf-header call in prom_send_capabilities().
So this sets the "ignore me" flag in the RPA note in addnote to 0, and
adjusts the RPA note values in addnote.c and in prom_init.c to be
consistent with each other and with the values in ibm_architecture_vec.
However, since the wrapper is independent of the kernel, this doesn't
ensure that the notes will stay consistent. To ensure that, this adds
code to addnote.c so that it can extract the kernel's RPA note from
the kernel binary and put that in the zImage. To that end, we put the
kernel's fake ELF header (which contains the kernel's RPA note) into
its own section, and arrange for wrapper to pull out that section with
objcopy and pass it to addnote, which then extracts the RPA note from
it and transfers it to the zImage.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Grant Likely [Wed, 8 Oct 2008 05:05:29 +0000 (05:05 +0000)]
powerpc/of-bindings: Don't support linux,<modalias> "compatible" values
Compatible property values in the form linux,<modalias> is not documented
anywhere and using it leaks Linux implementation details into the device
tree data (which is bad). Remove support for compatible values of this
form.
If any platforms exist which depended on this code (and I don't know of
any), then they can be fixed up by adding legacy translations to the
lookup table in this file.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Josh Poimboeuf [Tue, 7 Oct 2008 06:10:03 +0000 (06:10 +0000)]
powerpc: Fix error path in kernel_thread function
The powerpc 32-bit and 64-bit kernel_thread functions don't properly
propagate errors being returned by the clone syscall. (In the case of
error, the syscall exit code returns a positive errno in r3 and sets
the CR0[SO] bit.)
This patch fixes that by negating r3 if CR0[SO] is set after the syscall.
Signed-off-by: Josh Poimboeuf <jpoimboe@us.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Roel Kluin [Mon, 6 Oct 2008 22:38:33 +0000 (22:38 +0000)]
powerpc/cell/oprofile: Fix test on overlay_tbl_offset in vma_map
Offset is unsigned and when an address isn't found in the vma map
vma_map_lookup() returns the vma physical address + 0x10000000.
vma_map_lookup used to return 0xffffffff on a failed lookup, but
a change was made to return the vma physical address + 0x10000000
There are two callers of vam_map_lookup: one of them correctly
deals with this new return value, but the other (below) did not.
Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Acked-by: Maynard Johnson <maynardj@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Vorontsov [Mon, 6 Oct 2008 07:26:54 +0000 (07:26 +0000)]
powerpc: Fix no interrupt handling in pata_of_platform
When no interrupt is specified the pata_of_platform fills the irq_res
resource with -1, which is wrong to do for two reasons:
1. By definition, 'no irq' should be IRQ 0, not some negative integer;
2. pata_platform checks for irq_res.start > 0, but since irq_res.start
is unsigned type, the check will be true for `-1'.
Reported-by: Steven A. Falco <sfalco@harris.com> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nathan Fontenot [Wed, 1 Oct 2008 09:44:02 +0000 (09:44 +0000)]
powerpc: Oops in pseries_lmb_remove()
Testing hotplug memory remove has revealed that we can oops in
pseries_lmb_remove(). The incorrect shift causes a NULL pointer
dereference in the page_zone() inline routine.
I have only been able to reproduce the oops on kernels with large pages
enabled.
Tested on Power5 and Power6 with and without large pages enabled.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Fix sysfs pci mmap on 32-bit machines with 64-bit PCI
When manipulating 64-bit PCI addresses, the code would lose the
top 32-bit in a couple of places when shifting a pfn due to missing
type casting from the 32-bit pfn to a 64-bit resource before the
shift.
This breaks using newer X servers for example on 440 machines
with the PCI bus above 32-bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Honor O_NONBLOCK flag when reading RTAS log
rtas_log_read() doesn't check file flags for O_NONBLOCK and blocks
non-blocking readers of /proc/ppc64/rtas/error_log when there is
no data available. This fixes it.
Also rtas_log_read() returns now with ENODATA to prevent suspending of
process in wait_event_interruptible() when logging facility was
switched off and log is already empty.
Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Johannes Berg [Wed, 24 Sep 2008 04:29:08 +0000 (04:29 +0000)]
powerpc: Enforce sane MAX_ORDER
powerpc uses CONFIG_FORCE_MAX_ZONEORDER, and some things depend on it
being at least 10 when 64k pages are not configured (notably the dart
iommu code with CONFIG_PM). The defaults are fine, but when going from a
64K pages config to one without 64K pages, MAX_ORDER stays at 9 which is
too low for 4K pages.
This patch makes the Kconfig enforce at least the defaults.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Timur Tabi <timur@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit 14cf11af6cf608eb8c23e989ddb17a715ddce109 ("powerpc: Merge enough to
start building in arch/powerpc.") unwired /proc/ppc_htab, and commit 917f0af9e5a9ceecf9e72537fabb501254ba321d ("powerpc: Remove arch/ppc and
include/asm-ppc") removed the rest of the /proc/ppc_htab support, but there are
still a few references left. Kill them for good.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Roland Dreier [Mon, 15 Sep 2008 10:43:35 +0000 (10:43 +0000)]
powerpc: Avoid integer overflow in page_is_ram()
Commit 8b150478 ("ppc: make phys_mem_access_prot() work with pfns
instead of addresses") fixed page_is_ram() in arch/ppc to avoid overflow
for addresses above 4G on 32-bit kernels. However arch/powerpc's
page_is_ram() is missing the same fix -- it computes a physical address
by doing pfn << PAGE_SHIFT, which overflows if pfn corresponds to a page
above 4G.
In particular this causes pages above 4G to be mapped with the wrong
caching attribute; for example many ppc440-based SoCs have PCI space
above 4G, and mmap()ing MMIO space may end up with a mapping that has
caching enabled.
Fix this by working with the pfn and avoiding the conversion to
physical address that causes the overflow. This patch compares the
pfn to max_pfn, which is a semantic change from the old code -- that
code compared the physical address to high_memory, which corresponds
to max_low_pfn. However, I think that was is another bug, since
highmem pages are still RAM.
Reported-by: vb <vb@vsbe.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Victor Gallardo [Thu, 2 Oct 2008 06:29:06 +0000 (23:29 -0700)]
powerpc/44x: Add AMCC Arches eval board support
The Arches Evaluation board is based on the AMCC 460GT SoC chip.
This board is a dual processor board with each processor providing
independent resources for Rapid IO, Gigabit Ethernet, and serial
communications. Each 460GT has it's own 512MB DDR2 memory, 32MB NOR FLASH,
UART, EEPROM and temperature sensor, along with a shared debug port.
The two 460GT's will communicate with each other via shared memory,
Gigabit Ethernet and x1 PCI-Express.
Signed-off-by: Victor Gallardo <vgallardo@amcc.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Kumar Gala [Wed, 4 Jun 2008 07:59:29 +0000 (02:59 -0500)]
powerpc/math-emu: Use kernel generic math-emu code
The math emulation code is centered around a set of generic macros that
provide the core of the emulation that are shared by the various
architectures and other projects (like glibc). Each arch implements its
own sfp-machine.h to specific various arch specific details.
For historic reasons that are now lost the powerpc math-emu code had
its own version of the common headers. This moves us to using the
kernel generic version and thus getting fixes when those are updated.
Also cleaned up exception/error reporting from the FP emulation functions.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The PowerPC 405EZ SoC has some differences in the interrupt layout and
handling for the MAL. The SERR, TXDE, and RXDE interrupts are OR'd into
a single interrupt. Also, due to the possibility for interrupt coalescing,
the TXEOB and RXEOB interrupts require an interrupt bit to be cleared in
the ICINTSTAT SDR.
This sets the proper MAL feature bits for 405EZ boards, and adds a common
shared handler for SERR, TXDE, and RXDE. The defines for the ICINTSTAT DCR
are added to the proper header file as well.
This has been adapted from code originally written by Stefan Roese.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
There are some PowerPC SoCs that do odd things with the MAL handling. In
order to accommodate them, we need to introduce a feature mechanism that is
similar to the existing emac_has_feature function.
This adds a feature variable to the mal_instance structure, and adds a
mal_has_feature function. Two features are defined and are guarded
by Kconfig options that are selected by the affected platforms.
MAL_FTR_CLEAR_ICINSTAT is used for platforms that need to clear the
interrupt bits in the ICINTSTAT SDR for txeob/rxeob. This is common
on MAL implementations that have interrupt coalescing.
MAL_FTR_COMMON_ERR_INT is used for platforms that have SERR, TXDE,
and RXDE OR'd into a single interrupt bit.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
ibm_newemac: Allow the "no flow control" EMAC feature to work
Some PowerPC 40x chips have errata that force us not to use the integrated
flow control. We have the feature defined, but it currently can't be used
because it is never added to EMAC_FTRS_POSSIBLE.
This adds a Kconfig option for affected platforms to select and puts the
feature in the EMAC_FTRS_POSSIBLE list. This is set for PowerPC 405EZ
platforms as well.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Support for the SBC610 VPX Single Board Computer from GE Fanuc (PowerPC
MPC8641D).
Fixup to correctly reconfigure USB, provided by an NEC uPD720101, after
device is reset. This requires a set of chip specific registers in the
devices configuration space to be correctly written, enabling all ports
and switching the device to use an external 48-MHz Oscillator.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Becky Bruce [Wed, 24 Sep 2008 21:53:34 +0000 (16:53 -0500)]
powerpc: Drop redundant machine type print in show_cpuinfo
For many of the embedded boards, "model" and "Machine" are printing
the same thing; remove the redundant code and allow the generic
show_cpuinfo to print the model information.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Martyn Welch <martyn.welch@gefanuc.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Becky Bruce [Wed, 24 Sep 2008 16:01:24 +0000 (11:01 -0500)]
POWERPC: Allow 32-bit hashed pgtable code to support 36-bit physical
This rearranges a bit of code, and adds support for
36-bit physical addressing for configs that use a
hashed page table. The 36b physical support is not
enabled by default on any config - it must be
explicitly enabled via the config system.
This patch *only* expands the page table code to accomodate
large physical addresses on 32-bit systems and enables the
PHYS_64BIT config option for 86xx. It does *not*
allow you to boot a board with more than about 3.5GB of
RAM - for that, SWIOTLB support is also required (and
coming soon).
Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Kumar Gala [Thu, 31 Jul 2008 13:41:10 +0000 (08:41 -0500)]
powerpc/mm: Implement _PAGE_SPECIAL & pte_special() for 32-bit
Implement _PAGE_SPECIAL and pte_special() for 32-bit powerpc. This bit will
be used by the fast get_user_pages() to differenciate PTEs that correspond
to a valid struct page from special mappings that don't such as IO mappings
obtained via io_remap_pfn_ranges().
We currently only implement this on sub-arch that support SMP or will so
in the future (6xx, 44x, FSL-BookE) and not (8xx, 40x).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kumar Gala [Wed, 16 Jul 2008 20:54:21 +0000 (15:54 -0500)]
powerpc: Fixes for CONFIG_PTE_64BIT for SMP support
There are some minor issues with support 64-bit PTEs on a 32-bit processor
when dealing with SMP.
* We need to order the stores in set_pte_at to make sure the flag word
is set second.
* Change pte_clear to use pte_update so only the flag word is cleared
* Added a WARN_ON to set_pte_at to ensure the pte isn't present for
the 64-bit pte/SMP case (to ensure our assumption of this fact).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Becky Bruce <becky.bruce@freescale.com>
Becky Bruce [Fri, 12 Sep 2008 10:34:46 +0000 (10:34 +0000)]
powerpc: Merge 32 and 64-bit dma code
We essentially adopt the 64-bit dma code, with some changes to support
32-bit systems, including HIGHMEM. dma functions on 32-bit are now
invoked via accessor functions which call the correct op for a device based
on archdata dma_ops. If there is no archdata dma_ops, this defaults
to dma_direct_ops.
In addition, the dma_map/unmap_page functions are added to dma_ops
because we can't just fall back on map/unmap_single when HIGHMEM is
enabled. In the case of dma_direct_*, we stop using map/unmap_single
and just use the page version - this saves a lot of ugly
ifdeffing. We leave map/unmap_single in the dma_ops definition,
though, because they are needed by the iommu code, which does not
implement map/unmap_page. Ideally, going forward, we will completely
eliminate map/unmap_single and just have map/unmap_page, if it's
workable for 64-bit.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Kumar Gala [Mon, 22 Sep 2008 21:41:31 +0000 (14:41 -0700)]
powerpc: convert CONFIG_PPC_MERGE to CONFIG_PPC for legacy io checks
Now that arch/ppc is dead CONFIG_PPC_MERGE is always defined for all
powerpc platforms and we want to get rid of CONFIG_PPC_MERGE use
CONFIG_PPC instead.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Peter Korsgaard [Tue, 23 Sep 2008 15:35:38 +0000 (17:35 +0200)]
powerpc: gpio driver for mpc8349/8572/8610 and compatible
Structured similar to the existing QE GPIO support.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Becky Bruce [Fri, 12 Sep 2008 15:42:56 +0000 (10:42 -0500)]
cpm_uart: Pass actual dev ptr to dma_* in ucc and cpm_uart serial
We're currently passing NULL, and really shouldn't be.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Acked-By: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Timur Tabi [Wed, 6 Aug 2008 16:48:25 +0000 (11:48 -0500)]
powerpc: add SSI-to-DMA properties to Freescale MPC8610 HPCD device tree
Add the fsl,playback-dma and fsl,capture-dma properties to the Freescale
MPC8610 HPCD device tree. These properties connect the SSI nodes to the
DMA nodes for the DMA channels that the SSI should use. Also update the
ssi.txt documentation.
These properties will be needed when the ASoC V2 version of the Freescale
MPC8610 device drivers are merged into the mainline.
Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Kumar Gala [Fri, 27 Jun 2008 14:39:00 +0000 (09:39 -0500)]
math-emu: Add support for reporting exact invalid exception
Some architectures (like powerpc) provide status information on the exact
type of invalid exception. This is pretty straight forward as we already
report invalid exceptions via FP_SET_EXCEPTION.
We add new flags (FP_EX_INVALID_*) the architecture code can define if it
wants the exact invalid exception reported.
We had to split out the INF/INF and 0/0 cases for divide to allow reporting
the two invalid forms properly.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: David S. Miller <davem@davemloft.net>
Kumar Gala [Fri, 27 Jun 2008 14:33:59 +0000 (09:33 -0500)]
math-emu: Fix compiler warnings
Fix warnings of the form:
arch/powerpc/math-emu/fsubs.c:15: warning: 'R_f1' may be used uninitialized in this function
arch/powerpc/math-emu/fsubs.c:15: warning: 'R_f0' may be used uninitialized in this function
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This fixes a build warning when PHYS_64BIT is enabled, and removes an
unnecessary cast to phys_addr_t (the variable being cast is already
a phys_addr_t)
Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Martin Langer [Sun, 7 Sep 2008 07:51:32 +0000 (17:51 +1000)]
powerpc: Fix major revision number for Freescale cores
Some 74xx cores by Freescale are using the configuration field instead
of the major revision field for their revision number. This corrects
the wrong behaviour for those ppc cores including my one.
There is a reference document at Freecale. It describes the PVR
register. This is based on that pdf. You can find the document at:
David Gibson [Fri, 5 Sep 2008 01:49:54 +0000 (11:49 +1000)]
powerpc: Clean up hugepage pagetable allocation for powerpc with 16G pages
There is a small bug in the handling of 16G hugepages recently added
to the kernel. This doesn't cause a crash or other user-visible
problems, but it does mean that more levels of pagetable are allocated
than makes sense for 16G pages. The hugepage pagetables for the 16G
pages are allocated much lower in the pagetable tree than they should
be, with the intervening levels allocated with full pmd and pud pages
which will only ever have one entry filled in.
This corrects this problem, at the same time cleaning up the handling
of which level 64k versus 16M hugepage pagetables are allocated at.
The new way of formatting the tests should be more robust against
changes in pagetable structure, or any newly added hugepage sizes.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
powerpc: Make the irq reverse mapping radix tree lockless
The radix trees used by interrupt controllers for their irq reverse
mapping (currently only the XICS found on pSeries) have a complex
locking scheme dating back to before the advent of the lockless radix
tree.
This takes advantage of the lockless radix tree and of the fact that
the items of the tree are pointers to a static array (irq_map)
elements which can never go under us to simplify the locking.
Concurrency between readers and writers is handled by the intrinsic
properties of the lockless radix tree. Concurrency between writers is
handled with a global mutex.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
powerpc: Separate the irq radix tree insertion and lookup
irq_radix_revmap() currently serves 2 purposes, irq mapping lookup
and insertion which happen in interrupt and process context respectively.
Separate the function into its 2 components, one for lookup only and one
for insertion only.
Fix the only user of the revmap tree (XICS) to use the new functions.
Also, move the insertion into the radix tree of those irqs that were
requested before it was initialized at said tree initialization.
Mutual exclusion between the tree initialization and readers/writers is
handled via a state variable (revmap_trees_allocated) set to 1 when the tree
has been initialized and set to 2 after the already requested irqs have been
inserted in the tree by the init path. This state is checked before any reader
or writer access just like we used to check for tree.gfp_mask != 0 before.
Finally, now that we're not any longer inserting nodes into the radix-tree
in interrupt context, turn the GFP_ATOMIC allocations into GFP_KERNEL ones.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Becky Bruce [Wed, 3 Sep 2008 15:37:53 +0000 (01:37 +1000)]
powerpc: Rename PTE_SIZE to HPTE_SIZE
It's the size of the hardware PTE; make that clear in the name.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Paul Mackerras [Sat, 30 Aug 2008 01:43:47 +0000 (11:43 +1000)]
powerpc: Make the 64-bit kernel as a position-independent executable
This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
a position-independent executable (PIE) when it is set. This involves
processing the dynamic relocations in the image in the early stages of
booting, even if the kernel is being run at the address it is linked at,
since the linker does not necessarily fill in words in the image for
which there are dynamic relocations. (In fact the linker does fill in
such words for 64-bit executables, though not for 32-bit executables,
so in principle we could avoid calling relocate() entirely when we're
running a 64-bit kernel at the linked address.)
The dynamic relocations are processed by a new function relocate(addr),
where the addr parameter is the virtual address where the image will be
run. In fact we call it twice; once before calling prom_init, and again
when starting the main kernel. This means that reloc_offset() returns
0 in prom_init (since it has been relocated to the address it is running
at), which necessitated a few adjustments.
This also changes __va and __pa to use an equivalent definition that is
simpler. With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
constants (for 64-bit) whereas PHYSICAL_START is a variable (and
KERNELBASE ideally should be too, but isn't yet).
With this, relocatable kernels still copy themselves down to physical
address 0 and run there.
Paul Mackerras [Sat, 30 Aug 2008 01:41:12 +0000 (11:41 +1000)]
powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit
Using LOAD_REG_IMMEDIATE to get the address of kernel symbols
generates 5 instructions where LOAD_REG_ADDR can do it in one,
and will generate R_PPC64_ADDR16_* relocations in the output when
we get to making the kernel as a position-independent executable,
which we'd rather not have to handle. This changes various bits
of assembly code to use LOAD_REG_ADDR when we need to get the
address of a symbol, or to use suitable position-independent code
for cases where we can't access the TOC for various reasons, or
if we're not running at the address we were linked at.
It also cleans up a few minor things; there's no reason to save and
restore SRR0/1 around RTAS calls, __mmu_off can get the return
address from LR more conveniently than the caller can supply it in
R4 (and we already assume elsewhere that EA == RA if the MMU is on
in early boot), and enable_64b_mode was using 5 instructions where
2 would do.
Paul Mackerras [Sat, 30 Aug 2008 01:40:24 +0000 (11:40 +1000)]
powerpc: Make it possible to move the interrupt handlers away from the kernel
This changes the way that the exception prologs transfer control to
the handlers in 64-bit kernels with the aim of making it possible to
have the prologs separate from the main body of the kernel. Now,
instead of computing the address of the handler by taking the top
32 bits of the paca address (to get the 0xc0000000........ part) and
ORing in something in the bottom 16 bits, we get the base address of
the kernel by doing a load from the paca and add an offset.
This also replaces an mfmsr and an ori to compute the MSR value for
the handler with a load from the paca. That makes it unnecessary to
have a separate version of EXCEPTION_PROLOG_PSERIES that forces 64-bit
mode.
We can no longer use a direct branches in the exception prolog code,
which means that the SLB miss handlers can't branch directly to
.slb_miss_realmode any more. Instead we have to compute the address
and do an indirect branch. This is conditional on CONFIG_RELOCATABLE;
for non-relocatable kernels we use a direct branch as before. (A later
change will allow CONFIG_RELOCATABLE to be set on 64-bit powerpc.)
Since the secondary CPUs on pSeries start execution in the first 0x100
bytes of real memory and then have to get to wherever the kernel is,
we can't use a direct branch to get there. Instead this changes
__secondary_hold_spinloop from a flag to a function pointer. When it
is set to a non-NULL value, the secondary CPUs jump to the function
pointed to by that value.
Finally this eliminates one code difference between 32-bit and 64-bit
by making __secondary_hold be the text address of the secondary CPU
spinloop rather than a function descriptor for it.
Paul Mackerras [Sat, 30 Aug 2008 01:39:26 +0000 (11:39 +1000)]
powerpc: Rearrange head_64.S to move interrupt handler code to the beginning
This rearranges head_64.S so that we have all the first-level exception
prologs together starting at 0x100, followed by all the second-level
handlers that are invoked from the first-level prologs, followed by
other code. This doesn't make any functional change but will make
following changes for relocatable kernel support easier.
Chandru [Fri, 29 Aug 2008 14:28:16 +0000 (00:28 +1000)]
powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels
Kdump kernel needs to use only those memory regions that it is allowed
to use (crashkernel, rtas, tce, etc.). Each of these regions have
their own sizes and are currently added under 'linux,usable-memory'
property under each memory@xxx node of the device tree.
The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
node (on POWER6) now stores in it the representation for most of the
logical memory blocks with the size of each memory block being a
constant (lmb_size). If one or more or part of the above mentioned
regions lie under one of the lmb from ibm,dynamic-memory property,
there is a need to identify those regions within the given lmb.
This makes the kernel recognize a new 'linux,drconf-usable-memory'
property added by kexec-tools. Each entry in this property is of the
form of a count followed by that many (base, size) pairs for the above
mentioned regions. The number of cells in the count value is given by
the #size-cells property of the root node.
Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Nathan Fontenot [Mon, 25 Aug 2008 19:33:34 +0000 (05:33 +1000)]
powerpc: Check rc of notifier chain for memory remove
The return code from invocation of the notifier for
pSeries_reconfig_chain during update of the device tree is not
checked. This causes writes to /proc/ppc64/ofdt to update memory
properties (i.e. ibm,dyamic-reconfiguration-memory) to always
return success, instead of the result of the notifier chain.
This happens specifically when we remove/add memory from the
device tree on machines using memory specified in the
ibm,dynamic-reconfiguration-memory property of the device tree.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Mark Nelson [Fri, 22 Aug 2008 04:39:00 +0000 (14:39 +1000)]
powerpc: New copy_4K_page()
This new copy_4K_page() function was originally tuned for the best
performance on the Cell processor, but after testing on more 64bit
powerpc chips it was found that with a small modification it either
matched the performance offered by the current mainline version or
bettered it by a small amount.
It was found that on a Cell-based QS22 blade the amount of system
time measured when compiling a 2.6.26 pseries_defconfig decreased
by 4%. Using the same test, a 4-way 970MP machine saw a decrease of
2% in system time. No noticeable change was seen on Power4, Power5
or Power6.
The 4096 byte page is copied in thirty-two 128 byte strides. An
initial setup loop executes dcbt instructions for the whole source
page and dcbz instructions for the whole destination page. To do
this, the cache line size is retrieved from ppc64_caches.
A new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, (introduced in the
previous patch) is used to make the modification to this new copy
routine - on Power4, 970 and Cell the feature bit is set so the
setup loop is executed, but on all other 64bit chips the setup
loop is nop'ed out.
Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Mark Nelson [Fri, 22 Aug 2008 04:36:19 +0000 (14:36 +1000)]
powerpc: Add new CPU feature: CPU_FTR_CP_USE_DCBTZ
Add a new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, to be added to the
64bit powerpc chips that benefit from having dcbt and dcbz
instructions used in their memory copy routines.
This will be used in a subsequent patch that updates copy_4K_page().
The new bit is added to Cell, PPC970 and Power4 because they show
better performance with the new copy_4K_page() when dcbt and dcbz
instructions are used.
Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
This is happening because of an improper usage of strcmp() in the
e820 parsing code. The strcmp() always returns !0 and never resets the
value for e820.nr_map and returns an incorrect user-defined map.
Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6
* 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6:
UBIFS: make minimum fanout 3
UBIFS: fix division by zero
UBIFS: amend f_fsid
UBIFS: fill f_fsid
UBIFS: improve statfs reporting even more
UBIFS: introduce LEB overhead
UBIFS: add forgotten gc_idx_lebs component
UBIFS: fix assertion
UBIFS: improve statfs reporting
UBIFS: remove incorrect index space check
UBIFS: push empty flash hack down
UBIFS: do not update min_idx_lebs in stafs
UBIFS: allow for racing between GC and TNC
UBIFS: always read hashed-key nodes under TNC mutex
UBIFS: fix zero-length truncations
James Bottomley [Thu, 4 Sep 2008 01:43:36 +0000 (20:43 -0500)]
lib: Correct printk %pF to work on all architectures
It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer
formats" in commit 0fe1ef24f7bd0020f29ffe287dfdb9ead33ca0b2. However,
the current way its coded doesn't work on parisc64. For two reasons: 1)
parisc isn't in the #ifdef and 2) parisc has a different format for
function descriptors
Make dereference_function_descriptor() more accommodating by allowing
architecture overrides. I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Tony Luck <tony.luck@intel.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Snook [Tue, 9 Sep 2008 07:26:57 +0000 (03:26 -0400)]
MAINTAINERS: add Atheros maintainer for atlx
Jie Yang at Atheros is getting more directly involved with upstream work
on the atl* drivers. This patch changes the ATL1 entry to ATLX (atl2
support posted to netdev today) and adds him as a maintainer.
update Documentation/filesystems/Locking for 2.6.27 changes
In the 2.6.27 circle ->fasync lost the BKL, and the last remaining
->open variant that takes the BKL is also gone. ->get_sb and ->kill_sb
didn't have BKL forever, so updated the entries while we're at that.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[S390] cio: allow offline processing for disconnected devices
When disconnected ccw devices are removed, the device has to be set
offline, otherwise there will be side effects including a reference
count imbalance. This patch modifies ccw_device_offline to work for
devices in disconnecte/not operational state. ccw_device_offline is
called by cio for devices which are online during device removal.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
ssch() has two classes of return codes:
- condition codes (0-3) which need to be translated to Linux
error codes
- Linux error codes (-EIO on exceptions) which should be passed
to the caller (instead of erronously being handled like
condition code 3)
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jarod Wilson [Tue, 9 Sep 2008 10:38:56 +0000 (12:38 +0200)]
[S390] CVE-2008-1514: prevent ptrace padding area read/write in 31-bit mode
When running a 31-bit ptrace, on either an s390 or s390x kernel,
reads and writes into a padding area in struct user_regs_struct32
will result in a kernel panic.
This is also known as CVE-2008-1514.
Test case available here:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/user-area-padding.c?cvsroot=systemtap
Steps to reproduce:
1) wget the above
2) gcc -o user-area-padding-31bit user-area-padding.c -Wall -ggdb2 -D_GNU_SOURCE -m31
3) ./user-area-padding-31bit
<panic>
Test status
-----------
Without patch, both s390 and s390x kernels panic. With patch, the test case,
as well as the gdb testsuite, pass without incident, padding area reads
returning zero, writes ignored.
Nb: original version returned -EINVAL on write attempts, which broke the
gdb test and made the test case slightly unhappy, Jan Kratochvil suggested
the change to return 0 on write attempts.
Signed-off-by: Jarod Wilson <jarod@redhat.com> Tested-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: pm_standby low-power ram bug fix
avr32: Fix lockup after Java stack underflow in user mode
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc: Fix rare boot build breakage
powerpc/spufs: Fix possible scheduling of a context to multiple SPEs
powerpc/spufs: Fix race for a free SPU
powerpc/spufs: Fix multiple get_spu_context()
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: arch_reinit_sched_domains() must destroy domains to force rebuild
sched, cpuset: rework sched domains and CPU hotplug handling (v4)
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ahci: RAID mode SATA patch for Intel Ibex Peak DeviceIDs
pata_sil680: remove duplicate pcim_enable_device
libata-sff: kill spurious WARN_ON() in ata_hsm_move()
sata_nv: disable hardreset for generic
ahci: disable PMP for marvell ahcis
sata_mv: add RocketRaid 1720 PCI ID to driver
ahci, pata_marvell: play nicely together