pilppa.com Git - linux-2.6-omap-h63xx.git/log

powerpc: Make the 64-bit kernel as a position-independent executable

This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
a position-independent executable (PIE) when it is set.  This involves
processing the dynamic relocations in the image in the early stages of
booting, even if the kernel is being run at the address it is linked at,
since the linker does not necessarily fill in words in the image for
which there are dynamic relocations.  (In fact the linker does fill in
such words for 64-bit executables, though not for 32-bit executables,
so in principle we could avoid calling relocate() entirely when we're
running a 64-bit kernel at the linked address.)

The dynamic relocations are processed by a new function relocate(addr),
where the addr parameter is the virtual address where the image will be
run.  In fact we call it twice; once before calling prom_init, and again
when starting the main kernel.  This means that reloc_offset() returns
0 in prom_init (since it has been relocated to the address it is running
at), which necessitated a few adjustments.

This also changes __va and __pa to use an equivalent definition that is
simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
constants (for 64-bit) whereas PHYSICAL_START is a variable (and
KERNELBASE ideally should be too, but isn't yet).

With this, relocatable kernels still copy themselves down to physical
address 0 and run there.

Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit

Using LOAD_REG_IMMEDIATE to get the address of kernel symbols
generates 5 instructions where LOAD_REG_ADDR can do it in one,
and will generate R_PPC64_ADDR16_* relocations in the output when
we get to making the kernel as a position-independent executable,
which we'd rather not have to handle. This changes various bits
of assembly code to use LOAD_REG_ADDR when we need to get the
address of a symbol, or to use suitable position-independent code
for cases where we can't access the TOC for various reasons, or
if we're not running at the address we were linked at.

It also cleans up a few minor things; there's no reason to save and
restore SRR0/1 around RTAS calls, __mmu_off can get the return
address from LR more conveniently than the caller can supply it in
R4 (and we already assume elsewhere that EA == RA if the MMU is on
in early boot), and enable_64b_mode was using 5 instructions where
2 would do.

Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: Make it possible to move the interrupt handlers away from the kernel

This changes the way that the exception prologs transfer control to
the handlers in 64-bit kernels with the aim of making it possible to
have the prologs separate from the main body of the kernel.  Now,
instead of computing the address of the handler by taking the top
32 bits of the paca address (to get the 0xc0000000........ part) and
ORing in something in the bottom 16 bits, we get the base address of
the kernel by doing a load from the paca and add an offset.

This also replaces an mfmsr and an ori to compute the MSR value for
the handler with a load from the paca.  That makes it unnecessary to
have a separate version of EXCEPTION_PROLOG_PSERIES that forces 64-bit
mode.

We can no longer use a direct branches in the exception prolog code,
which means that the SLB miss handlers can't branch directly to
.slb_miss_realmode any more.  Instead we have to compute the address
and do an indirect branch.  This is conditional on CONFIG_RELOCATABLE;
for non-relocatable kernels we use a direct branch as before.  (A later
change will allow CONFIG_RELOCATABLE to be set on 64-bit powerpc.)

Since the secondary CPUs on pSeries start execution in the first 0x100
bytes of real memory and then have to get to wherever the kernel is,
we can't use a direct branch to get there.  Instead this changes
__secondary_hold_spinloop from a flag to a function pointer.  When it
is set to a non-NULL value, the secondary CPUs jump to the function
pointed to by that value.

Finally this eliminates one code difference between 32-bit and 64-bit
by making __secondary_hold be the text address of the secondary CPU
spinloop rather than a function descriptor for it.

Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: Rearrange head_64.S to move interrupt handler code to the beginning

This rearranges head_64.S so that we have all the first-level exception
prologs together starting at 0x100, followed by all the second-level
handlers that are invoked from the first-level prologs, followed by
other code. This doesn't make any functional change but will make
following changes for relocatable kernel support easier.

Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels

Kdump kernel needs to use only those memory regions that it is allowed
to use (crashkernel, rtas, tce, etc.).  Each of these regions have
their own sizes and are currently added under 'linux,usable-memory'
property under each memory@xxx node of the device tree.

The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
node (on POWER6) now stores in it the representation for most of the
logical memory blocks with the size of each memory block being a
constant (lmb_size).  If one or more or part of the above mentioned
regions lie under one of the lmb from ibm,dynamic-memory property,
there is a need to identify those regions within the given lmb.

This makes the kernel recognize a new 'linux,drconf-usable-memory'
property added by kexec-tools.  Each entry in this property is of the
form of a count followed by that many (base, size) pairs for the above
mentioned regions.  The number of cells in the count value is given by
the #size-cells property of the root node.

Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: Check rc of notifier chain for memory remove

The return code from invocation of the notifier for
pSeries_reconfig_chain during update of the device tree is not
checked. This causes writes to /proc/ppc64/ofdt to update memory
properties (i.e. ibm,dyamic-reconfiguration-memory) to always
return success, instead of the result of the notifier chain.

This happens specifically when we remove/add memory from the
device tree on machines using memory specified in the
ibm,dynamic-reconfiguration-memory property of the device tree.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: New copy_4K_page()

This new copy_4K_page() function was originally tuned for the best
performance on the Cell processor, but after testing on more 64bit
powerpc chips it was found that with a small modification it either
matched the performance offered by the current mainline version or
bettered it by a small amount.

It was found that on a Cell-based QS22 blade the amount of system
time measured when compiling a 2.6.26 pseries_defconfig decreased
by 4%. Using the same test, a 4-way 970MP machine saw a decrease of
2% in system time. No noticeable change was seen on Power4, Power5
or Power6.

The 4096 byte page is copied in thirty-two 128 byte strides. An
initial setup loop executes dcbt instructions for the whole source
page and dcbz instructions for the whole destination page. To do
this, the cache line size is retrieved from ppc64_caches.

A new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, (introduced in the
previous patch) is used to make the modification to this new copy
routine - on Power4, 970 and Cell the feature bit is set so the
setup loop is executed, but on all other 64bit chips the setup
loop is nop'ed out.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: Add new CPU feature: CPU_FTR_CP_USE_DCBTZ

Add a new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, to be added to the
64bit powerpc chips that benefit from having dcbt and dcbz
instructions used in their memory copy routines.

This will be used in a subsequent patch that updates copy_4K_page().
The new bit is added to Cell, PPC970 and Power4 because they show
better performance with the new copy_4K_page() when dcbt and dcbz
instructions are used.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: Fix duplicate test of MACIO_FLAG_SCCB_ON

Evidently MACIO_FLAG_SCCA_ON was meant.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>

Merge branch 'linux-2.6'

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix memmap=exactmap boot argument
  x86: disable static NOPLs on 32 bits
  xen: fix 2.6.27-rc5 xen balloon driver warnings

x86: fix memmap=exactmap boot argument

When using kdump modifying the e820 map is yielding strange results.

For example starting with

BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 0000000000093400 (usable)
BIOS-e820: 0000000000093400 - 00000000000a0000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fee0000 (usable)
BIOS-e820: 000000003fee0000 - 000000003fef3000 (ACPI data)
BIOS-e820: 000000003fef3000 - 000000003ff80000 (ACPI NVS)
BIOS-e820: 000000003ff80000 - 0000000040000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)

and booting with args

memmap=exactmap memmap=640K@0K memmap=5228K@16384K memmap=125188K@22252K memmap=76K#1047424K memmap=564K#1047500K

resulted in:

user-defined physical RAM map:
user: 0000000000000000 - 0000000000093400 (usable)
user: 0000000000093400 - 00000000000a0000 (reserved)
user: 0000000000100000 - 000000003fee0000 (usable)
user: 000000003fee0000 - 000000003fef3000 (ACPI data)
user: 000000003fef3000 - 000000003ff80000 (ACPI NVS)
user: 000000003ff80000 - 0000000040000000 (reserved)
user: 00000000e0000000 - 00000000f0000000 (reserved)
user: 00000000fec00000 - 00000000fec10000 (reserved)
user: 00000000fee00000 - 00000000fee01000 (reserved)
user: 00000000ff000000 - 0000000100000000 (reserved)

But should have resulted in:

user-defined physical RAM map:
user: 0000000000000000 - 00000000000a0000 (usable)
user: 0000000001000000 - 000000000151b000 (usable)
user: 00000000015bb000 - 0000000008ffc000 (usable)
user: 000000003fee0000 - 000000003ff80000 (ACPI data)

This is happening because of an improper usage of strcmp() in the
e820 parsing code. The strcmp() always returns !0 and never resets the
value for e820.nr_map and returns an incorrect user-defined map.

This patch fixes the problem.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] cio: allow offline processing for disconnected devices
  [S390] cio: handle ssch() return codes correctly.
  [S390] cio: Correct cleanup on error.
  [S390] CVE-2008-1514: prevent ptrace padding area read/write in 31-bit mode

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] IP22: Fix detection of second HPC3 on Challenge S

Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6

* 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6:
  UBIFS: make minimum fanout 3
  UBIFS: fix division by zero
  UBIFS: amend f_fsid
  UBIFS: fill f_fsid
  UBIFS: improve statfs reporting even more
  UBIFS: introduce LEB overhead
  UBIFS: add forgotten gc_idx_lebs component
  UBIFS: fix assertion
  UBIFS: improve statfs reporting
  UBIFS: remove incorrect index space check
  UBIFS: push empty flash hack down
  UBIFS: do not update min_idx_lebs in stafs
  UBIFS: allow for racing between GC and TNC
  UBIFS: always read hashed-key nodes under TNC mutex
  UBIFS: fix zero-length truncations

lib: Correct printk %pF to work on all architectures

It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer
formats" in commit 0fe1ef24f7bd0020f29ffe287dfdb9ead33ca0b2.  However,
the current way its coded doesn't work on parisc64.  For two reasons: 1)
parisc isn't in the #ifdef and 2) parisc has a different format for
function descriptors

Make dereference_function_descriptor() more accommodating by allowing
architecture overrides.  I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: add Atheros maintainer for atlx

Jie Yang at Atheros is getting more directly involved with upstream work
on the atl* drivers. This patch changes the ATL1 entry to ATLX (atl2
support posted to netdev today) and adds him as a maintainer.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

update Documentation/filesystems/Locking for 2.6.27 changes

In the 2.6.27 circle ->fasync lost the BKL, and the last remaining
->open variant that takes the BKL is also gone. ->get_sb and ->kill_sb
didn't have BKL forever, so updated the entries while we're at that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[S390] cio: allow offline processing for disconnected devices

When disconnected ccw devices are removed, the device has to be set
offline, otherwise there will be side effects including a reference
count imbalance. This patch modifies ccw_device_offline to work for
devices in disconnecte/not operational state. ccw_device_offline is
called by cio for devices which are online during device removal.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] cio: handle ssch() return codes correctly.

ssch() has two classes of return codes:
- condition codes (0-3) which need to be translated to Linux
  error codes
- Linux error codes (-EIO on exceptions) which should be passed
  to the caller (instead of erronously being handled like
  condition code 3)

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] cio: Correct cleanup on error.

Fix cleanup on error in chp_new() and init_channel_subsystem()
(must not call kfree() on structures that had been registered).

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] CVE-2008-1514: prevent ptrace padding area read/write in 31-bit mode

When running a 31-bit ptrace, on either an s390 or s390x kernel,
reads and writes into a padding area in struct user_regs_struct32
will result in a kernel panic.

This is also known as CVE-2008-1514.

Test case available here:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/user-area-padding.c?cvsroot=systemtap

Steps to reproduce:
1) wget the above
2) gcc -o user-area-padding-31bit user-area-padding.c -Wall -ggdb2 -D_GNU_SOURCE -m31
3) ./user-area-padding-31bit
<panic>

Test status
-----------
Without patch, both s390 and s390x kernels panic. With patch, the test case,
as well as the gdb testsuite, pass without incident, padding area reads
returning zero, writes ignored.

Nb: original version returned -EINVAL on write attempts, which broke the
gdb test and made the test case slightly unhappy, Jan Kratochvil suggested
the change to return 0 on write attempts.

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: pm_standby low-power ram bug fix
avr32: Fix lockup after Java stack underflow in user mode

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc: Fix rare boot build breakage
  powerpc/spufs: Fix possible scheduling of a context to multiple SPEs
  powerpc/spufs: Fix race for a free SPU
  powerpc/spufs: Fix multiple get_spu_context()

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
Revert "crypto: camellia - Use kernel-provided bitops, unaligned access helpers"

Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 5241/1: provide ioremap_wc()
  [ARM] omap: fix virtual vs physical address space confusions
  [ARM] remove unused #include <version.h>
  [ARM] omap: fix build error in ohci-omap.c
  [ARM] omap: fix gpio.c build error

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: arch_reinit_sched_domains() must destroy domains to force rebuild
sched, cpuset: rework sched domains and CPU hotplug handling (v4)

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  ahci: RAID mode SATA patch for Intel Ibex Peak DeviceIDs
  pata_sil680: remove duplicate pcim_enable_device
  libata-sff: kill spurious WARN_ON() in ata_hsm_move()
  sata_nv: disable hardreset for generic
  ahci: disable PMP for marvell ahcis
  sata_mv: add RocketRaid 1720 PCI ID to driver
  ahci, pata_marvell: play nicely together

Fix format of MAINTAINERS

... one entry lacked a colon which broke one of my scripts.

Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  bridge: don't allow setting hello time to zero
  netns : fix kernel panic in timewait socket destruction
  pkt_sched: Fix qdisc state in net_tx_action()
  netfilter: nf_conntrack_irc: make sure string is terminated before calling simple_strtoul
  netfilter: nf_conntrack_gre: nf_ct_gre_keymap_flush() fixlet
  netfilter: nf_conntrack_gre: more locking around keymap list
  netfilter: nf_conntrack_sip: de-static helper pointers

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Prevent sparc64 from invoking irq handlers on offline CPUs
sparc64: Fix IPI call locking.

usb: fix null deferences in low level usb serial

The hw interface drivers for the usb serial devices deference the tty
structure to set up the parameters for the initial console. The tty
structure should be passed as a parameter to the set_termios() call.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

NFS: Restore missing hunk in NFS mount option parser

Automounter maps can contain mount options valid for other NFS
implementations but not for Linux.  The Linux automounter uses the
mount command's "-s" command line option ("s" for "sloppy") so that
mount requests containing such options are not rejected.

Commit f45663ce5fb30f76a3414ab3ac69f4dd320e760a attempted to address a
known regression with text-based NFS mount option parsing.  Unrecognized
mount options would cause mount requests to fail, even if the "-s"
option was used on the mount command line.

Unfortunately, this commit was not complete as submitted.  It adds a
new mount option, "sloppy".  But it is missing a hunk, so it now allows
NFS mounts with unrecognized mount options, even if the "sloppy" option
is not present.  This could be a problem if a required critical mount
option such as "sync" is misspelled, for example, and is considered a
regression from 2.6.26.

This patch restores the missing hunk.  Now, the default behavior of
text-based NFS mount options is as before: any unrecognized mount option
will cause the mount to fail.

Please include this in 2.6.27-rc.

Thanks to Neil Brown for reporting this.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bridge: don't allow setting hello time to zero

Dushan Tcholich reports that on his system ksoftirqd can consume
between %6 to %10 of cpu time, and cause ~200 context switches per
second.

He then correlated this with a report by bdupree@techfinesse.com:

http://marc.info/?l=linux-kernel&m=119613299024398&w=2

and the culprit cause seems to be starting the bridge interface.
In particular, when starting the bridge interface, his scripts
are specifying a hello timer interval of "0".

The bridge hello time can't be safely set to values less than 1
second, otherwise it is possible to end up with a runaway timer.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns : fix kernel panic in timewait socket destruction

How to reproduce ?
- create a network namespace
- use tcp protocol and get timewait socket
- exit the network namespace
- after a moment (when the timewait socket is destroyed), the kernel
panics.

# BUG: unable to handle kernel NULL pointer dereference at
0000000000000007
IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
PGD 119985067 PUD 11c5c0067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>]
inet_twdr_do_twkill_work+0x6e/0xb8
RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
ffff88011ff76690)
Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
0000000000000008
0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
Call Trace:
<IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e
[<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e
[<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193
[<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd
[<ffffffff8200d08c>] ? call_softirq+0x1c/0x28
[<ffffffff8200e611>] ? do_softirq+0x2c/0x68
[<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9
[<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b
[<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d

Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0
48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
RSP <ffff88011ff7fed0>
CR2: 0000000000000007

This patch provides a function to purge all timewait sockets related
to a network namespace. The timewait sockets life cycle is not tied with
the network namespace, that means the timewait sockets stay alive while
the network namespace dies. The timewait sockets are for avoiding to
receive a duplicate packet from the network, if the network namespace is
freed, the network stack is removed, so no chance to receive any packets
from the outside world. Furthermore, having a pending destruction timer
on these sockets with a network namespace freed is not safe and will lead
to an oops if the timer callback which try to access data belonging to
the namespace like for example in:
inet_twdr_do_twkill_work
-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);

Purging the timewait sockets at the network namespace destruction will:
1) speed up memory freeing for the namespace
2) fix kernel panic on asynchronous timewait destruction

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86: disable static NOPLs on 32 bits

On 32-bit, at least the generic nops are fairly reasonable, but the
default nops for 64-bit really look pretty sad, and the P6 nops really do
look better.

So I would suggest perhaps moving the static P6 nop selection into the
CONFIG_X86_64 thing.

The alternative is to just get rid of that static nop selection, and just
have two cases: 32-bit and 64-bit, and just pick obviously safe cases for
them.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>

xen: fix 2.6.27-rc5 xen balloon driver warnings

Set the class so it doesn't clash with the normal memory class.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
===================================================================

[MIPS] IP22: Fix detection of second HPC3 on Challenge S

The second HPC3 could be found only on Guiness systems (Challenge-S),
but not on fullhouse (Indigo2) systems.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

ahci: RAID mode SATA patch for Intel Ibex Peak DeviceIDs

Add the Intel Ibex Peak (PCH) SATA RAID Controller DeviceIDs.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

pata_sil680: remove duplicate pcim_enable_device

Remove duplicate call to pcim_enable_device in sil680_init_one.

Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata-sff: kill spurious WARN_ON() in ata_hsm_move()

On HSM_ST_ERR, ata_hsm_move() triggers WARN_ON() if AC_ERR_DEV or
AC_ERR_HSM is not set. PHY events may trigger HSM_ST_ERR with other
error codes and, with or without it, there just isn't much reason to
do WARN_ON() on it. Even if error code is not set there, core EH
logic won't have any problem dealing with the error condition.

OSDL bz#11065 reports this problem.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

sata_nv: disable hardreset for generic

of them being unifying probing, hotplug and EH reset paths uniform.
Previously, broken hardreset could go unnoticed as it wasn't used
during probing but when something goes wrong or after hotplug the
problem will surface and bite hard.

OSDL bug 11195 reports that sata_nv generic flavor falls into this
category.  Hardreset itself succeeds but PHY stays offline after
hardreset.  I tried longer debounce timing but the result was the
same.

  http://bugzilla.kernel.org/show_bug.cgi?id=11195

So, it seems we'll have to drop hardreset from the generic flavor.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: disable PMP for marvell ahcis

Marvell ahcis don't play nicely with PMPs. Disable it.

Reported by KueiHuan Chen in the following thread.

http://thread.gmane.org/gmane.linux.ide/33296

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: KueiHuan Chen <kueihuan.chen@gmail.com>
Cc: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

sata_mv: add RocketRaid 1720 PCI ID to driver

Signed-off-by: Petr Jelen <petr.jelen@gmail.com>
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci, pata_marvell: play nicely together

I've been chasing Jeff about this for months.  Jeff added the Marvell
device identifiers to the ahci driver without making the AHCI driver
handle the PATA port. This means a lot of users can't use current
kernels and in most distro cases can't even install.

This has been going on since March 2008 for the 6121 Marvell, and late 2007
for the 6145!!!

This was all pointed out at the time and repeatedly ignored. Bugs assigned
to Jeff about this are ignored also.

To quote Jeff in email

> "Just switch the order of 'ahci' and 'pata_marvell' in
> /etc/modprobe.conf, then use Fedora's tools regenerate the initrd.

> See?  It's not rocket science, and the current configuration can be
> easily made to work for Fedora users."

(Which isn't trivial, isn't end user, shouldn't be needed, and as it usually
breaks at install time is in fact impossible)

To quote Jeff in August 2007

> "   mv-ahci-pata
> Marvell 6121/6141 PATA support.  Needs fixing in the 'PATA controller
> command' area before it is usable, and can go upstream."

Only he add the ids anyway later and caused regressions, adding a further
id in March causing more regresions.

The actual fix for the moment is very simple. If the user has included
the pata_marvell driver let it drive the ports. If they've only selected
for SATA support give them the AHCI driver which will run the port a fraction
faster. Allow the user to control this decision via ahci.marvell_enable as
a module parameter so that distributions can ship 'it works' defaults and
smarter users (or config tools) can then flip it over it desired.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

powerpc: Fix rare boot build breakage

A make -j20 powerpc kernel build broke a couple of months ago saying:
In file included from arch/powerpc/boot/gunzip_util.h:13,
from arch/powerpc/boot/prpmc2800.c:21:
arch/powerpc/boot/zlib.h:85: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘*’ token
arch/powerpc/boot/zlib.h:630: warning: type defaults to ‘int’ in declaration of ‘Byte’
arch/powerpc/boot/zlib.h:630: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token

It happened again yesterday: too rare for me to confirm the fix, but
it looks like the list of dependants on gunzip_util.h was incomplete.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>

Revert "crypto: camellia - Use kernel-provided bitops, unaligned access helpers"

This reverts commit bd699f2df6dbc2f4cba528fe598bd63a4d3702c5,
which causes camellia to fail the included self-test vectors.
It has also been confirmed that it breaks existing encrypted
disks using camellia.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

pkt_sched: Fix qdisc state in net_tx_action()

net_tx_action() can skip __QDISC_STATE_SCHED bit clearing while qdisc
is neither ran nor rescheduled, which may cause endless loop in
dev_deactivate().

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_conntrack_irc: make sure string is terminated before calling simple_strtoul

Alexey Dobriyan points out:

1. simple_strtoul() silently accepts all characters for given base even
   if result won't fit into unsigned long. This is amazing stupidity in
   itself, but

2. nf_conntrack_irc helper use simple_strtoul() for DCC request parsing.
   Data first copied into 64KB buffer, so theoretically nothing prevents
   reading past the end of it, since data comes from network given 1).

This is not actually a problem currently since we're guaranteed to have
a 0 byte in skb_shared_info or in the buffer the data is copied to, but
to make this more robust, make sure the string is actually terminated.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_conntrack_gre: nf_ct_gre_keymap_flush() fixlet

It does "kfree(list_head)" which looks wrong because entity that was
allocated is definitely not list_head.

However, this all works because list_head is first item in
struct nf_ct_gre_keymap.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_conntrack_gre: more locking around keymap list

gre_keymap_list should be protected in all places.
(unless I'm misreading something)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_conntrack_sip: de-static helper pointers

Helper's ->help hook can run concurrently with itself, so iterating over
SIP helpers with static pointer won't work reliably.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

powerpc/spufs: Fix possible scheduling of a context to multiple SPEs

We currently have a race when scheduling a context to a SPE -
after we have found a runnable context in spusched_tick, the same
context may have been scheduled by spu_activate().

This may result in a panic if we try to unschedule a context that has
been freed in the meantime.

This change exits spu_schedule() if the context has already been
scheduled, so we don't end up scheduling it twice.

Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - make Lenovo 3000 N100 blacklist entry more specific
  Input: bcm5974 - add BTN_TOUCH event for mousedev benefit
  Input: bcm5974 - improve finger tracking and counting
  Input: bcm5974 - small formatting cleanup
  Input: bcm5974 - add maintainer entry

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: cpu_init(): fix memory leak when using CPU hotplug
  x86: pda_init(): fix memory leak when using CPU hotplug
  x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags
  x86: move mtrr cpu cap setting early in early_init_xxxx
  x86: delay early cpu initialization until cpuid is done
  x86: use X86_FEATURE_NOPL in alternatives
  x86: add NOPL as a synthetic CPU feature bit
  x86: boot: stub out unimplemented CPU feature words

Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource, acpi_pm.c: check for monotonicity
  clocksource, acpi_pm.c: use proper read function also in errata mode
  ntp: fix calculation of the next jiffie to trigger RTC sync
  x86: HPET: read back compare register before reading counter
  x86: HPET fix moronic 32/64bit thinko
  clockevents: broadcast fixup possible waiters
  HPET: make minimum reprogramming delta useful
  clockevents: prevent endless loop lockup
  clockevents: prevent multiple init/shutdown
  clockevents: enforce reprogram in oneshot setup
  clockevents: prevent endless loop in periodic broadcast handler
  clockevents: prevent clockevent event_handler ending up handler_noop

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
Fix CONFIG_AC97_BUS dependency

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Probe initrd header only if explicitly specified
  [MIPS] TX39xx: Add missing local_flush_icache_range initialization
  [MIPS] TXx9: Fix txx9_pcode initialization
  [MIPS] Fix WARNING: at kernel/smp.c:290
  [MIPS] Fix data bus error recovery

Merge branch 'sched/cpuset' into sched/urgent

x86: cpu_init(): fix memory leak when using CPU hotplug

Exception stacks are allocated each time a CPU is set online.
But the allocated space is never freed. Thus with one CPU hotplug
offline/online cycle there is a memory leak of 24K (6 pages) for
a CPU.

Fix is to allocate exception stacks only once -- when the CPU is
set online for the first time.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: pda_init(): fix memory leak when using CPU hotplug

pda->irqstackptr is allocated whenever a CPU is set online.
But it is never freed. This results in a memory leak of 16K
for each CPU offline/online cycle.

Fix is to allocate pda->irqstackptr only once.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags

Using native_pte_val triggers the BUG_ON() in the paravirt_ops
version of pte_flags().

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: arch_reinit_sched_domains() must destroy domains to force rebuild

What I realized recently is that calling rebuild_sched_domains() in
arch_reinit_sched_domains() by itself is not enough when cpusets are enabled.
partition_sched_domains() code is trying to avoid unnecessary domain rebuilds
and will not actually rebuild anything if new domain masks match the old ones.

What this means is that doing
echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
on a system with cpusets enabled will not take affect untill something changes
in the cpuset setup (ie new sets created or deleted).

This patch fixes restore correct behaviour where domains must be rebuilt in
order to enable MC powersaving flags.

Test on quad-core Core2 box with both CONFIG_CPUSETS and !CONFIG_CPUSETS.
Also tested on dual-core Core2 laptop. Lockdep is happy and things are working
as expected.

Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: move mtrr cpu cap setting early in early_init_xxxx

Krzysztof Helt found MTRR is not detected on k6-2

root cause:
we moved mtrr_bp_init() early for mtrr trimming,
and in early_detect we only read the CPU capability from cpuid,
so some cpu doesn't have that bit in cpuid.

So we need to add early_init_xxxx to preset those bit before mtrr_bp_init
for those earlier cpus.

this patch is for v2.6.27

Reported-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86: delay early cpu initialization until cpuid is done

Move early cpu initialization after cpu early get cap so the
early cpu initialization can fix up cpu caps.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

clocksource, acpi_pm.c: check for monotonicity

The current check for monotonicity is way too weak: Andreas Mohr reports (
http://lkml.org/lkml/2008/8/10/77 ) that on one of his test systems the
current check only triggers in 50% of all cases, leading to catastrophic
timer behaviour. To fix this issue, expand the check for monotonicity by
doing ten consecutive tests instead of one.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

clocksource, acpi_pm.c: use proper read function also in errata mode

On all hardware (some Intel ICH4, PIIX4 and PIIX4E chipsets) affected by a
hardware errata there's about a 4.2% chance that initialization of the
ACPI PMTMR fails.  On those chipsets, we need to read out the timer value
at least three times to get a correct result, for every once in a while
(i.e.  within a 3 ns window every 69.8 ns) the read returns a bogus
result.  During normal operation we work around this issue, but during
initialization reading a bogus value may lead to -EINVAL even though the
hardware is usable.

Thanks to Andreas Mohr for spotting this issue.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

ntp: fix calculation of the next jiffie to trigger RTC sync

We have a bug in the calculation of the next jiffie to trigger the RTC
synchronisation.  The aim here is to run sync_cmos_clock() as close as
possible to the middle of a second.  Which means we want this function to
be called less than or equal to half a jiffie away from when now.tv_nsec
equals 5e8 (500000000).

If this is not the case for a given call to the function, for this purpose
instead of updating the RTC we calculate the offset in nanoseconds to the
next point in time where now.tv_nsec will be equal 5e8.  The calculated
offset is then converted to jiffies as these are the unit used by the
timer.

Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode,
where the resulting value is rounded up.  As a result the range of
now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC
rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2.

As a result if for example sync_cmos_clock() happens to be called at the
time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 +
TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the
same range of now.tv_nsec again.  Similarly for cases offsetted by an
integer multiple of TICK_NSEC.

This change addresses the problem by subtracting TICK_NSEC / 2 from the
nanosecond offset to the next point in time where now.tv_nsec will be
equal 5e8, effectively shifting the following rounding in
timespec_to_jiffies() so that it produces a rounded-to-nearest result.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

[ARM] 5241/1: provide ioremap_wc()

This patch provides an ARM implementation of ioremap_wc().

We use different page table attributes depending on which CPU we
are running on:

- Non-XScale ARMv5 and earlier systems: The ARMv5 ARM documents four
  possible mapping types (CB=00/01/10/11).  We can't use any of the
  cached memory types (CB=10/11), since that breaks coherency with
  peripheral devices.  Both CB=00 and CB=01 are suitable for _wc, and
  CB=01 (Uncached/Buffered) allows the hardware more freedom than
  CB=00, so we'll use that.

  (The ARMv5 ARM seems to suggest that CB=01 is allowed to delay stores
  but isn't allowed to merge them, but there is no other mapping type
  we can use that allows the hardware to delay and merge stores, so
  we'll go with CB=01.)

- XScale v1/v2 (ARMv5): same as the ARMv5 case above, with the slight
  difference that on these platforms, CB=01 actually _does_ allow
  merging stores.  (If you want noncoalescing bufferable behavior
  on Xscale v1/v2, you need to use XCB=101.)

- Xscale v3 (ARMv5) and ARMv6+: on these systems, we use TEXCB=00100
  mappings (Inner/Outer Uncacheable in xsc3 parlance, Uncached Normal
  in ARMv6 parlance).

  The ARMv6 ARM explicitly says that any accesses to Normal memory can
  be merged, which makes Normal memory more suitable for _wc mappings
  than Device or Strongly Ordered memory, as the latter two mapping
  types are guaranteed to maintain transaction number, size and order.
  We use the Uncached variety of Normal mappings for the same reason
  that we can't use C=1 mappings on ARMv5.

  The xsc3 Architecture Specification documents TEXCB=00100 as being
  Uncacheable and allowing coalescing of writes, which is also just
  what we need.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Fix CONFIG_AC97_BUS dependency

CONFIG_AC97_BUS is used from both sound and ucb1400 drivers.
The recent change in Kconfig introduced the exclusive dependency on
CONFIG_SOUND, and disabled the ucb1400 build without sound.
This patch makes CONFIG_AC97_BUS independent.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>

x86: HPET: read back compare register before reading counter

After fixing the u32 thinko I sill had occasional hickups on ATI chipsets
with small deltas. There seems to be a delay between writing the compare
register and the transffer to the internal register which triggers the
interrupt. Reading back the value makes sure, that it hit the internal
match register befor we compare against the counter value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: HPET fix moronic 32/64bit thinko

We use the HPET only in 32bit mode because:
1) some HPETs are 32bit only
2) on i386 there is no way to read/write the HPET atomic 64bit wide

The HPET code unification done by the "moron of the year" did
not take into account that unsigned long is different on 32 and
64 bit.

This thinko results in a possible endless loop in the clockevents
code, when the return comparison fails due to the 64bit/332bit
unawareness.

unsigned long cnt = (u32) hpet_read() + delta can wrap over 32bit.
but the final compare will fail and return -ETIME causing endless
loops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

clockevents: broadcast fixup possible waiters

Until the C1E patches arrived there where no users of periodic broadcast
before switching to oneshot mode. Now we need to trigger a possible
waiter for a periodic broadcast when switching to oneshot mode.
Otherwise we can starve them for ever.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: use X86_FEATURE_NOPL in alternatives

Use X86_FEATURE_NOPL to determine if it is safe to use P6 NOPs in
alternatives. Also, replace table and loop with simple if statement.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86: add NOPL as a synthetic CPU feature bit

The long noops ("NOPL") are supposed to be detected by family >= 6.
Unfortunately, several non-Intel x86 implementations, both hardware
and software, don't obey this dictum. Instead, probe for NOPL
directly by executing a NOPL instruction and see if we get #UD.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86: boot: stub out unimplemented CPU feature words

The CPU feature detection code in the boot code is somewhat minimal,
and doesn't include all possible CPUID words. In particular, it
doesn't contain the code for CPU feature words 2 (Transmeta),
3 (Linux-specific), 5 (VIA), or 7 (scattered). Zero them out, so we
can still set those bits as known at compile time; in particular, this
allows creating a Linux-specific NOPL flag and have it required (and
therefore resolvable at compile time) in 64-bit mode.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>

drivers/mmc/card/block.c: fix refcount leak in mmc_block_open()

mmc_block_open() increments md->usage although it returns with -EROFS when
default mounting a MMC/SD card with write protect switch on. This
reference counting bug prevents /dev/mmcblkX from being released on card
removal, and situation worsen with reinsertion until the minor number
range runs out.

Reported-by: <sasin@solomon-systech.com>
Acked-by: Pierre Ossman <drzeus-list@drzeus.cx>
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tracehook: comment pasto fixes

Fix some pasto's in comments in the new linux/tracehook.h and
asm-generic/syscall.h files.

Reported-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

atmel_lcdfb: fix oops in rmmod when framebuffer fails to register

If framebuffer registration failed in platform driver ->probe() callback,
dev_get_drvdata() points to freed memory region, but ->remove() function
try to use it and the following oops occurs:

Unable to handle kernel NULL pointer dereference at virtual address 00000228
pgd = c3a20000
[00000228] *pgd=23a2b031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1]
Modules linked in: atmel_lcdfb(-) cfbcopyarea cfbimgblt cfbfillrect [last unloaded: atmel_lcdfb]
CPU: 0    Not tainted  (2.6.27-rc2 #116)
PC is at atmel_lcdfb_remove+0x14/0xf8 [atmel_lcdfb]
LR is at platform_drv_remove+0x20/0x24
pc : [<bf006bc4>]    lr : [<c0157d28>]    psr: a0000013
sp : c3a45e84  ip : c3a45ea0  fp : c3a45e9c
r10: 00000002  r9 : c3a44000  r8 : c0026c04
r7 : 00000880  r6 : c02bb228  r5 : 00000000  r4 : c02bb230
r3 : bf007e3c  r2 : c02bb230  r1 : 00000004  r0 : c02bb228
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 23a20000  DAC: 00000015
Process rmmod (pid: 6799, stack limit = 0xc3a44260)
Stack: (0xc3a45e84 to 0xc3a46000)
5e80:          c02bb230 bf007e3c bf007e3c c3a45eac c3a45ea0 c0157d28 bf006bc0
5ea0: c3a45ec4 c3a45eb0 c0156d20 c0157d18 c02bb230 c02bb2d8 c3a45ee0 c3a45ec8
5ec0: c0156da8 c0156cb8 bf007e3c bf007ee0 c02c8e14 c3a45efc c3a45ee4 c0156018
5ee0: c0156d50 bf007e3c bf007ee0 00000000 c3a45f18 c3a45f00 c0157220 c0155f9c
5f00: 00000000 bf007ee0 bf008000 c3a45f28 c3a45f1c c0157e34 c01571ec c3a45f38
5f20: c3a45f2c bf006ba8 c0157e30 c3a45fa4 c3a45f3c c005772c bf006ba4 656d7461
5f40: 636c5f6c 00626664 c004c988 c3a45f80 c3a45f5c 00000000 c3a45fb0 00000000
5f60: ffffffff becaccd8 00000880 00000000 000a5e80 00000001 bf007ee0 00000880
5f80: c3a45f84 00000000 becaccd4 00000002 000003df 00000081 00000000 c3a45fa8
5fa0: c0026a60 c0057584 00000002 000003df 00900081 000a5e80 00000880 00000000
5fc0: becaccd4 00000002 000003df 00000000 000a5e80 00000001 00000002 0000005f
5fe0: 4004f5ec becacbe8 0001a158 4004f5fc 20000010 00900081 f9ffbadf 7bbfb2bb
Backtrace:
[<bf006bb0>] (atmel_lcdfb_remove+0x0/0xf8 [atmel_lcdfb]) from [<c0157d28>] (platform_drv_remove+0x20/0x24)
r6:bf007e3c r5:bf007e3c r4:c02bb230
[<c0157d08>] (platform_drv_remove+0x0/0x24) from [<c0156d20>] (__device_release_driver+0x78/0x98)
[<c0156ca8>] (__device_release_driver+0x0/0x98) from [<c0156da8>] (driver_detach+0x68/0x90)
r5:c02bb2d8 r4:c02bb230
[<c0156d40>] (driver_detach+0x0/0x90) from [<c0156018>] (bus_remove_driver+0x8c/0xb4)
r6:c02c8e14 r5:bf007ee0 r4:bf007e3c
[<c0155f8c>] (bus_remove_driver+0x0/0xb4) from [<c0157220>] (driver_unregister+0x44/0x48)
r6:00000000 r5:bf007ee0 r4:bf007e3c
[<c01571dc>] (driver_unregister+0x0/0x48) from [<c0157e34>] (platform_driver_unregister+0x14/0x18)
r6:bf008000 r5:bf007ee0 r4:00000000
[<c0157e20>] (platform_driver_unregister+0x0/0x18) from [<bf006ba8>] (atmel_lcdfb_exit+0x14/0x1c [atmel_lcdfb])
[<bf006b94>] (atmel_lcdfb_exit+0x0/0x1c [atmel_lcdfb]) from [<c005772c>] (sys_delete_module+0x1b8/0x22c)
[<c0057574>] (sys_delete_module+0x0/0x22c) from [<c0026a60>] (ret_fast_syscall+0x0/0x2c)
r7:00000081 r6:000003df r5:00000002 r4:becaccd4
Code: e92dd870 e24cb004 e59050c4 e1a06000 (e5954228)
---[ end trace 85476b184d9e68d8 ]---

This patch fixes the oops.

Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

forcedeth: fix kexec regression

Fix regression tracked as http://bugzilla.kernel.org/show_bug.cgi?id=11361
and caused by commit f735a2a1a4f2a0f5cd823ce323e82675990469e2 ("[netdrvr]
forcedeth: setup wake-on-lan before shutting down") that makes network
adapters integrated into the NVidia MCP55 chipsets fail to work in kexeced
kernels. The problem appears to be that if the adapter is put into D3_hot
during ->shutdown(), it cannot be brought back into D0 after kexec (ref.
http://marc.info/?l=linux-kernel&m=121900062814967&w=4). Therefore, only
put forcedeth into D3 during ->shutdown() if the system is to be powered
off.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Ayaz Abdulla <aabdulla@nvidia.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

res_counter: fix off-by-one bug in setting limit

I found we can no longer set limit to 0 with 2.6.27-rcX:
# mount -t cgroup -omemory xxx /mnt
# mkdir /mnt/0
# echo 0 > /mnt/0/memory.limit_in_bytes
bash: echo: write error: Device or resource busy

It turned out 'limit' can't be set to 'usage', which is wrong IMO.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: fix process time monotonicity
sched_clock: fix NOHZ interaction

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: add io delay quirk for Presario F700

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
async_tx: fix the bug in async_tx_run_dependencies

Merge git://git.infradead.org/~dwmw2/dwmw2-2.6.27

* git://git.infradead.org/~dwmw2/dwmw2-2.6.27:
  Revert "[ARM] use the new byteorder headers"
  Fix conditional export of kvh.h and a.out.h to userspace.
  [MTD] [NAND] tmio_nand: fix base address programming

Merge branch 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6

* 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  i2c: fix i2c-sh_mobile timing issues
  sh64: resume_kernel fix for kernel oops built with CONFIG_BKL_PREEMPT=y.
  sh: resume_kernel fix for kernel oops built with CONFIG_BKL_PREEMPT=y.
  sh: fix semtimedop syscall
  sh: update AP325RXA defconfig
  sh: update Migo-R defconfig
  sh: fix platform_resource_setup_memory() section mismatch
  sh: fix kexec entry point for crash kernels
  sh: crash kernel resource fix
  sh: fix ptrace_64.c:user_disable_single_step()
  sh64: re-add the __strnlen_user() prototype

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (98 commits)
  V4L/DVB (8881): gspca: After 'while (retry--) {...}', retry will be -1 but not 0.
  V4L/DVB (8880): PATCH: Fix parents on some webcam drivers
  V4L/DVB (8877): b2c2 and bt8xx: udelay to mdelay
  V4L/DVB (8876): budget: udelay changed to mdelay
  V4L/DVB (8874): gspca: Adjust hstart for sn9c103/ov7630 and update usb-id's.
  V4L/DVB (8873): gspca: Bad image offset with rev012a of spca561 and adjust exposure.
  V4L/DVB (8872): gspca: Bad image format and offset with rev072a of spca561.
  V4L/DVB (8870): gspca: Fix dark room problem with sonixb.
  V4L/DVB (8869): gspca: Move the Sonix webcams with TAS5110C1B from sn9c102 to gspca.
  V4L/DVB (8868): gspca: Support for vga modes with sif sensors in sonixb.
  V4L/DVB (8844): dabusb_fpga_download(): fix a memory leak
  V4L/DVB (8843): tda10048_firmware_upload(): fix a memory leak
  V4L/DVB (8842): vivi_release(): fix use-after-free
  V4L/DVB (8840): dib0700: add basic support for Hauppauge Nova-TD-500 (84xxx)
  V4L/DVB (8839): dib0700: add comment to identify 35th USB id pair
  V4L/DVB (8837): dvb: fix I2C adapters name size
  V4L/DVB (8835): gspca: Same pixfmt as the sn9c102 driver and raw Bayer added in sonixb.
  V4L/DVB (8834): gspca: Have a bigger buffer for sn9c10x compressed images.
  V4L/DVB (8833): gspca: Cleanup the sonixb code.
  V4L/DVB (8832): gspca: Bad pixelformat of vc0321 webcams.
  ...

Merge branch 'core/debugobjects' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core/debugobjects' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
debugobjects: fix lockdep warning

Merge branch 'release-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-acpi-2.6

* 'release-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-acpi-2.6:
  acer-wmi: remove debugfs entries upon unloading
  ACPI: Avoid bogus timeout about SMbus check
  fujitsu-laptop: fix regression for P8010 in 2.6.27-rc
  ACPI: Make Len Brown the ACPI maintainer again
  ACPI: thinkpad-acpi: wan radio control is not experimental
  PNPACPI: ignore the producer/consumer bit for extended IRQ descriptors
  acpi: add checking for NULL early param
  ACPI: Fix typo in "Disable MWAIT via DMI on broken Compal board"
  ACPI: Fix now signed module parameter.
  ACPI: Change package length error to warning
  ACPI: Fix now signed module parameter.

[MIPS] Probe initrd header only if explicitly specified

Currently init_initrd() probes initrd header at the last page of kernel
image, but it is valid only if addinitrd was used. If addinitrd was not
used, the area contains garbage so probing there might misdetect initrd
header (magic number is not strictly robust).

This patch introduces CONFIG_PROBE_INITRD_HEADER to explicitly enable this
probing.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] TX39xx: Add missing local_flush_icache_range initialization

Commmit 59e39ecd933ba49eb6efe84cbfa5597a6c9ef18a ("Fix WARNING: at
kernel/smp.c:290") introduced local_flush_icache_range but lacks
initialization for some TX39 case.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] TXx9: Fix txx9_pcode initialization

The txx9_pcode variable was introduced in commit
fe1c2bc64f65003b39f331a8e4b0d15b235a4afd ("TXx9: Add 64-bit support")
but was not initialized properly.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix WARNING: at kernel/smp.c:290

trap_init issues flush_icache_range(), which uses ipi functions to
get icache flushing done on all cpus. But this is done before interrupts
are enabled and caused WARN_ON messages. This changeset introduces
a new local_flush_icache_range() and uses it before interrupts (and
additional CPUs) are enabled to avoid this problem.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix data bus error recovery

With -ffunction-section the entries in __dbe_table aren't no longer
sorted, so the lookup of exception addresses in do_be() failed for
some addresses. To avoid this we now sort __dbe_table.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Revert "mac80211: Use IWEVASSOCREQIE instead of IWEVCUSTOM"

This reverts commit 087d833e5a9f67ba933cb32eaf5a2279c1a5b47c, which was
reported to break wireless at least in some combinations with 32bit user
space and a 64bit kernel. Alex Williamnson bisected it to this commit.

Reported-and-bisected-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: John W. Linville <linville@tuxdriver.com>
Cc: David Miller <davem@davemloft.net>
Cc: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

UBIFS: make minimum fanout 3

UBIFS does not really work correctly when fanout is 2,
because of the way we manage the indexing tree. It may
just become a list and UBIFS screws up.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBIFS: fix division by zero

If fanout is 3, we have division by zero in
'ubifs_read_superblock()':

divide error: 0000 [#1] PREEMPT SMP

Pid: 28744, comm: mount Not tainted (2.6.27-rc4-ubifs-2.6 #23)
EIP: 0060:[<f8f9e3ef>] EFLAGS: 00010202 CPU: 0
EIP is at ubifs_reported_space+0x2d/0x69 [ubifs]
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: f0ae64b0 EBP: f1f9fcf4 ESP: f1f9fce0
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

sched: fix process time monotonicity

Spencer reported a problem where utime and stime were going negative despite
the fixes in commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. The suspected
reason for the problem is that signal_struct maintains it's own utime and
stime (of exited tasks), these are not updated using the new task_utime()
routine, hence sig->utime can go backwards and cause the same problem
to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
fixes the problem

TODO: using max(task->prev_utime, derived utime) works for now, but a more
generic solution is to implement cputime_max() and use the cputime_gt()
function for comparison.

Reported-by: spencer@bluehost.com
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched_clock: fix NOHZ interaction

If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
actual process times. Fix this by re-calibrating the clock against GTOD when
leaving nohz mode.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Revert "[ARM] use the new byteorder headers"

This reverts commit ae82cbfc8beaa69007aa09966d3983ac938c3577. It
needs the new byteorder headers to be exported to userspace, and
they aren't yet -- and probably shouldn't be, at this point in the
2.6.27 release cycle (or ever, for that matter).

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>