]> pilppa.com Git - linux-2.6-omap-h63xx.git/log
linux-2.6-omap-h63xx.git
18 years ago[PATCH] cpuset: update_nodemask code reformat
Paul Jackson [Sun, 8 Jan 2006 09:01:52 +0000 (01:01 -0800)]
[PATCH] cpuset: update_nodemask code reformat

Restructure code layout of the kernel/cpuset.c update_nodemask() routine,
removing embedded returns and nested if's in favor of goto completion labels.
This is being done in anticipation of adding more logic to this routine, which
will favor the goto style structure.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cpuset: minor spacing initializer fixes
Paul Jackson [Sun, 8 Jan 2006 09:01:51 +0000 (01:01 -0800)]
[PATCH] cpuset: minor spacing initializer fixes

Four trivial cpuset fixes: remove extra spaces, remove useless initializers,
mark one __read_mostly.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cpuset: remove marker_pid documentation
Paul Jackson [Sun, 8 Jan 2006 09:01:51 +0000 (01:01 -0800)]
[PATCH] cpuset: remove marker_pid documentation

Remove documentation for the cpuset 'marker_pid' feature, that was in the
patch "cpuset: change marker for relative numbering" That patch was previously
pulled from *-mm at my (pj) request.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cpuset: document additional features
Paul Jackson [Sun, 8 Jan 2006 09:01:50 +0000 (01:01 -0800)]
[PATCH] cpuset: document additional features

Document the additional cpuset features:
notify_on_release
marker_pid
memory_pressure
memory_pressure_enabled

Rearrange and improve formatting of existing documentation for
cpu_exclusive and mem_exclusive features.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cpuset: memory pressure meter
Paul Jackson [Sun, 8 Jan 2006 09:01:49 +0000 (01:01 -0800)]
[PATCH] cpuset: memory pressure meter

Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
that the tasks in a cpuset call try_to_free_pages(), the synchronous
(direct) memory reclaim code.

This enables batch managers monitoring jobs running in dedicated cpusets to
efficiently detect what level of memory pressure that job is causing.

This is useful both on tightly managed systems running a wide mix of
submitted jobs, which may choose to terminate or reprioritize jobs that are
trying to use more memory than allowed on the nodes assigned them, and with
tightly coupled, long running, massively parallel scientific computing jobs
that will dramatically fail to meet required performance goals if they
start to use more memory than allowed to them.

This patch just provides a very economical way for the batch manager to
monitor a cpuset for signs of memory pressure.  It's up to the batch
manager or other user code to decide what to do about it and take action.

==> Unless this feature is enabled by writing "1" to the special file
    /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
    code of __alloc_pages() for this metric reduces to simply noticing
    that the cpuset_memory_pressure_enabled flag is zero.  So only
    systems that enable this feature will compute the metric.

Why a per-cpuset, running average:

    Because this meter is per-cpuset, rather than per-task or mm, the
    system load imposed by a batch scheduler monitoring this metric is
    sharply reduced on large systems, because a scan of the tasklist can be
    avoided on each set of queries.

    Because this meter is a running average, instead of an accumulating
    counter, a batch scheduler can detect memory pressure with a single
    read, instead of having to read and accumulate results for a period of
    time.

    Because this meter is per-cpuset rather than per-task or mm, the
    batch scheduler can obtain the key information, memory pressure in a
    cpuset, with a single read, rather than having to query and accumulate
    results over all the (dynamically changing) set of tasks in the cpuset.

A per-cpuset simple digital filter (requires a spinlock and 3 words of data
per-cpuset) is kept, and updated by any task attached to that cpuset, if it
enters the synchronous (direct) page reclaim code.

A per-cpuset file provides an integer number representing the recent
(half-life of 10 seconds) rate of direct page reclaims caused by the tasks
in the cpuset, in units of reclaims attempted per second, times 1000.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cpuset: mempolicy one more nodemask conversion
Paul Jackson [Sun, 8 Jan 2006 09:01:47 +0000 (01:01 -0800)]
[PATCH] cpuset: mempolicy one more nodemask conversion

Finish converting mm/mempolicy.c from bitmaps to nodemasks.  The previous
conversion had left one routine using bitmaps, since it involved a
corresponding change to kernel/cpuset.c

Fix that interface by replacing with a simple macro that calls nodes_subset(),
or if !CONFIG_CPUSET, returns (1).

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <christoph@lameter.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cpuset: better bitmap remap defaults
Paul Jackson [Sun, 8 Jan 2006 09:01:46 +0000 (01:01 -0800)]
[PATCH] cpuset: better bitmap remap defaults

Fix the default behaviour for the remap operators in bitmap, cpumask and
nodemask.

As previously submitted, the pair of masks <A, B> defined a map of the
positions of the set bits in A to the corresponding bits in B.  This is still
true.

The issue is how to map the other positions, corresponding to the unset (0)
bits in A.  As previously submitted, they were all mapped to the first set bit
position in B, a constant map.

When I tried to code per-vma mempolicy rebinding using these remap operators,
I realized this was wrong.

This patch changes the default to map all the unset bit positions in A to the
same positions in B, the identity map.

For example, if A has bits 4-7 set, and B has bits 9-12 set, then the map
defined by the pair <A, B> maps each bit position in the first 32 bits as
follows:

0 ==> 0
  ...
3 ==> 3
4 ==> 9
  ...
7 ==> 12
8 ==> 8
9 ==> 9
  ...
31 ==> 31

This now corresponds to the typical behaviour desired when migrating pages and
policies from one cpuset to another.

The pages on nodes within the original cpuset, and the references in memory
policies to nodes within the original cpuset, are migrated to the
corresponding cpuset-relative nodes in the destination cpuset.  Other pages
and node references are left untouched.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slob: introduce the SLOB allocator
Matt Mackall [Sun, 8 Jan 2006 09:01:45 +0000 (01:01 -0800)]
[PATCH] slob: introduce the SLOB allocator

configurable replacement for slab allocator

This adds a CONFIG_SLAB option under CONFIG_EMBEDDED.  When CONFIG_SLAB is
disabled, the kernel falls back to using the 'SLOB' allocator.

SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer,
similar to the original Linux kmalloc allocator that SLAB replaced.  It's
signicantly smaller code and is more memory efficient.  But like all
similar allocators, it scales poorly and suffers from fragmentation more
than SLAB, so it's only appropriate for small systems.

It's been tested extensively in the Linux-tiny tree.  I've also
stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not
recommended).

Here's a comparison for otherwise identical builds, showing SLOB saving
nearly half a megabyte of RAM:

$ size vmlinux*
   text    data     bss     dec     hex filename
3336372  529360  190812 4056544  3de5e0 vmlinux-slab
3323208  527948  190684 4041840  3dac70 vmlinux-slob

$ size mm/{slab,slob}.o
   text    data     bss     dec     hex filename
  13221     752      48   14021    36c5 mm/slab.o
   1896      52       8    1956     7a4 mm/slob.o

/proc/meminfo:
                  SLAB          SLOB      delta
MemTotal:        27964 kB      27980 kB     +16 kB
MemFree:         24596 kB      25092 kB    +496 kB
Buffers:            36 kB         36 kB       0 kB
Cached:           1188 kB       1188 kB       0 kB
SwapCached:          0 kB          0 kB       0 kB
Active:            608 kB        600 kB      -8 kB
Inactive:          808 kB        812 kB      +4 kB
HighTotal:           0 kB          0 kB       0 kB
HighFree:            0 kB          0 kB       0 kB
LowTotal:        27964 kB      27980 kB     +16 kB
LowFree:         24596 kB      25092 kB    +496 kB
SwapTotal:           0 kB          0 kB       0 kB
SwapFree:            0 kB          0 kB       0 kB
Dirty:               4 kB         12 kB      +8 kB
Writeback:           0 kB          0 kB       0 kB
Mapped:            560 kB        556 kB      -4 kB
Slab:             1756 kB          0 kB   -1756 kB
CommitLimit:     13980 kB      13988 kB      +8 kB
Committed_AS:     4208 kB       4208 kB       0 kB
PageTables:         28 kB         28 kB       0 kB
VmallocTotal:  1007312 kB    1007312 kB       0 kB
VmallocUsed:        48 kB         48 kB       0 kB
VmallocChunk:  1007264 kB    1007264 kB       0 kB

(this work has been sponsored in part by CELF)

From: Ingo Molnar <mingo@elte.hu>

   Fix 32-bitness bugs in mm/slob.c.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slob: introduce mm/util.c for shared functions
Matt Mackall [Sun, 8 Jan 2006 09:01:43 +0000 (01:01 -0800)]
[PATCH] slob: introduce mm/util.c for shared functions

Add mm/util.c for functions common between SLAB and SLOB.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] DEBUG_SLAB depends on SLAB
Ingo Molnar [Sun, 8 Jan 2006 09:01:42 +0000 (01:01 -0800)]
[PATCH] DEBUG_SLAB depends on SLAB

Make DEBUG_SLAB depend on SLAB.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] radix-tree: reduce tree height upon partial truncation
Nick Piggin [Sun, 8 Jan 2006 09:01:41 +0000 (01:01 -0800)]
[PATCH] radix-tree: reduce tree height upon partial truncation

Shrink the height of a radix tree when it is partially truncated - we only do
shrinkage of full truncation at present.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] radix tree: early termination of tag clearing
Nick Piggin [Sun, 8 Jan 2006 09:01:41 +0000 (01:01 -0800)]
[PATCH] radix tree: early termination of tag clearing

Correctly determine the tags to be cleared in radix_tree_delete() so we
don't keep moving up the tree clearing tags that we don't need to.  For
example, if a tag is simply not set in the deleted item, nor anywhere up
the tree, radix_tree_delete() would attempt to clear it up the entire
height of the tree.

Also, tag_set() was made conditional so as not to dirty too many cachelines
high up in the radix tree.  Instead, put this logic into
radix_tree_tag_set().

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] radix tree: code consolidation
Nick Piggin [Sun, 8 Jan 2006 09:01:40 +0000 (01:01 -0800)]
[PATCH] radix tree: code consolidation

Introduce helper any_tag_set() rather than repeat the same code sequence 4
times.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] remove get_task_struct_rcu()
Paul E. McKenney [Sun, 8 Jan 2006 09:01:39 +0000 (01:01 -0800)]
[PATCH] remove get_task_struct_rcu()

The latest set of signal-RCU patches does not use get_task_struct_rcu().
Attached is a patch that removes it.

Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Simpler signal-exit concurrency handling
Paul E. McKenney [Sun, 8 Jan 2006 09:01:38 +0000 (01:01 -0800)]
[PATCH] Simpler signal-exit concurrency handling

Some simplification in checking signal delivery against concurrent exit.
Instead of using get_task_struct_rcu(), which increments the task_struct
reference count, check the reference count after acquiring sighand lock.

Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] RCU signal handling
Ingo Molnar [Sun, 8 Jan 2006 09:01:37 +0000 (01:01 -0800)]
[PATCH] RCU signal handling

RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
instead of tasklist_lock read-locked.  This is a scalability improvement on
SMP and a preemption-latency improvement under PREEMPT_RCU.

Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Make RCU task_struct safe for oprofile
Paul E. McKenney [Sun, 8 Jan 2006 09:01:35 +0000 (01:01 -0800)]
[PATCH] Make RCU task_struct safe for oprofile

Applying RCU to the task structure broke oprofile, because
free_task_notify() can now be called from softirq.  This means that the
task_mortuary lock must be acquired with irq disabled in order to avoid
intermittent self-deadlock.  Since irq is now disabled, the critical
section within process_task_mortuary() has been restructured to be O(1) in
order to maximize scalability and minimize realtime latency degradation.

Kudos to Wu Fengguang for finding this problem!

CC: Wu Fengguang <wfg@mail.ustc.edu.cn>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: John Levon <levon@movementarian.org>
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: prevent MODE_SKAS=n and MODE_TT=n
Adrian Bunk [Sun, 8 Jan 2006 09:01:34 +0000 (01:01 -0800)]
[PATCH] uml: prevent MODE_SKAS=n and MODE_TT=n

If MODE_TT=n, MODE_SKAS must be y.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: whitespace cleanup
Jeff Dike [Sun, 8 Jan 2006 09:01:33 +0000 (01:01 -0800)]
[PATCH] uml: whitespace cleanup

This fixes some mangled whitespace added by the earlier trap_user.c patch.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] consolidate asm/futex.h
Jeff Dike [Sun, 8 Jan 2006 09:01:32 +0000 (01:01 -0800)]
[PATCH] consolidate asm/futex.h

Most of the architectures have the same asm/futex.h.  This consolidates them
into asm-generic, with the arches including it from their own asm/futex.h.

In the case of UML, this reverts the old broken futex.h and goes back to using
the same one as almost everyone else.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: merge trap_user.c and trap_kern.c
Gennady Sharapov [Sun, 8 Jan 2006 09:01:32 +0000 (01:01 -0800)]
[PATCH] uml: merge trap_user.c and trap_kern.c

The serial UML OS-abstraction layer patch (um/kernel dir).

This joins trap_user.c and trap_kernel.c files.

Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: move libc-dependent code from trap_user.c
Gennady Sharapov [Sun, 8 Jan 2006 09:01:31 +0000 (01:01 -0800)]
[PATCH] uml: move libc-dependent code from trap_user.c

The serial UML OS-abstraction layer patch (um/kernel dir).

This moves all systemcalls from trap_user.c file under os-Linux dir

Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: move libc-dependent code from signal_user.c
Gennady Sharapov [Sun, 8 Jan 2006 09:01:29 +0000 (01:01 -0800)]
[PATCH] uml: move libc-dependent code from signal_user.c

The serial UML OS-abstraction layer patch (um/kernel dir).

This moves all systemcalls from signal_user.c file under os-Linux dir

Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] ARM: Netwinder ds1620 driver needs an export to be built as module
Woody Suwalski [Sun, 8 Jan 2006 09:01:29 +0000 (01:01 -0800)]
[PATCH] ARM: Netwinder ds1620 driver needs an export to be built as module

ds1620 module is using gpio_read symbol, so works only if "built-in" symbol
needs to be exported from the kernel image

Signed-off-by: Woody Suwalski <woodys@xandros.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Kill L1_CACHE_SHIFT_MAX
Ravikiran G Thirumalai [Sun, 8 Jan 2006 09:01:28 +0000 (01:01 -0800)]
[PATCH] Kill L1_CACHE_SHIFT_MAX

Kill L1_CACHE_SHIFT from all arches.  Since L1_CACHE_SHIFT_MAX is not used
anymore with the introduction of INTERNODE_CACHE, kill L1_CACHE_SHIFT_MAX.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp macros
Ravikiran G Thirumalai [Sun, 8 Jan 2006 09:01:27 +0000 (01:01 -0800)]
[PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp macros

____cacheline_maxaligned_in_smp is currently used to align critical structures
and avoid false sharing.  It uses per-arch L1_CACHE_SHIFT_MAX and people find
L1_CACHE_SHIFT_MAX useless.

However, we have been using ____cacheline_maxaligned_in_smp to align
structures on the internode cacheline size.  As per Andi's suggestion,
following patch kills ____cacheline_maxaligned_in_smp and introduces
INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches.
Arches needing L3/Internode cacheline alignment can define
INTERNODE_CACHE_SHIFT in the arch asm/cache.h.  Patch replaces
____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp

With this patch, L1_CACHE_SHIFT_MAX can be killed

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: fix uninitialised variable in serverworks driver
David Howells [Sun, 8 Jan 2006 09:01:26 +0000 (01:01 -0800)]
[PATCH] frv: fix uninitialised variable in serverworks driver

Fix an uninitialised variable warning in the serverworks driver.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: fix uninitialised variable in atm nicstar driver
David Howells [Sun, 8 Jan 2006 09:01:25 +0000 (01:01 -0800)]
[PATCH] frv: fix uninitialised variable in atm nicstar driver

Fix an uninitialised variable warning in the atm nicstar driver.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: miscellaneous changes
David Howells [Sun, 8 Jan 2006 09:01:25 +0000 (01:01 -0800)]
[PATCH] frv: miscellaneous changes

Fix a number of miscellanous items:

 (1) Declare lock sections in the linker script.

 (2) Recurse in the correct manner in the arch makefile.

 (3) asm/bug.h requires asm/linkage.h to be included first. One C file puts
     asm/bug.h first.

 (4) Add an empty RTC header file to avoid missing header file errors.

 (5) sg_dma_address() should use the dma_address member of a scatter list.

 (6) Add trivial pci_unmap support.

 (7) Add pgprot_noncached()

 (8) Discard u_quad_t.

 (9) Use ~0UL rather than ULONG_MAX in unistd.h in case the latter isn't
     declared.

(10) Add an empty VGA header file to avoid missing header file errors.

(11) Add an XOR header file to use the generic XOR stuff.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: make get_user macro cast pointers
David Howells [Sun, 8 Jan 2006 09:01:24 +0000 (01:01 -0800)]
[PATCH] frv: make get_user macro cast pointers

Make the get_user macro cast the source pointer to an appropriate type for the
specified size.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: force serial driver inclusion
David Howells [Sun, 8 Jan 2006 09:01:23 +0000 (01:01 -0800)]
[PATCH] frv: force serial driver inclusion

Force the 8230 serial driver to be built in if the on-CPU UARTs are to be
used.  It can't be used as a module because the arch setup needs to call into
it.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: fix PCMCIA configuration
David Howells [Sun, 8 Jan 2006 09:01:22 +0000 (01:01 -0800)]
[PATCH] frv: fix PCMCIA configuration

Fix PCMCIA configuration for FRV by including the stock PCMCIA configuration
description file.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: add pci_iomap
David Howells [Sun, 8 Jan 2006 09:01:22 +0000 (01:01 -0800)]
[PATCH] frv: add pci_iomap

Implement pci_iomap() for FRV.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: add module support stubs
David Howells [Sun, 8 Jan 2006 09:01:21 +0000 (01:01 -0800)]
[PATCH] frv: add module support stubs

Add stubs for FRV module support.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: supply various missing I/O access primitives
David Howells [Sun, 8 Jan 2006 09:01:20 +0000 (01:01 -0800)]
[PATCH] frv: supply various missing I/O access primitives

Supply various I/O access primitives that are missing for the FRV arch:

 (*) mmiowb()

 (*) read*_relaxed()

 (*) ioport_*map()

 (*) ioread*(), iowrite*(), ioread*_rep() and iowrite*_rep()

 (*) pci_io*map()

 (*) check_signature()

The patch also makes __is_PCI_addr() more efficient.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: support module exception tables
David Howells [Sun, 8 Jan 2006 09:01:19 +0000 (01:01 -0800)]
[PATCH] frv: support module exception tables

Fix the exception table handling so that modules exceptions are dealt with.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: implement and export various things required by modules
David Howells [Sun, 8 Jan 2006 09:01:19 +0000 (01:01 -0800)]
[PATCH] frv: implement and export various things required by modules

Export a number of features required to build all the modules.  It also
implements the following simple features:

 (*) csum_partial_copy_from_user() for MMU as well as no-MMU.

 (*) __ucmpdi2().

so that they can be exported too.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: drop unsupported debugging features
David Howells [Sun, 8 Jan 2006 09:01:18 +0000 (01:01 -0800)]
[PATCH] frv: drop unsupported debugging features

Drop support for debugging features that aren't supported on FRV:

 (*) EARLY_PRINTK

The on-chip UARTs are set up early enough that this isn't required,
and VGA support isn't available. There's also a gdbstub available.

 (*) DEBUG_PAGEALLOC

This can't be easily be done since we use huge static mappings to
cover the kernel, not pages.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: drop 8/16-bit xchg and cmpxchg
David Howells [Sun, 8 Jan 2006 09:01:17 +0000 (01:01 -0800)]
[PATCH] frv: drop 8/16-bit xchg and cmpxchg

Drop support for 8-bit and 16-bit xchg and cmpxchg emulation and implements
32-bit xchg with the SWAP/SWAPI instruction.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] frv: suppress configuration of certain features for FRV
David Howells [Sun, 8 Jan 2006 09:01:16 +0000 (01:01 -0800)]
[PATCH] frv: suppress configuration of certain features for FRV

Suppress configuration of certain features for the FRV arch as they can't be
built for FRV at the moment:

 (*) RTC

 (*) HISAX_*

 (*) PARPORT_PC

 (*) VGA_CONSOLE

 (*) BINFMT_ELF

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cs89x0: fix up after pnx0105 Kconfig symbol renaming
Lennert Buytenhek [Sun, 8 Jan 2006 09:01:14 +0000 (01:01 -0800)]
[PATCH] cs89x0: fix up after pnx0105 Kconfig symbol renaming

The Kconfig symbol for pnx0105 was recently renamed to ARCH_PNX010X.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: dmitry pervushin <dpervushin@ru.mvista.com>
Cc: <dsaxena@plexity.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix Kconfig depends for cs89x0 (PNX010X support)
Lennert Buytenhek [Sun, 8 Jan 2006 09:01:13 +0000 (01:01 -0800)]
[PATCH] fix Kconfig depends for cs89x0 (PNX010X support)

PNX010X support for CS89x0 should be conditional on NET_PCI, as it is an 'on
board controller' and NET_PCI includes that category of NICs.  Since
ARCH_PNX0105 was recently changed to ARCH_PNX010X, incorporate that change as
well while we're at it.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: dmitry pervushin <dpervushin@ru.mvista.com>
Cc: <dsaxena@plexity.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cs89x0: switch {in,out}sw to {read,write}words
Lennert Buytenhek [Sun, 8 Jan 2006 09:01:12 +0000 (01:01 -0800)]
[PATCH] cs89x0: switch {in,out}sw to {read,write}words

Implement readwords/writewords that use readword/writeword, and switch the
rest of the driver over to use these.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: dmitry pervushin <dpervushin@ru.mvista.com>
Cc: <dsaxena@plexity.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cs89x0: cleanly implement ixdp2x01 and pnx0501 support
Lennert Buytenhek [Sun, 8 Jan 2006 09:01:11 +0000 (01:01 -0800)]
[PATCH] cs89x0: cleanly implement ixdp2x01 and pnx0501 support

Implement suitable versions of the readword/writeword macros for ixdp2x01 and
pnx0501.  Handle the 32-bit spacing of the registers in these functions
instead of in the header file.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: dmitry pervushin <dpervushin@ru.mvista.com>
Cc: <dsaxena@plexity.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cs89x0: make {read,write}reg use {read,write}word
Lennert Buytenhek [Sun, 8 Jan 2006 09:01:10 +0000 (01:01 -0800)]
[PATCH] cs89x0: make {read,write}reg use {read,write}word

Make readreg/writereg use readword/writeword.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: dmitry pervushin <dpervushin@ru.mvista.com>
Cc: <dsaxena@plexity.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cs89x0: swap {read,write}reg and {read,write}word
Lennert Buytenhek [Sun, 8 Jan 2006 09:01:09 +0000 (01:01 -0800)]
[PATCH] cs89x0: swap {read,write}reg and {read,write}word

Reverse the order of readreg/writereg and readword/writeword in the
file, so that we can make readreg/writereg use readword/writeword.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: dmitry pervushin <dpervushin@ru.mvista.com>
Cc: <dsaxena@plexity.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cs89x0: convert {inw,outw} calls to {read,write}word
Lennert Buytenhek [Sun, 8 Jan 2006 09:01:08 +0000 (01:01 -0800)]
[PATCH] cs89x0: convert {inw,outw} calls to {read,write}word

Switch all occurences of inw/outw in the driver over to readword/writeword.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: dmitry pervushin <dpervushin@ru.mvista.com>
Cc: <dsaxena@plexity.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cs89x0: make {read,write}word take base_addr
Lennert Buytenhek [Sun, 8 Jan 2006 09:01:06 +0000 (01:01 -0800)]
[PATCH] cs89x0: make {read,write}word take base_addr

readword() and writeword() take a 'struct net_device *' and deref its
->base_addr member.  Make them take the base_addr directly instead, so
that we can switch the other occurences of inw/outw in the file over
to readword/writeword as well.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: dmitry pervushin <dpervushin@ru.mvista.com>
Cc: <dsaxena@plexity.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Optimise oom kill of current task
Kirill Korotaev [Sun, 8 Jan 2006 09:01:05 +0000 (01:01 -0800)]
[PATCH] Optimise oom kill of current task

When oom_killer kills current there's no need to call
schedule_timeout_interruptible() since task must die ASAP.

Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Move page migration related functions near do_migrate_pages()
Christoph Lameter [Sun, 8 Jan 2006 09:01:04 +0000 (01:01 -0800)]
[PATCH] Move page migration related functions near do_migrate_pages()

Group page migration functions in mempolicy.c

Add a forward declaration for migrate_page_add (like gather_stats()) and use
our new found mobility to group all page migration related function around
do_migrate_pages().

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] mempolicies: unexport get_vma_policy()
Christoph Lameter [Sun, 8 Jan 2006 09:01:03 +0000 (01:01 -0800)]
[PATCH] mempolicies: unexport get_vma_policy()

Since the numa_maps functionality is now in mempolicy.c we no longer need to
export get_vma_policy().

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Drop page table lock before calling migrate_page_add()
Christoph Lameter [Sun, 8 Jan 2006 09:01:02 +0000 (01:01 -0800)]
[PATCH] Drop page table lock before calling migrate_page_add()

migrate_page_add cannot be called with a spinlock held (calls
isolate_lru_page which calles schedule_on_each_cpu).  Drop ptl lock in
check_pte_range before calling migrate_page_add().

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Fold numa_maps into mempolicies.c
Christoph Lameter [Sun, 8 Jan 2006 09:01:02 +0000 (01:01 -0800)]
[PATCH] Fold numa_maps into mempolicies.c

First discussed at http://marc.theaimsgroup.com/?t=113149255100001&r=1&w=2

- Use the check_range() in mempolicy.c to gather statistics.

- Improve the numa_maps code in general and fix some comments.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] mempolicies: private pointer in check_range and MPOL_MF_INVERT
Christoph Lameter [Sun, 8 Jan 2006 09:01:01 +0000 (01:01 -0800)]
[PATCH] mempolicies: private pointer in check_range and MPOL_MF_INVERT

This was was first posted at
http://marc.theaimsgroup.com/?l=linux-mm&m=113149240227584&w=2

(Part of this functionality is also contained in the direct migration
pathset. The functionality here is more generic and independent of that
patchset.)

- Add internal flags MPOL_MF_INVERT to control check_range() behavior.

- Replace the pagelist passed through by check_range by a general
  private pointer that may be used for other purposes.
  (The following patches will use that to merge numa_maps into
  mempolicy.c and to better group the page migration code in
  the policy layer)

- Improve some comments.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] rmap: additional diagnostics in page_remove_rmap()
Dave Jones [Sun, 8 Jan 2006 09:01:00 +0000 (01:01 -0800)]
[PATCH] rmap: additional diagnostics in page_remove_rmap()

We seem to be hitting this assertion failure too often for it to be
hardware bugs.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] mm: clean up local variables
Tobias Klauser [Sun, 8 Jan 2006 09:00:59 +0000 (01:00 -0800)]
[PATCH] mm: clean up local variables

Clean up a local variable with the same name as a variable in a larger
block.  Also move a variable into the block where it's actually used.

Spotted by http://linuxicc.sourceforge.net/

Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] set_page_count() macro safety
Avishay Traeger [Sun, 8 Jan 2006 09:00:58 +0000 (01:00 -0800)]
[PATCH] set_page_count() macro safety

Fix set_page_count() macro to handle complex arguments.

Signed-off-by: Avishay Traeger <atraeger@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] mm: make hugepages obey cpusets.
Christoph Lameter [Sun, 8 Jan 2006 09:00:57 +0000 (01:00 -0800)]
[PATCH] mm: make hugepages obey cpusets.

See http://marc.theaimsgroup.com/?l=linux-kernel&m=113167000201265&w=2
http://marc.theaimsgroup.com/?l=linux-mm&m=113167267527312&w=2

Make hugepages obey cpusets.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cpusets: swap migration interface
Paul Jackson [Sun, 8 Jan 2006 09:00:56 +0000 (01:00 -0800)]
[PATCH] cpusets: swap migration interface

Add a boolean "memory_migrate" to each cpuset, represented by a file
containing "0" or "1" in each directory below /dev/cpuset.

It defaults to false (file contains "0").  It can be set true by writing
"1" to the file.

If true, then anytime that a task is attached to the cpuset so marked, the
pages of that task will be moved to that cpuset, preserving, to the extent
practical, the cpuset-relative placement of the pages.

Also anytime that a cpuset so marked has its memory placement changed (by
writing to its "mems" file), the tasks in that cpuset will have their pages
moved to the cpusets new nodes, preserving, to the extent practical, the
cpuset-relative placement of the moved pages.

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SwapMig: Switch error handling in migrate_pages to use -Exx
Christoph Lameter [Sun, 8 Jan 2006 09:00:55 +0000 (01:00 -0800)]
[PATCH] SwapMig: Switch error handling in migrate_pages to use -Exx

Use -Exxx instead of numeric return codes and cleanup the code in
migrate_pages() using -Exx error codes.

Consolidate successful migration handling

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SwapMig: Extend parameters for migrate_pages()
Christoph Lameter [Sun, 8 Jan 2006 09:00:55 +0000 (01:00 -0800)]
[PATCH] SwapMig: Extend parameters for migrate_pages()

Extend the parameters of migrate_pages() to allow the caller control over the
fate of successfully migrated or impossible to migrate pages.

Swap migration and direct migration will have the same interface after this
patch so that patches can be independently applied to the policy layer and the
core migration code.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SwapMig: Drop unused pages immediately
Christoph Lameter [Sun, 8 Jan 2006 09:00:54 +0000 (01:00 -0800)]
[PATCH] SwapMig: Drop unused pages immediately

Drop unused pages immediately

If a page is encountered that is only referenced by the migration code then
there is no reason to swap or migrate the page.  Release the page by calling
move_to_lru().

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SwapMig: add_to_swap() avoid atomic allocations
Christoph Lameter [Sun, 8 Jan 2006 09:00:53 +0000 (01:00 -0800)]
[PATCH] SwapMig: add_to_swap() avoid atomic allocations

Add gfp_mask to add_to_swap

add_to_swap does allocations with GFP_ATOMIC in order not to interfere with
swapping.  During migration we may have use add_to_swap extensively which may
lead to out of memory errors.

This patch makes add_to_swap take a parameter that specifies the gfp mask.
The page migration code can then make add_to_swap use GFP_KERNEL.

Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SwapMig: CONFIG_MIGRATION fixes
Christoph Lameter [Sun, 8 Jan 2006 09:00:52 +0000 (01:00 -0800)]
[PATCH] SwapMig: CONFIG_MIGRATION fixes

Move move_to_lru, putback_lru_pages and isolate_lru in section surrounded by
CONFIG_MIGRATION saving some codesize for single processor kernels.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Swap Migration V5: sys_migrate_pages interface
Christoph Lameter [Sun, 8 Jan 2006 09:00:51 +0000 (01:00 -0800)]
[PATCH] Swap Migration V5: sys_migrate_pages interface

sys_migrate_pages implementation using swap based page migration

This is the original API proposed by Ray Bryant in his posts during the first
half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.

The intent of sys_migrate is to migrate memory of a process.  A process may
have migrated to another node.  Memory was allocated optimally for the prior
context.  sys_migrate_pages allows to shift the memory to the new node.

sys_migrate_pages is also useful if the processes available memory nodes have
changed through cpuset operations to manually move the processes memory.  Paul
Jackson is working on an automated mechanism that will allow an automatic
migration if the cpuset of a process is changed.  However, a user may decide
to manually control the migration.

This implementation is put into the policy layer since it uses concepts and
functions that are also needed for mbind and friends.  The patch also provides
a do_migrate_pages function that may be useful for cpusets to automatically
move memory.  sys_migrate_pages does not modify policies in contrast to Ray's
implementation.

The current code here is based on the swap based page migration capability and
thus is not able to preserve the physical layout relative to it containing
nodeset (which may be a cpuset).  When direct page migration becomes available
then the implementation needs to be changed to do a isomorphic move of pages
between different nodesets.  The current implementation simply evicts all
pages in source nodeset that are not in the target nodeset.

Patch supports ia64, i386 and x86_64.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Swap Migration V5: MPOL_MF_MOVE interface
Christoph Lameter [Sun, 8 Jan 2006 09:00:50 +0000 (01:00 -0800)]
[PATCH] Swap Migration V5: MPOL_MF_MOVE interface

Add page migration support via swap to the NUMA policy layer

This patch adds page migration support to the NUMA policy layer.  An
additional flag MPOL_MF_MOVE is introduced for mbind.  If MPOL_MF_MOVE is
specified then pages that do not conform to the memory policy will be evicted
from memory.  When they get pages back in new pages will be allocated
following the numa policy.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support
Christoph Lameter [Sun, 8 Jan 2006 09:00:49 +0000 (01:00 -0800)]
[PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support

Include page migration if the system is NUMA or having a memory model that
allows distinct areas of memory (SPARSEMEM, DISCONTIGMEM).

And:
- Only include lru_add_drain_per_cpu if building for an SMP system.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Swap Migration V5: migrate_pages() function
Christoph Lameter [Sun, 8 Jan 2006 09:00:48 +0000 (01:00 -0800)]
[PATCH] Swap Migration V5: migrate_pages() function

This adds the basic page migration function with a minimal implementation that
only allows the eviction of pages to swap space.

Page eviction and migration may be useful to migrate pages, to suspend
programs or for remapping single pages (useful for faulty pages or pages with
soft ECC failures)

The process is as follows:

The function wanting to migrate pages must first build a list of pages to be
migrated or evicted and take them off the lru lists via isolate_lru_page().
isolate_lru_page determines that a page is freeable based on the LRU bit set.

Then the actual migration or swapout can happen by calling migrate_pages().

migrate_pages does its best to migrate or swapout the pages and does multiple
passes over the list.  Some pages may only be swappable if they are not dirty.
 migrate_pages may start writing out dirty pages in the initial passes over
the pages.  However, migrate_pages may not be able to migrate or evict all
pages for a variety of reasons.

The remaining pages may be returned to the LRU lists using putback_lru_pages().

Changelog V4->V5:
- Use the lru caches to return pages to the LRU

Changelog V3->V4:
- Restructure code so that applying patches to support full migration does
  require minimal changes. Rename swapout_pages() to migrate_pages().

Changelog V2->V3:
- Extract common code from shrink_list() and swapout_pages()

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "Michael Kerrisk" <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap
Christoph Lameter [Sun, 8 Jan 2006 09:00:47 +0000 (01:00 -0800)]
[PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap

Add PF_SWAPWRITE to control a processes permission to write to swap.

- Use PF_SWAPWRITE in may_write_to_queue() instead of checking for kswapd
  and pdflush

- Set PF_SWAPWRITE flag for kswapd and pdflush

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Swap Migration V5: LRU operations
Christoph Lameter [Sun, 8 Jan 2006 09:00:45 +0000 (01:00 -0800)]
[PATCH] Swap Migration V5: LRU operations

This is the start of the `swap migration' patch series.

Swap migration allows the moving of the physical location of pages between
nodes in a numa system while the process is running.  This means that the
virtual addresses that the process sees do not change.  However, the system
rearranges the physical location of those pages.

The main intent of page migration patches here is to reduce the latency of
memory access by moving pages near to the processor where the process
accessing that memory is running.

The patchset allows a process to manually relocate the node on which its
pages are located through the MF_MOVE and MF_MOVE_ALL options while
setting a new memory policy.

The pages of process can also be relocated from another process using the
sys_migrate_pages() function call.  Requires CAP_SYS_ADMIN.  The migrate_pages
function call takes two sets of nodes and moves pages of a process that are
located on the from nodes to the destination nodes.

Manual migration is very useful if for example the scheduler has relocated a
process to a processor on a distant node.  A batch scheduler or an
administrator can detect the situation and move the pages of the process
nearer to the new processor.

sys_migrate_pages() could be used on non-numa machines as well, to force all
of a particualr process's pages out to swap, if someone thinks that's useful.

Larger installations usually partition the system using cpusets into sections
of nodes.  Paul has equipped cpusets with the ability to move pages when a
task is moved to another cpuset.  This allows automatic control over locality
of a process.  If a task is moved to a new cpuset then also all its pages are
moved with it so that the performance of the process does not sink
dramatically (as is the case today).

Swap migration works by simply evicting the page.  The pages must be faulted
back in.  The pages are then typically reallocated by the system near the node
where the process is executing.

For swap migration the destination of the move is controlled by the allocation
policy.  Cpusets set the allocation policy before calling sys_migrate_pages()
in order to move the pages as intended.

No allocation policy changes are performed for sys_migrate_pages().  This
means that the pages may not faulted in to the specified nodes if no
allocation policy was set by other means.  The pages will just end up near the
node where the fault occurred.

There's another patch series in the pipeline which implements "direct
migration".

The direct migration patchset extends the migration functionality to avoid
going through swap.  The destination node of the relation is controllable
during the actual moving of pages.  The crutch of using the allocation policy
to relocate is not necessary and the pages are moved directly to the target.
Its also faster since swap is not used.

And sys_migrate_pages() can then move pages directly to the specified node.
Implement functions to isolate pages from the LRU and put them back later.

This patch:

An earlier implementation was provided by Hirokazu Takahashi
<taka@valinux.co.jp> and IWAMOTO Toshihiro <iwamoto@valinux.co.jp> for the
memory hotplug project.

From: Magnus

This breaks out isolate_lru_page() and putpack_lru_page().  Needed for swap
migration.

Signed-off-by: Magnus Damm <magnus.damm@gmail.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] add schedule_on_each_cpu()
Christoph Lameter [Sun, 8 Jan 2006 09:00:43 +0000 (01:00 -0800)]
[PATCH] add schedule_on_each_cpu()

swap migration's isolate_lru_page() currently uses an IPI to notify other
processors that the lru caches need to be drained if the page cannot be
found on the LRU.  The IPI interrupt may interrupt a processor that is just
processing lru requests and cause a race condition.

This patch introduces a new function run_on_each_cpu() that uses the
keventd() to run the LRU draining on each processor.  Processors disable
preemption when dealing the LRU caches (these are per processor) and thus
executing LRU draining from another process is safe.

Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
condition.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] mm: free_pages opt
Nick Piggin [Sun, 8 Jan 2006 09:00:42 +0000 (01:00 -0800)]
[PATCH] mm: free_pages opt

Try to streamline free_pages_bulk by ensuring callers don't pass in a
'count' that exceeds the list size.

Some cleanups:
Rename __free_pages_bulk to __free_one_page.
Put the page list manipulation from __free_pages_ok into free_one_page.
Make __free_pages_ok static.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] mm: cleanup zone_pcp
Nick Piggin [Sun, 8 Jan 2006 09:00:41 +0000 (01:00 -0800)]
[PATCH] mm: cleanup zone_pcp

Use zone_pcp everywhere even though NUMA code "knows" the internal details
of the zone.  Stop other people trying to copy, and it looks nicer.

Also, only print the pagesets of online cpus in zoneinfo.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Make high and batch sizes of per_cpu_pagelists configurable
Rohit Seth [Sun, 8 Jan 2006 09:00:40 +0000 (01:00 -0800)]
[PATCH] Make high and batch sizes of per_cpu_pagelists configurable

As recently there has been lot of traffic on the right values for batch and
high water marks for per_cpu_pagelists.  This patch makes these two
variables configurable through /proc interface.

A new tunable /proc/sys/vm/percpu_pagelist_fraction is added.  This entry
controls the fraction of pages at most in each zone that are allocated for
each per cpu page list.  The min value for this is 8.  It means that we
don't allow more than 1/8th of pages in each zone to be allocated in any
single per_cpu_pagelist.

The batch value of each per cpu pagelist is also updated as a result.  It
is set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)

Signed-off-by: Rohit Seth <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] drop-pagecache
Andrew Morton [Sun, 8 Jan 2006 09:00:39 +0000 (01:00 -0800)]
[PATCH] drop-pagecache

Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can.  THis
operation requires root permissions.

It won't drop dirty data, so the user should run `sync' first.

Caveats:

a) Holds inode_lock for exorbitant amounts of time.

b) Needs to be taught about NUMA nodes: propagate these all the way through
   so the discarding can be controlled on a per-node basis.

This is a debugging feature: useful for getting consistent results between
filesystem benchmarks.  We could possibly put it under a config option, but
it's less than 300 bytes.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slab: remove nested #ifdef CONFIG_NUMA
Christoph Lameter [Sun, 8 Jan 2006 09:00:38 +0000 (01:00 -0800)]
[PATCH] slab: remove nested #ifdef CONFIG_NUMA

For some reason there is an #ifdef CONFIG_NUMA within another #ifdef
CONFIG_NUMA in the page allocator.  Remove innermost #ifdef CONFIG_NUMA

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slab: fix code formatting
Pekka Enberg [Sun, 8 Jan 2006 09:00:37 +0000 (01:00 -0800)]
[PATCH] slab: fix code formatting

The slab allocator code is inconsistent in coding style and messy.  For this
patch, I ran Lindent for mm/slab.c and fixed up goofs by hand.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slab: extract slab order calculation to separate function
Pekka Enberg [Sun, 8 Jan 2006 09:00:36 +0000 (01:00 -0800)]
[PATCH] slab: extract slab order calculation to separate function

This patch moves the ugly loop that determines the 'optimal' size (page order)
of cache slabs from kmem_cache_create() to a separate function and cleans it
up a bit.

Thanks to Matthew Wilcox for the help with this patch.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slab: extract slabinfo header printing to separate function
Pekka Enberg [Sun, 8 Jan 2006 09:00:36 +0000 (01:00 -0800)]
[PATCH] slab: extract slabinfo header printing to separate function

This patch extracts slabinfo header printing to a separate function
print_slabinfo_header() to make s_start() more readable.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slab: remove unused align parameter from alloc_percpu
Pekka Enberg [Sun, 8 Jan 2006 09:00:33 +0000 (01:00 -0800)]
[PATCH] slab: remove unused align parameter from alloc_percpu

__alloc_percpu and alloc_percpu both take an 'align' argument which is
completely ignored.  snmp6_mib_init() in net/ipv6/af_inet6.c attempts to use
it, but it will be ignored.  Therefore, remove the 'align' argument and fixup
the lone caller.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Fix compilation with CONFIG_MEMORY_HOTPLUG=y and gcc41.
Olaf Hering [Sun, 8 Jan 2006 09:00:32 +0000 (01:00 -0800)]
[PATCH] Fix compilation with CONFIG_MEMORY_HOTPLUG=y and gcc41.

Fix compilation with CONFIG_MEMORY_HOTPLUG=y and gcc41.
Also remove unneeded declations, add a public function.

drivers/base/memory.c:53: error: static declaration of 'register_memory_notifier' follows non-static declaration
include/linux/memory.h:85: error: previous declaration of 'register_memory_notifier' was here
drivers/base/memory.c:58: error: static declaration of 'unregister_memory_notifier' follows non-static declaration
include/linux/memory.h:86: error: previous declaration of 'unregister_memory_notifier' was here
drivers/base/memory.c:68: error: static declaration of 'register_memory' follows non-static declaration
include/linux/memory.h:73: error: previous declaration of 'register_memory' was here

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] ARM Netwinder watchdog wdt977 update
Woody Suwalski [Sun, 8 Jan 2006 09:00:31 +0000 (01:00 -0800)]
[PATCH] ARM Netwinder watchdog wdt977 update

Cleanup for the ARM-only watchdog driver wdt977.

This is probably the last update, since we want to merge with w83977f_wdt.
Jose Goncalves has ported this driver to i386, so probably we can iron out
configuration differences.

Signed-off-by: Woody Suwalski <woodys@xandros.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] small hp_sdc_rtc cleanup: use no_llseek
Marcelo Tosatti [Sun, 8 Jan 2006 09:00:29 +0000 (01:00 -0800)]
[PATCH] small hp_sdc_rtc cleanup: use no_llseek

Use no_llseek function.

Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: "Brian S. Julin" <bri@calyx.com>
Acked-by: Vojtech Pavlik <vojtech@suse.cz>
Cc: Dmitry Torokhov <dtor_core@ameritech.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] asm-generic/atomic.h needs types.h
Andrew Morton [Sun, 8 Jan 2006 09:00:29 +0000 (01:00 -0800)]
[PATCH] asm-generic/atomic.h needs types.h

For BITS_PER_LONG

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] revert "mm: page_state fixes"
Andrew Morton [Sun, 8 Jan 2006 09:00:28 +0000 (01:00 -0800)]
[PATCH] revert "mm: page_state fixes"

Hugh says:

page_alloc_cpu_notify() specifically contains code to

  /* Add dead cpu's page_states to our own. */

which handles this more efficiently.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[IPV6]: small cleanups
Adrian Bunk [Sat, 7 Jan 2006 21:24:25 +0000 (13:24 -0800)]
[IPV6]: small cleanups

This patch contains the following cleanups:
- addrconf.c: make addrconf_dad_stop() static
- inet6_connection_sock.c should #include <net/inet6_connection_sock.h>
  for getting the prototypes of it's global functions

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV4]: make ip_fragment() static
Adrian Bunk [Sat, 7 Jan 2006 21:23:39 +0000 (13:23 -0800)]
[IPV4]: make ip_fragment() static

Since there's no longer any external user of ip_fragment() we can make
it static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Add dummy nf_hook{_thresh}() when NETFILTER is disabled.
David S. Miller [Sat, 7 Jan 2006 20:50:27 +0000 (12:50 -0800)]
[NETFILTER]: Add dummy nf_hook{_thresh}() when NETFILTER is disabled.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: ip_conntrack_proto_sctp.c needs linux/interrupt.h
Joe Kappus [Sat, 7 Jan 2006 07:15:04 +0000 (23:15 -0800)]
[NETFILTER]: ip_conntrack_proto_sctp.c needs linux/interrupt.h

Signed-off-by: Joe Kappus <joecool1029@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[AX25/MKISS]: unbalanced spinlock_bh in ax_encaps()
Francois Romieu [Sat, 7 Jan 2006 07:08:42 +0000 (23:08 -0800)]
[AX25/MKISS]: unbalanced spinlock_bh in ax_encaps()

The unlocking disappeared during commit
5793f4be23f0171b4999ca68a39a9157b44139f3.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Add ipt_policy/ip6t_policy matches
Patrick McHardy [Sat, 7 Jan 2006 07:06:48 +0000 (23:06 -0800)]
[NETFILTER]: Add ipt_policy/ip6t_policy matches

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Handle NAT in IPsec policy checks
Patrick McHardy [Sat, 7 Jan 2006 07:06:30 +0000 (23:06 -0800)]
[NETFILTER]: Handle NAT in IPsec policy checks

Handle NAT of decapsulated IPsec packets by reconstructing the struct flowi
of the original packet from the conntrack information for IPsec policy
checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Keep conntrack reference until IPsec policy checks are done
Patrick McHardy [Sat, 7 Jan 2006 07:06:10 +0000 (23:06 -0800)]
[NETFILTER]: Keep conntrack reference until IPsec policy checks are done

Keep the conntrack reference until policy checks have been performed for
IPsec NAT support. The reference needs to be dropped before a packet is
queued to avoid having the conntrack module unloadable.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Redo policy lookups after NAT when neccessary
Patrick McHardy [Sat, 7 Jan 2006 07:05:36 +0000 (23:05 -0800)]
[NETFILTER]: Redo policy lookups after NAT when neccessary

When NAT changes the key used for the xfrm lookup it needs to be done
again. If a new policy is returned in POST_ROUTING the packet needs
to be passed to xfrm4_output_one manually after all hooks were called
because POST_ROUTING is called with fixed okfn (ip_finish_output).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Use conntrack information to determine if packet was NATed
Patrick McHardy [Sat, 7 Jan 2006 07:05:17 +0000 (23:05 -0800)]
[NETFILTER]: Use conntrack information to determine if packet was NATed

Preparation for IPsec support for NAT:
Use conntrack information instead of saving the saving and comparing the
addresses to determine if a packet was NATed and needs to be rerouted to
make it easier to extend the key.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
Patrick McHardy [Sat, 7 Jan 2006 07:04:54 +0000 (23:04 -0800)]
[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder

ip_route_me_harder doesn't use the port numbers of the xfrm lookup and
uses ip_route_input for non-local addresses which doesn't do a xfrm
lookup, ip6_route_me_harder doesn't do a xfrm lookup at all.

Use xfrm_decode_session and do the lookup manually, make sure both
only do the lookup if the packet hasn't been transformed already.

Makeing sure the lookup only happens once needs a new field in the
IP6CB, which exceeds the size of skb->cb. The size of skb->cb is
increased to 48b. Apparently the IPv6 mobile extensions need some
more room anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV4]: reset IPCB flags when neccessary
Patrick McHardy [Sat, 7 Jan 2006 07:04:01 +0000 (23:04 -0800)]
[IPV4]: reset IPCB flags when neccessary

Reset IPSKB_XFRM_TUNNEL_SIZE flags in ipip and ip_gre hard_start_xmit
function before the packet reenters IP. This is neccessary so the
encapsulated packets are checked not to be oversized in xfrm4_output.c
again. Reset all flags in sit when a packet changes its address family.

Also remove some obsolete IPSKB flags.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV4/6]: Netfilter IPsec input hooks
Patrick McHardy [Sat, 7 Jan 2006 07:03:34 +0000 (23:03 -0800)]
[IPV4/6]: Netfilter IPsec input hooks

When the innermost transform uses transport mode the decapsulated packet
is not visible to netfilter. Pass the packet through the PRE_ROUTING and
LOCAL_IN hooks again before handing it to upper layer protocols to make
netfilter-visibility symetrical to the output path.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Move nextheader offset to the IP6CB
Patrick McHardy [Sat, 7 Jan 2006 07:02:34 +0000 (23:02 -0800)]
[IPV6]: Move nextheader offset to the IP6CB

Move nextheader offset to the IP6CB to make it possible to pass a
packet to ip6_input_finish multiple times and have it skip already
parsed headers. As a nice side effect this gets rid of the manual
hopopts skipping in ip6_input_finish.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[XFRM]: Netfilter IPsec output hooks
Patrick McHardy [Sat, 7 Jan 2006 07:01:48 +0000 (23:01 -0800)]
[XFRM]: Netfilter IPsec output hooks

Call netfilter hooks before IPsec transforms. Packets visit the
FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation
and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode
transform.

Patch from Herbert Xu <herbert@gondor.apana.org.au>:

Move the loop from dst_output into xfrm4_output/xfrm6_output since they're
the only ones who need to it. xfrm{4,6}_output_one() processes the first SA
all subsequent transport mode SAs and is called in a loop that calls the
netfilter hooks between each two calls.

In order to avoid the tail call issue, I've added the inline function
nf_hook which is nf_hook_slow plus the empty list check.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>