Julia Lawall [Fri, 28 Mar 2008 21:43:10 +0000 (14:43 -0700)]
fs/ocfs2/aops.c: test for IS_ERR rather than 0
The function ocfs2_start_trans always returns either a valid pointer or a
value made with ERR_PTR, so its result should be tested with IS_ERR, not
with a test for 0.
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Wed, 5 Mar 2008 08:11:46 +0000 (16:11 +0800)]
ocfs2: Add inode stealing for ocfs2_reserve_new_inode
Inode allocation is modified to look in other nodes allocators during
extreme out of space situations. We retry our own slot when space is freed
back to the global bitmap, or whenever we've allocated more than 1024 inodes
from another slot.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 3 Mar 2008 09:12:30 +0000 (17:12 +0800)]
ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
In inode stealing, we no longer restrict the allocation to
happen in the local node. So it is neccessary for us to add
a new member in ocfs2_alloc_context to indicate which slot
we are using for allocation. We also modify the process of
local alloc so that this member can be used there also.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 3 Mar 2008 09:12:09 +0000 (17:12 +0800)]
ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
In some cases(Inode stealing from other nodes), we may not want
ocfs2_reserve_suballoc_bits to allocate new groups from the
global_bitmap since it may already be full. So add a new parameter
for this.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Wed, 30 Jan 2008 06:21:32 +0000 (14:21 +0800)]
ocfs2: Enable cross extent block merge.
In ocfs2_figure_merge_contig_type, we judge whether there exists
a cross extent block merge and enable it by setting CONTIG_LEFT
and CONTIG_RIGHT accordingly.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Wed, 30 Jan 2008 06:21:05 +0000 (14:21 +0800)]
ocfs2: Add support for cross extent block
In ocfs2_merge_rec_left, when we find the merge extent is "CONTIG_RIGHT"
with the first extent record of the next extent block, we will merge it to
the next extent block and change all the related extent blocks accordingly.
In ocfs2_merge_rec_right, when we find the merge extent is "CONTIG_LEFT"
with the last extent record of the previous extent block, we will merge
it to the prevoius extent block and change all the related extent blocks
accordingly.
As for CONTIG_LEFTRIGHT, we will handle CONTIG_RIGHT first so that when
the index is zero, the merge process will be more efficient and easier.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Mark Fasheh [Wed, 30 Jan 2008 01:08:26 +0000 (17:08 -0800)]
ocfs2: Move /sys/o2cb to /sys/fs/o2cb
/sys/fs is where we really want file system specific sysfs objects.
Ocfs2-tools has been updated to look in /sys/fs/o2cb. We can maintain
backwards compatibility with old ocfs2-tools by using a sysfs symlink. After
some time (2 years), the symlink can be safely removed. This patch also adds
documentation to make it easier for people to figure out what /sys/fs/o2cb
is used for.
Mark Fasheh [Tue, 29 Jan 2008 22:35:18 +0000 (14:35 -0800)]
sysfs: Allow removal of symlinks in the sysfs root
Allow callers of sysfs_remove_link() to pass a NULL kobj, in which case
sysfs_root will be used as the parent directory. This allows us to tear down
top level symlinks created via sysfs_create_link(), which already has
similar handling of a NULL parent object.
Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Tao Ma [Wed, 5 Mar 2008 07:50:12 +0000 (15:50 +0800)]
ocfs2: Reconnect after idle time out.
Currently, o2net connects to a node on hb_up and disconnects on
hb_down and net timeout.
It disconnects on net timeout is ok, but it should attempt to
reconnect back. This is because sometimes nodes get overloaded
enough that the network connection breaks but the disk hb does not.
And if we get into that situation, we either fence (unnecessarily)
or wait for its disk hb to die (and sometimes hang in the process).
So in this updated scheme, when the network disconnects, we keep
attempting to reconnect till we succeed or we get a disk hb down
event.
If the other node is really dead, then we will eventually get a
node down event. If not, we should be able to connect again and
continue.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Fri, 14 Mar 2008 18:18:24 +0000 (11:18 -0700)]
ocfs2/dlm: Cleanup lockres print
A previous patch added KERN_NOTICE to printks printing the lockres that
cluttered the output. This patch removes the log level. For people concerned
with syslog clutter, please note we now use this facility to print lockres
only during an error.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Mon, 10 Mar 2008 22:16:29 +0000 (15:16 -0700)]
ocfs2/dlm: Fix lockname in lockres print function
__dlm_print_one_lock_resource was printing lockname incorrectly.
Also, we now use printk directly instead of mlog as the latter prints
the line context which is not useful for this print.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Mon, 10 Mar 2008 22:16:25 +0000 (15:16 -0700)]
ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h
This patch moves some mle related definitions from dlmmaster.c
to dlmcommon.h. Future patches need these definitions to dump mle
debugging information.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.beckeroracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Mon, 10 Mar 2008 22:16:21 +0000 (15:16 -0700)]
ocfs2/dlm: Link all lockres' to a tracking list
This patch links all the lockres' to a tracking list in dlm_ctxt.
We will use this in an upcoming patch that will walk the entire
list and to dump the lockres states to a debugfs file.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Mon, 10 Mar 2008 22:16:20 +0000 (15:16 -0700)]
ocfs2/dlm: Create slabcaches for lock and lockres
This patch makes the o2dlm allocate memory for lockres, lockname and lock
structures from slabcaches rather than kmalloc. This allows us to not only
make these allocs more efficient but also allows us to track the memory being
consumed by these structures.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 5 Mar 2008 01:58:56 +0000 (17:58 -0800)]
ocfs2: Allow selection of cluster plug-ins.
ocfs2 now supports plug-ins for the classic O2CB stack as well as
userspace cluster stacks in conjunction with fs/dlm. This allows zero,
one, or both of the plug-ins to be selected in Kconfig. For local mounts
(non-clustered), neither plug-in is needed. Both plugins can be loaded
at one time, the runtime will select the one needed for the cluster
systme in use.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 5 Mar 2008 00:09:39 +0000 (16:09 -0800)]
ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h
The masklog code is in the o2cb stack, but ocfs2_lockid.h now needs to
be included by the user stack. The BUG() in ocfs2_lock_type_string()
does not need masklog support, so change it to a regular BUG_ON().
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 20 Feb 2008 23:39:44 +0000 (15:39 -0800)]
ocfs2: Add the 'set version' message to the ocfs2_control device.
The "SETV" message sets the filesystem locking protocol version as
negotiated by the client. The client negotiates based on the maximum
version advertised in /sys/fs/ocfs2/max_locking_protocol.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 20 Feb 2008 22:44:34 +0000 (14:44 -0800)]
ocfs2: Add the local node id to the handshake.
This is the second part of the ocfs2_control handshake. After
negotiating the ocfs2_control protocol, the daemon tells the filesystem
what the local node id is via the SETN message.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Tue, 19 Feb 2008 03:40:12 +0000 (19:40 -0800)]
ocfs2: Start the ocfs2_control handshake.
When a control daemon opens the ocfs2_control device, it must perform a
handshake to tell the filesystem it is something capable of monitoring
cluster status. Only after the handshake is complete will the filesystem
allow mounts.
This is the first part of the handshake. The daemon reads all supported
ocfs2_control protocols, then writes in the protocol it will use.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 23:17:30 +0000 (15:17 -0800)]
ocfs2: Add the 'cluster_stack' sysfs file.
Userspace can now query and specify the cluster stack in use via the
/sys/fs/ocfs2/cluster_stack file. By default, it is 'o2cb', which is
the classic stack. Thus, old tools that do not know how to modify this
file will work just fine. The stack cannot be modified if there is a
live filesystem.
ocfs2_cluster_connect() now takes the expected cluster stack as an
argument. This way, the filesystem and the stack glue ensure they are
speaking to the same backend.
If the stack is 'o2cb', the o2cb stack plugin is used. For any other
value, the fsdlm stack plugin is selected.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 23:08:23 +0000 (15:08 -0800)]
ocfs2: Add the USERSPACE_STACK incompat bit.
The filesystem gains the USERSPACE_STACK incomat bit and the
s_cluster_info field on the superblock. When a userspace stack is in
use, the name of the stack is stored on-disk for mount-time
verification.
The "cluster_stack" option is added to mount(2) processing. The mount
process needs to pass the matching stack name. If the passed name and
the on-disk name do not match, the mount is failed.
When using the classic o2cb stack, the incompat bit is *not* set and no
mount option is used other than the usual heartbeat=local. Thus, the
filesystem is compatible with older tools.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 07:56:17 +0000 (23:56 -0800)]
ocfs2: Create stack glue sysfs files.
Introduce a set of sysfs files that describe the current stack glue
state. The files live under /sys/fs/ocfs2. The locking_protocol file
displays the version of ocfs2's locking code. The
loaded_cluster_plugins file displays all of the currently loaded stack
plugins. When filesystems are mounted, the active_cluster_plugin file
will display the plugin in use.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 23:03:57 +0000 (15:03 -0800)]
ocfs2: Break out stackglue into modules.
We define the ocfs2_stack_plugin structure to represent a stack driver.
The o2cb stack code is split into stack_o2cb.c. This becomes the
ocfs2_stack_o2cb.ko module.
The stackglue generic functions are similarly split into the
ocfs2_stackglue.ko module. This module now provides an interface to
register drivers. The ocfs2_stack_o2cb driver registers itself. As
part of this interface, ocfs2_stackglue can load drivers on demand.
This is accomplished in ocfs2_cluster_connect().
ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
If a hangup is pending, it will not release the driver module and will
let _hangup() do that.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Joel Becker [Fri, 1 Feb 2008 23:02:36 +0000 (15:02 -0800)]
ocfs2: Create ocfs2_stack_operations and split out the o2cb stack.
Define the ocfs2_stack_operations structure. Build o2cb_stack_ops from
all of the o2cb-specific stack functions. Change the generic stack glue
functions to call the stack_ops instead of the o2cb functions directly.
The o2cb functions are moved to stack_o2cb.c. The headers are cleaned up
to where only needed headers are included.
In this code, stackglue.c and stack_o2cb.c refer to some shared
extern variables. When they become modules, that will change.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 22:51:03 +0000 (14:51 -0800)]
ocfs2: Split o2cb code from generic stack functions.
Split off the o2cb-specific funtionality from the generic stack glue
calls. This is a precurser to wrapping the o2cb functionality in an
operations vector.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 30 Jan 2008 00:59:55 +0000 (16:59 -0800)]
ocfs2: Abstract out a debugging function for underlying dlms.
dlmglue.c was still referencing a raw o2dlm lksb in one instance. Let's
create a generic ocfs2_dlm_dump_lksb() function. This allows underlying
DLMs to print whatever they want about their lock.
We then move the o2dlm dump into stackglue.c where it belongs.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 22:45:08 +0000 (14:45 -0800)]
ocfs2: Remove CANCELGRANT from the view of dlmglue.
o2dlm has the non-standard behavior of providing a cancel callback
(unlock_ast) even when the cancel has failed (the locking operation
succeeded without canceling). This is called CANCELGRANT after the
status code sent to the callback. fs/dlm does not provide this
callback, so dlmglue must be changed to live without it.
o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.
Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
needs to check for it. ocfs2_locking_ast() must catch that a cancel was
tried and clear the cancel state.
Making these changes opens up a locking race. dlmglue uses the the
OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
one time. But dlmglue must unlock the lockres before calling into the
dlm. In the small window of time between unlocking the lockres and
calling the dlm, the downconvert thread can try to cancel the lock. The
downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
know that ocfs2_dlm_lock() has not yet been called.
Because ocfs2_dlm_lock() has not yet been called, the cancel operation
will just be a no-op. There's nothing to cancel. With CANCELGRANT,
dlmglue uses the CANCELGRANT callback to clear up the cancel state.
When it comes around again, it will retry the cancel. Eventually, the
first thread will have called into ocfs2_dlm_lock(), and either the
lock or the cancel will succeed. The downconvert thread can then do its
downconvert.
Without CANCELGRANT, there is nothing to clean up the cancellation
state. The downconvert thread does not know to retry its operations.
More importantly, the original lock may be blocking on the other node
that is trying to cancel us. With neither able to make progress, the
ast is never called and the cancellation state is never cleaned up that
way. dlmglue is deadlocked.
The OCFS2_LOCK_PENDING flag is introduced to remedy this window. It is
set at the same time OCFS2_LOCK_BUSY is. Thus, the downconvert thread
can check whether the lock is cancelable. If not, it just loops around
to try again. Once ocfs2_dlm_lock() is called, the thread then clears
OCFS2_LOCK_PENDING and wakes the downconvert thread. Now, if the
downconvert thread finds the lock BUSY, it can safely try to cancel it.
Whether the cancel works or not, the state will be properly set and the
lock processing can continue.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Mark Fasheh [Wed, 30 Jan 2008 00:59:56 +0000 (16:59 -0800)]
ocfs2: Fill node number during cluster stack init
It doesn't make sense to query for a node number before connecting to the
cluster stack. This should be safe to do because node_num is only just
printed,
and we're actually only moving the setting of node num a small amount
further in the mount process.
[ Disconnect when node query fails -- Joel ]
Reviewed-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 30 Jan 2008 00:59:56 +0000 (16:59 -0800)]
ocfs2: Move o2hb functionality into the stack glue.
The last bit of classic stack used directly in ocfs2 code is o2hb.
Specifically, the check for heartbeat during mount and the call to
ocfs2_hb_ctl during unmount.
We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
to ocfs2_hb_ctl. Other stacks will just leave hangup() empty.
The check for heartbeat is moved into ocfs2_cluster_connect(). It will
be matched by a similar check for other stacks.
With this change, only stackglue.c includes cluster/ headers.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 30 Jan 2008 23:38:24 +0000 (15:38 -0800)]
ocfs2: Abstract out node number queries.
ocfs2 asks the cluster stack for the local node's node number for two
reasons; to fill the slot map and to print it. While the slot map isn't
necessary for userspace cluster stacks, the printing is very nice for
debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
this value. It is anticipated that the slot map will not be used under a
userspace cluster stack, so validity checks of the node num only need to
exist in the slot map code. Otherwise, it just gets used and printed as an
opaque value.
[ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
truly opaque. --Mark ]
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 22:39:35 +0000 (14:39 -0800)]
ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API.
This step introduces a cluster stack agnostic API for initializing and
exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
connect to the stack. It is all handled in stackglue.c.
heartbeat.c no longer needs to know how it gets called.
ocfs2_do_node_down() is now a clean recovery trigger.
The big gotcha is the ordering of initializations and de-initializations done
underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all
o2dlm initialization in one block. Thus, the o2dlm functionality of
ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(),
however, did a few things between de-registration of the eviction
callback and actually shutting down the domain. Now de-registration and
shutdown of the domain are wrapped within the single
ocfs2_cluster_disconnect() call. I've checked the code paths to make
sure we can safely tear down things in ocfs2_dlm_shutdown() before
calling ocfs2_cluster_disconnect(). The filesystem has already set
itself to ignore the callback.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 20:15:37 +0000 (12:15 -0800)]
ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API.
Change the ocfs2_dlm_lock/unlock() functions to return -errno values.
This is the first step towards elminiating dlm_status in
fs/ocfs2/dlmglue.c. The change also passes -errno values to
->unlock_ast().
[ Fix a return code in dlmglue.c and change the error translation table into
an array of ints. --Mark ]
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 30 Jan 2008 01:37:32 +0000 (17:37 -0800)]
ocfs2: Separate out dlm lock functions.
This is the first in a series of patches to isolate ocfs2 from the
underlying cluster stack. Here we wrap the dlm locking functions with
ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
callbacks, we can eliminate the callbacks from the filesystem visible
functions.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 20:06:54 +0000 (12:06 -0800)]
ocfs2: New slot map format
The old slot map had a few limitations:
- It was limited to one block, so the maximum slot count was 255.
- Each slot was signed 16bits, limiting node numbers to INT16_MAX.
- An empty slot was marked by the magic 0xFFFF (-1).
The new slot map format provides 32bit node numbers (UINT32_MAX), a
separate space to mark a slot in use, and extra room to grow. The slot
map is now bounded by i_size, not a block.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 20:04:48 +0000 (12:04 -0800)]
ocfs2: De-magic the in-memory slot map.
The in-memory slot map uses the same magic as the on-disk one. There is
a special value to mark a slot as invalid. It relies on the size of
certain types and so on.
Write a new in-memory map that keeps validity as a separate field. Outside
of the I/O functions, OCFS2_INVALID_SLOT now means what it is supposed to.
It also is no longer tied to the type size.
This also means that only the I/O functions refer to 16bit quantities.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 20:03:57 +0000 (12:03 -0800)]
ocfs2: Change the recovery map to an array of node numbers.
The old recovery map was a bitmap of node numbers. This was sufficient
for the maximum node number of 254. Going forward, we want node numbers
to be UINT32. Thus, we need a new recovery map.
Note that we can't keep track of slots here. We must write down the
node number to recovery *before* we get the locks needed to convert a
node number into a slot number.
The recovery map is now an array of unsigned ints, max_slots in size.
It moves to journal.c with the rest of recovery.
Because it needs to be initialized, we move all of recovery initialization
into a new function, ocfs2_recovery_init(). This actually cleans up
ocfs2_initialize_super() a little as well. Following on, recovery cleaup
becomes part of ocfs2_recovery_exit().
A number of node map functions are rendered obsolete and are removed.
Finally, waiting on recovery is wrapped in a function rather than naked
checks on the recovery_event. This is a cleanup from Mark.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 20:01:05 +0000 (12:01 -0800)]
ocfs2: Make ocfs2_slot_info private.
Just use osb_lock around the ocfs2_slot_info data. This allows us to
take the ocfs2_slot_info structure private in slot_info.c. All access
is now via accessors.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Mark Fasheh [Fri, 1 Feb 2008 19:59:09 +0000 (11:59 -0800)]
ocfs2: Move slot map access into slot_map.c
journal.c and dlmglue.c would refresh the slot map by hand. Instead, have
the update and clear functions do the work inside slot_map.c. The eventual
result is to make ocfs2_slot_info defined privately in slot_map.c
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
it821x: do not describe noraid parameter with its value
Pb1200/DBAu1200: fix bad IDE resource size
Au1200: IDE driver build fix
Au1200: kill IDE driver function prototypes
avr32 mustn't select HAVE_IDE
Sergei Shtylyov [Wed, 16 Apr 2008 23:14:33 +0000 (01:14 +0200)]
Pb1200/DBAu1200: fix bad IDE resource size
The header files for the Pb1200/DBAu1200 boards have wrong definition for the
IDE interface's decoded range length -- it should be 512 bytes according to
what the IDE driver does. In addition, the IDE platform device claims 1 byte
too many for its memory resource -- fix the platform code and the IDE driver
in accordance.
Sergei Shtylyov [Wed, 16 Apr 2008 23:14:33 +0000 (01:14 +0200)]
Au1200: IDE driver build fix
The driver fails to compile with CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA enabled:
drivers/ide/mips/au1xxx-ide.c: In function `auide_build_dmatable':
drivers/ide/mips/au1xxx-ide.c:256: error: implicit declaration of function
`sg_virt'
drivers/ide/mips/au1xxx-ide.c:275: error: implicit declaration of function
`sg_next'
drivers/ide/mips/au1xxx-ide.c:275: warning: assignment makes pointer from
integer without a cast
Fix this by including <linux/scatterlist.h>. While at it, remove the #include's
without which the driver happily builds.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Wed, 16 Apr 2008 23:14:33 +0000 (01:14 +0200)]
Au1200: kill IDE driver function prototypes
Fix these warnings emitted when compiling drivers/ide/mips/au1xxx-ide.c:
include/asm/mach-au1x00/au1xxx_ide.h:137: warning: 'auide_tune_drive' declared
`static' but never defined
include/asm/mach-au1x00/au1xxx_ide.h:138: warning: 'auide_tune_chipset' declared
`static' but never defined
by wiping out the whole "function prototyping" section from the header file
<asm-mips/mach-au1x00/au1xxx_ide.h> as it mostly declared functions that are
already dead in the IDE driver; move the only useful prototype into the driver.
Adrian Bunk [Wed, 16 Apr 2008 23:14:32 +0000 (01:14 +0200)]
avr32 mustn't select HAVE_IDE
There's a libata based PATA driver for avr32, but no support for
drivers/ide/ on avr32.
This patch fixes the following compile error:
<-- snip -->
...
CC [M] drivers/ide/ide-cd.o
In file included from /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/ide/ide-cd.c:37:
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/ide.h:209:21: error: asm/ide.h: No such file or directory
make[3]: *** [drivers/ide/ide-cd.o] Error 1
<-- snip -->
Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: remove broken usb-serial num_endpoints check
USB: option: Add new vendor ID and device ID for AMOI HSDPA modem
USB: support more Huawei data card product IDs
USB: option.c: add more device IDs
USB: Obscure Maxon BP3-USB Device Support 16d8:6280 for option driver
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[TCP]: Add return value indication to tcp_prune_ofo_queue().
PS3: gelic: fix the oops on the broken IE returned from the hypervisor
b43legacy: fix DMA mapping leakage
mac80211: remove message on receiving unexpected unencrypted frames
Update rt2x00 MAINTAINERS entry
Add rfkill to MAINTAINERS file
rfkill: Fix device type check when toggling states
b43legacy: Fix usage of struct device used for DMAing
ssb: Fix usage of struct device used for DMAing
MAINTAINERS: move to generic repository for iwlwifi
b43legacy: fix initvals loading on bcm4303
rtl8187: Add missing priv->vif assignments
netconsole: only set CON_PRINTBUFFER if the user specifies a netconsole
[CAN]: Update documentation of struct sockaddr_can
MAINTAINERS: isdn4linux@listserv.isdn4linux.de is subscribers-only
[TCP]: Fix never pruned tcp out-of-order queue.
[NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop
The num_interrupt_in, num_bulk_in, and other checks in the usb-serial
code are just wrong, there are too many different devices out there with
different numbers of endpoints. We need to just be sticking with the
device ids instead of trying to catch this kind of thing. It broke too
many different devices.
This fixes a large number of usb-serial devices to get them working
properly again.
- declare the unusal device for Huawei data card devices in
unusual_devs.h
- disable the product ID matching for Huawei data card devices in
usb_match_device function of driver.c
- declare the product IDs in option.c.
James Cameron [Wed, 9 Apr 2008 08:59:13 +0000 (18:59 +1000)]
USB: Obscure Maxon BP3-USB Device Support 16d8:6280 for option driver
The modem was detected, the ttyUSB{0,1,2} appeared, a call could be
made, and the expected data rate was achieved. Tested for an hour or
two, total of 100Mb. I shall do more testing.
Signed-off-by: James Cameron <quozl@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Krzysztof Helt [Tue, 15 Apr 2008 21:34:47 +0000 (14:34 -0700)]
acpi thermal trip points increased to 12
The THERMAL_MAX_TRIPS value is set to 10. It is too few for the Compaq AP550
machine which has 12 trip points.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Tue, 15 Apr 2008 21:34:46 +0000 (14:34 -0700)]
spi: spi_s3c24xx must initialize num_chipselect
The SPI core now expects num_chipselect to be set correctly as due to added
checks on the chip being selected before an transfer is allowed. This patch
adds a num_cs field to the platform data which needs to be set correctly
before adding the SPI platform device.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Tue, 15 Apr 2008 21:34:44 +0000 (14:34 -0700)]
spi: spi_s3c24xx driver must init completion
The s3c24xx_spi_txrx() function should initialise the completion each time
before using it, otherwise we end up with the possibility of returning success
before the interrupt handler has processed all the data.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 15 Apr 2008 21:34:43 +0000 (14:34 -0700)]
vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs
mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But
filesystems are calling this function while holding xattr_sem so possible
recursion into the fs violates locking ordering of xattr_sem and transaction
start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems
can specify desired gfp mask and use GFP_NOFS from all of them.
Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Dave Jones <davej@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation: correct overcommit caveat in hugetlbpage.txt
As shown by Gurudas Pai recently, we can put hugepages into the surplus
state (by echo 0 > /proc/sys/vm/nr_hugepages), even when
/proc/sys/vm/nr_overcommit_hugepages is 0. This is actually correct, to
allow the original goal (shrink the static pool to 0) to succeed (we are
converting hugepages to surplus because they are in use). However, the
documentation does not accurately reflect this case. Update it.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
add "Isolate" migratetype name to /proc/pagetypeinfo
In a5d76b54a3f3a40385d7f76069a2feac9f1bad63 (memory unplug: page isolation by
KAMEZAWA Hiroyuki), "isolate" migratetype added. but unfortunately, it
doesn't treat /proc/pagetypeinfo display logic.
this patch add "Isolate" to pagetype name field.
/proc/pagetype
before:
------------------------------------------------------------------------------------------------------------------------
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 2 2 2 1 2 2 1 1 0 0
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Movable 2 3 3 1 3 3 2 0 0 0 0
Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone DMA, type <NULL> 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 1 9 7 4 1 1 1 1 0 0 0
Node 0, zone Normal, type Reclaimable 5 2 0 0 1 1 0 0 0 1 0
Node 0, zone Normal, type Movable 0 1 1 0 0 0 1 0 0 1 60
Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone Normal, type <NULL> 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone HighMem, type Unmovable 0 0 1 1 1 0 1 1 2 2 0
Node 0, zone HighMem, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone HighMem, type Movable 236 62 6 2 2 1 1 0 1 1 16
Node 0, zone HighMem, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone HighMem, type <NULL> 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Reclaimable Movable Reserve <NULL>
Node 0, zone DMA 1 0 2 1 0
Node 0, zone Normal 10 40 169 1 0
Node 0, zone HighMem 2 0 283 1 0
after:
------------------------------------------------------------------------------------------------------------------------
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 2 2 2 1 2 2 1 1 0 0
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Movable 2 3 3 1 3 3 2 0 0 0 0
Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 0 2 1 1 0 1 0 0 0 0 0
Node 0, zone Normal, type Reclaimable 1 1 1 1 1 0 1 1 1 0 0
Node 0, zone Normal, type Movable 0 1 1 1 0 1 0 1 0 0 196
Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone HighMem, type Unmovable 0 1 0 0 0 1 1 1 2 2 0
Node 0, zone HighMem, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone HighMem, type Movable 1 0 1 1 0 0 0 0 1 0 200
Node 0, zone HighMem, type Reserve 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone HighMem, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Reclaimable Movable Reserve Isolate
Node 0, zone DMA 1 0 2 1 0
Node 0, zone Normal 8 4 207 1 0
Node 0, zone HighMem 2 0 283 1 0
Dmitri Vorobiev [Tue, 15 Apr 2008 21:34:40 +0000 (14:34 -0700)]
Fix typos in Documentation/filesystems/seq_file.txt
A couple of typos crept into the newly added document about the seq_file
interface. This patch corrects those typos and simultaneously deletes
unnecessary trailing spaces.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WANG Cong [Tue, 15 Apr 2008 21:34:38 +0000 (14:34 -0700)]
uml: compile error fix
This patch fixes this error:
In file included from /home/wangcong/projects/linux-2.6/arch/um/kernel/smp.c:9:
include2/asm/tlb.h: In function `tlb_remove_page':
include2/asm/tlb.h:101: error: implicit declaration of function `page_cache_release'
And since including <linux/pagemap.h> in <linux/swap.h> will break sparc,
we add this #include in uml's own header.
Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Tue, 15 Apr 2008 21:34:37 +0000 (14:34 -0700)]
memcg: fix oops in oom handling
When I used a test program to fork mass processes and immediately move them to
a cgroup where the memory limit is low enough to trigger oom kill, I got oops:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000808
IP: [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
PGD 4c95f067 PUD 4406c067 PMD 0
Oops: 0002 [1] SMP
CPU 2
Modules linked in:
Kay Sievers [Tue, 15 Apr 2008 21:34:35 +0000 (14:34 -0700)]
serial: fix platform driver hotplug/coldplug
Since 43cc71eed1250755986da4c0f9898f9a635cb3bf, the platform modalias is
prefixed with "platform:". Add MODULE_ALIAS() to the hotpluggable serial
platform drivers, to re-enable auto loading.
NOTE that Kconfig for some of these drivers doesn't allow modular builds, and
thus doesn't match the driver source's unload support. Presumably their
unload code is buggy and/or weakly tested...
[dbrownell@users.sourceforge.net: more drivers, registration fixes] Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kay Sievers [Tue, 15 Apr 2008 21:34:34 +0000 (14:34 -0700)]
pcmcia: fix platform driver hotplug/coldplug
Since 43cc71eed1250755986da4c0f9898f9a635cb3bf, the platform modalias is
prefixed with "platform:". Add MODULE_ALIAS() to the hotpluggable PCMCIA
platform drivers, to re-enable auto loading.
[dbrownell@users.sourceforge.net: registration fixes] Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kay Sievers [Tue, 15 Apr 2008 21:34:33 +0000 (14:34 -0700)]
misc: fix platform driver hotplug/coldplug
Since 43cc71eed1250755986da4c0f9898f9a635cb3bf, the platform modalias is
prefixed with "platform:". Add MODULE_ALIAS() to the hotpluggable 'misc'
platform drivers, to re-enable auto loading.
[dbrownell@users.sourceforge.net: bugfix, registration fixes] Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kay Sievers [Tue, 15 Apr 2008 21:34:30 +0000 (14:34 -0700)]
leds: fix platform driver hotplug/coldplug
Since 43cc71eed1250755986da4c0f9898f9a635cb3bf, the platform
modalias is prefixed with "platform:". Add MODULE_ALIAS() to the
hotpluggable platform LED drivers, to re-enable auto loading.
[dbrownell@users.sourceforge.net: more drivers, registration fixes] Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rtc: fix the error in the function of cmos_set_alarm
There is a bug in the function of cmos_set_alarm. RTC alarm time for October
can't be set correctly.
For October: 0x0A will be written into the RTC region (MONTH_ALARM) in current
kernel. But in fact 0x10 should be written. Wildcards are also not handled
correctly.
Kay Sievers [Tue, 15 Apr 2008 21:34:28 +0000 (14:34 -0700)]
mmc: fix platform driver hotplug/coldplug
Since 43cc71eed1250755986da4c0f9898f9a635cb3bf, the platform modalias is
prefixed with "platform:". Add MODULE_ALIAS() to the hotpluggable MMC host
platform drivers, to re-enable auto loading.
Also, add missing owner declarations in driver init.
[dbrownell@users.sourceforge.net: registration fixes] Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Pierre Ossman <drzeus@drzeus.cx> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix memory corruption and crash on 32-bit x86 systems.
If a !PAE x86 kernel is booted on a 32-bit system with more than 4GB of
RAM, then we call memory_present() with a start/end that goes outside
the scope of MAX_PHYSMEM_BITS.
That causes this loop to happily walk over the limit of the sparse
memory section map:
for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
unsigned long section = pfn_to_section_nr(pfn);
struct mem_section *ms;
ms = __nr_to_section(section);
if (!ms->section_mem_map)
ms->section_mem_map = sparse_encode_early_nid(nid) |
SECTION_MARKED_PRESENT;
'ms' will be out of bounds and we'll corrupt a small amount of memory by
encoding the node ID and writing SECTION_MARKED_PRESENT (==0x1) over it.
The corruption might happen when encoding a non-zero node ID, or due to
the SECTION_MARKED_PRESENT which is 0x1:
mmzone.h:#define SECTION_MARKED_PRESENT (1UL<<0)
The fix is to sanity check anything the architecture passes to
sparsemem.
This bug seems to be rather old (as old as sparsemem support itself),
but the exact incarnation depended on random details like configs, which
made this bug more prominent in v2.6.25-to-be.
An additional enhancement might be to print a warning about ignored or
trimmed memory ranges.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Christoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nick Piggin <npiggin@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Yinghai Lu <Yinghai.Lu@sun.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The signal trampolines were accidently flushing the kernel I$ instead of
the users. Fix that up, and also add a missing user D$ flush while
we're at it.
Fix locking bug in "acquire_console_semaphore_for_printk()"
When I cleaned up printk() and split up the printk locking logic in
commit 266c2e0abeca649fa6667a1a427ad1da507c6375 ("Make printk() console
semaphore accesses sensible") I had incorrectly moved the call to
have_callable_console() outside of the console semaphore.
That was buggy. The console semaphore protects the console_drivers list
that is used by have_callable_console().
Thanks go to Bongani Hlope who saw this as a hang on shutdown and reboot
and bisected the bug to the right commit, and tested this patch. See
PS3: gelic: fix the oops on the broken IE returned from the hypervisor
This fixes the bug that the driver would try to over-scan the memory
if the sum of the length field of every IEs does not match the length
returned from the hypervisor.
Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp> Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes a DMA mapping leakage in the case where we reject a DMA buffer
because of its address.
The patch by Michael Buesch has been ported to b43legacy.
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it> Cc: Christian Casteyde <casteyde.christian@free.fr> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Sun, 13 Apr 2008 08:12:47 +0000 (10:12 +0200)]
mac80211: remove message on receiving unexpected unencrypted frames
Some people are getting this message a lot, and we have traced it to
broken access points that much too often send completely empty frames
(all bytes zeroed, which they shouldn't do at all.)
Since we cannot do anything about such frames in any case except the
special case where we're debugging an AP, just remove the message.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>