Herbert Xu [Tue, 22 Apr 2008 07:46:42 +0000 (00:46 -0700)]
[IPSEC]: Fix catch-22 with algorithm IDs above 31
As it stands it's impossible to use any authentication algorithms
with an ID above 31 portably. It just happens to work on x86 but
fails miserably on ppc64.
The reason is that we're using a bit mask to check the algorithm
ID but the mask is only 32 bits wide.
After looking at how this is used in the field, I have concluded
that in the long term we should phase out state matching by IDs
because this is made superfluous by the reqid feature. For current
applications, the best solution IMHO is to allow all algorithms when
the bit masks are all ~0.
The following patch does exactly that.
This bug was identified by IBM when testing on the ppc64 platform
using the NULL authentication algorithm which has an ID of 251.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Asselstine [Mon, 21 Apr 2008 21:44:16 +0000 (14:44 -0700)]
hamradio: Remove unneeded and deprecated cli()/sti() calls in dmascc.c
These cli()/sti() calls are made in start_timer() and are therefor
redundant since the register_lock is now used to protect register
io from within scc_isr() and write_scc() (where all calls to
start_timer() originate).
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[DCCP]: Convert do_gettimeofday() to getnstimeofday().
What do_gettimeofday() does is to call getnstimeofday() and
to convert the result from timespec{} to timeval{}.
We do not always need timeval{} and we can convert timespec{}
when we really need (to print).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 21 Apr 2008 21:23:03 +0000 (14:23 -0700)]
[NETNS]: The ip6_fib_timer can work with garbage on net namespace stop.
The del_timer() function doesn't guarantee, that the timer callback
is not active by the time it exits.
Thus, the fib6_net_exit() may kfree() all the data, that is required
by the fib6_run_gc(). The race window is tiny, but slab poisoning can
trigger this bug.
Using del_timer_sync() will cure this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[IPV4]: Convert do_gettimeofday() to getnstimeofday().
What do_gettimeofday() does is to call getnstimeofday() and
to convert the result from timespec{} to timeval{}.
After that, these callers convert the result again to msec.
Use getnstimeofday() and convert the units at once.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Woodhouse [Sun, 20 Apr 2008 23:07:43 +0000 (16:07 -0700)]
[NET]: Expose netdevice dev_id through sysfs
Expose dev_id to userspace, because it helps to disambiguate between
interfaces where the MAC address is unique.
This should allow us to simplify the handling of persistent naming for
S390 network devices in udev -- because it can depend on a simple
attribute of the device like the other match criteria, rather than
having a special case for SUBSYSTEMS=="ccwgroup".
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Bernard Pidoux [Sun, 20 Apr 2008 01:41:51 +0000 (18:41 -0700)]
rose: Socket lock was not released before returning to user space
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
xfbbd/3683 is leaving the kernel with locks still held!
1 lock held by xfbbd/3683:
#0: (sk_lock-AF_ROSE){--..}, at: [<c8cd1eb3>] rose_connect+0x73/0x420 [rose]
Pavel Machek [Sun, 20 Apr 2008 01:17:26 +0000 (18:17 -0700)]
hci_usb: remove code obfuscation
_urb_free is an alias for kfree... making code longer & harder to
read. Remove it.
Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Machek [Sun, 20 Apr 2008 01:13:40 +0000 (18:13 -0700)]
hci_usb: do not initialize static variables to 0
hci_usb: do not initialize static variables to 0.
Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Sun, 20 Apr 2008 01:12:33 +0000 (18:12 -0700)]
tg3: 5701 DMA corruption fix
Herbert Xu's commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9, entitled
"[TCP]: Fix size calculation in sk_stream_alloc_pskb", has triggered a
bug in the 5701 where the 5701 DMA engine will corrupt outgoing
packets. This problem only happens when the starting address of the
packet matches a certain range of offsets and only when the 5701 is
placed downstream of a particular Intel bridge.
This patch detects the problematic bridge and if present, readjusts the
starting address of the packet data to a dword aligned boundary.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Asselstine [Sun, 20 Apr 2008 01:10:46 +0000 (18:10 -0700)]
atm nicstar: Removal of debug code containing deprecated calls to cli()/sti()
Code within NS_DEBUG_SPINLOCKS contained deprecated cli()/sti()
function calls. NS_DEBUG_SPINLOCKS and the associated code seems to
be of little use these days so the strategy of removing this code
rather then updating it to use spinlocks has been taken.
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 20 Apr 2008 01:09:39 +0000 (18:09 -0700)]
iwlwifi: Fix unconditional access to station->tidp[].agg.
Reportred by Ingo Molnar:
drivers/net/wireless/iwlwifi/iwl-debugfs.c: In function 'iwl_dbgfs_stations_read':
drivers/net/wireless/iwlwifi/iwl-debugfs.c:256: error: 'struct iwl4965_tid_data' has no member named 'agg'
Needs CONFIG_IWL4965_HT protection.
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
security: fix up documentation for security_module_enable
Security: Introduce security= boot parameter
Audit: Final renamings and cleanup
SELinux: use new audit hooks, remove redundant exports
Audit: internally use the new LSM audit hooks
LSM/Audit: Introduce generic Audit LSM hooks
SELinux: remove redundant exports
Netlink: Use generic LSM hook
Audit: use new LSM hooks instead of SELinux exports
SELinux: setup new inode/ipc getsecid hooks
LSM: Introduce inode_getsecid and ipc_getsecid hooks
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
[NET]: Fix and allocate less memory for ->priv'less netdevices
[IPV6]: Fix dangling references on error in fib6_add().
[NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
[PKT_SCHED]: Fix datalen check in tcf_simp_init().
[INET]: Uninline the __inet_inherit_port call.
[INET]: Drop the inet_inherit_port() call.
SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
[netdrvr] forcedeth: internal simplifications; changelog removal
phylib: factor out get_phy_id from within get_phy_device
PHY: add BCM5464 support to broadcom PHY driver
cxgb3: Fix __must_check warning with dev_dbg.
tc35815: Statistics cleanup
natsemi: fix MMIO for PPC 44x platforms
[TIPC]: Cleanup of TIPC reference table code
[TIPC]: Optimized initialization of TIPC reference table
[TIPC]: Remove inlining of reference table locking routines
e1000: convert uint16_t style integers to u16
ixgb: convert uint16_t style integers to u16
sb1000.c: make const arrays static
sb1000.c: stop inlining largish static functions
...
Add the security= boot parameter. This is done to avoid LSM
registration clashes in case of more than one bult-in module.
User can choose a security module to enable at boot. If no
security= boot parameter is specified, only the first LSM
asking for registration will be loaded. An invalid security
module name will be treated as if no module has been chosen.
LSM modules must check now if they are allowed to register
by calling security_module_enable(ops) first. Modify SELinux
and SMACK to do so.
Do not let SMACK register smackfs if it was not chosen on
boot. Smackfs assumes that smack hooks are registered and
the initial task security setup (swapper->security) is done.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org>
Ahmed S. Darwish [Fri, 18 Apr 2008 23:59:43 +0000 (09:59 +1000)]
Audit: Final renamings and cleanup
Rename the se_str and se_rule audit fields elements to
lsm_str and lsm_rule to avoid confusion.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org>
SELinux: use new audit hooks, remove redundant exports
Setup the new Audit LSM hooks for SELinux.
Remove the now redundant exported SELinux Audit interface.
Audit: Export 'audit_krule' and 'audit_field' to the public
since their internals are needed by the implementation of the
new LSM hook 'audit_rule_known'.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org>
instad of (respectively) :
selinux_audit_rule_init
selinux_audit_rule_free
audit_rule_has_selinux
selinux_audit_rule_match
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org>
Those hooks are only available if CONFIG_AUDIT is enabled.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org> Reviewed-by: Paul Moore <paul.moore@hp.com>
Remove the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)
They can be substitued with the following generic equivalents
respectively:
new LSM hook, inode_getsecid(inode, secid)
new LSM hook, ipc_getsecid*(ipcp, secid)
LSM hook, task_getsecid(tsk, secid)
LSM hook, sid_to_secctx(sid, ctx, len)
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org> Reviewed-by: Paul Moore <paul.moore@hp.com>
Don't use SELinux exported selinux_get_task_sid symbol.
Use the generic LSM equivalent instead.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Paul Moore <paul.moore@hp.com>
Audit: use new LSM hooks instead of SELinux exports
Stop using the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)
kfree(ctx)
and use following generic LSM equivalents respectively:
security_inode_getsecid(inode, secid)
security_ipc_getsecid*(ipcp, secid)
security_task_getsecid(tsk, secid)
security_sid_to_secctx(sid, ctx, len)
security_release_secctx(ctx, len)
Call security_release_secctx only if security_secid_to_secctx
succeeded.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org> Reviewed-by: Paul Moore <paul.moore@hp.com>
Setup the new inode_getsecid and ipc_getsecid() LSM hooks
for SELinux.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org> Reviewed-by: Paul Moore <paul.moore@hp.com>
LSM: Introduce inode_getsecid and ipc_getsecid hooks
Introduce inode_getsecid(inode, secid) and ipc_getsecid(ipcp, secid)
LSM hooks. These hooks will be used instead of similar exported
SELinux interfaces.
Let {inode,ipc,task}_getsecid hooks set the secid to 0 by default
if CONFIG_SECURITY is not defined or if the hook is set to
NULL (dummy). This is done to notify the caller that no valid
secid exists.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: James Morris <jmorris@namei.org> Reviewed-by: Paul Moore <paul.moore@hp.com>
[NET]: Fix and allocate less memory for ->priv'less netdevices
This patch effectively reverts commit d0498d9ae1a5cebac363e38907266d5cd2eedf89
aka "[NET]: Do not allocate unneeded memory for dev->priv alignment."
It was found to be buggy because of final unconditional += NETDEV_ALIGN_CONST
removal.
For example, for sizeof(struct net_device) being 2048 bytes, "alloc_size"
was also 2048 bytes, but allocator with debugging options turned on started
giving out !32-byte aligned memory resulting in redzones overwrites.
Patch does small optimization in ->priv'less case: bumping size to next
32-byte boundary was always done to ensure ->priv will also be aligned.
But, no ->priv, no need to do that.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
do not return a -EINVAL when mmap()-ing PCI holes.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (137 commits)
[SCSI] iscsi: bidi support for iscsi_tcp
[SCSI] iscsi: bidi support at the generic libiscsi level
[SCSI] iscsi: extended cdb support
[SCSI] zfcp: Fix error handling for blocked unit for send FCP command
[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
[SCSI] zfcp: fix 31 bit compile warnings
[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
[SCSI] bsg: remove minor in struct bsg_device
[SCSI] bsg: use better helper list functions
[SCSI] bsg: replace kobject_get with blk_get_queue
[SCSI] bsg: takes a ref to struct device in fops->open
[SCSI] qla1280: remove version check
[SCSI] libsas: fix endianness bug in sas_ata
[SCSI] zfcp: fix compiler warning caused by poking inside new semaphore (linux-next)
[SCSI] aacraid: Do not describe check_reset parameter with its value
[SCSI] aacraid: Fix down_interruptible() to check the return value
[SCSI] sun3_scsi_vme: add MODULE_LICENSE
[SCSI] st: rename flush_write_buffer()
[SCSI] tgt: use KMEM_CACHE macro
[SCSI] initio: fix big endian problems for auto request sense
...
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (43 commits)
firewire: cleanups
firewire: fix synchronization of gap counts
firewire: wait until PHY configuration packet was transmitted (fix bus reset loop)
firewire: remove unused struct member
firewire: use bitwise and to get reg in handle_registers
firewire: replace more hex values with defined csr constants
firewire: reread config ROM when device reset the bus
firewire: replace static ROM cache by allocated cache
firewire: fw-ohci: work around generation bug in TI controllers (fix AV/C and more)
firewire: fw-ohci: extend logging of bus generations and node ID
firewire: fw-ohci: conditionally log busReset interrupts
firewire: fw-ohci: don't append to AT context when it's not active
firewire: fw-ohci: log regAccessFail events
firewire: fw-ohci: make sure HCControl register LPS bit is set
firewire: fw-ohci: missing PPC PMac feature calls in failure path
firewire: fw-ohci: untangle a mixed unsigned/signed expression
firewire: debug interrupt events
firewire: fw-ohci: catch self_id_count == 0
firewire: fw-ohci: add self ID error check
firewire: fw-ohci: refactor probe, remove, suspend, resume
...
James Bottomley [Fri, 18 Apr 2008 18:18:48 +0000 (13:18 -0500)]
libata: fix boot panic with SATAPI devices on non-SFF HBAs
The kernel now panics reliably on boot if you have a SATAPI device
connected.
The problem was introduced by the libata merge trying to pull out all
the SFF code into a separate module. Unfortunately, if you're a satapi
device you usually need to call atapi_request_sense, which has a bare
invocation of a SFF callback which is NULL on non-SFF HBAs. Fix this by
making the call conditional.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (64 commits)
ocfs2/net: Add debug interface to o2net
ocfs2: Only build ocfs2/dlm with the o2cb stack module
ocfs2/cluster: Get rid of arguments to the timeout routines
ocfs2: Put tree in MAINTAINERS
ocfs2: Use BUG_ON
ocfs2: Convert ocfs2 over to unlocked_ioctl
ocfs2: Improve rename locking
fs/ocfs2/aops.c: test for IS_ERR rather than 0
ocfs2: Add inode stealing for ocfs2_reserve_new_inode
ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
ocfs2: Enable cross extent block merge.
ocfs2: Add support for cross extent block
ocfs2: Move /sys/o2cb to /sys/fs/o2cb
sysfs: Allow removal of symlinks in the sysfs root
ocfs2: Reconnect after idle time out.
ocfs2/dlm: Cleanup lockres print
ocfs2/dlm: Fix lockname in lockres print function
ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c
ocfs2/dlm: Dumps the purgelist into a debugfs file
...
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (49 commits)
[GFS2] fix assertion in log_refund()
[GFS2] fix GFP_KERNEL misuses
[GFS2] test for IS_ERR rather than 0
[GFS2] Invalidate cache at correct point
[GFS2] fs/gfs2/recovery.c: suppress warnings
[GFS2] Faster gfs2_bitfit algorithm
[GFS2] Streamline quota lock/check for no-quota case
[GFS2] Remove drop of module ref where not needed
[GFS2] gfs2_adjust_quota has broken unstuffing code
[GFS2] possible null pointer dereference fixup
[GFS2] Need to ensure that sector_t is 64bits for GFS2
[GFS2] re-support special inode
[GFS2] remove gfs2_dev_iops
[GFS2] fix file_system_type leak on gfs2meta mount
[GFS2] Allow bmap to allocate extents
[GFS2] Fix a page lock / glock deadlock
[GFS2] proper extern for gfs2/locking/dlm/mount.c:gdlm_ops
[GFS2] gfs2/ops_file.c should #include "ops_inode.h"
[GFS2] be*_add_cpu conversion
[GFS2] Fix bug where we called drop_bh incorrectly
...
[SCSI] iscsi: bidi support at the generic libiscsi level
- prepare the additional bidi_read rlength header.
- access the right scsi_in() and/or scsi_out() side of things.
also for resid.
- Handle BIDI underflow overflow from target
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Reviewed-by: Pete Wyckoff <pw@osc.edu> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Martin Peschke [Fri, 18 Apr 2008 10:51:55 +0000 (12:51 +0200)]
[SCSI] zfcp: fix 31 bit compile warnings
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_rscn’:
drivers/s390/scsi/zfcp_aux.c:1379: warning: cast from pointer to integer of
different size
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_plogi’:
drivers/s390/scsi/zfcp_aux.c:1432: warning: cast from pointer to integer of
different size
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_logo’:
drivers/s390/scsi/zfcp_aux.c:1457: warning: cast from pointer to integer of
different size
..
Just passing pointers rids us of these warnings and improves readability.
Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:41 +0000 (10:03 +0900)]
[SCSI] bsg: remove minor in struct bsg_device
minor in struct bsg_device is used as identifier to find the
corresponding struct bsg_device_class. However, request_queuse can be
used as identifier for that and the minor in struct bsg_device is
unnecessary.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:38 +0000 (10:03 +0900)]
[SCSI] bsg: takes a ref to struct device in fops->open
bsg_register_queue() takes a ref to struct device that a caller
passes. For example, bsg takes a ref to the sdev_gendev for scsi
devices. However, bsg doesn't inrease the refcount in fops->open. So
while an application opens a bsg device, the scsi device that the bsg
device holds can go away (bsg also takes a ref to a queue, but it
doesn't prevent the device from going away).
With this patch, bsg increases the refcount of struct device in
fops->open and decreases it in fops->release.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch exposes o2net information via debugfs. The information includes
the list of sockets (sock_containers) as well as the list of outstanding
messages (send_tracking). Useful for o2dlm debugging.
(This patch is derived from an earlier one written by Zach Brown that
exposed the same information via /proc.)
[Mark: checkpatch fixes]
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Mark Fasheh [Fri, 4 Apr 2008 19:45:55 +0000 (12:45 -0700)]
ocfs2: Only build ocfs2/dlm with the o2cb stack module
fs/ocfs2/dlm/ocfs2_dlm.ko and fs/ocfs2/dlm/ocfs2_dlmfs.ko get built if
CONFIG_FS_OCFS2 is specified. This isn't quite how it should happen any more
- the "o2cb" dlm modules should only be built if CONFIG_FS_OCFS2_O2CB is
set, so update the dlm Makefile accordingly.
Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com>
Jeff Mahoney [Fri, 28 Mar 2008 23:44:13 +0000 (16:44 -0700)]
ocfs2/cluster: Get rid of arguments to the timeout routines
We keep seeing bug reports related to NULL pointer derefs in
o2net_set_nn_state(). When I originally wrote up the configurable timeout
patch, I had tried to plan for multiple clusters. This was silly.
The timeout routines all use o2nm_single_cluster so there's no point in
passing an argument at all. This patch removes the arguments and kills those
bugs dead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Julia Lawall [Fri, 28 Mar 2008 21:43:10 +0000 (14:43 -0700)]
fs/ocfs2/aops.c: test for IS_ERR rather than 0
The function ocfs2_start_trans always returns either a valid pointer or a
value made with ERR_PTR, so its result should be tested with IS_ERR, not
with a test for 0.
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Wed, 5 Mar 2008 08:11:46 +0000 (16:11 +0800)]
ocfs2: Add inode stealing for ocfs2_reserve_new_inode
Inode allocation is modified to look in other nodes allocators during
extreme out of space situations. We retry our own slot when space is freed
back to the global bitmap, or whenever we've allocated more than 1024 inodes
from another slot.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 3 Mar 2008 09:12:30 +0000 (17:12 +0800)]
ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
In inode stealing, we no longer restrict the allocation to
happen in the local node. So it is neccessary for us to add
a new member in ocfs2_alloc_context to indicate which slot
we are using for allocation. We also modify the process of
local alloc so that this member can be used there also.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Mon, 3 Mar 2008 09:12:09 +0000 (17:12 +0800)]
ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
In some cases(Inode stealing from other nodes), we may not want
ocfs2_reserve_suballoc_bits to allocate new groups from the
global_bitmap since it may already be full. So add a new parameter
for this.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Wed, 30 Jan 2008 06:21:32 +0000 (14:21 +0800)]
ocfs2: Enable cross extent block merge.
In ocfs2_figure_merge_contig_type, we judge whether there exists
a cross extent block merge and enable it by setting CONTIG_LEFT
and CONTIG_RIGHT accordingly.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tao Ma [Wed, 30 Jan 2008 06:21:05 +0000 (14:21 +0800)]
ocfs2: Add support for cross extent block
In ocfs2_merge_rec_left, when we find the merge extent is "CONTIG_RIGHT"
with the first extent record of the next extent block, we will merge it to
the next extent block and change all the related extent blocks accordingly.
In ocfs2_merge_rec_right, when we find the merge extent is "CONTIG_LEFT"
with the last extent record of the previous extent block, we will merge
it to the prevoius extent block and change all the related extent blocks
accordingly.
As for CONTIG_LEFTRIGHT, we will handle CONTIG_RIGHT first so that when
the index is zero, the merge process will be more efficient and easier.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Mark Fasheh [Wed, 30 Jan 2008 01:08:26 +0000 (17:08 -0800)]
ocfs2: Move /sys/o2cb to /sys/fs/o2cb
/sys/fs is where we really want file system specific sysfs objects.
Ocfs2-tools has been updated to look in /sys/fs/o2cb. We can maintain
backwards compatibility with old ocfs2-tools by using a sysfs symlink. After
some time (2 years), the symlink can be safely removed. This patch also adds
documentation to make it easier for people to figure out what /sys/fs/o2cb
is used for.
Mark Fasheh [Tue, 29 Jan 2008 22:35:18 +0000 (14:35 -0800)]
sysfs: Allow removal of symlinks in the sysfs root
Allow callers of sysfs_remove_link() to pass a NULL kobj, in which case
sysfs_root will be used as the parent directory. This allows us to tear down
top level symlinks created via sysfs_create_link(), which already has
similar handling of a NULL parent object.
Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Tao Ma [Wed, 5 Mar 2008 07:50:12 +0000 (15:50 +0800)]
ocfs2: Reconnect after idle time out.
Currently, o2net connects to a node on hb_up and disconnects on
hb_down and net timeout.
It disconnects on net timeout is ok, but it should attempt to
reconnect back. This is because sometimes nodes get overloaded
enough that the network connection breaks but the disk hb does not.
And if we get into that situation, we either fence (unnecessarily)
or wait for its disk hb to die (and sometimes hang in the process).
So in this updated scheme, when the network disconnects, we keep
attempting to reconnect till we succeed or we get a disk hb down
event.
If the other node is really dead, then we will eventually get a
node down event. If not, we should be able to connect again and
continue.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Fri, 14 Mar 2008 18:18:24 +0000 (11:18 -0700)]
ocfs2/dlm: Cleanup lockres print
A previous patch added KERN_NOTICE to printks printing the lockres that
cluttered the output. This patch removes the log level. For people concerned
with syslog clutter, please note we now use this facility to print lockres
only during an error.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Mon, 10 Mar 2008 22:16:29 +0000 (15:16 -0700)]
ocfs2/dlm: Fix lockname in lockres print function
__dlm_print_one_lock_resource was printing lockname incorrectly.
Also, we now use printk directly instead of mlog as the latter prints
the line context which is not useful for this print.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Mon, 10 Mar 2008 22:16:25 +0000 (15:16 -0700)]
ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h
This patch moves some mle related definitions from dlmmaster.c
to dlmcommon.h. Future patches need these definitions to dump mle
debugging information.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.beckeroracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Mon, 10 Mar 2008 22:16:21 +0000 (15:16 -0700)]
ocfs2/dlm: Link all lockres' to a tracking list
This patch links all the lockres' to a tracking list in dlm_ctxt.
We will use this in an upcoming patch that will walk the entire
list and to dump the lockres states to a debugfs file.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Sunil Mushran [Mon, 10 Mar 2008 22:16:20 +0000 (15:16 -0700)]
ocfs2/dlm: Create slabcaches for lock and lockres
This patch makes the o2dlm allocate memory for lockres, lockname and lock
structures from slabcaches rather than kmalloc. This allows us to not only
make these allocs more efficient but also allows us to track the memory being
consumed by these structures.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 5 Mar 2008 01:58:56 +0000 (17:58 -0800)]
ocfs2: Allow selection of cluster plug-ins.
ocfs2 now supports plug-ins for the classic O2CB stack as well as
userspace cluster stacks in conjunction with fs/dlm. This allows zero,
one, or both of the plug-ins to be selected in Kconfig. For local mounts
(non-clustered), neither plug-in is needed. Both plugins can be loaded
at one time, the runtime will select the one needed for the cluster
systme in use.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 5 Mar 2008 00:09:39 +0000 (16:09 -0800)]
ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h
The masklog code is in the o2cb stack, but ocfs2_lockid.h now needs to
be included by the user stack. The BUG() in ocfs2_lock_type_string()
does not need masklog support, so change it to a regular BUG_ON().
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 20 Feb 2008 23:39:44 +0000 (15:39 -0800)]
ocfs2: Add the 'set version' message to the ocfs2_control device.
The "SETV" message sets the filesystem locking protocol version as
negotiated by the client. The client negotiates based on the maximum
version advertised in /sys/fs/ocfs2/max_locking_protocol.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Wed, 20 Feb 2008 22:44:34 +0000 (14:44 -0800)]
ocfs2: Add the local node id to the handshake.
This is the second part of the ocfs2_control handshake. After
negotiating the ocfs2_control protocol, the daemon tells the filesystem
what the local node id is via the SETN message.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Tue, 19 Feb 2008 03:40:12 +0000 (19:40 -0800)]
ocfs2: Start the ocfs2_control handshake.
When a control daemon opens the ocfs2_control device, it must perform a
handshake to tell the filesystem it is something capable of monitoring
cluster status. Only after the handshake is complete will the filesystem
allow mounts.
This is the first part of the handshake. The daemon reads all supported
ocfs2_control protocols, then writes in the protocol it will use.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Joel Becker [Fri, 1 Feb 2008 23:17:30 +0000 (15:17 -0800)]
ocfs2: Add the 'cluster_stack' sysfs file.
Userspace can now query and specify the cluster stack in use via the
/sys/fs/ocfs2/cluster_stack file. By default, it is 'o2cb', which is
the classic stack. Thus, old tools that do not know how to modify this
file will work just fine. The stack cannot be modified if there is a
live filesystem.
ocfs2_cluster_connect() now takes the expected cluster stack as an
argument. This way, the filesystem and the stack glue ensure they are
speaking to the same backend.
If the stack is 'o2cb', the o2cb stack plugin is used. For any other
value, the fsdlm stack plugin is selected.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>