pilppa.com Git - linux-2.6-omap-h63xx.git/log

UBIFS: fix recovery bug

UBIFS did not recovery in a situation in which it could
have. The relevant function assumed there could not be
more nodes in an eraseblock after a corrupted node, but
in fact the last (NAND) page written might contain anything.
The correct approach is to check for empty space (0xFF bytes)
from then on.

Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>

Merge branch 'linus' into locking-for-linus

Conflicts:
lib/Kconfig.debug

regulator: twl4030 VAUX3 supports 3.0V

TWL4030 and TWL5030 support 3.0V on VAUX3.

Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>

regulator: Support disabling of unused regulators by machines

At present it is not possible for machine constraints to disable
regulators which have been left on when the system starts, for example
as a result of fixed default configurations in hardware. This means that
power may be wasted by these regulators if they are not in use.

Provide intial support for this with a late_initcall which will disable
any unused regulators if the machine has enabled this feature by calling
regulator_has_full_constraints(). If this has not been called then print
a warning to encourage users to fully specify their constraints so that
we can change this to be the default behaviour in future.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Don't increment use_count for boot_on regulators

Don't set use_count for regulators that are enabled at boot since this
stops the supply being disabled by well-behaved consumers which do
balanced enables and disabled. Any consumers which don't do disables
which are not matched by enables are unable to share regulators - shared
regulators are the common case so the API should facilitate them.

Consumers that want to disable regulators that are enabled when they
start have two options:

- Do a regulator_enable() prior to the disable to bring the use count
   in sync with the hardware state; this will ensure that if the
   regulator was enabled by another driver then this consumer will play
   nicely with it.
- Use regulator_force_disable(); this explicitly bypasses any checks
   done by the core and documents the inability of the driver to share
   the supply.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

twl4030-regulator: expose VPLL2

Add VPLL2 to the set of twl4030-family regulators exposed for
use by various drivers. It's commonly used to power the digital
video outputs (e.g. LCD or DVI displays) on OMAP3 systems.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: refcount fixes

Fix some refcounting issues in the regulator framework, supporting
regulator_disable() for regulators that were enabled at boot time
via machine constraints:

- Update those regulators' usecounts after enabling, so they
   can cleanly be disabled at that level.

- Remove the problematic per-consumer usecount, so there's
   only one level of enable/disable.

Buggy consumers could notice different bug symptoms.  The main
example would be refcounting bugs; also, any (out-of-tree) users
of the experimental regulator_set_optimum_mode() stuff which
don't call it when they're done using a regulator.

This is a net minor codeshrink.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Don't warn if we failed to get a regulator

The consumer can print a message if required, some consumers may have
optional regulators and wish to downgrade the logging for them or ignore
their absence.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Allow boot_on regulators to be disabled by clients

Rather than incrementing the reference count for boot_on regulators
(which prevents them being disabled later on) simply force the
regulator to be enabled when applying the constraints. Previously
boot_on was essentially equivalent to always_on.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Implement list_voltage for WM835x LDOs and DCDCs

Implement the recently added voltage step listing API for the WM835x
DCDCs and LDOs. DCDCs can use values up to 0x66, LDOs can use the full
range of values in the mask. Both masks are the lower bits of the
register.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

twl4030-regulator: list more VAUX4 voltages

The VAUX4 voltage table scrolls onto a second page in many versions
of the TWL4030 family manuals. This doesn't mean we should ignore
those values! Some boards use the (fully supported) 2.8V setting.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Don't warn on omitted voltage constraints

Specifying voltage constraints is optional (and only needed if the
consumer is allowed to change the voltage) so don't complain unless
a voltage has been specified.

Also avoid surprises with a dangling else while we're here.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Implement list_voltage() for WM8400 DCDCs and LDOs

All DCDCs and LDOs are identical.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

MMC: regulator utilities

Glue between MMC and regulator stacks ... verified with
some OMAP3 boards using adjustable and configured-as-fixed
regulators on several MMC controllers.

These calls are intended to be used by MMC host adapters
using at least one regulator per host. Examples include
slots with regulators supporting multiple voltages and
ones using multiple voltage rails (e.g. DAT4..DAT7 using a
separate supply, or a split rail chip like certain SDIO
WLAN or eMMC solutions).

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: twl4030 voltage enumeration (v2)

Update previously-posted twl4030 regulator driver to export
supported voltages to upper layers using a new mechanism.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: twl4030 regulators

Support most of the LDO regulators in the twl4030 family chips.
In the case of LDOs supporting MMC/SD, the voltage controls are
used; but in most other cases, the regulator framework is only
used to enable/disable a supplies, conserving power when a given
voltage rail is not needed.

The drivers/mfd/twl4030-core.c code already sets up the various
regulators according to board-specific configuration, and knows
that some chips don't provide the full set of voltage rails.

The omitted regulators are intended to be under hardware control,
such as during the hardware-mediated system powerup, powerdown,
and suspend states. Unless/until software hooks are known to
be safe, they won't be exported here.

These regulators implement the new get_status() operation, but
can't realistically implement get_mode(); the status output is
effectively the result of a vote, with the relevant hardware
inputs not exposed.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: get_status() grows kerneldoc

Add kerneldoc for the new get_status() message.  Fix the existing
kerneldoc for that struct in two ways:

(a) Syntax, making sure parameter descriptions immediately
     follow the one-line struct description and that the first
     blank lines is before any more expansive description;
(b) Presentation for a few points, to highlight the fact that
     the previous "get" methods exist only to report the current
     configuration, not to display actual status.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: enumerate voltages (v2)

Add a basic mechanism for regulators to report the discrete
voltages they support: list_voltage() enumerates them using
selectors numbered from 0 to an upper bound.

Use those methods to force machine-level constraints into bounds.
(Example: regulator supports 1.8V, 2.4V, 2.6V, 3.3V, and board
constraints for that rail are 2.0V to 3.6V ... so the range of
voltages is then 2.4V to 3.3V on this board.)

Export those voltages to the regulator consumer interface, so for
example regulator hooked up to an MMC/SD/SDIO slot can report the
actual voltage options available to cards connected there.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Fix get_mode() for WM835x DCDCs

The WM835x regulators need a different register checking for force
mode on each DCDC. Previously the force mode status for DCDC1 was
checked.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Allow regulators to set the initial operating mode

This is useful when wishing to run in a fixed operating mode that isn't
the default.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Suggest use of datasheet supply or pin names for consumers

Update the documentation to suggest the use of datasheet names for
the supplies requested by regulator consumers. Doing this makes it
easier to tie the design for a given platform up with the requirements
of the driver for a consumer.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: email - update email address and regulator webpage.

Remove deceased email address and update to new address. Also update
website details in MAINTAINERS with correct page.

Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: add unset_regulator_supplies to fix regulator_unregister

Currently regulator_unregister does not clear regulator <--> consumer
mapping.
This patch introduces unset_regulator_supplies that clear the map.

Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: fix header file missing kernel-doc

Fix regulator/driver.h missing kernel-doc:

Warning(linux-next-20090120//include/linux/regulator/driver.h:108): No description found for parameter 'get_status'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Liam Girdwood <lrg@slimlogic.co.uk>
cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Hoist struct regulator_dev out of core to fix notifiers

Commit 872ed3fe176833f7d43748eb88010da4bbd2f983 caused regulator drivers
to take the struct regulator_dev lock themselves which requires that the
struct be visible to them. Band aid this by making the struct visible.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Mark attributes table for virtual regulator static

It's not exported.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Make fixed voltage regulators visible in Kconfig

This allows users to enable or disable support for these regulators at
build time as they can for other regulators rather than having platforms
force the regulators to be built in.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Allow init_data to be passed to fixed voltage regulators

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Allow init data to be supplied for bq24022

Previously it was not possible to do so, making it impossible for
machines to configure the driver.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: Pass regulator init data as explict argument when registering

Rather than having the regulator init data read from the platform_data
member of the struct device that is registered for the regulator make
the init data an explict argument passed in when registering. This
allows drivers to use the platform data for their own purposes if they
wish.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

Regulator: Push lock out of _notifier_call_chain + add voltage change event.

Regulator: Push lock out of _notifier_call_chain and into caller functions
(side effect of fixing deadlock in regulator_force_disable)
+ Add a voltage changed event.
Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: minor cleanup of virtual consumer

On Thu, 15 Jan 2009 16:10:22 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 14 Jan 2009 13:16:27 -0800
> David Brownell <david-b@pacbell.net> wrote:
>
> > From: David Brownell <dbrownell@users.sourceforge.net>
> >
> > Minor cleanup to the regulator set_mode sysfs support:
> > switch to sysfs_streq() in set_mode(), which is also
> > a code shrink.  Use the same strings that get_mode()
> > uses, shrinking data too.
> >
> > Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
> > ---
> >  drivers/regulator/virtual.c |    8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > --- a/drivers/regulator/virtual.c
> > +++ b/drivers/regulator/virtual.c
> > @@ -226,13 +226,13 @@ static ssize_t set_mode(struct device *d
> >   unsigned int mode;
> >   int ret;
> >
> > - if (strncmp(buf, "fast", strlen("fast")) == 0)
> > + if (sysfs_streq(buf, "fast\n") == 0)
> >   mode = REGULATOR_MODE_FAST;
> > - else if (strncmp(buf, "normal", strlen("normal")) == 0)
> > + else if (sysfs_streq(buf, "normal\n") == 0)
> >   mode = REGULATOR_MODE_NORMAL;
> > - else if (strncmp(buf, "idle", strlen("idle")) == 0)
> > + else if (sysfs_streq(buf, "idle\n") == 0)
> >   mode = REGULATOR_MODE_IDLE;
> > - else if (strncmp(buf, "standby", strlen("standby")) == 0)
> > + else if (sysfs_streq(buf, "standby\n") == 0)
> >   mode = REGULATOR_MODE_STANDBY;
>
> we don't need the \n's, do we?

oh, it's for the string sharing.  Sneaky.

I wonder how many people will try to fix that up for us?

Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: add get_status()

Based on previous LKML discussions:

* Update docs for regulator sysfs class attributes to highlight
   the fact that all current attributes are intended to be control
   inputs, including notably "state" and "opmode" which previously
   implied otherwise.

* Define a new regulator driver get_status() method, which is the
   first method reporting regulator outputs instead of inputs.
   It can report on/off and error status; or instead of simply
   "on", report the actual operating mode.

For the moment, this is a sysfs-only interface, not accessible to
regulator clients.  Such clients can use the current notification
interfaces to detect errors, if the regulator reports them.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

regulator: minor cleanup of virtual consumer

Minor cleanup to the regulator set_mode sysfs support:
switch to sysfs_streq() in set_mode(), which is also
a code shrink. Use the same strings that get_mode()
uses, shrinking data too.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>

sh: ap325 and Migo-R use new sh_mobile_ceu_info flags

Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

Revert "xfs: increase the maximum number of supported ACL entries"

This reverts commit 8b112171734c791afaf43ccc8c6ec492e7006e44.
Premature unintended commit.

Signed-off-by: Felix Blyakher <felixb@sgi.com>

md/raid5 revise rules for when to update metadata during reshape

We currently update the metadata :
1/ every 3Megabytes
2/ When the place we will write new-layout data to is recorded in
    the metadata as still containing old-layout data.

Rule one exists to avoid having to re-do too much reshaping in the
face of a crash/restart.  So it should really be time based rather
than size based.  So change it to "every 10 seconds".

Rule two turns out to be too harsh when restriping an array
'in-place', as in that case the metadata much be updates for every
stripe.
For the in-place update, it can only possibly be safe from a crash if
some user-space program data a backup of every e.g. few hundred
stripes before allowing them to be reshaped.  In that case, the
constant metadata update is pointless.
So only update the metadata if the new metadata will report that the
end of the 'old-layout' data is beyond where we are currently
writing 'new-layout' data.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: minor code cleanups in make_request.

... and to be certain the that make_request doesn't wait forever,
add a 'wake_up' when ->reshape_progress has been set to MaxSector

Signed-off-by: NeilBrown <neilb@suse.de>

md: remove CONFIG_MD_RAID_RESHAPE config option.

This was only needed when the code was experimental. Most of it
is well tested now, so the option is no longer useful.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: be more careful about write ordering when reshaping.

When we are reshaping an array, it is very important that we read
the data from a particular sector offset before writing new data
at that offset.

In most cases when growing or shrinking an array we read long before
we even consider writing.  But when restriping an array without
changing it size, there is a small possibility that we might have
some data to available write before the read has happened at the same
location.  This would require some stripes to be in cache already.

To guard against this small possibility, we check, before writing,
that the 'old' stripe at the same location is not in the process of
being read.  And we ensure that we mark all 'source' stripes as such
before allowing new 'destination' stripes to proceed.

Signed-off-by: NeilBrown <neilb@suse.de>

md: don't display meaningless values in sysfs files resync_start and sync_speed

When no resync if happening, both of these files currently have
meaningless values (is slightly different ways).
Change them to "none" in that case.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: allow layout and chunksize to be changed on active array.

If an array has 3 or more devices, we allow the chunksize or layout
to be changed and when a reshape starts, we use these as the 'new'
values.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: reshape using largest of old and new chunk size

This ensures that even when old and new stripes are overlapping,
we will try to read all of the old before having to write any
of the new.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: prepare for allowing reshape to change layout

Add prev_algo to raid5_conf_t along the same lines as prev_chunk
and previous_raid_disks.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: prepare for allowing reshape to change chunksize.

Add "prev_chunk" to raid5_conf_t, similar to "previous_raid_disks", to
remember what the chunk size was before the reshape that is currently
underway.

This seems like duplication with "chunk_size" and "new_chunk" in
mddev_t, and to some extent it is, but there are differences.
The values in mddev_t are always defined and often the same.
The prev* values are only defined if a reshape is underway.

Also (and more significantly) the raid5_conf_t values will be changed
at the same time (inside an appropriate lock) that the reshape is
started by setting reshape_position. In contrast, the new_chunk value
is set when the sysfs file is written which could be well before the
reshape starts.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: clearly differentiate 'before' and 'after' stripes during reshape.

During a raid5 reshape, we have some stripes in the cache that are
'before' the reshape (and are still to be processed) and some that are
'after'. They are currently differentiated by having different
->disks values as the only reshape current supported involves changing
the number of disks.

However we will soon support reshapes that do not change the number
of disks (chunk parity or chunk size). So make the difference more
explicit with a 'generation' number.

Signed-off-by: NeilBrown <neilb@suse.de>

Documentation/md.txt update

Update md.txt to reflect recent changes in a number of sysfs
attributes.

Signed-off-by: NeilBrown <neilb@suse.de>

md: allow number of drives in raid5 to be reduced

When reshaping a raid5 to have fewer devices, we work from the end of
the array to the beginning.
md_do_sync gives addresses to sync_request that go from the beginning
to the end. So largely ignore them use the internal state variable
"reshape_progress" to keep track of what to do next.

Never allow the size to be reduced below the minimum (4 for raid6,
3 otherwise).

We require that the size of the array has already been reduced before
the array is reshaped to a smaller size. This is because simply
reducing the size is an easily reversible operation, while the reshape
is immediately destructive and so is not reversible for the blocks at
the ends of the devices.
Thus to reshape an array to have fewer devices, you must first write
an appropriately small size to md/array_size.

When reshape finished, we remove any drives that are no longer
needed and fix up ->degraded.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: change reshape-progress measurement to cope with reshaping backwards.

When reducing the number of devices in a raid4/5/6, the reshape
process has to start at the end of the array and work down to the
beginning. So we need to handle expand_progress and expand_lo
differently.

This patch renames "expand_progress" and "expand_lo" to avoid the
implication that anything is getting bigger (expand->reshape) and
every place they are used, we make sure that they are used the right
way depending on whether delta_disks is positive or negative.

Signed-off-by: NeilBrown <neilb@suse.de>

md: add explicit method to signal the end of a reshape.

Currently raid5 (the only module that supports restriping)
notices that the reshape has finished be sync_request being
given a large value, and handles any cleanup them.

This patch changes it so md_check_recovery calls into an
explicit finish_reshape method as well.

The clean-up from sync_request can do things that need to be
done promptly, typically things local to the raid5_conf_t
structure.

The "finish_reshape" method is called under the mddev_lock
so it can do things involving reconfiguring the device.

This allows us to get rid of md_set_array_sectors_locked, which
would have caused a deadlock if you tried to stop and array
while a reshape was happening.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: enhance raid5_size to work correctly with negative delta_disks

This is the first of four patches which combine to allow md/raid5 to
reduce the number of devices in the array by restriping the data over
a subset of the devices.

If the number of disks in a raid4/5/6 is being reduced, then the
default size must be based on the new number, not the old number
of devices.
In general, it should be based on the smaller of new and old.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: drop qd_idx from r6_state

We now have this value in stripe_head so we don't need to duplicate
it.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid6: move raid6 data processing to raid6_pq.ko

Move the raid6 data processing routines into a standalone module
(raid6_pq) to prepare them to be called from async_tx wrappers and other
non-md drivers/modules. This precludes a circular dependency of raid456
needing the async modules for data processing while those modules in
turn depend on raid456 for the base level synchronous raid6 routines.

To support this move:
1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h
2/ The raid6_call, recovery calls, and table symbols are exported
3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to
compile

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: raid5 run(): Fix max_degraded for raid level 4.

raid4 allows only one failed disk.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>

md: 'array_size' sysfs attribute

Allow userspace to set the size of the array according to the following
semantics:

1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
   a) If size is set before the array is running, do_md_run will fail
      if size is greater than the default size
   b) A reshape attempt that reduces the default size to less than the set
      array size should be blocked
2/ once userspace sets the size the kernel will not change it
3/ writing 'default' to this attribute returns control of the size to the
   kernel and reverts to the size reported by the personality

Also, convert locations that need to know the default size from directly
reading ->array_sectors to <pers>_size.  Resync/reshape operations
always follow the default size.

Finally, fixup other locations that read a number of 1k-blocks from
userspace to use strict_blocks_to_sectors() which checks for unsigned
long long to sector_t overflow and blocks to sectors overflow.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

md: centralize ->array_sectors modifications

Get personalities out of the business of directly modifying
->array_sectors. Lays groundwork to introduce policy on when
->array_sectors can be modified.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

md: add 'size' as a personality method

In preparation for giving userspace control over ->array_sectors we need
to be able to retrieve the 'default' size, and the 'anticipated' size
when a reshape is requested. For personalities that do not reshape emit
a warning if anything but the default size is requested.

In the raid5 case we need to update ->previous_raid_disks to make the
new 'default' size available.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

md: fix typo in FSF address

Hello,

I found a typo Bosto"m" in FSF address.
And I am checking around linux source code.
Here is the only place which uses Bosto"m" (not Boston).

Signed-off-by: Atsushi SAKAI <sakaia@jp.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: add takeover support for converting raid6 back into raid5

If a raid6 is still in the layout that comes from converting raid5
into a raid6. this will allow us to convert it back again.

Signed-off-by: NeilBrown <neilb@suse.de>

md: add takeover support for raid4 -> raid5 conversion.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: allow layout/chunksize to be changed on an active 2-drive raid5.

2-drive raid5's aren't very interesting. But if you are converting
a raid1 into a raid5, you will at least temporarily have one. And
that it a good time to set the layout/chunksize for the new RAID5
if you aren't happy with the defaults.

layout and chunksize don't actually affect the placement of data
on a 2-drive raid5, so we just do some internal book-keeping.

Signed-off-by: NeilBrown <neilb@suse.de>

md: add ->takeover method for raid5 to be able to take over raid1

The RAID1 must have two drives and be a suitable size to
be a multiple of a chunksize that isn't too small.

Signed-off-by: NeilBrown <neilb@suse.de>

md: add ->takeover method to support changing the personality managing an array

Implement this for RAID6 to be able to 'takeover' a RAID5 array. The
new RAID6 will use a layout which places Q on the last device, and
that device will be missing.
If there are any available spares, one will immediately have Q
recovered onto it.

Signed-off-by: NeilBrown <neilb@suse.de>

md: enable suspend/resume of md devices.

To be able to change the 'level' of an md/raid array, we need to
suspend the device so that no requests are active - then move some
pointers around etc.

The code already keeps counts of active requests and the ->quiesce
function can be used to wait until those counts hit zero.
However the quiesce function blocks new requests once they are all
ready 'inside' the personality module, and that is too late if we want
to replace the personality modules.

So make all md requests come in through a common md_make_request
function that keeps track of how many requests have entered the
modules but may not yet be on the internal reference counts.
Allow md_make_request to be blocked when we want to suspend the
device, and make it possible to wait for all those in-transit requests
to be added to internal lists so that ->quiesce can wait for them.

There is still a problem that when a request completes, we drop the
ref count inside the personality code so there is a short time between
when the refcount hits zero, and when the personality code is no
longer being used.
The personality code never blocks (schedule or spinlock) between
dropping the refcount and exiting the routine, so this should be safe
(as put_module calls synchronize_sched() before unmapping the module
code).

Signed-off-by: NeilBrown <neilb@suse.de>

md: md_unregister_thread should cope with being passed NULL

Mostly md_unregister_thread is only called when we know that the
thread is NULL, but sometimes we need to check first. It is safer
to put the check inside md_unregister_thread itself.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: refactor raid5 "run"

.. so that the code to create the private data structures is separate.
This will help with future code to change the level of an active
array.

Signed-off-by: NeilBrown <neilb@suse.de>

md: make sure new_level, new_chunksize, new_layout always have sensible values.

When an md array is undergoing a change, we have new_* fields that
show the new values.
When no change is happening, it is least confusing if these have
the same value as the normal fields.
This is true in most cases, but not when the values are set via sysfs.

So fix this up.

A subsequent patch will BUG_ON if these things aren't consistent.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: finish support for DDF/raid6

DDF requires RAID6 calculations over different devices in a different
order.
For md/raid6, we calculate over just the data devices, starting
immediately after the 'Q' block.
For ddf/raid6 we calculate over all devices, using zeros in place of
the P and Q blocks.

This requires unfortunately complex loops...

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: Add support for new layouts for raid5 and raid6.

DDF uses different layouts for P and Q blocks than current md/raid6
so add those that are missing.
Also add support for RAID6 layouts that are identical to various
raid5 layouts with the simple addition of one device to hold all of
the 'Q' blocks.
Finally add 'raid5' layouts to match raid4.
These last to will allow online level conversion.

Note that this does not provide correct support for DDF/raid6 yet
as the order in which data blocks are summed to produce the Q block
is significant and different between current md code and DDF
requirements.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: simplify raid5_compute_sector interface

Rather than passing 'pd_idx' and 'qd_idx' to be filled in, pass
a 'struct stripe_head *' and fill in the relevant fields. This is
more extensible.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid6: remove expectation that Q device is immediately after P device.

Code currently assumes that the devices in a raid6 stripe are
  0 1 ... N-1 P Q
in some rotated order.  We will shortly add new layouts in which
this strict pattern is broken.
So remove this expectation.  We still assume that the data disks
are roughly in-order.  However P and Q can be inserted anywhere within
that order.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: change raid5_compute_sector and stripe_to_pdidx to take a 'previous' argument

This similar to the recent change to get_active_stripe.
There is no functional change, just come rearrangement to make
future patches cleaner.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: simplify interface for init_stripe and get_active_stripe

Rather than passing 'pd_idx' and 'disks' to these functions, just pass
'previous' which tells whether to use the 'previous' or 'current'
geometry during a reshape, and let init_stripe calculate
disks and pd_idx and anything else it might need.

This is not a substantial simplification and even adds a division.
However we will shortly be adding more complexity to init_stripe
to handle more interesting 'reshape' activities, and without this
change, the interface to these functions would get very complex.

Signed-off-by: NeilBrown <neilb@suse.de>

md: Represent raid device size in sectors.

This patch renames the "size" field of struct mdk_rdev_s to
"sectors" and changes this field to store sectors instead of
blocks.

All users of this field, linear.c, raid0.c and md.c, are fixed up
accordingly which gets rid of many multiplications and divisions.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>

md: Make mddev->size sector-based.

This patch renames the "size" field of struct mddev_s to "dev_sectors"
and stores the number of 512-byte sectors instead of the number of
1K-blocks in it.

All users of that field, including raid levels 1,4-6,10, are adjusted
accordingly. This simplifies the code a bit because it allows to get
rid of a couple of divisions/multiplications by two.

In order to make checkpatch happy, some minor coding style issues
have also been addressed. In particular, size_store() now uses
strict_strtoull() instead of simple_strtoull().

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>

md: be more consistent about setting WriteMostly flag when adding a drive to an array

When a drive is added to an array using ADD_NEW_DISK, there are two
places we can get certain flags from: the metadata on the disk or the
flags passed through the IOCTL.

For the WriteMostly flag (aka MD_DISK_WRITEMOSTLY) we take the value
from either of those sources depending on if it is set (i.e. we
effectively 'or' the two sources together).

This makes it awkward to clear, and is at best inconsistent.

As documented code (in mdadm) requires that setting
MD_DISK_WRITEMOSTLY in the ioctl will be effective, we resolve the
inconsistency by always using the value for this flag from the ioctl,
and ignoring the value on disk.

Signed-off-by: NeilBrown <neilb@suse.de>

md: occasionally checkpoint drive recovery to reduce duplicate effort after a crash

Version 1.x metadata has the ability to record the status of a
partially completed drive recovery.
However we only update that record on a clean shutdown.
It would be nice to update it on unclean shutdowns too, particularly
when using a bitmap that removes much to the 'sync' effort after an
unclean shutdown.

One complication with checkpointing recovery is that we only know
where we are up to in terms of IO requests started, not which ones
have completed.  And we need to know what has completed to record
how much is recovered.  So occasionally pause the recovery until all
submitted requests are completed, then update the record of where
we are up to.

When we have a bitmap, we already do that pause occasionally to keep
the bitmap up-to-date.  So enhance that code to record the recovery
offset and schedule a superblock update.
And when there is no bitmap, just pause 16 times during the resync to
do a checkpoint.
'16' is a fairly arbitrary number.  But we don't really have any good
way to judge how often is acceptable, and it seems like a reasonable
number for now.

Signed-off-by: NeilBrown <neilb@suse.de>

md: move md_k.h from include/linux/raid/ to drivers/md/

It really is nicer to keep related code together..

Signed-off-by: NeilBrown <neilb@suse.de>

md: move lots of #include lines out of .h files and into .c

This makes the includes more explicit, and is preparation for moving
md_k.h to drivers/md/md.h

Remove include/raid/md.h as its only remaining use was to #include
other files.

Signed-off-by: NeilBrown <neilb@suse.de>

md: move most content from md.h to md_k.h

The extern function definitions are kernel-internal definitions, so
they belong in md_k.h

The MD_*_VERSION values could reasonably go in a number of places,
but md_u.h seems most reasonable.

This leaves almost nothing in md.h. It will go soon.

Signed-off-by: NeilBrown <neilb@suse.de>

md: move LEVEL_* definition from md_k.h to md_u.h

.. as they are part of the user-space interface.
Also move MdpMinorShift into there so we can remove duplication.

Lastly move mdp_major in. It is less obviously part of the user-space
interface, but do_mounts_md.c uses it, and it is acting a bit like
user-space.

Signed-off-by: NeilBrown <neilb@suse.de>

md: move headers out of include/linux/raid/

Move the headers with the local structures for the disciplines and
bitmap.h into drivers/md/ so that they are more easily grepable for
hacking and not far away. md.h is left where it is for now as there
are some uses from the outside.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>

cleanup drivers/md/Makefile

Use the -y variables instead of the old -objs so we can easily add
conditional objects to the modules. Also always use += to add
subobjects to avoid problems when placing additional objects in
some place in the file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>

md: stop defining MAJOR_NR

MAJOR_NR was only required for magic in linux/blk.h in 2.4 or earlier
kernels, so no need to keep it around.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>

MD data integrity support

md: Add support for data integrity to MD

If all subdevices support the same protection format the MD device is
flagged as integrity capable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: write bitmap information to devices that are undergoing recovery.

When we add some spares to an array and start recovery, and we have
a bitmap which is stored 'internally' on all devices, we call
bitmap_write_all to make sure the bitmap is correct on the new
device(s).
However that doesn't work as write_sb_page only writes to
'In_sync' devices, and devices undergoing recovery are not
'In_sync' until recovery finishes.

So extend write_sb_page (actually next_active_rdev) to include devices
that are under recovery.

Signed-off-by: NeilBrown <neilb@suse.de>

md: never clear bit from the write-intent bitmap when the array is degraded.

It is safe to clear a bit from the write-intent bitmap for a raid1
if we know the data has been written to all devices, which is
what the current test does.

But it is not always safe to update the 'events_cleared' counter in
that case.  This is because one request could complete successfully
after some other request has partially failed.

So simply disable the clearing and updating of events_cleared whenever
the array is degraded.  This might end up not clearing some bits that
could safely be cleared, but it is safest approach.

Note that the bug fixed here did not risk corrupting data by letting
the array get out-of-sync.  Rather it meant that when a device is
removed and re-added to the array, it might incorrectly require a full
recovery rather than just recovering based on the bitmap.

Signed-off-by: NeilBrown <neilb@suse.de>

md: Allow write-intent bitmaps to have chunksize < PAGE_SIZE

md currently insists that the chunk size used for write-intent
bitmaps (the amount of data that corresponds to one chunk)
be at least one page.

The reason for this restriction is lost in the mists of time,
but a review of the code (and a vague memory) suggests that the only
problem would be related to resync. Resync tries very hard to
work in multiples of a page, but also needs to sync with units
of a bitmap_chunk too.

This connection comes out in the bitmap_start_sync call.

So change bitmap_start_sync to always work in multiples of a page.
If the bitmap chunk size is less that one page, we flag multiple
chunks as 'syncing' and generally make them all appear to the
resync routines like one chunk.

All other code either already works with data ranges that could
span multiple chunks, or explicitly only cares about a single chunk.

Signed-off-by: Neil Brown <neilb@suse.de>

md: Fix is_mddev_idle test (again).

There are two problems with is_mddev_idle.

1/ sync_io is 'atomic_t' and hence 'int'.  curr_events and all the
   rest are 'long'.
   So if sync_io were to wrap on a 64bit host, the value of
   curr_events would go very negative suddenly, and take a very
   long time to return to positive.

   So do all calculations as 'int'.  That gives us plenty of precision
   for what we need.

2/ To initialise rdev->last_events we simply call is_mddev_idle, on
   the assumption that it will make sure that last_events is in a
   suitable range.  It used to do this, but now it does not.
   So now we need to be more explicit about initialisation.

Signed-off-by: NeilBrown <neilb@suse.de>

Merge branch 'master' of git://git.kernel.org/pub/scm/fs/xfs/xfs

parisc: fix macro expansion in atomic.h

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

Merge branch 'cpumask-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

Conflicts:

arch/x86/include/asm/topology.h
drivers/oprofile/buffer_sync.c
(Both cases: changed in Linus' tree, removed in Ingo's).

parisc: iosapic: fix build breakage

drivers/parisc/iosapic.c:717: error: incompatible types in assignment

irq_desc::affinity was changed from cpumask_t to cpumask_var_t in
7f7ace0cda (cpumask: update irq_desc to use cpumask_var_t)

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>

parisc: oops_enter()/oops_exit() in die()

As pointed out by Russell in http://marc.info/?l=linux-arch&m=118208089204630&w=2

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>

parisc: document light weight syscall ABI

Document the LWS ABI including implementation notes for
userspace, and comment cleanup.

Remove extraneous .align 16 after lws_lock_start.

Signed-off-by: Carlos O'Donell <carlos@systemhalted.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>

parisc: blink all or loadavg LEDs on oops

- depending on machine type, blink all leds or just the loadavg
leds twice a second on oops.
- cancel_rearming_delayed_workqueue() is obsolete,
use cancel_delayed_work_sync() instead

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>

parisc: add ftrace (function and graph tracer) functionality

This patch adds the ftrace debugging functionality to the parisc kernel.
It will currently only work with 64bit kernels, because the gcc options -pg
and -ffunction-sections can't be enabled at the same time and -ffunction-sections
is still needed to be able to link 32bit kernels.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>

parisc: simplify sys_clone()

No need to test clone_flags here and set parent_tidptr and child_tidptr
accordingly. The same check will be done in do_fork() and copy_process() anyway.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>

parisc: add LATENCYTOP_SUPPORT and CONFIG_STACKTRACE_SUPPORT

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>