pilppa.com Git - linux-2.6-omap-h63xx.git/log

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
jfs: ensure symlinks are NUL-terminated

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  sata_sil: add Large Block Transfer support
  [libata] ata_piix: cleanup dmi strings checking
  DMI: add dmi_match
  libata: blacklist NCQ on OCZ CORE 2 SSD (resend)
  [libata] Update kernel-doc comments to match source code
  libata: perform port detach in EH
  libata: when restoring SControl during detach do the PMP links first
  libata: beef up iterators

Merge branch 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  oprofile: select RING_BUFFER
  ring_buffer: adding EXPORT_SYMBOLs
  oprofile: fix lost sample counter
  oprofile: remove nr_available_slots()
  oprofile: port to the new ring_buffer
  ring_buffer: add remaining cpu functions to ring_buffer.h
  oprofile: moving cpu_buffer_reset() to cpu_buffer.h
  oprofile: adding cpu_buffer_entries()
  oprofile: adding cpu_buffer_write_commit()
  oprofile: adding cpu buffer r/w access functions
  ftrace: remove unused function arg in trace_iterator_increment()
  ring_buffer: update description for ring_buffer_alloc()
  oprofile: set values to default when creating oprofilefs
  oprofile: implement switch/case in buffer_sync.c
  x86/oprofile: cleanup IBS init/exit functions in op_model_amd.c
  x86/oprofile: reordering IBS code in op_model_amd.c
  oprofile: fix typo
  oprofile: whitspace changes only
  oprofile: update comment for oprofile_add_sample()
  oprofile: comment cleanup

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slub: avoid leaking caches or refcounts on sysfs error
  slab: Fix comment on #endif
  slab: remove GFP_THISNODE clearing from alloc_slabmgmt()
  slub: Add might_sleep_if() to slab_alloc()
  SLUB: failslab support
  slub: Fix incorrect use of loose
  slab: Update the kmem_cache_create documentation regarding the name parameter
  slub: make early_kmem_cache_node_alloc void
  slab: unsigned slabp->inuse cannot be less than 0
  slub - fix get_object_page comment
  SLUB: Replace __builtin_return_address(0) with _RET_IP_.
  SLUB: cleanup - define macros instead of hardcoded numbers

Merge branch 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6

* 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (37 commits)
  drm/i915: fix modeset devname allocation + agp init return check.
  drm/i915: Remove redundant test in error path.
  drm: Add a debug node for vblank state.
  drm: Avoid use-before-null-test on dev in drm_cleanup().
  drm/i915: Don't print to dmesg when taking signal during object_pin.
  drm: pin new and unpin old buffer when setting a mode.
  drm/i915: un-EXPORT and make 'intelfb_panic' static
  drm/i915: Delete unused, pointless i915_driver_firstopen.
  drm/i915: fix sparse warnings: returning void-valued expression
  drm/i915: fix sparse warnings: move 'extern' decls to header file
  drm/i915: fix sparse warnings: make symbols static
  drm/i915: fix sparse warnings: declare one-bit bitfield as unsigned
  drm/i915: Don't double-unpin buffers if we take a signal in evict_everything().
  drm/i915: Fix fbcon setup to align display pitch to 64b.
  drm/i915: Add missing userland definitions for gem init/execbuffer.
  i915/drm: provide compat defines for userspace for certain struct members.
  drm: drop DRM_IOCTL_MODE_REPLACEFB, add+remove works just as well.
  drm: sanitise drm modesetting API + remove unused hotplug
  drm: fix allowing master ioctls on non-master fds.
  drm/radeon: use locked rmmap to remove sarea mapping.
  ...

Merge branch 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6

* 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp/intel: Fix broken ® symbol in device name.
agp/intel: add support for G41 chipset

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6: (98 commits)
  sparc: move select of ARCH_SUPPORTS_MSI
  sparc: drop SUN_IO
  sparc: unify sections.h
  sparc: use .data.init_task section for init_thread_union
  sparc: fix array overrun check in of_device_64.c
  sparc: unify module.c
  sparc64: prepare module_64.c for unification
  sparc64: use bit neutral Elf symbols
  sparc: unify module.h
  sparc: introduce CONFIG_BITS
  sparc: fix hardirq.h removal fallout
  sparc64: do not export pus_fs_struct
  sparc: use sparc64 version of scatterlist.h
  sparc: Commonize memcmp assembler.
  sparc: Unify strlen assembler.
  sparc: Add asm/asm.h
  sparc: Kill memcmp_32.S code which has been ifdef'd out for centuries.
  sparc: replace for_each_cpu_mask_nr with for_each_cpu
  sparc: fix sparse warnings in irq_32.c
  sparc: add include guards to kernel.h
  ...

Merge branch 'for-2.6.29' of git://git.kernel.dk/linux-2.6-block

* 'for-2.6.29' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  bio: get rid of bio_vec clearing
  bounce: don't rely on a zeroed bio_vec list
  cciss: simplify parameters to deregister_disk function
  cfq-iosched: fix race between exiting queue and exiting task
  loop: Do not call loop_unplug for not configured loop device.
  loop: Flush possible running bios when loop device is released.
  alpha: remove dead BIO_VMERGE_BOUNDARY
  Get rid of CONFIG_LSF
  block: make blk_softirq_init() static
  block: use min_not_zero in blk_queue_stack_limits
  block: add one-hit cache for disk partition lookup
  cfq-iosched: remove limit of dispatch depth of max 4 times quantum
  nbd: tell the block layer that it is not a rotational device
  block: get rid of elevator_t typedef
  aio: make the lookup_ioctx() lockless
  bio: add support for inlining a number of bio_vecs inside the bio
  bio: allow individual slabs in the bio_set
  bio: move the slab pointer inside the bio_set
  bio: only mempool back the largest bio_vec slab cache
  block: don't use plugging on SSD devices
  ...

Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, sparseirq: clean up Kconfig entry
  x86: turn CONFIG_SPARSE_IRQ off by default
  sparseirq: fix numa_migrate_irq_desc dependency and comments
  sparseirq: add kernel-doc notation for new member in irq_desc, -v2
  locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP
  sparseirq, xen: make sure irq_desc is allocated for interrupts
  sparseirq: fix !SMP building, #2
  x86, sparseirq: move irq_desc according to smp_affinity, v7
  proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
  sparse irqs: add irqnr.h to the user headers list
  sparse irqs: handle !GENIRQ platforms
  sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
  sparseirq: fix Alpha build failure
  sparseirq: fix typo in !CONFIG_IO_APIC case
  x86, MSI: pass irq_cfg and irq_desc
  x86: MSI start irq numbering from nr_irqs_gsi
  x86: use NR_IRQS_LEGACY
  sparse irq_desc[] array: core kernel and x86 changes
  genirq: record IRQ_LEVEL in irq_desc[]
  irq.h: remove padding from irq_desc on 64bits

Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimers: fix warning in kernel/hrtimer.c
  x86: make sure we really have an hpet mapping before using it
  x86: enable HPET on Fujitsu u9200
  linux/timex.h: cleanup for userspace
  posix-timers: simplify de_thread()->exit_itimers() path
  posix-timers: check ->it_signal instead of ->it_pid to validate the timer
  posix-timers: use "struct pid*" instead of "struct task_struct*"
  nohz: suppress needless timer reprogramming
  clocksource, acpi_pm.c: put acpi_pm_read_slow() under CONFIG_PCI
  nohz: no softirq pending warnings for offline cpus
  hrtimer: removing all ur callback modes, fix
  hrtimer: removing all ur callback modes, fix hotplug
  hrtimer: removing all ur callback modes
  x86: correct link to HPET timer specification
  rtc-cmos: export second NVRAM bank

Fixed up conflicts in sound/drivers/pcsp/pcsp.c and sound/core/hrtimer.c
manually.

Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  stacktrace: provide save_stack_trace_tsk() weak alias
  rcu: provide RCU options on non-preempt architectures too
  printk: fix discarding message when recursion_bug
  futex: clean up futex_(un)lock_pi fault handling
  "Tree RCU": scalable classic RCU implementation
  futex: rename field in futex_q to clarify single waiter semantics
  x86/swiotlb: add default swiotlb_arch_range_needs_mapping
  x86/swiotlb: add default phys<->bus conversion
  x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
  x86: add swiotlb allocation functions
  swiotlb: consolidate swiotlb info message printing
  swiotlb: support bouncing of HighMem pages
  swiotlb: factor out copy to/from device
  swiotlb: add arch hook to force mapping
  swiotlb: allow architectures to override phys<->bus<->phys conversions
  swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
  rcu: fix rcutorture behavior during reboot
  resources: skip sanity check of busy resources
  swiotlb: move some definitions to header
  swiotlb: allow architectures to override swiotlb pool allocation
  ...

Fix up trivial conflicts in
  arch/x86/kernel/Makefile
  arch/x86/mm/init_32.c
  include/linux/hardirq.h
as per Ingo's suggestions.

sata_sil: add Large Block Transfer support

This implements support for the Large Block Transfer feature found in Silicon
Image 311x controllers. This allows transferring bigger contiguous chunks of
data from system memory and avoids the 64KB boundary restriction of standard
SFF controllers.

This is based on a patch from Jeff Garzik (from the sii-lbt branch of
libata-dev) but includes a few bug fixes: Since the bmdma2 register does not
implement the status bits, the original bmdma register must be used except
where the bmdma2 register is required. As well the DMA boundary should be
31-bit instead of 32-bit since the top bit of the length field is still
required for the PRD end-of-table flag.

Signed-off-by: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

[libata] ata_piix: cleanup dmi strings checking

Commit
ATA: piix, fix pointer deref on suspend
fixed a possible oops in an ugly manner. Use newly introduced dmi_match()
to make the code pretty again.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alexandru Romanescu <a_romanescu@yahoo.co.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

DMI: add dmi_match

Add a wrapper for testing system_info which will handle also NULL
system infos.

This will be used by the ata PIIX driver.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alexandru Romanescu <a_romanescu@yahoo.co.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: blacklist NCQ on OCZ CORE 2 SSD (resend)

The patchlet below blacklists NCQ on OCZ CORE v2 SSD drive(s). Even
though the drive advertises NCQ support with queue depth 1, it responds
with all-zeroes FIS to NCQ commands which triggers ata error handling
several times before the kernel decides to disable NCQ on the drive.

Signed-off-by: Lubomir Bulej <lubomir.bulej@dsrg.mff.cuni.cz>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

Merge branch 'topic/failslab' into for-linus

Conflicts:

mm/slub.c

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

Merge branches 'topic/fixes', 'topic/cleanups' and 'topic/documentation' into for-linus

slub: avoid leaking caches or refcounts on sysfs error

If a slab cache is mergeable and the sysfs alias cannot be added, the
target cache shall have its refcount decremented. kmem_cache_create()
will return NULL, so if kmem_cache_destroy() is ever called on the target
cache, it will never be freed if the refcount has been leaked.

Likewise, if a slab cache is not mergeable and the sysfs link cannot be
added, the new cache shall be removed from the slab_caches list.
kmem_cache_create() will return NULL, so it will be impossible to call
kmem_cache_destroy() on it.

Both of these operations require slub_lock since refcount of all slab
caches and slab_caches are protected by the lock.

In the mergeable case, it would be better to restore objsize and offset
back to their original values, but this could race with another merge
since slub_lock was dropped.

Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

slab: Fix comment on #endif

This #endif in slab.h is described as closing the inner block while it's for
the big CONFIG_NUMA one. That makes reading the code a bit harder.

This trivial patch fixes the comment.

Signed-off-by: Pascal Terjan <pterjan@mandriva.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

slab: remove GFP_THISNODE clearing from alloc_slabmgmt()

Commit 6cb062296f73e74768cca2f3eaf90deac54de02d ("Categorize GFP flags")
left one call-site in alloc_slabmgmt() to clear GFP_THISNODE instead of
GFP_CONSTRAINT_MASK. Unfortunately, that ends up clearing __GFP_NOWARN
and __GFP_NORETRY as well which is not what we want. As the only caller
of alloc_slabmgmt() already clears GFP_CONSTRAINT_MASK before passing
local_flags to it, we can just remove the clearing of GFP_THISNODE.

This patch should fix spurious page allocation failure warnings on the
mempool_alloc() path. See the following URL for the original discussion
of the bug:

http://lkml.org/lkml/2008/10/27/100

Acked-by: Christoph Lameter <cl@linux-foundation.org>
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

slub: Add might_sleep_if() to slab_alloc()

Currently SLUB doesn't warn about __GFP_WAIT. Add it into slab_alloc().

Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

SLUB: failslab support

Currently fault-injection capability for SLAB allocator is only
available to SLAB. This patch makes it available to SLUB, too.

[penberg@cs.helsinki.fi: unify slab and slub implementations]
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

drm/i915: fix modeset devname allocation + agp init return check.

devname needs to be allocated before the irq is installed, so the
irq routines get the correct name in /proc.

Also check the return value from the AGP init function, and
fixup the exit points.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/i915: Remove redundant test in error path.

The error path for object list being null is in the second goto target.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm: Add a debug node for vblank state.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm: Avoid use-before-null-test on dev in drm_cleanup().

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: Don't print to dmesg when taking signal during object_pin.

This showed up in logs where people had a hung chip, so pinning was blocked
on the chip unpinning other buffers, and the X Server took its scheduler
signal during that time.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm: pin new and unpin old buffer when setting a mode.

This removes the requirement for user space to pin a buffer before
setting a mode that is backed by the pixels from that buffer.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: un-EXPORT and make 'intelfb_panic' static

Fix this sparse warning:

drivers/gpu/drm/i915/intel_fb.c:417:5: warning: symbol 'intelfb_panic' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: Delete unused, pointless i915_driver_firstopen.

Thanks to Hannes Eder for pointing out that this code was dead according to
sparse.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: fix sparse warnings: returning void-valued expression

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: fix sparse warnings: move 'extern' decls to header file

Move 'extern'-decls from "intel_dvo.c" to "dvo.h", as "dvo.h" is
included by and only by files where the symbols are either defined or
used.

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: fix sparse warnings: make symbols static

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: fix sparse warnings: declare one-bit bitfield as unsigned

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: Don't double-unpin buffers if we take a signal in evict_everything().

We haven't seen this in practice, but it was visible when looking at a bug
report from when i915_gem_evict_everything() was broken and would always
return error.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>

drm/i915: Fix fbcon setup to align display pitch to 64b.

This is required by the display plane, and fixes 1400x1050 laptop displays.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/i915: Add missing userland definitions for gem init/execbuffer.

fdo bug #19132.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

i915/drm: provide compat defines for userspace for certain struct members.

Painfully userspace started using new names that were never actually to be
used from the external repo.

Also fill out the gaps in the structure for old/new userspace compat

Add compat defines for these structs.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: drop DRM_IOCTL_MODE_REPLACEFB, add+remove works just as well.

The replace fb ioctl replaces the backing buffer object for a modesetting
framebuffer object. This can be acheived by just creating a new
framebuffer backed by the new buffer object, setting that for the crtcs
in question and then removing the old framebuffer object.

Signed-off-by: Kristian Hogsberg <krh@redhat.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: sanitise drm modesetting API + remove unused hotplug

The initially merged modesetting API has some uglies in it, this
cleans up the struct members and ioctl ordering for initial submission.

It also removes the unneeded hotplug infrastructure.

airlied:- I've pulled this patch in from git modesetting-gem tree.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: fix allowing master ioctls on non-master fds.

The multi-master patches changed master to a pointer, and this fell out,
change to use is_master.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon: use locked rmmap to remove sarea mapping.

this exports the locked version of the symbol as struct_mutex locks it all.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon: fix missing hunk from the master changes.

Thanks to Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> for reporting
this.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: fix useless gcc unused variable warning

the calling function doesn't call this function unless one of the two
states that sets the value is true.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon: fix warning due to PAGE_SIZE max

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: kconfig have drm core select i2c for kms

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: PAGE_CACHE_WC is x86 only so far

The page protections need to be checked whether they need to be more flexible.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: pick an 800x600@60HZ mode by default for unknown CRT.

This is what X picks now, so we should do the same.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/i915: Fix stolen memory detection on G45 and GM45.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/i915: Register module dependencies for the modesetting code.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

DRM: i915: add mode setting support

This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported.  HDMI, DisplayPort and additional SDVO output support will
follow.

Support for the mode setting code is controlled by the new 'modeset'
module option.  A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.

Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing.  So use caution when
enabling this code; be sure your user level code supports the new
interfaces.

A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.

Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

DRM: add mode setting support

Add mode setting support to the DRM layer.

This is a fairly big chunk of work that allows DRM drivers to provide
full output control and configuration capabilities to userspace.  It was
motivated by several factors:
  - the fb layer's APIs aren't suited for anything but simple
    configurations
  - coordination between the fb layer, DRM layer, and various userspace
    drivers is poor to non-existent (radeonfb excepted)
  - user level mode setting drivers makes displaying panic & oops
    messages more difficult
  - suspend/resume of graphics state is possible in many more
    configurations with kernel level support

This commit just adds the core DRM part of the mode setting APIs.
Driver specific commits using these new structure and APIs will follow.

Co-authors: Jesse Barnes <jbarnes@virtuousgeek.org>, Jakob Bornecrantz <jakob@tungstengraphics.com>
Contributors: Alan Hourihane <alanh@tungstengraphics.com>, Maarten Maathuis <madman2003@gmail.com>

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/i915: add GEM GTT mapping support

Use the new core GEM object mapping code to allow GTT mapping of GEM
objects on i915. The fault handler will make sure a fence register is
allocated too, if the object in question is tiled.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: GEM mmap support

Add core support for mapping of GEM objects. Drivers should provide a
vm_operations_struct if they want to support page faulting of objects.
The code for handling GEM object offsets was taken from TTM, which was
written by Thomas Hellström.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/i915: Add /proc debugging entry for reading out the HWS.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: reorganise start and load.

Make sure we have the primary node so the device can add maps.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: fix leak of uninitialized data to userspace

...so drm_getunique() is trying to copy some uninitialized data to
userspace. The ECX register contains the number of words that are
left to copy -- so there are 5 * 4 = 20 bytes left. The offset of the
first uninitialized byte (counting from the start of the string) is
also 20 (i.e. 0xf65d2294&((1 << 5)-1) == 20). So somebody tried to
copy 40 bytes when the string was only 19 long.

In drm_set_busid() we have this code:

        dev->unique_len = 40;
        dev->unique = drm_alloc(dev->unique_len + 1, DRM_MEM_DRIVER);
      ...
        len = snprintf(dev->unique, dev->unique_len, pci:%04x:%02x:%02x.%d",

...so it seems that dev->unique is never updated to reflect the
actual length of the string. The remaining bytes (20 in this case)
are random uninitialized bytes that are copied into userspace.

This patch fixes the problem by setting dev->unique_len after the
snprintf().

airlied- I've had to fix this up to store the alloced size so
we have it for drm_free later.

Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Vegard Nossum <vegardno@thuin.ifi.uio.no>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: move to kref per-master structures.

This is step one towards having multiple masters sharing a drm
device in order to get fast-user-switching to work.

It splits out the information associated with the drm master
into a separate kref counted structure, and allocates this when
a master opens the device node. It also allows the current master
to abdicate (say while VT switched), and a new master to take over
the hardware.

It moves the Intel and radeon drivers to using the sarea from
within the new master structures.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: cleanup exit path for module unload

The current sub-module unload exit path is a mess, it tries
to abuse the idr. Just keep a list of devices per driver struct
and free them in-order on rmmod.

Signed-off-by: Dave Airlie <airlied@redhat.com>

bio: get rid of bio_vec clearing

We don't need to clear the memory used for adding bio_vec entries,
since nobody should be looking at members unitialized. Any valid
use should be below bio->bi_vcnt, and that members up until that count
must be valid since they were added through bio_add_page().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

bounce: don't rely on a zeroed bio_vec list

__blk_queue_bounce() relies on a zeroed bio_vec list, since it looks
up arbitrary indexes in the allocated bio. The block layer only
guarentees that added entries are valid, so clear memory after alloc.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cciss: simplify parameters to deregister_disk function

Simplify parameters to deregister_disk function.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cfq-iosched: fix race between exiting queue and exiting task

Original patch from Nikanth Karthikesan <knikanth@suse.de>

When a queue exits the queue lock is taken and cfq_exit_queue() would free all
the cic's associated with the queue.

But when a task exits, cfq_exit_io_context() gets cic one by one and then
locks the associated queue to call __cfq_exit_single_io_context. It looks like
between getting a cic from the ioc and locking the queue, the queue might have
exited on another cpu.

Fix this by rechecking the cfq_io_context queue key inside the queue lock
again, and not calling into __cfq_exit_single_io_context() if somebody
beat us to it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

loop: Do not call loop_unplug for not configured loop device.

In loop_unplug() function is expected that mapping is set
and lo->lo_backing_file is not NULL.

Unfortunately loop_set_fd() set the request queue unplug function,
but loop_clr_fd() doesn't clear that.

Loop device allows open of non-configured loop in some situations.
If the unplug on request queue is called, loop module oopses because
of missing lo_backing_file.

Simple reproducer:
losetup /dev/loop0 /xxx
losetup -d /dev/loop0
dmsetup create x --table "0 1 linear /dev/loop0 0"

EIP is at loop_unplug+0x1d/0x3b
...
  Call Trace:
   blk_unplug+0x57/0x5e
   dm_table_unplug_all+0x34/0x77 [dm_mod]
   destroy_inode+0x27/0x38
   generic_delete_inode+0xd5/0xd9
   iput+0x4b/0x4e
   dm_resume+0xca/0xfe [dm_mod]
   dev_suspend+0x143/0x165 [dm_mod]
   dm_ctl_ioctl+0x18e/0x1cf [dm_mod]
   dev_suspend+0x0/0x165 [dm_mod]
   dm_ctl_ioctl+0x0/0x1cf [dm_mod]
   vfs_ioctl+0x22/0x69
   do_vfs_ioctl+0x39d/0x3c7
   trace_hardirqs_on+0xb/0xd
   remove_vma+0x50/0x56
   do_munmap+0x21c/0x237
   sys_ioctl+0x2c/0x45
   sysenter_do_call+0x12/0x31

Several reports here
http://www.kerneloops.org/search.php?search=loop_unplug

Fix it by simply clear unplug function together with
removing of backing file.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

loop: Flush possible running bios when loop device is released.

When there are still queued bios and reference count
drops to zero, loop device must flush all queued bios.

Otherwise it can lead to situation that caller
closes the device, but some bios are still running
and endio() function call later OOpses when uses
unallocated mempool.

This happens for example when running dm-crypt over loop,
here is typical oops backtrace:

Oops: 0000 [#1] PREEMPT SMP
EIP is at mempool_free+0x12/0x6b
...
crypt_dec_pending+0x50/0x54 [dm_crypt]
crypt_endio+0x9f/0xa7 [dm_crypt]
crypt_endio+0x0/0xa7 [dm_crypt]
bio_endio+0x2b/0x2e
loop_thread+0x37a/0x3b1
do_lo_send_aops+0x0/0x165
autoremove_wake_function+0x0/0x33
loop_thread+0x0/0x3b1
kthread+0x3b/0x61
kthread+0x0/0x61
kernel_thread_helper+0x7/0x10

(But crash is reproducible with different dm targets
running over loop device too.)

Patch fixes it by flushing the bios in release call,
reusing the flush mechanism for switching backing store.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

alpha: remove dead BIO_VMERGE_BOUNDARY

The block layer dropped the virtual merge feature
(b8b3e16cfe6435d961f6aaebcfd52a1ff2a988c5). BIO_VMERGE_BOUNDARY
definition is meaningless now.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

Get rid of CONFIG_LSF

We have two seperate config entries for large devices/files. One
is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
that handles large files. This doesn't make a lot of sense, you typically
want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
to indicate that it covers both.

Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: make blk_softirq_init() static

Sparse asked whether these could be static.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: use min_not_zero in blk_queue_stack_limits

zero is invalid for max_phys_segments, max_hw_segments, and
max_segment_size. It's better to use use min_not_zero instead of
min. min() works though (because the commit 0e435ac makes sure that
these values are set to the default values, non zero, if a queue is
initialized properly).

With this patch, blk_queue_stack_limits does the almost same thing
that dm's combine_restrictions_low() does. I think that it's easy to
remove dm's combine_restrictions_low.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: add one-hit cache for disk partition lookup

disk_map_sector_rcu() returns a partition from a sector offset,
which we use for IO statistics on a per-partition basis. The
lookup itself is an O(N) list lookup, where N is the number of
partitions. This actually hurts performance quite a bit, even
on the lower end partitions. On higher numbered partitions,
it can get pretty bad.

Solve this by adding a one-hit cache for partition lookup.
This makes the lookup O(1) for the case where we do most IO to
one partition. Even for mixed partition workloads, amortized cost
is pretty close to O(1) since the natural IO batching makes the
one-hit cache last for lots of IOs.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cfq-iosched: remove limit of dispatch depth of max 4 times quantum

This basically limits the hardware queue depth to 4*quantum at any
point in time, which is 16 with the default settings. As CFQ uses
other means to shrink the hardware queue when necessary in the first
place, there's really no need for this extra heuristic. Additionally,
it ends up hurting performance in some cases.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

nbd: tell the block layer that it is not a rotational device

Then we can get rid of that manual elevator type fiddling.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: get rid of elevator_t typedef

Just use struct elevator_queue everywhere instead.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

aio: make the lookup_ioctx() lockless

The mm->ioctx_list is currently protected by a reader-writer lock,
so we always grab that lock on the read side for doing ioctx
lookups. As the workload is extremely reader biased, turn this into
an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
the rwlock and use a spinlock for providing update side exclusion.

There's usually only 1 entry on this list, so it doesn't make sense
to look into fancier data structures.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

bio: add support for inlining a number of bio_vecs inside the bio

When we go and allocate a bio for IO, we actually do two allocations.
One for the bio itself, and one for the bi_io_vec that holds the
actual pages we are interested in.

This feature inlines a definable amount of io vecs inside the bio
itself, so we eliminate the bio_vec array allocation for IO's up
to a certain size. It defaults to 4 vecs, which is typically 16k
of IO.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

bio: allow individual slabs in the bio_set

Instead of having a global bio slab cache, add a reference to one
in each bio_set that is created. This allows for personalized slabs
in each bio_set, so that they can have bios of different sizes.

This means we can personalize the bios we return. File systems may
want to embed the bio inside another structure, to avoid allocation
more items (and stuffing them in ->bi_private) after the get a bio.
Or we may want to embed a number of bio_vecs directly at the end
of a bio, to avoid doing two allocations to return a bio. This is now
possible.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

bio: move the slab pointer inside the bio_set

In preparation for adding differently sized bios.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

bio: only mempool back the largest bio_vec slab cache

We only very rarely need the mempool backing, so it makes sense to
get rid of all but one of the mempool in a bio_set. So keep the
largest bio_vec count mempool so we can always honor the largest
allocation, and "upgrade" callers that fail.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: don't use plugging on SSD devices

We just want to hand the first bits of IO to the device as fast
as possible. Gains a few percent on the IOPS rate.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: fix empty barrier on write-through w/ ordered tag

Empty barrier on write-through (or no cache) w/ ordered tag has no
command to execute and without any command to execute ordered tag is
never issued to the device and the ordering is never achieved. Force
draining for such cases.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: simplify empty barrier implementation

Empty barrier required special handling in __elv_next_request() to
complete it without letting the low level driver see it.

With previous changes, barrier code is now flexible enough to skip the
BAR step using the same barrier sequence selection mechanism. Drop
the special handling and mask off q->ordered from start_ordered().

Remove blk_empty_barrier() test which now has no user.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: make barrier completion more robust

Barrier completion had the following assumptions.

* start_ordered() couldn't finish the whole sequence properly.  If all
  actions are to be skipped, q->ordseq is set correctly but the actual
  completion was never triggered thus hanging the barrier request.

* Drain completion in elv_complete_request() assumed that there's
  always at least one request in the queue when drain completes.

Both assumptions are true but these assumptions need to be removed to
improve empty barrier implementation.  This patch makes the following
changes.

* Make start_ordered() use blk_ordered_complete_seq() to mark skipped
  steps complete and notify __elv_next_request() that it should fetch
  the next request if the whole barrier has completed inside
  start_ordered().

* Make drain completion path in elv_complete_request() check whether
  the queue is empty.  Empty queue also indicates drain completion.

* While at it, convert 0/1 return from blk_do_ordered() to false/true.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: make every barrier action optional

In all barrier sequences, the barrier write itself was always assumed
to be issued and thus didn't have corresponding control flag. This
patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in
start_ordered() such that any barrier action can be skipped.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: remove duplicate or unused barrier/discard error paths

* Because barrier mode can be changed dynamically, whether barrier is
  supported or not can be determined only when actually issuing the
  barrier and there is no point in checking it earlier.  Drop barrier
  support check in generic_make_request() and __make_request(), and
  update comment around the support check in blk_do_ordered().

* There is no reason to check discard support in both
  generic_make_request() and __make_request().  Drop the check in
  __make_request().  While at it, move error action block to the end
  of the function and add unlikely() to q existence test.

* Barrier request, be it empty or not, is never passed to low level
  driver and thus it's meaningless to try to copy back req->sector to
  bio->bi_sector on error.  In addition, the notion of failed sector
  doesn't make any sense for empty barrier to begin with.  Drop the
  code block from __end_that_request_first().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: reorganize QUEUE_ORDERED_* constants

Separate out ordering type (drain,) and action masks (preflush,
postflush, fua) from visible ordering mode selectors
(QUEUE_ORDERED_*). Ordering types are now named QUEUE_ORDERED_BY_*
while action masks are named QUEUE_ORDERED_DO_*.

This change is necessary to add QUEUE_ORDERED_DO_BAR and make it
optional to improve empty barrier implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: reorder struct bio to remove padding on 64bit

Remove 8 bytes of padding from struct bio which also removes 16 bytes from
struct bio_pair to make it 248 bytes. bio_pair then fits into one fewer
cache lines & into a smaller slab.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: use cancel_work_sync() instead of kblockd_flush_work()

After many improvements on kblockd_flush_work, it is now identical to
cancel_work_sync, so a direct call to cancel_work_sync is suggested.

The only difference is that cancel_work_sync is a GPL symbol,
so no non-GPL modules anymore.

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set

Allow the scsi request REQ_QUIET flag to be propagated to the buffer
file system layer. The basic ideas is to pass the flag from the scsi
request to the bio (block IO) and then to the buffer layer.  The buffer
layer can then suppress needless printks.

This patch declutters the kernel log by removed the 40-50 (per lun)
buffer io error messages seen during a boot in my multipath setup . It
is a good chance any real errors will be missed in the "noise" it the
logs without this patch.

During boot I see blocks of messages like
"
__ratelimit: 211 callbacks suppressed
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242847
Buffer I/O error on device sdm, logical block 1
Buffer I/O error on device sdm, logical block 5242878
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242872
"
in my logs.

My disk environment is multipath fiber channel using the SCSI_DH_RDAC
code and multipathd.  This topology includes an "active" and "ghost"
path for each lun. IO's to the "ghost" path will never complete and the
SCSI layer, via the scsi device handler rdac code, quick returns the IOs
to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
layer messages.

I am wanting to extend the QUIET behavior to include the buffer file
system layer to deal with these errors as well. I have been running this
patch for a while now on several boxes without issue.  A few runs of
bonnie++ show no noticeable difference in performance in my setup.

Thanks for John Stultz for the quiet_error finalization.

Submitted-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: don't take lock on changing ra_pages

There's no need to take queue_lock or kernel_lock when modifying
bdi->ra_pages. So remove them. Also remove out of date comment for
queue_max_sectors_store().

Signed-off-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

Documentation: remove reference to ll_rw_blk.c and moved drivers/block/elevator.c

The drivers/block/ll_rw_block.c has been split and organized in the block/
directory, and also drivers/block/elevator.c has been moved to the block/
directory. Update Documentation/block/biodoc.txt accordingly

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block/blk-tag.c: cleanup kernel-doc

There is no argument named @tags in blk_init_tags,
remove its' comment.

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cciss: switch to using hlist for command list management

This both cleans up the code and also helps detect the spurious case
of a command attempted being removed from a queue it doesn't belong
to.

Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

Do not free io context when taking recursive faults in do_exit

When taking recursive faults in do_exit, if the io_context is not null,
exit_io_context() is being called. But it might decrement the refcount
more than once. It is better to leave this task alone.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cdrom: reduce stack usage of mmc_ioctl_dvd_read_struct

1. kmalloc 192 bytes in dvd_read_bca (which is inlined into dvd_read_struct)
2. Pass struct packet_command to all dvd_read_* functions.

Checkstack output:
Before: mmc_ioctl_dvd_read_struct: 280
After: mmc_ioctl_dvd_read_struct: 56

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cdrom: split mmc_ioctl to lower stack usage

Checkstack output:

Before:
mmc_ioctl:                  584

After:
mmc_ioctl_dvd_read_struct:  280
mmc_ioctl_cdrom_subchannel: 152
mmc_ioctl_cdrom_read_data:  120
mmc_ioctl_cdrom_volume:     104
mmc_ioctl_cdrom_read_audio: 104
(mmc_ioctl is inlined into cdrom_ioctl - 104 bytes)

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

scsi-ioctl: use clock_t <> jiffies

Convert the timeout ioctl scalling to use the clock_t functions
which are much more accurate with some USER_HZ vs HZ combinations.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: leave the request timeout timer running even on an empty list

For sync IO, we'll often do them serialized. This means we'll be touching
the queue timer for every IO, as opposed to only occasionally like we
do for queued IO. Instead of deleting the timer when the last request
is removed, just let continue running. If a new request comes up soon
we then don't have to readd the timer again. If no new requests arrive,
the timer will expire without side effect later.

This improves high iops sync IO by ~1%.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: add comment in blk_rq_timed_out() about why next can not be 0

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: optimizations in blk_rq_timed_out_timer()

Now the rq->deadline can't be zero if the request is in the
timeout_list, so there is no need to have next_set. There is no need to
access a request's deadline field if blk_rq_timed_out is called on it.

Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

xen-blkfront: set queue paravirt flag

Xen's blkfront sets noop as the default I/O scheduler at initialization
time to avoid elevator overheads such as idling, but with the advent of
basic disk profiling capabilities this is not necessary anymore. We
should just tell the block layer that we are a paravirt front-end driver
and the elevator will automatically make the necessary adjustments.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>