Chris Mason [Mon, 10 Nov 2008 12:31:30 +0000 (07:31 -0500)]
Btrfs: Make sure pages are dirty before doing delalloc for them
This adds a PageDirty check to the writeback path that locks pages
for delalloc. If a page wasn't dirty at this point, it is in the
process of being truncated away.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 10 Nov 2008 12:26:33 +0000 (07:26 -0500)]
Btrfs: Don't substract too much from the allocation target (avoid wrapping)
When metadata allocation clustering has to fall back to unclustered
allocs because large free areas could not be found, it was sometimes
substracting too much from the total bytes to allocate. This would
make it wrap below zero.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 7 Nov 2008 23:22:45 +0000 (18:22 -0500)]
Btrfs: Avoid unplug storms during commit
While doing a commit, btrfs makes sure all the metadata blocks
were properly written to disk, calling wait_on_page_writeback for
each page. This writeback happens after allowing another transaction
to start, so it competes for the disk with other processes in the FS.
If the page writeback bit is still set, each wait_on_page_writeback might
trigger an unplug, even though the page might be waiting for checksumming
to finish or might be waiting for the async work queue to submit the
bio.
This trades wait_on_page_writeback for waiting on the extent writeback
bits. It won't trigger any unplugs and substantially improves performance
in a number of workloads.
This also changes the async bio submission to avoid requeueing if there
is only one device. The requeue just wastes CPU time because there are
no other devices to service.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 7 Nov 2008 17:35:44 +0000 (12:35 -0500)]
Btrfs: make sure compressed bios don't complete too soon
When writing a compressed extent, a number of bios are created that
point to a single struct compressed_bio. At end_io time an atomic counter in
the compressed_bio struct makes sure that all of the bios have finished
before final end_io processing is done.
But when multiple bios are needed to write a compressed extent, the
counter was being incremented after the first bio was sent to submit_bio.
It is possible the bio will complete before the counter is incremented,
making the end_io handler free the compressed_bio struct before
processing is finished.
The fix is to increment the atomic counter before bio submission,
both for compressed reads and writes.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 7 Nov 2008 03:02:51 +0000 (22:02 -0500)]
Btrfs: Optimize compressed writeback and reads
When reading compressed extents, try to put pages into the page cache
for any pages covered by the compressed extent that readpages didn't already
preload.
Add an async work queue to handle transformations at delayed allocation processing
time. Right now this is just compression. The workflow is:
1) Find offsets in the file marked for delayed allocation
2) Lock the pages
3) Lock the state bits
4) Call the async delalloc code
The async delalloc code clears the state lock bits and delalloc bits. It is
important this happens before the range goes into the work queue because
otherwise it might deadlock with other work queue items that try to lock
those extent bits.
The file pages are compressed, and if the compression doesn't work the
pages are written back directly.
An ordered work queue is used to make sure the inodes are written in the same
order that pdflush or writepages sent them down.
This changes extent_write_cache_pages to let the writepage function
update the wbc nr_written count.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 7 Nov 2008 03:03:00 +0000 (22:03 -0500)]
Btrfs: Add ordered async work queues
Btrfs uses kernel threads to create async work queues for cpu intensive
operations such as checksumming and decompression. These work well,
but they make it difficult to keep IO order intact.
A single writepages call from pdflush or fsync will turn into a number
of bios, and each bio is checksummed in parallel. Once the checksum is
computed, the bio is sent down to the disk, and since we don't control
the order in which the parallel operations happen, they might go down to
the disk in almost any order.
The code deals with this somewhat by having deep work queues for a single
kernel thread, making it very likely that a single thread will process all
the bios for a single inode.
This patch introduces an explicitly ordered work queue. As work structs
are placed into the queue they are put onto the tail of a list. They have
three callbacks:
->func (cpu intensive processing here)
->ordered_func (order sensitive processing here)
->ordered_free (free the work struct, all processing is done)
The work struct has three callbacks. The func callback does the cpu intensive
work, and when it completes the work struct is marked as done.
Every time a work struct completes, the list is checked to see if the head
is marked as done. If so the ordered_func callback is used to do the
order sensitive processing and the ordered_free callback is used to do
any cleanup. Then we loop back and check the head of the list again.
This patch also changes the checksumming code to use the ordered workqueues.
One a 4 drive array, it increases streaming writes from 280MB/s to 350MB/s.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 31 Oct 2008 16:46:39 +0000 (12:46 -0400)]
Btrfs: Compression corner fixes
Make sure we keep page->mapping NULL on the pages we're getting
via alloc_page. It gets set so a few of the callbacks can do the right
thing, but in general these pages don't have a mapping.
Don't try to truncate compressed inline items in btrfs_drop_extents.
The whole compressed item must be preserved.
Don't try to create multipage inline compressed items. When we try to
overwrite just the first page of the file, we would have to read in and recow
all the pages after it in the same compressed inline items. For now, only
create single page inline items.
Make sure we lock pages in the correct order during delalloc. The
search into the state tree for delalloc bytes can return bytes before
the page we already have locked.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan Zheng [Thu, 30 Oct 2008 18:25:28 +0000 (14:25 -0400)]
Btrfs: Add fallocate support v2
This patch updates btrfs-progs for fallocate support.
fallocate is a little different in Btrfs because we need to tell the
COW system that a given preallocated extent doesn't need to be
cow'd as long as there are no snapshots of it. This leverages the
-o nodatacow checks.
Yan Zheng [Thu, 30 Oct 2008 18:20:02 +0000 (14:20 -0400)]
Btrfs: update nodatacow code v2
This patch simplifies the nodatacow checker. If all references
were created after the latest snapshot, then we can avoid COW
safely. This patch also updates run_delalloc_nocow to do more
fine-grained checking.
Yan Zheng [Thu, 30 Oct 2008 18:19:50 +0000 (14:19 -0400)]
Btrfs: Fix bookend extent race v2
When dropping middle part of an extent, btrfs_drop_extents truncates
the extent at first, then inserts a bookend extent.
Since truncation and insertion can't be done atomically, there is a small
period that the bookend extent isn't in the tree. This causes problem for
functions that search the tree for file extent item. The way to fix this is
lock the range of the bookend extent before truncation.
Yan Zheng [Thu, 30 Oct 2008 18:19:41 +0000 (14:19 -0400)]
Btrfs: update hole handling v2
This patch splits the hole insertion code out of btrfs_setattr
into btrfs_cont_expand and updates btrfs_get_extent to properly
handle the case that file extent items are not continuous.
Chris Mason [Thu, 30 Oct 2008 15:23:27 +0000 (11:23 -0400)]
Btrfs: prevent looping forever in finish_current_insert and del_pending_extents
finish_current_insert and del_pending_extents process extent tree modifications
that build up while we are changing the extent tree. It is a confusing
bit of code that prevents recursion.
Both functions run through a list of pending operations and both funcs
add to the list of pending operations. If you have two procs in either
one of them, they can end up looping forever making more work for each other.
This patch makes them walk forward through the list of pending changes instead
of always trying to process the entire list. At transaction commit
time, we catch any changes that were left over.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan Zheng [Wed, 29 Oct 2008 18:49:05 +0000 (14:49 -0400)]
Btrfs: Add root tree pointer transaction ids
This patch adds transaction IDs to root tree pointers.
Transaction IDs in tree pointers are compared with the
generation numbers in block headers when reading root
blocks of trees. This can detect some types of IO errors.
Josef Bacik [Wed, 29 Oct 2008 18:49:05 +0000 (14:49 -0400)]
Btrfs: nuke fs wide allocation mutex V2
This patch removes the giant fs_info->alloc_mutex and replaces it with a bunch
of little locks.
There is now a pinned_mutex, which is used when messing with the pinned_extents
extent io tree, and the extent_ins_mutex which is used with the pending_del and
extent_ins extent io trees.
The locking for the extent tree stuff was inspired by a patch that Yan Zheng
wrote to fix a race condition, I cleaned it up some and changed the locking
around a little bit, but the idea remains the same. Basically instead of
holding the extent_ins_mutex throughout the processing of an extent on the
extent_ins or pending_del trees, we just hold it while we're searching and when
we clear the bits on those trees, and lock the extent for the duration of the
operations on the extent.
Also to keep from getting hung up waiting to lock an extent, I've added a
try_lock_extent so if we cannot lock the extent, move on to the next one in the
tree and we'll come back to that one. I have tested this heavily and it does
not appear to break anything. This has to be applied on top of my
find_free_extent redo patch.
I tested this patch on top of Yan's space reblancing code and it worked fine.
The only thing that has changed since the last version is I pulled out all my
debugging stuff, apparently I forgot to run guilt refresh before I sent the
last patch out. Thank you,
Josef Bacik [Wed, 29 Oct 2008 18:49:05 +0000 (14:49 -0400)]
Btrfs: fix enospc when there is plenty of space
So there is an odd case where we can possibly return -ENOSPC when there is in
fact space to be had. It only happens with Metadata writes, and happens _very_
infrequently. What has to happen is we have to allocate have allocated out of
the first logical byte on the disk, which would set last_alloc to
first_logical_byte(root, 0), so search_start == orig_search_start. We then
need to allocate for normal metadata, so BTRFS_BLOCK_GROUP_METADATA |
BTRFS_BLOCK_GROUP_DUP. We will do a block lookup for the given search_start,
block_group_bits() won't match and we'll go to choose another block group.
However because search_start matches orig_search_start we go to see if we can
allocate a chunk.
If we are in the situation that we cannot allocate a chunk, we fail and ENOSPC.
This is kind of a big flaw of the way find_free_extent works, as it along with
find_free_space loop through _all_ of the block groups, not just the ones that
we want to allocate out of. This patch completely kills find_free_space and
rolls it into find_free_extent. I've introduced a sort of state machine into
this, which will make it easier to get cache miss information out of the
allocator, and will work well with my locking changes.
The basic flow is this: We have the variable loop which is 0, meaning we are
in the hint phase. We lookup the block group for the hint, and lookup the
space_info for what we want to allocate out of. If the block group we were
pointed at by the hint either isn't of the correct type, or just doesn't have
the space we need, we set head to space_info->block_groups, so we start at the
beginning of the block groups for this particular space info, and loop through.
This is also where we add the empty_cluster to total_needed. At this point
loop is set to 1 and we just loop through all of the block groups for this
particular space_info looking for the space we need, just as find_free_space
would have done, except we only hit the block groups we want and not _all_ of
the block groups. If we come full circle we see if we can allocate a chunk.
If we cannot of course we exit with -ENOSPC and we are good. If not we start
over at space_info->block_groups and loop through again, with loop == 2. If we
come full circle and haven't found what we need then we exit with -ENOSPC.
I've been running this for a couple of days now and it seems stable, and I
haven't yet hit a -ENOSPC when there was plenty of space left.
Also I've made a groups_sem to handle the group list for the space_info. This
is part of my locking changes, but is relatively safe and seems better than
holding the space_info spinlock over that entire search time. Thanks,
Yan Zheng [Wed, 29 Oct 2008 18:49:05 +0000 (14:49 -0400)]
Btrfs: Improve space balancing code
This patch improves the space balancing code to keep more sharing
of tree blocks. The only case that breaks sharing of tree blocks is
data extents get fragmented during balancing. The main changes in
this patch are:
Add a 'drop sub-tree' function. This solves the problem in old code
that BTRFS_HEADER_FLAG_WRITTEN check breaks sharing of tree block.
Remove relocation mapping tree. Relocation mappings are stored in
struct btrfs_ref_path and updated dynamically during walking up/down
the reference path. This reduces CPU usage and simplifies code.
This patch also fixes a bug. Root items for reloc trees should be
updated in btrfs_free_reloc_root.
Chris Mason [Wed, 29 Oct 2008 18:49:59 +0000 (14:49 -0400)]
Btrfs: Add zlib compression support
This is a large change for adding compression on reading and writing,
both for inline and regular extents. It does some fairly large
surgery to the writeback paths.
Compression is off by default and enabled by mount -o compress. Even
when the -o compress mount option is not used, it is possible to read
compressed extents off the disk.
If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later.
* While finding delalloc extents, the pages are locked before being sent down
to the delalloc handler. This allows the delalloc handler to do complex things
such as cleaning the pages, marking them writeback and starting IO on their
behalf.
* Inline extents are inserted at delalloc time now. This allows us to compress
the data before inserting the inline extent, and it allows us to insert
an inline extent that spans multiple pages.
* All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
are changed to record both an in-memory size and an on disk size, as well
as a flag for compression.
From a disk format point of view, the extent pointers in the file are changed
to record the on disk size of a given extent and some encoding flags.
Space in the disk format is allocated for compression encoding, as well
as encryption and a generic 'other' field. Neither the encryption or the
'other' field are currently used.
In order to limit the amount of data read for a single random read in the
file, the size of a compressed extent is limited to 128k. This is a
software only limit, the disk format supports u64 sized compressed extents.
In order to limit the ram consumed while processing extents, the uncompressed
size of a compressed extent is limited to 256k. This is a software only limit
and will be subject to tuning later.
Checksumming is still done on compressed extents, and it is done on the
uncompressed version of the data. This way additional encodings can be
layered on without having to figure out which encoding to checksum.
Compression happens at delalloc time, which is basically singled threaded because
it is usually done by a single pdflush thread. This makes it tricky to
spread the compression load across all the cpus on the box. We'll have to
look at parallel pdflush walks of dirty inodes at a later time.
Decompression is hooked into readpages and it does spread across CPUs nicely.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Fri, 10 Oct 2008 14:24:32 +0000 (10:24 -0400)]
Btrfs: make tree_search_offset more flexible in its searching
Sometimes we end up freeing a reserved extent because we don't need it, however
this means that its possible for transaction->last_alloc to point to the middle
of a free area.
When we search for free space in find_free_space we do a tree_search_offset
with contains set to 0, because we want it to find the next best free area if
we do not have an offset starting on the given offset.
Unfortunately that currently means that if the offset we were given as a hint
points to the middle of a free area, we won't find anything. This is especially
bad if we happened to last allocate from the big huge chunk of a newly formed
block group, since we won't find anything and have to go back and search the
long way around.
This fixes this problem by making it so that we return the free space area
regardless of the contains variable. This made cache missing happen _alot_
less, and speeds things up considerably.
Linus Torvalds [Thu, 9 Oct 2008 21:04:54 +0000 (14:04 -0700)]
Don't allow splice() to files opened with O_APPEND
This is debatable, but while we're debating it, let's disallow the
combination of splice and an O_APPEND destination.
It's not entirely clear what the semantics of O_APPEND should be, and
POSIX apparently expects pwrite() to ignore O_APPEND, for example. So
we could make up any semantics we want, including the old ones.
But Miklos convinced me that we should at least give it some thought,
and that accepting writes at arbitrary offsets is wrong at least for
IS_APPEND() files (which always have O_APPEND set, even if the reverse
isn't true: you can obviously have O_APPEND set on a regular file).
So disallow O_APPEND entirely for now. I doubt anybody cares, and this
way we have one less gray area to worry about.
Matt Mackall [Wed, 8 Oct 2008 19:51:57 +0000 (14:51 -0500)]
SLOB: fix bogus ksize calculation fix
This fixes the previous fix, which was completely wrong on closer
inspection. This version has been manually tested with a user-space
test harness and generates sane values. A nearly identical patch has
been boot-tested.
The problem arose from changing how kmalloc/kfree handled alignment
padding without updating ksize to match. This brings it in sync.
Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Creating a subvolume is in many ways like a normal VFS ->mkdir, and we
really need to play with the VFS topology locking rules. So instead of
just creating the snapshot on disk and then later getting rid of
confliting aliases do it correctly from the start. This will become
especially important once we allow for subvolumes anywhere in the tree,
and not just below a hidden root.
Note that snapshots will need the same treatment, but do to the delay
in creating them we can't do it currently. Chris promised to fix that
issue, so I'll wait on that.
Yan Zheng [Thu, 9 Oct 2008 15:46:19 +0000 (11:46 -0400)]
Btrfs: Fix leaf reference cache miss
Due to the optimization for truncate, tree leaves only containing
checksum items can be deleted without being COW'ed first. This causes
reference cache misses. The way to fix the miss is create cache
entries for tree leaves only contain checksum.
This patch also fixes a -EEXIST issue in shared reference cache.
Yan Zheng [Thu, 9 Oct 2008 15:46:24 +0000 (11:46 -0400)]
Btrfs: Remove offset field from struct btrfs_extent_ref
The offset field in struct btrfs_extent_ref records the position
inside file that file extent is referenced by. In the new back
reference system, tree leaves holding references to file extent
are recorded explicitly. We can scan these tree leaves very quickly, so the
offset field is not required.
This patch also makes the back reference system check the objectid
when extents are in deleting.
hwmon: (abituguru3) Enable DMI probing feature on Abit AT8 32X
Enable driver checking of the DMI product name (when enabled) on
an Abit AT8 32X, instead of falling back to a manual probe. This
eliminates false negatives and eventually will help avoid
unnecessary bus probes on unsupported mainboards.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk> Tested-by: Daniel Exner <dex@dragonslave.de> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
hwmon: (abituguru3) Enable reading from AUX3 fan on Abit AT8 32X
The table for the Abit AT8 32X was incorrectly missing an entry
for the sixth ("AUX3") fan. Add this entry, exporting the fan
reading to userspace.
Closes lm-sensors.org ticket #2339.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk> Tested-by: Daniel Exner <dex@dragonslave.de> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 9 Oct 2008 13:33:58 +0000 (15:33 +0200)]
hwmon: (it87) Prevent power-off on Shuttle SN68PT
On the Shuttle SN68PT, FAN_CTL2 is apparently not connected to a fan,
but to something else. One user has reported instant system power-off
when changing the PWM2 duty cycle, so we disable it.
I use the board name string as the trigger in case the same board is
ever used in other systems.
This closes lm-sensors ticket #2349:
pwmconfig causes a hard poweroff
http://www.lm-sensors.org/ticket/2349
Corentin Chary [Thu, 9 Oct 2008 13:33:57 +0000 (15:33 +0200)]
eeepc-laptop: Fix hwmon interface
Creates a name file in the sysfs directory, that
is needed for the libsensors library to work.
Also rename fan1_pwm to pwm1 and scale its value as needed.
This fixes bug #11520:
http://bugzilla.kernel.org/show_bug.cgi?id=11520
Signed-off-by: Corentin Chary <corentincj@iksaif.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd.
net: Fix netdev_run_todo dead-lock
tcp: Fix possible double-ack w/ user dma
net: only invoke dev->change_rx_flags when device is UP
netrom: Fix sock_orphan() use in nr_release
ax25: Quick fix for making sure unaccepted sockets get destroyed.
Revert "ax25: Fix std timer socket destroy handling."
[Bluetooth] Add reset quirk for A-Link BlueUSB21 dongle
[Bluetooth] Add reset quirk for new Targus and Belkin dongles
[Bluetooth] Fix double frees on error paths of btusb and bpa10x drivers
tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd.
Because of rounding, in certain conditions, i.e. when in congestion
avoidance state rho is smaller than 1/128 of the current cwnd, TCP
Hybla congestion control starves and the cwnd is kept constant
forever.
This patch forces an increment by one segment after #send_cwnd calls
without increments(newreno behavior).
Signed-off-by: Daniele Lacamera <root@danielinux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 7 Oct 2008 22:50:03 +0000 (15:50 -0700)]
net: Fix netdev_run_todo dead-lock
Benjamin Thery tracked down a bug that explains many instances
of the error
unregister_netdevice: waiting for %s to become free. Usage count = %d
It turns out that netdev_run_todo can dead-lock with itself if
a second instance of it is run in a thread that will then free
a reference to the device waited on by the first instance.
The problem is really quite silly. We were trying to create
parallelism where none was required. As netdev_run_todo always
follows a RTNL section, and that todo tasks can only be added
with the RTNL held, by definition you should only need to wait
for the very ones that you've added and be done with it.
There is no need for a second mutex or spinlock.
This is exactly what the following patch does.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Ali Saidi [Tue, 7 Oct 2008 22:31:19 +0000 (15:31 -0700)]
tcp: Fix possible double-ack w/ user dma
From: Ali Saidi <saidi@engin.umich.edu>
When TCP receive copy offload is enabled it's possible that
tcp_rcv_established() will cause two acks to be sent for a single
packet. In the case that a tcp_dma_early_copy() is successful,
copied_early is set to true which causes tcp_cleanup_rbuf() to be
called early which can send an ack. Further along in
tcp_rcv_established(), __tcp_ack_snd_check() is called and will
schedule a delayed ACK. If no packets are processed before the delayed
ack timer expires the packet will be acked twice.
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 7 Oct 2008 22:26:48 +0000 (15:26 -0700)]
net: only invoke dev->change_rx_flags when device is UP
Jesper Dangaard Brouer <hawk@comx.dk> reported a bug when setting a VLAN
device down that is in promiscous mode:
When the VLAN device is set down, the promiscous count on the real
device is decremented by one by vlan_dev_stop(). When removing the
promiscous flag from the VLAN device afterwards, the promiscous
count on the real device is decremented a second time by the
vlan_change_rx_flags() callback.
The root cause for this is that the ->change_rx_flags() callback is
invoked while the device is down. The synchronization is meant to mirror
the behaviour of the ->set_rx_mode callbacks, meaning the ->open function
is responsible for doing a full sync on open, the ->close() function is
responsible for doing full cleanup on ->stop() and ->change_rx_flags()
is meant to do incremental changes while the device is UP.
Only invoke ->change_rx_flags() while the device is UP to provide the
intended behaviour.
Tested-by: Jesper Dangaard Brouer <jdb@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Mackall [Tue, 7 Oct 2008 16:37:35 +0000 (11:37 -0500)]
SLOB: fix bogus ksize calculation
SLOB's ksize calculation was braindamaged and generally harmlessly
underreported the allocation size. But for very small buffers, it could
in fact overreport them, leading code depending on krealloc to overrun
the allocation and trample other data.
Signed-off-by: Matt Mackall <mpm@selenic.com> Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Theodore Ts'o (tytso@mit.edu) wrote:
>
> I've been playing with adding some markers into ext4 to see if they
> could be useful in solving some problems along with Systemtap. It
> appears, though, that as of 2.6.27-rc8, markers defined in code which is
> compiled directly into the kernel (i.e., not as modules) don't show up
> in Module.markers:
>
> kvm_trace_entryexit arch/x86/kvm/kvm-intel %u %p %u %u %u %u %u %u
> kvm_trace_handler arch/x86/kvm/kvm-intel %u %p %u %u %u %u %u %u
> kvm_trace_entryexit arch/x86/kvm/kvm-amd %u %p %u %u %u %u %u %u
> kvm_trace_handler arch/x86/kvm/kvm-amd %u %p %u %u %u %u %u %u
>
> (Note the lack of any of the kernel_sched_* markers, and the markers I
> added for ext4_* and jbd2_* are missing as wel.)
>
> Systemtap apparently depends on in-kernel trace_mark being recorded in
> Module.markers, and apparently it's been claimed that it used to be
> there. Is this a bug in systemtap, or in how Module.markers is getting
> built? And is there a file that contains the equivalent information
> for markers located in non-modules code?
Linus Torvalds [Mon, 6 Oct 2008 21:29:16 +0000 (14:29 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: gart iommu have direct mapping when agp is present too
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
ide: workaround for bogus gcc warning in ide_sysfs_register_port()
ide-cd: Optiarc DVD RW AD-7200A does play audio
IDE: Fix platform device registration in Swarm IDE driver (v2)
ide-dma: fix ide_build_dmatable() for TRM290
ide-cd: temporary tray close fix
Linus Torvalds [Mon, 6 Oct 2008 21:27:39 +0000 (14:27 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] IP27: Fix build errors if CONFIG_MAPPED_KERNEL=y
[MIPS] Fix CMP Kconfig configuration and mark as broken.
Linus Torvalds [Mon, 6 Oct 2008 21:27:15 +0000 (14:27 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (33 commits)
V4L/DVB (9103): em28xx: HVR-900 B3C0 - fix audio clicking issue
V4L/DVB (9099): em28xx: Add detection for K-WORLD DVB-T 310U
V4L/DVB (9092): gspca: Bad init values for sonixj ov7660.
V4L/DVB (9080): gspca: Add a delay after writing to the sonixj sensors.
V4L/DVB (9075): gspca: Bad check of returned status in i2c_read() spca561.
V4L/DVB (9053): fix buffer overflow in uvc-video
V4L/DVB (9043): S5H1420: Fix size of shadow-array to avoid overflow
V4L/DVB (9037): Fix support for Hauppauge Nova-S SE
V4L/DVB (9029): Fix deadlock in demux code
V4L/DVB (8979): sms1xxx: Add new USB product ID for Hauppauge WinTV MiniStick
V4L/DVB (8978): sms1xxx: fix product name for Hauppauge WinTV MiniStick
V4L/DVB (8967): Use correct XC3028L firmware for AMD ATI TV Wonder 600
V4L/DVB (8963): s2255drv field count fix
V4L/DVB (8961): zr36067: Fix RGBR pixel format
V4L/DVB (8960): drivers/media/video/cafe_ccic.c needs mm.h
V4L/DVB (8958): zr36067: Return proper bytes-per-line value
V4L/DVB (8957): zr36067: Restore the default pixel format
V4L/DVB (8955): bttv: Prevent NULL pointer dereference in radio_open
V4L/DVB (8935): em28xx-cards: Remove duplicate entry (EM2800_BOARD_KWORLD_USB2800)
V4L/DVB (8933): gspca: Disable light frquency for zc3xx cs2102 Kokom.
...
atmel-mci: Initialize BLKR before sending data transfer command
The atmel-mci driver sometimes fails data transfers like this:
mmcblk0: error -5 transferring data
end_request: I/O error, dev mmcblk0, sector 2749769
end_request: I/O error, dev mmcblk0, sector 2749777
It turns out that this might be caused by the BLKR register (which
contains the block size and the number of blocks being transfered) being
initialized too late. This patch moves the initialization of BLKR so
that it contains the correct value before the block transfer command is
sent.
This error is difficult to reproduce, but if you insert a long delay
(mdelay(10) or thereabouts) between the calls to atmci_start_command()
and atmci_submit_data(), all transfers seem to fail without this patch,
while I haven't seen any failures with this patch.
Jarek Poplawski [Mon, 6 Oct 2008 19:54:57 +0000 (12:54 -0700)]
netrom: Fix sock_orphan() use in nr_release
While debugging another bug it was found that NetRom socks
are sometimes seen unorphaned in sk_free(). This patch moves
sock_orphan() in nr_release() to the beginning (like in ax25,
or rose).
Reported-and-tested-by: Bernard Pidoux f6bvp <f6bvp@free.fr> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Oct 2008 19:53:50 +0000 (12:53 -0700)]
ax25: Quick fix for making sure unaccepted sockets get destroyed.
Since we reverted 30902dc3cb0ea1cfc7ac2b17bcf478ff98420d74 ("ax25: Fix
std timer socket destroy handling.") we have to put some kind of fix
in to cure the issue whereby unaccepted connections do not get destroyed.
The approach used here is from Tihomir Heidelberg - 9a4gl
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wessel [Mon, 6 Oct 2008 18:50:59 +0000 (13:50 -0500)]
kgdb: call touch_softlockup_watchdog on resume
The softlockup watchdog needs to be touched when resuming the from the
kgdb stopped state to avoid the printk that a CPU is stuck if the
debugger was active for longer than the softlockup threshold.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Jan Kiszka [Mon, 6 Oct 2008 18:50:59 +0000 (13:50 -0500)]
kgdb, x86: Avoid invoking kgdb_nmicallback twice per NMI
Stress-testing KVM's latest NMI support with kgdbts inside an SMP guest,
I came across spurious unhandled NMIs while running the singlestep test.
Looking closer at the code path each NMI takes when KGDB is enabled, I
noticed that kgdb_nmicallback is called twice per event: One time via
DIE_NMI_IPI notification, the second time on DIE_NMI. Removing the first
invocation cures the unhandled NMIs here.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
x86 ACPI: Blacklist two HP machines with buggy BIOSes
There is a bug in the BIOSes of some HP boxes with AMD Turions which
connects IO-APIC pins with ACPI thermal trip points in such a way that
if the state of the IO-APIC is not as expected by the (buggy) BIOS, the
thermal trip points are set to insanely low values (usually all of them
become 16 degrees Celsius). As a result, thermal throttling kicks in
and knock the system down to its shoes.
Unfortunately some of the recent IO-APIC changes made the bug show up.
To prevent this from happening, blacklist machines that are known to be
affected (nx6115 and 6715b in this particular case).
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=11516 listed as
a regression from 2.6.26.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
and the whole story is described in this (huge) thread:
As far as the Dmitry's and Jason's boxes are concerned, I recognized the
symptoms and asked them to verify that the blacklisting helped.
It appears that the buggy BIOS code has been copy-pasted to the entire
range of machines, for no good reason.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Jason Vas Dias <jason.vas.dias@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcel Holtmann [Mon, 6 Oct 2008 10:22:52 +0000 (12:22 +0200)]
[Bluetooth] Add reset quirk for A-Link BlueUSB21 dongle
The new A-Link Bluetooth dongle is another one based on the BCM2046 chip
from Broadcom and it also needs to send HCI_Reset before it becomes fully
operational. Without the quirk it will show a lot of I/O errors.
Marcel Holtmann [Mon, 6 Oct 2008 10:22:51 +0000 (12:22 +0200)]
[Bluetooth] Add reset quirk for new Targus and Belkin dongles
Targus and Belkin have come out with new Bluetooth 2.1 capable dongles
using the latest BCM2046 chip from Broadcom. Both of them are so called
HID proxy dongles and they need to send HCI_Reset before they become
fully operational.
Marcel Holtmann [Mon, 6 Oct 2008 10:22:51 +0000 (12:22 +0200)]
[Bluetooth] Fix double frees on error paths of btusb and bpa10x drivers
The transfer buffer of an URB will be automatically freed when using
the URB_FREE_BUFFER transfer_flag. So the extra calls to kfree() will
cause a double free.
Ralf Baechle [Sun, 5 Oct 2008 16:23:28 +0000 (18:23 +0200)]
IDE: Fix platform device registration in Swarm IDE driver (v2)
The Swarm IDE driver uses a release method which is defined in the driver
itself thus potentially oopsable. The simple fix would be to just leak
the device but this patch goes the full length and moves the entire
handling of the platform device in the platform code and retains only
the platform driver code in drivers/ide/mips/swarm.c.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
[bart: remove no longer needed BLK_DEV_IDE_SWARM from ide/Kconfig] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Apparently, 'xcount' being 0 does not mean 0 bytes for TRM290; it means 4 bytes,
judging from the code immediately preceding this check. So, we must never try
to "split" the PRD for TRM290.
This is probably never hit anyway -- with the DMA buffers aligned to at least
512 bytes and ATAPI DMA not being used for non block I/O commands...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Borislav Petkov [Sun, 5 Oct 2008 16:23:27 +0000 (18:23 +0200)]
ide-cd: temporary tray close fix
This one fixes http://bugzilla.kernel.org/show_bug.cgi?id=11602.
A more generic fix for drives which cannot autoclose tray will follow.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com>
[bart: add an extra parentheses for consistency with the rest of kernel code] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
V4L/DVB (9099): em28xx: Add detection for K-WORLD DVB-T 310U
Correct firmware type to MTS
Correct audio routing for composite/s-video
Add DVB-T detection.
This patch uses the eeprom hash method for detection as the vendor/product
ids are also used for the DIGIVOX_AD. This may be a clone of the same
product. Explanatory text has been added prior to the hask look-up in
anticipation that it may help others.
The following has been tested to work:
Analogue TV (PAL-I)
Composite In
DVB-T (UK Crystal Palace)
USB AUDIO
The following has not been tested but probably works:
S-Video In
Ralph Loader [Tue, 23 Sep 2008 00:06:48 +0000 (21:06 -0300)]
V4L/DVB (9053): fix buffer overflow in uvc-video
There is a buffer overflow in drivers/media/video/uvc/uvc_ctrl.c:
INFO: 0xf2c5ce08-0xf2c5ce0b. First byte 0xa1 instead of 0xcc
INFO: Allocated in uvc_query_v4l2_ctrl+0x3c/0x239 [uvcvideo] age=13 cpu=1 pid=4975
...
A fixed size 8-byte buffer is allocated, and a variable size field is read
into it; there is no particular bound on the size of the field (it is
dependent on hardware and configuration) and it can overflow [also
verified by inserting printk's.]
The patch attempts to size the buffer to the correctly.
V4L/DVB (9037): Fix support for Hauppauge Nova-S SE
Different backends have different input busses (saa7146, flexcop).
To reflect that a config-option to the s5h1420-driver was added which makes
the output mode selectable.
Furthermore the s5h1420-driver is now doing the same i2c-method as it was done
before adding support for other i2c-users.
This patch needs to go into the current release of the kernel, as this driver
is currently broken.
(Thanks to Eberhard Kaltenhaeuser for helping out to debug this issue.)
Signed-off-by: Patrick Boettcher <pb@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The functions dvb_dmxdev_section_callback, dvb_dmxdev_ts_callback,
dvb_dmx_swfilter_packet, dvb_dmx_swfilter_packets, dvb_dmx_swfilter and
dvb_dmx_swfilter_204 may be called from both interrupt and process
context. Therefore they need to be protected by spin_lock_irqsave()
instead of spin_lock().
This fixes a deadlock discovered by lockdep.
Signed-off-by: Andreas Oberritter <obi@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Jean Delvare [Fri, 5 Sep 2008 13:39:27 +0000 (10:39 -0300)]
V4L/DVB (8961): zr36067: Fix RGBR pixel format
The zr36067 driver is improperly declaring pixel format RGBP twice,
once as "16-bit RGB LE" and once as "16-bit RGB BE". The latter is
actually RGBR. Fix the code to properly map both pixel formats.
drivers/media/video/cafe_ccic.c: In function 'cafe_setup_siobuf':
drivers/media/video/cafe_ccic.c:1192: error: implicit declaration of function 'PAGE_ALIGN'
drivers/media/video/cafe_ccic.c: At top level:
drivers/media/video/cafe_ccic.c:1430: error: variable 'cafe_v4l_vm_ops' has initializer but incomplete type
drivers/media/video/cafe_ccic.c:1431: error: unknown field 'open' specified in initializer
drivers/media/video/cafe_ccic.c:1431: warning: excess elements in struct initializer
drivers/media/video/cafe_ccic.c:1431: warning: (near initialization for 'cafe_v4l_vm_ops')
drivers/media/video/cafe_ccic.c:1432: error: unknown field 'close' specified in initializer
drivers/media/video/cafe_ccic.c:1433: warning: excess elements in struct initializer
drivers/media/video/cafe_ccic.c:1433: warning: (near initialization for 'cafe_v4l_vm_ops')
drivers/media/video/cafe_ccic.c: In function 'cafe_v4l_mmap':
drivers/media/video/cafe_ccic.c:1444: error: 'VM_WRITE' undeclared (first use in this function)
drivers/media/video/cafe_ccic.c:1444: error: (Each undeclared identifier is reported only once
drivers/media/video/cafe_ccic.c:1444: error: for each function it appears in.)
drivers/media/video/cafe_ccic.c:1444: error: 'VM_SHARED' undeclared (first use in this function)
drivers/media/video/cafe_ccic.c:1461: error: 'VM_DONTEXPAND' undeclared (first use in this function)
This build breakage is caused by some header file shuffle in linux-next. But
I suggest that this patch be merged ahead of linux-next to avoid bisection
breakage.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Jean Delvare [Sun, 7 Sep 2008 08:56:55 +0000 (05:56 -0300)]
V4L/DVB (8958): zr36067: Return proper bytes-per-line value
The zr36067 driver should return the actual bytes-per-line value when
queried with ioctl VIDIOC_G_FMT, instead of 0. Otherwise user-space
applications can get confused.
Likewise, with ioctl VIDIOC_S_FMT, we are supposed to fill the
bytes-per-line value. And we shouldn't fail if the caller sets the
initial value to something different from 0. This is perfectly valid
for applications to pre-fill this field with the value they expect.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Jean Delvare [Sun, 7 Sep 2008 08:21:34 +0000 (05:21 -0300)]
V4L/DVB (8957): zr36067: Restore the default pixel format
Restore the default pixel format to YUYV as it used to be before
kernel 2.6.23. It was accidentally changed to BGR3 by commit 603d6f2c8f9f3604f9c6c1f8903efc2df30a000f.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This happens because radio_open assumes that all present bttv devices
have a radio function. If a bttv device without radio and one with
radio are installed on the same system, and the one without radio is
registered first, then radio_open checks for the radio device number
of a bttv device that has no radio function, and this breaks. All we
have to do to fix it is to skip bttv devices without a radio function.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>