x86, bitops: select the generic bitmap search functions
Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.
I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
include/asm-x86/bitops_32.h and include/asm-x86/bitops_64.h are now
almost identical. The 64-bit version sets ARCH_HAS_FAST_MULTIPLIER
and has an extra inline function set_bit_string. The define currently
has no influence on the generated code, but it can be argued that
setting it on i386 is the right thing to do anyhow. The addition
of the extra inline function on i386 does not hurt either.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86, UML: remove x86-specific implementations of find_first_bit
x86 has been switched to the generic versions of find_first_bit
and find_first_zero_bit, but the original versions were retained.
This patch just removes the now unused x86-specific versions.
also update UML.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avoid a call to find_first_bit if the bitmap size is know at
compile time and small enough to fit in a single long integer.
Modeled after an optimization in the original x86_64-specific
code.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: generic versions of find_first_(zero_)bit, convert i386
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.
The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.
This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: merge the simple bitops and move them to bitops.h
Some of those can be written in such a way that the same
inline assembly can be used to generate both 32 bit and
64 bit code.
For ffs and fls, x86_64 unconditionally used the cmov
instruction and i386 unconditionally used a conditional
branch over a mov instruction. In the current patch I
chose to select the version based on the availability
of the cmov instruction instead. A small detail here is
that x86_64 did not previously set CONFIG_X86_CMOV=y.
Improved comments for ffs, ffz, fls and variations.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.
On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:
In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
On x86_64, 52 locations are optimized with a minimal increase in
code size:
Current #testing defconfig:
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename 5392637 846592 724424 6963653 6a41c5 vmlinux
After removing the x86_64 specific optimization for find_next_*bit:
94 x bsf, 79 x find_next_*bit
text data bss dec hex filename 5392358 846592 724424 6963374 6a40ae vmlinux
After this patch (making the optimization generic):
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename 5392396 846592 724424 6963412 6a40d4 vmlinux
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.
Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.
If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.
Using the generic version does give a slightly bigger kernel, though.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
x86 PAT: decouple from nonpromisc devmem
x86 PAT: tone down debugging messages
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (7751): ir-kbd-i2c: Save a temporary memory allocation in ir_probe
V4L/DVB (7750): au0828/ cleanups and fixes
V4L/DVB (7748): tuner-core: some adjustments at tuner logs, if debug enabled
V4L/DVB (7746): pvrusb2: make signed one-bit bitfields unsigned
V4L/DVB (7744): pvrusb2-dvb: add atsc/qam support for Hauppauge pvrusb2 model 751xx
V4L/DVB (7742): cx88: Add support for the DViCO FusionHDTV_7_GOLD digital modes
V4L/DVB (7741): s5h1411: Adding support for this ATSC/QAM demodulator
V4L/DVB (7740): tuner-xc2028.c dubious !x & y
V4L/DVB (7739): mt312.h: dubious one-bit signed bitfield
V4L/DVB (7735): Fix compilation for au0828
V4L/DVB (7734): em28xx: copy and paste error in em28xx_init_isoc
V4L/DVB (7733): blackbird_find_mailbox negative return ignored in blackbird_initialize_codec()
V4L/DVB (7732): vivi: fix a warning
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (61 commits)
ide: sanitize handling of IDE_HFLAG_NO_SET_MODE host flag
sis5513: fail early for unsupported chipsets
it821x: fix kzalloc() failure handling
qd65xx: use IDE_HFLAG_SINGLE host flag
qd65xx: always use ->selectproc method
ide-cd: put proc-related functions together under single ifdef
ide-cd: Replace __FUNCTION__ with __func__
IDE: Coding Style fixes to drivers/ide/ide-cd.c
IDE: Coding Style fixes to drivers/ide/pci/cy82c693.c
IDE: Coding Style fixes to drivers/ide/pci/it8213.c
IDE: Coding Style fixes to drivers/ide/ide-floppy.c
IDE: Coding Style fixes to drivers/ide/legacy/ali14xx.c
IDE: Coding Style fixes to drivers/ide/legacy/hd.c
IDE: Coding Style fixes to drivers/ide/pci/cmd640.c
IDE: Coding Style fixes to drivers/ide/pci/opti621.c
IDE: Coding Style fixes to drivers/ide/ide-pnp.c
IDE: Coding Style fixes to drivers/ide/ide-proc.c
IDE: Coding Style fixes to drivers/ide/legacy/ide-4drives.c
IDE: Coding Style fixes to drivers/ide/legacy/umc8672.c
IDE: Coding Style fixes to drivers/ide/pci/generic.c
...
Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp: convert drivers/char/agp/frontend.c to use unlocked_ioctl
agp: fix shadowed variable warning in amd-k7-agp.c
Merge branch 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: _end is shadowing real _end, just rename it.
drm/vbl rework: rework how the drm deals with vblank.
drm: reorganise minor number handling using backported modesetting code.
drm/i915: Handle tiled buffers in vblank tasklet
drm/i965: On I965, use correct 3DSTATE_DRAWING_RECTANGLE command in vblank
drm: Remove unneeded dma sync in ATI pcigart alloc
drm: Fix mismerge of non-coherent DMA patch
Ingo Molnar [Mon, 3 Mar 2008 11:38:52 +0000 (12:38 +0100)]
x86: add optimized inlining
add CONFIG_OPTIMIZE_INLINING=y.
allow gcc to optimize the kernel image's size by uninlining
functions that have been marked 'inline'. Previously gcc was
forced by Linux to always-inline these functions via a gcc
attribute:
Especially when the user has already selected
CONFIG_OPTIMIZE_FOR_SIZE=y this can make a huge difference in
kernel image size (using a standard Fedora .config):
qd_select() checks itself whether timings should be reprogrammed so
remove superfluous qd_timing_ok() and always use ->selectproc method
(rename qd_select() to qd65xx_select() while at it).
All host drivers now either set hwif->mmio or reserve continuous
I/O resources so remove no longer needed hwif->straight8 flag
and never reached code for 'hwif->straight8 == 0' case.
Adrian Bunk [Sat, 26 Apr 2008 15:36:37 +0000 (17:36 +0200)]
remove include/linux/hdsmart.h
include/linux/hdsmart.h is not used by the kernel and should therefore
be removed.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>, Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This removes the internal ide-cd buffer and falls back to read-ahead block layer
capabilities. Thorough testing (cd burning, dvd read, raw read) gives with the
bufferless mode marginally better performance in addition to simplified code.
bufferless:
dd: reading `/dev/hdc': Input/output error
6238+0 records in
6238+0 records out 204406784 bytes (204 MB) copied, 259.891 s, 787 kB/s
real 4m21.598s
user 0m0.014s
sys 0m0.744s
with the old buffer (2.6.25-rc1):
dd: reading `/dev/hdc': Input/output error
6238+0 records in
6238+0 records out 204406784 bytes (204 MB) copied, 262.893 s, 778 kB/s
Sergei Shtylyov [Sat, 26 Apr 2008 15:36:31 +0000 (17:36 +0200)]
ide: make ide_pci_check_iomem() actually work
This function didn't actually check if a given BAR is in I/O space because of
using the bogus PCI_BASE_ADDRESS_IO_MASK (which equals ~3) to test the resource
flags instead of IORESOURCE_IO -- fix this, make ide_hwif_configure() check the
results failing if necessary, and move the printk() call to the failure path.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Jacek Luczak [Fri, 11 Apr 2008 11:29:04 +0000 (13:29 +0200)]
x86: section mismatch fixes, #3
This patch fixes section mismatch warnings in unlock_ExtINT_logic().
WARNING: arch/x86/kernel/built-in.o(.text+0x14a92): Section mismatch in reference from the function unlock_ExtINT_logic()
to the function .init.text:find_isa_irq_pin()
The function unlock_ExtINT_logic() references
the function __init find_isa_irq_pin().
This is often because unlock_ExtINT_logic lacks a __init
annotation or the annotation of find_isa_irq_pin is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jacek Luczak [Fri, 11 Apr 2008 11:28:49 +0000 (13:28 +0200)]
x86: section mismatch fixes, #2
This patch fixes section mismatch warnings in smpboot_setup_io_apic().
WARNING: arch/x86/kernel/built-in.o(.text+0x11781): Section mismatch in reference from the function smpboot_setup_io_apic()
to the function .init.text:setup_IO_APIC()
The function smpboot_setup_io_apic() references
the function __init setup_IO_APIC().
This is often because smpboot_setup_io_apic lacks a __init
annotation or the annotation of setup_IO_APIC is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>