pilppa.com Git - linux-2.6-omap-h63xx.git/commit

[PATCH] files: fix rcu initializers

First of a number of files_lock scaability patches.

Here are the x86 numbers -

tiobench on a 4(8)-way (HT) P4 system on ramdisk :

                                         (lockfree)
Test            2.6.10-vanilla  Stdev   2.6.10-fd       Stdev
-------------------------------------------------------------
Seqread         1400.8          11.52   1465.4          34.27
Randread        1594            8.86    2397.2          29.21
Seqwrite        242.72          3.47    238.46          6.53
Randwrite       445.74          9.15    446.4           9.75

The performance improvement is very significant.
We are getting killed by the cacheline bouncing of the files_struct
lock here. Writes on ramdisk (ext2) seems to vary just too
much to get any meaningful number.

Also, With Tridge's thread_perf test on a 4(8)-way (HT) P4 xeon system :

2.6.12-rc5-vanilla :

Running test 'readwrite' with 8 tasks
Threads     0.34 +/- 0.01 seconds
Processes   0.16 +/- 0.00 seconds

2.6.12-rc5-fd :

Running test 'readwrite' with 8 tasks
Threads     0.17 +/- 0.02 seconds
Processes   0.17 +/- 0.02 seconds

I repeated the measurements on ramfs (as opposed to ext2 on ramdisk in
the earlier measurement) and I got more consistent results from tiobench :

4(8) way xeon P4
-----------------
                                         (lock-free)
Test            2.6.12-rc5      Stdev   2.6.12-rc5-fd   Stdev
-------------------------------------------------------------
Seqread         1282            18.59   1343.6          26.37
Randread        1517            7       2415            34.27
Seqwrite        702.2           5.27    709.46           5.9
Randwrite       846.86          15.15   919.68          21.4

4-way ppc64
------------
                                         (lock-free)
Test            2.6.12-rc5      Stdev   2.6.12-rc5-fd   Stdev
-------------------------------------------------------------
Seqread         1549            91.16   1569.6          47.2
Randread        1473.6          25.11   1585.4          69.99
Seqwrite        1096.8          20.03   1136            29.61
Randwrite       1189.6           4.04   1275.2          32.96

Also running Tridge's thread_perf test on ppc64 :

2.6.12-rc5-vanilla
--------------------
Running test 'readwrite' with 4 tasks
Threads     0.20 +/- 0.02 seconds
Processes   0.16 +/- 0.01 seconds

2.6.12-rc5-fd
--------------------
Running test 'readwrite' with 4 tasks
Threads     0.18 +/- 0.04 seconds
Processes   0.16 +/- 0.01 seconds

The benefits are huge (upto ~60%) in some cases on x86 primarily
due to the atomic operations during acquisition of ->file_lock
and cache line bouncing in fast path. ppc64 benefits are modest
due to LL/SC based locking, but still statistically significant.

This patch:

RCU head initilizer no longer needs the head varible name since we don't use
list.h lists anymore.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

author	Dipankar Sarma <dipankar@in.ibm.com>
	Fri, 9 Sep 2005 20:04:07 +0000 (13:04 -0700)
committer	Linus Torvalds <torvalds@g5.osdl.org>
	Fri, 9 Sep 2005 20:57:54 +0000 (13:57 -0700)
commit	8b6490e5faafb3a16ea45654fb55f9ff086f1495
tree	7af6f19fb36afe14a3405a4a656c29ad7ce251eb	tree \| snapshot
parent	0f97a931b337e4662e736ca67f1fab0a187f5852	commit \| diff