The higher the number of processors the large the order sizes used for various
slab caches will become. This has been shown to address the performance issues
in hackbench on 16p etc.
The calculation is only performed if slub_min_objects is zero (default). If one
specifies a slub_min_objects on boot then that setting is taken.
As suggested by Zhang Yanmin's performance tests on 16-core Tigerton, use the
formula '4 * (fls(nr_cpu_ids) + 1)':