I did some benchmark runs (lmbench & kernel compile) with a number of
guests running in parallel to compare the performance of numa=on vs.
numa=off. As soon as one starts to load the machine, the performance
goes down in the numa=off case. The tests were done on an 8-node
machine (4 cores each). lmbench (actually copying large amounts of
memory) shows a dramatic dropdown, but I even noticed significant
performance decrease for a tmpfs based Linux kernel compile. Here a
summary of the data:
lmbench's rd benchmark (normalized to native Linux (=100)):
guests numa=off numa=on avg increase
min avg max min avg max
1 78.0 102.3
7 37.4 45.6 62.0 90.6 102.3 110.9 124.4%
15 21.0 25.8 31.7 41.7 48.7 54.1 88.2%
23 13.4 17.5 23.2 25.0 28.0 30.1 60.2%
kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time:
guests numa=off numa=on increase
1 480.610 464.320 3.4%
7 482.109 461.721 4.2%
15 515.297 477.669 7.3%
23 548.427 495.180 9.7%
again with 2 VCPUs and make -j2:
1 264.580 261.690 1.1%
7 279.763 258.907 7.7%
15 330.385 272.762 17.4%
23 463.510 390.547 15.7% (46 VCPUs on 32pCPUs)
Selected tests on a 4-node machine showed similar behavior (7.9 %
increase with 6 parallel guests on the 2 VCPU kernel compile
benchmark).
Note that this does not affect non-NUMA machines at all, since NUMA
will be turned off again by the code if no NUMA topology is detected.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
-/* Default NUMA to off for now. acpi=on required to enable it. */
-int numa_off __devinitdata = 1;
+int numa_off __devinitdata = 0;
int acpi_numa __devinitdata;