James K. Lowden
2014-08-12 00:29:13 UTC
I'm testing different algorithms for performance against a single
memory-mapped file. How can I best ensure that for each run the
page cache is cold, that the prior run left nothing for the current run
to take advantage of?
I tried calling mmap(2) on an unrelated file larger than RAM, and
iterating over it. Not only would I prefer something more
deterministic, but the following output suggests to me that technique
wasn't effective:
begin= 0x7f7f9de1e000, n= 10000000
sorted 1 of 100000000 in 2 seconds
begin= 0x7f7fa2a69408, n= 10000000
sorted 2 of 100000000 in 4 seconds
begin= 0x7f7fa76b4810, n= 10000000
sorted 3 of 100000000 in 6 seconds
begin= 0x7f7fac2ffc18, n= 10000000
sorted 4 of 100000000 in 8 seconds
begin= 0x7f7fb0f4b020, n= 10000000
sorted 5 of 100000000 in 9 seconds
begin= 0x7f7fb5b96428, n= 10000000
sorted 6 of 100000000 in 11 seconds
begin= 0x7f7fba7e1830, n= 10000000
sorted 7 of 100000000 in 13 seconds
begin= 0x7f7fbf42cc38, n= 10000000
sorted 8 of 100000000 in 14 seconds
begin= 0x7f7fc4078040, n= 10000000
sorted 9 of 100000000 in 16 seconds
begin= 0x7f7fc8cc3448, n= 10000000
sorted 10 of 100000000 in 19 seconds
As the algorithm works its way through the billion elements (100 million
at a time), N remains constant but T grows rather dramatically. I
suspect that's because of I/O (and maybe I should measure that, too).
Suggestions? What would you do?
--jkl
memory-mapped file. How can I best ensure that for each run the
page cache is cold, that the prior run left nothing for the current run
to take advantage of?
I tried calling mmap(2) on an unrelated file larger than RAM, and
iterating over it. Not only would I prefer something more
deterministic, but the following output suggests to me that technique
wasn't effective:
begin= 0x7f7f9de1e000, n= 10000000
sorted 1 of 100000000 in 2 seconds
begin= 0x7f7fa2a69408, n= 10000000
sorted 2 of 100000000 in 4 seconds
begin= 0x7f7fa76b4810, n= 10000000
sorted 3 of 100000000 in 6 seconds
begin= 0x7f7fac2ffc18, n= 10000000
sorted 4 of 100000000 in 8 seconds
begin= 0x7f7fb0f4b020, n= 10000000
sorted 5 of 100000000 in 9 seconds
begin= 0x7f7fb5b96428, n= 10000000
sorted 6 of 100000000 in 11 seconds
begin= 0x7f7fba7e1830, n= 10000000
sorted 7 of 100000000 in 13 seconds
begin= 0x7f7fbf42cc38, n= 10000000
sorted 8 of 100000000 in 14 seconds
begin= 0x7f7fc4078040, n= 10000000
sorted 9 of 100000000 in 16 seconds
begin= 0x7f7fc8cc3448, n= 10000000
sorted 10 of 100000000 in 19 seconds
As the algorithm works its way through the billion elements (100 million
at a time), N remains constant but T grows rather dramatically. I
suspect that's because of I/O (and maybe I should measure that, too).
Suggestions? What would you do?
--jkl