It and it's getting late here so i can't try it myself, at least not today, but i'd try moving in a fixed, sparse, pseudorandom pattern but vary operation size.
I think this should give a more staircase-y plot, with jumps when the size of each individual copy passes whole multiple of cache line width.
I think this way you don't need to fight prefetching, and there will be one read-modify-write per operation except right at the edges which is another signal.
I think this should give a more staircase-y plot, with jumps when the size of each individual copy passes whole multiple of cache line width.
I think this way you don't need to fight prefetching, and there will be one read-modify-write per operation except right at the edges which is another signal.