As much as we would like to encourage smart thinking in programming contests, in the end, what makes the difference most of the time is who knows more tips and tricks to gain the extra advantage required. Today I am going to present another collection of them, believing that maybe someday, one of these will help you win.
More Tricks to Gain Speed in Programming Contests - Maintain the cache intact (Page 2 of 4 )
"What cache?" you may ask. Well most people do not know, but at read time the processor reads a cache line, not only a single value. The length of this cache varies, depending on a few factors (though most of the time it is 32 bytes). However, in the end, it is always a number that is a power of two. Take this into account when you read your arrays.
The same is true when your construct structures. To not force the Central Processing Unit to read a value from the memory twice, construct the structures so they will use an amount of memory that is a power of two. To visualize this problem, I have written a short program that will just present and measure the difference.
#include "stdio.h"
#include "stdlib.h"
// using Windows interface QueryPerformanceCounter( )
The code of interest is inside the f function. It is about the line: “s += t[j][i];”. Here you can observe that we calculate the sum of the matrix by breaking the cache line after every read. When the CPU reads t[i][j] it will also read the t[i][j+1], t[i][j+2] and t[i][j+3]. However, if we calculate the sum this way at every read, it will have to reread the values, as the ones inside the catch are no longer usable. The time required to complete the process on my machine, with an E4300 Intel Dual Core at 1.8 GHz, is:
Now if we just switch up the two variables and write:
s += t[i][j];
The speed of the program now is:
That is double. Talk about a huge difference! In the future, remember that the CPU is still much faster than the memory, so avoid breaking the cache. Also, avoid using the global variables for local purpose; first, because this can lead to strange problems, and second, because this can slow down the application. Now this will have very little influence, but the influence is still there.
The local memory pool (stack) is much faster than the global one. This is limited to 4MB, though, so avoid exceeding this amount. In contests the limit is set at 1MB, so really, do not overuse it.