CS520
Fall 2013
Program 4
Due Sunday October 20


Cache Testing

As we will be learning in class, not all memory accesses are created equal. The purpose of this assignment is to give you experience with how the different kinds and levels of caches interact with one another, and how that affects the overall access time.

Programming part

Write a program called cachetest that takes two command line arguments and does a bunch of reads from an array of randomized data. When it has finished running, it should print out the number of seconds it took. Note that this will probably be a fractional number. There are a number of different ways you could implement the timer, but I expect you to have at least millisecond resolution.

The program should be contained in a file called cache.c. The program will be compiled with gcc using -Wall -std=c99 -O0 flags. Please keep everything confined to a single file; this program is so small there should be no reason to have more than one file. For example, my program is 64 lines long, and it contains a bunch of commented out stuff.

Command Line Arguments

The program should take the following arguments:
  1. The number of reads to do
  2. The size of the array of randomized data

For example:

./cachetest 1000 50
denotes doing 1000 randomized reads from an array that contains 50 items. While you may make the array contain any kind of item you wish, I recommend using integers.

Functionality

The program should begin by creating an array of the size specified in the command line argument. Next, it should write a random number to every value in the array. At this point, the program is ready to begin the time trial.

The program should next read a randomly selected value from the array and perform some kind of arithmetic operation using that number, or some kind of bitwise operation with that number. The reason for this is that if you do not use the number, you may end up with the loop getting optimized out for being useless. You should do this as many time as is specified in the command line argument.

Once this has finished, the time trial is over, and you need to figure out how much time elapsed.

Last, you need to check to see what the result of the calculation is. I recommend returning 0 if the calculation is 0, otherwise returning 1, with the assumption that your calculation, if done using random input, probably won't return a 0. The reason, once again, is that you do not want that getting optimized out.

Return Status

The program should return 0, unless there is some kind of error.

Building the Program

Normally, gcc tries to optimize your code, and it can occasionally make a mess of things. I recommend compiling with the -O0 flag (dash capital O followed by a zero). This tells the compiler to do no optimization at all, and render your code pretty much literally the way it is in assembly, which is precisely what you want to happen for this assignment.

Written Report Part

Once you have written the program, the next step is to use the program to do some tests to determine how well the different cache levels work.

I recommend starting by doing some pilot experiments just messing around with your program. Begin with an array with 10 integers in it, and adjust the number of reads you do until it takes approximately 1 second for the program to complete all of the reads. The exact number of reads you will be able to do may vary, so I would just increase exponentially until you figure out something close. Note that the total runtime of the program will be somewhat related to the size of the array, because initializing the array takes time proportional to the size of the array, but you don't count that time.

Run the lscpu command to figure out what kind of hardware you have at your disposal.

cmw@paris:~/cs520/p4/cache$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Stepping:              7
CPU MHz:               1600.000
BogoMIPS:              6186.03
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3
This command tells you a ton of stuff, but the important ones are the caches, in red.

As you change the size of the array, the amount of time to do the same number of reads varies, and the way the time varies has to do with how well the array fits into the L1, L2, and L3 caches.

Produce the following charts:

  1. A chart showing the transition from the L1 cache to the L2 cache
  2. A chart showing the transition from the L2 cache to the L3 cache (or main memory, depending on architecture)
Note that each of these will have a different size depending on which machine you are running on. I know this can be done on agate, but agate has a really large L3 cache, so you may be better served doing this on a more lame computer (like one of the ones in the PAC, N218, or the undergraduate lounge). The other problem with using agate is that agate has a ton of unrelated stuff running on it like 415 students, etc, which can cause problems for your timing. I am not certain, but I believe that doing this on a virtual machine may also prove to be problematic. For this reason, I think the best choices for doing this assignment are probably going to be the individual machines in the clusters.

In order to make sure you are on (or not on) agate, use the following bash command:

cmw@paris:~/cs520/p4/cache$ echo $HOSTNAME
paris
The last step is to use your program to estimate the how long it takes to do 1 read from each level of the cache and main memory. The report should include: The report should be in pdf format, and should be called cache.pdf. I will be using a script to print these out, so it must have the correct name.


Grading

  1. 30 points will be awarded for making a basic correct cache testing program.
  2. 10 points will be awarded for making a robust cache testing program. Robustness is defined as handling stupid and otherwise incorrect input. I will start a post on Piazza (and if I don't, students should do so instead) where you may discuss "stupid input" ideas to look for and handle appropriately.
  3. 60 points will be awarded based upon the charts and calculations showing what you believe to be the average read time for each of the caches.

In addition, remember, you may lose points if your program is not properly structured or adequately documented. Coding guidelines are given on the course overview webpage.

Your programs will be graded using agate.cs.unh.edu so be sure to test in that environment.

Remember: as always you are expected to do your own work on this assignment. Copying code from another student or from sites on the internet is explicitly forbidden!

Submission

Your programs should be submitted for grading from agate.cs.unh.edu. In order to turn in the program, first make sure you are SSH'd in to agate. To turn in this program, type:
% ~cs520/bin/DoSubmission.py prog4 file1 file2 file3 file4

This submission script is new. It passed what testing I have done on it, but it may still have issues. If there are any problems, please contact me via email and I will do my best to assist you. If I cannot be reached, please send me a copy of your assignment via email, and we will deal with the submission script later.

Due Date

This assignment is due Sunday October 20. The standard late policy concerning late submissions will be in effect.

Complications

I have found that the cache programs work as expected on agate and on zelinka (two computers at school), but does not seem to work very well on paris (a computer at home). I think it has to do with how old the computers are, but I am not certain. As mentioned before, I strongly recommend doing this assignment either on agate or even better on one of the other computers the department owns.