Caches: LRU v. random
As soon as upon a time, my computer architecture professor talked about that utilizing a random eviction coverage for caches actually is not so dangerous. That random eviction is not dangerous may be shocking — in case your cache fills up and it’s a must to eliminate one thing, selecting the least recently used (LRU) is an apparent selection, because you’re extra probably to make use of one thing for those who’ve used it not too long ago. In case you have a good loop, LRU goes to be good so long as the loop matches in cache, however it should trigger a miss each time if the loop would not match. A random eviction coverage degrades gracefully because the loop will get too massive.
In observe, on actual workloads, random tends to do worse than different algorithms. However what if we take two random selections and simply use LRU between these two selections?
Listed here are the relative miss charges we get for SPEC CPU with a Sandy Bridge-like cache (8-way associative, 64k, 256k, and 2MB L1, L2, and L3 caches, respectively). These are ratios (algorithm miss fee : random miss fee); decrease is healthier. Every cache makes use of the identical coverage in any respect ranges of the cache.
Coverage | L1 (64k) | L2 (256k) | L3 (2MB) |
---|---|---|---|
2-random | 0.91 | 0.93 | 0.95 |
FIFO | 0.96 | 0.97 | 1.02 |
LRU | 0.90 | 0.90 | 0.97 |
random | 1.00 | 1.00 | 1.00 |
Random and FIFO are each strictly worse than both LRU or 2-random. LRU and 2-random are fairly shut, with LRU edging out 2-random for the smaller caches and 2-random edging out LRU for the bigger caches.
To see if something odd is happening in any particular person benchmark, we are able to have a look at the uncooked outcomes on every sub-benchmark. The L1, L2, and L3 miss charges are all plotted in the identical column for every benchmark, beneath:
As we would count on, LRU does worse than 2-random when the miss charges are excessive, and higher when the miss charges are low.
At this level, it isn’t clear if 2-random is thrashing LRU in L3 cache miss charges as a result of it does higher when the caches are giant or as a result of it does higher as a result of it is the third degree in a hierarchical cache. Since a cache line that is being actively utilized in L1 or L2 is not touched in L3, an eviction can occur from the L3 (which forces an eviction of each the L1 and L2) since, so far as the L3 is worried, that line hasn’t been used not too long ago. This makes it much less apparent that LRU is an effective eviction coverage for L3 cache.
To separate out the consequences, let us take a look at the relative miss charges for a non-hierarchical (single degree) vs. hierarchical caches at varied sizes. For the hierarchical cache, the L1 and L2 sizes are as above, 64k and 256k, and solely the L3 cache dimension varies. Beneath, we have the geometric technique of the ratios of how every coverage does (over all SPEC sub-benchmarks, in comparison with random eviction). A doable draw back to this metric is that if we now have some very low miss charges, these may dominate the imply since small fluctuations can have a big impact on the ratio, however we are able to look the distribution of outcomes to see if that is the case.
Sizes beneath 512k are lacking for the hierarchical case due to the 256k L2 — we’re utilizing an inclusive L3 cache right here, so it would not actually make sense to have an L3 that is smaller than the L2. Sizes above 16M are omitted as a result of cache miss charges converge when the cache will get too massive, which is uninteresting.
Trying on the single cache case, evidently LRU works a bit higher than 2-random for smaller caches (decrease miss ratio is healthier), 2-random edges out LRU because the cache will get larger. The story is analogous within the hierarchical case, besides that we do not actually have a look at the smaller cache sizes the place LRU is superior.
Evaluating the 2 instances, the outcomes are totally different, however related sufficient that it seems to be our authentic outcomes weren’t solely an artifact of trying on the final degree of a hierarchical cache.
Beneath, we’ll have a look at the complete distribution so we are able to see if the imply of the ratios is being skewed by tiny outcomes.
It seems to be like, for a selected cache dimension (one column of the graph), the randomized algorithms do higher when miss charges are comparatively excessive and worse when miss charges are comparatively low, so, if something, they’re deprived after we simply have a look at the geometric imply — if we have been to take the arithmetic imply, the consequence could be dominated by the bigger outcomes, the place 2 random selections and plain outdated random do comparatively nicely.
From what we have seen of the imply ratios, 2-random seems to be nice for big caches, and from what we have seen of the distribution of the outcomes, that is regardless of 2-random being penalized by the imply ratio metric, which makes it appear fairly good for big caches.
Nevertheless, it’s normal to implement pseudo-LRU insurance policies as a result of LRU may be too costly to be workable. Since 2-random requires having at the least as a lot info as LRU, let’s check out what occurs we use pseudo 2-random (roughly 80% correct), and pseudo 3-random (a two-level event, every degree of which is roughly 80% correct).
Since random and FIFO are clearly not good alternative insurance policies, I will depart them out of the next graphs. Additionally, because the outcomes have been related within the single cache in addition to multi-level cache case, we are able to simply have a look at the outcomes from the extra practical multi-level cache case.
Since pseudo 2-random acts like random 20% of the time and 2-random 80% of the time, we would count on it to fall someplace between 2-random and random, which is precisely what occurs. A easy tweak to attempt to enhance pseudo 2-random is to attempt pseudo 3-random (evict the least not too long ago used of three random selections). Whereas that is nonetheless not fairly pretty much as good as true 2-random, it is fairly shut, and it is nonetheless higher than LRU (and pseudo LRU) for caches bigger than 1M.
The one massive variable we’ve not explored is the set associativity. To see how LRU compares with 2-random throughout totally different cache sizes let us take a look at the LRU:2-random miss ratio (larger/crimson means LRU is healthier, decrease/inexperienced means 2-random is healthier).
On common, growing associativity will increase the distinction between the 2 insurance policies. As earlier than, LRU is healthier for small caches and 2-random is healthier for big caches. Associativities of 1 and a pair of aren’t proven as a result of they need to be similar for each algorithms.
There’s nonetheless a combinatorial explosion of prospects we’ve not tried but. One factor to do is to attempt totally different eviction insurance policies at totally different cache ranges (LRU for L1 and L2 with 2-random for L3 appears promising). One other factor to do is to do this for various kinds of caches. I occurred to decide on CPU caches as a result of it is simple to seek out simulators and benchmark traces, however in at present’s “put a cache on it” world, there are a whole lot of different locations 2-random may be utilized.
For any comp arch of us, from this information, I think that 2-random would not sustain with adaptive insurance policies like DIP (though it’d — it is in the correct ballpark, but it surely was characterised on a special workload utilizing a special simulator, so it isn’t 100% clear). Nevertheless, A pseudo 2-random coverage may be carried out that hardly makes use of extra assets than pseudo-LRU insurance policies, which makes this very low-cost in comparison with DIP. Additionally, we are able to see that pseudo 3-random is considerably higher than pseudo 2-random, which signifies that k-random might be an enchancment over 2-random for the okay. Some k-random coverage is perhaps an enchancment over DIP.
So we have seen that this works, however why would anybody suppose to do that within the first place? The Power of Two Random Choices: A Survey of Techniques and Results by Mitzenmacher, Richa, and Sitaraman has a fantastic rationalization. The mathematical instinct is that if we (randomly) throw n balls into n bins, the utmost variety of balls in any bin is O(log n / log log n)
with excessive chance, which is just about simply O(log n)
. But when (as a substitute of selecting randomly) we select the least loaded of okay random bins, the utmost is O(log log n / log okay)
with excessive chance, i.e., even with two random selections, it is principally O(log log n)
and every extra selection solely reduces the load by a continuing issue.
This seems to have all kinds of purposes; issues like load balancing and hash distribution are pure matches for the balls and bins mannequin. There are additionally a whole lot of purposes that are not clearly analogous to the balls and bins mannequin, like circuit routing and Erdős–Rényi graphs.
Because of Jan Elder and Mark Hill for making dinero IV freely accessible, to Aleksandar Milenkovic for offering SPEC CPU traces, and to Carl Vogel, James Porter, Peter Fraenkel, Katerina Barone-Adesi, Jesse Luehrs, Lea Albaugh, and Kevin Lynagh for recommendation on plots and plotting packages, to Mindy Preston for locating a typo within the acknowledgments, to Lindsey Kuper for mentioning some terminology stuff, to Tom Wenisch for suggesting that I try CMP$im for future work, and to Leah Hanson for intensive feedback on the complete publish.