How a lot reminiscence bandwidth do giant Amazon cases provide? – Daniel Lemire’s weblog
In my previous post, I described how one can write a C++ program to estimate your learn reminiscence bandwidth. It’s not very tough: you allocate a big reminiscence area and also you learn it as quick as you may. To see how a lot bandwidth you could have for those who use multithreaded purposes, you should use a number of threads, the place every thread reads a piece of the big reminiscence area.
The server I used for the weblog put up, a two-CPU Intel Ice Lake server has a maximal bandwidth of about 130 GB/s. You may double this quantity of bandwidth with NUMA-aware code, however it’ll require additional engineering.
However you shouldn’t have entry to my server. What a few massive Amazon server? So I spun out an r6i.metal occasion from Amazon. These servers can assist 128 bodily threads, they’ve 1 terabyte of RAM (1024 GB) and 6.25 GB/s of community bandwidth.
Operating my benchmark program on this Amazon server revealed that they’ve about 115 GB/s of learn reminiscence bandwidth. That isn’t counting NUMA and different refined methods. Plotting the bandwidth versus the variety of threads used reveals that, as soon as once more, you want about 20 threads to maximized reminiscence bandwidth though you get most of it with solely 15 threads.