Horrible Code, Clear Efficiency – Johnny’s Software program Lab
We at Johnny’s Software program Lab LLC are specialists in efficiency. If efficiency is in any means concern in your software program venture, be at liberty to contact us.
Relating to software program efficiency, it appears there isn’t a restrict to how a lot one can complicate issues so as to make efficiency enhancements. Right here we current a brief instance of optimizations went wild – horrible code with clear efficiency. Homage to Clean Code, Horrible Performance.
Clear Code
Let’s say we’ve a category known as my_string
, that appears like this:
class my_string { char* ptr; size_t measurement; };
The category implements a string, however not a null-terminated form. Pointer to character block is saved in ptr
, and the dimensions of string is saved in measurement
.
Let’s say we wish to implement a find_substring(substring)
technique, which, tries to seek out the primary prevalence of substring in a string. Right here is the naive and easy implementation of such technique:
std::pair<bool, size_t> find_substring(const my_string& substr) { if (substr.measurement > measurement) { return { false, 0 }; } size_t s = measurement - substr.measurement; char* ptr_str = ptr; char* ptr_substr = substr.ptr; size_t size_substr = substr.measurement; for (int i = 0; i < s; ++i) { bool discovered = true; for (int j = 0; j < size_substr; j++) { if (ptr_str[i + j] != ptr_substr[j]) { discovered = false; break; } } if (discovered) { return { true, i }; } } return { false, 0 }; }
The recent loop is on strains 10-21. The loop over i
iterates the string, and the loop over j
tries to seek out the substring contained in the string at place i
. This can be a quite simple and straight-forward answer.
The Downside
The issue with this strategy is the interior loop, which iterates over the substring. It reads the identical information from the information cache to the registers again and again. The interior loop over j
will most definitely break in a short time after it begins.
From the efficiency standpoint, it could be very handy if the information belonging to the substring had been saved in CPU registers and never the information caches. If the dimensions of substring had been identified at compile-time, and small, the compiler may carry out an optimization: place the entire substring in registers and by no means contact reminiscence once more.
In our case, the compiler doesn’t know the dimensions of the substring. In concept, it may put part of the substring in a register, in our case, the primary half. And, when wanted, seize the remaining from the reminiscence. This could be very helpful, as a part of the substring that’s accessed probably the most is the start of the substring. However compilers by no means try this robotically.
Like what you’re studying? Observe us on LinkedIn , Twitter or Mastodon and get notified as quickly as new content material turns into obtainable.
Need assistance with software program efficiency? Contact us!
Horrible Code
Nicely, what the compiler can’t do, we are able to do for it. Let’s rewrite the unique code by allocating a small, fastened sized array to carry the start of the substring. Right here is the code:
std::pair<bool, size_t> find_substring2(const my_string& substr) { if (substr.measurement > measurement) { return { false, 0 }; } static constexpr size_t substring_buffer_max_size = 8; char substring_buffer[substring_buffer_max_size]; // Fill the statically allotted substring size_t substring_buffer_size = std::min(substring_buffer_max_size, substr.measurement); for (int i = 0; i < substring_buffer_size; ++i) { substring_buffer[i] = substr.ptr[i]; } size_t s = measurement - substr.measurement; char* ptr_str = ptr; char* ptr_substr = substr.ptr; size_t size_substr = substr.measurement; for (int i = 0; i < s; ++i) { bool discovered = true; int j; for (j = 0; j < substring_buffer_size; ++j) { if (ptr_str[i + j] != substring_buffer[j]) { discovered = false; break; } } if (discovered) { for (; j < size_substr; ++j) { if (ptr_str[i + j] != ptr_substr[j]) { discovered = false; break; } } } if (discovered) { return { true, i }; } } return { false, 0 }; }
This code is way more complicated than the unique. We introduce a brand new fastened measurement array substring_buffer
(line 7) and fill it with information from the unique substring strains 11-13. In sizzling loop, we first test if we’ve a match from the substring_buffer
(strains 22-27). If no, we break the interior loop and transfer to the following worth of i
. Else, we test if the remainder of the substring
is a match (strains 30-35).
The compiler truly takes this into consideration when producing code. Right here is the meeting output:
First 4 characters are saved in registers sil
, dil
, r8b
and r9b
. The remainder of the characters for the substring are reloaded from reminiscence, however this isn’t an issue with us as a result of comparability more often than not fails on first character of the substring. But, this meeting signifies it is sensible to lower the dimensions of substring_buffer
to 4 characters.
It’s fairly clear why we name this code horrible. A developer unfamiliar with compiler optimizations would look with disgust at this code. However is it value it? Let’s discover out.
Like what you’re studying? Observe us on LinkedIn , Twitter or Mastodon and get notified as quickly as new content material turns into obtainable.
Need assistance with software program efficiency? Contact us!
Benchmarking
We measure the efficiency of this answer on each GCC 11 and Clang 15. As an enter we use a string which is 256 MB in measurement. We’re on the lookout for a substring that’s not current; this can make it possible for the entire string is processed. Listed here are the runtimes:
Authentic | With substring caching | |
---|---|---|
GCC 11 | Runtime: 0.368 s Instr: 2966 M CPI: 0.473 |
Runtime: 0.255 s Instr: 2426 M CPI: 0.357 |
Clang 15 | Runtime: 0.272 s Instr: 2964 M CPI: 0.35 |
Runtime: 0.169 s Instr: 2155 M CPI: 0.301 |
On each compilers the runtime of the unique is worse and the optimized model is each quicker and extra environment friendly.
Backside Line
Is that this code horrible? Sure, it’s.
Is it quick? As demonstrated, it’s quick.
Is that this efficiency moveable? The pace enchancment is moveable between GCC and CLANG. It could be attainable for a compiler to make this optimization with out our intervention, however my intestine feeling recommend that compilers that do that are uncommon.
Is it value it? That is debatable and to a sure extent extra a desire than a reality. For me personally, it is best to by no means write such a code as a primary implementation. It’s messy, and the intention will not be clear to a median developer. However, if there’s a efficiency bottleneck that it’s essential to handle, then this answer can be acceptable.
Like what you’re studying? Observe us on LinkedIn , Twitter or Mastodon and get notified as quickly as new content material turns into obtainable.
Need assistance with software program efficiency? Contact us!