Easy Efficiency Enhancements in C++: std::unordered_map
That is the second submit within the collection about easy efficiency
enhancements in C++. Examine the first
post to get the
full story!
Final time we noticed that our hotspots had been largely positioned within the
customary library, and we concluded that we could also be utilizing it mistaken. The
high offender being a operate from std::map
, let’s dig into that.
In our preliminary implementation now we have a single map, used to retailer the
key-value pairs obtained from the enter. We’re doing three lookups in
this map. The primary one is clear: it’s after we course of a "worth"
report, the place we have to test if the supplied key has been set:
else if (tokens[0] == "worth")
{
// Apparent lookup within the map.
if (entries.discover(tokens[1]) != entries.finish())
/* … */
However there’s additionally two less-obvious searches. Only one line under the
earlier scan:
else if (tokens[0] == "worth")
{
if (entries.discover(tokens[1]) != entries.finish())
// Much less apparent lookup within the map, by way of operator[].
outcome += "worth:" + entries[tokens[1]] + 'n';
That’s proper, calling std::map::operator[]
does a lookup like discover
to test if there’s already an entry with the given key. However since we
did a search simply above, couldn’t or not it’s optimized by the compiler?
Nicely, it’s difficult, however the reply isn’t any.
Earlier than exhibiting the answer for this level, right here is the third lookup,
within the processing of a "set"
report:
else if (tokens[0] == "set")
// One other lookup within the map, by way of operator[]
entries[tokens[1]] = tokens[2];
Once more, the decision to std::map::operator[]
does a lookup.
There’s not a lot we are able to do to keep away from the search within the processing
of a "set"
report. We might use std::map::emplace
to avoid wasting the
overwritting of an current key, however it could change the habits of
our program. So we’re going to go away it as it’s.
Within the processing of "worth"
we might abuse the truth that accessing
a non-existent key by way of std::map::operator[]
creates a brand new worth
utilizing the default constructor of its sort. In our case it could be an
empty string, so the output could be the identical:
else if (tokens[0] == "worth")
{
// entries[tokens[1]] is the present worth if any, or an empty
// string if none.
outcome += "worth:" + entries[tokens[1]] + 'n';
However by doing so we might insert a brand new entry within the map for every
non-existent key we check! This isn’t what we wish.
The right answer to get an optionally available worth related to the important thing
is to maintain the iterator returned by std::map::discover()
:
else if (tokens[0] == "worth")
{
// Single search within the map
const entry_map::const_iterator it = entries.discover(tokens[1]);
if (it != entries.finish())
// Right here we are able to entry the worth by way of the iterator.
outcome += "worth:" + it->second + 'n';
else
outcome += "worth:n";
++line_count;
}
Identical to within the preliminary code, we search the map utilizing
std::map::discover
, however now we retailer the end in an area variable. The
result’s an iterator that we are able to examine with entries.finish()
, as
earlier than. Now, if the iterator shouldn’t be the top iterator, then we are able to get
the worth from the map by dereferencing the iterator and accessing its
second
member.
This isn’t an enormous change, what distinction can it make?
Ouch! 5 to 9% sooner with such a small change! Fairly good isn’t it?
Let’s test once more the hotspots with perf
:
- 26% in
std::map<std::string, std::string>::discover
.- Underneath which the highest offender is
std::string::operator<
.
- Underneath which the highest offender is
- 15% in
tokenize
.- Principally in
std::vector<std::string>::push_back
.
- Principally in
- 10% in
std::map<std::string, std::string>::operator[]
.- Once more with
std::string::operator<
as the highest offender.
- Once more with
- 6% in
std::getline
. - 5% in
substitute
.- With
std::string::discover
as the highest offender.
- With
The time spent in std::map::operator[]
went down from 15% to
10%. That is now much less of an issue than the decision to tokenize
however we
nonetheless have std::map::discover
on the high. As we noticed earlier than, there’s not
a lot we are able to do to keep away from these calls, so let’s see if there could be
completely different strategy as a extra environment friendly alternative.
The implementation of std::map
is a binary tree, the place every node
factors to 2 sub-trees: one the place the gadgets are smaller than the one
within the present node, and one the place the gadgets are
larger. Conceptually, it appears to be like like this:
The algorithmic complexity for looking out, inserting, or eradicating an
entry is O(log2(n)), the place n is the variety of gadgets in
the container.
Since C++11 the usual library offers one other associative desk in
the type of std::unordered_map
. Its implementation is a hash desk:
an integral hash is computed for every merchandise, then this hash is used as
an index in a desk the place the gadgets are saved. If many gadgets find yourself
in the identical slot, they’re chained, as in a linked checklist. Conceptually,
it appears to be like like this:
The algorithmic complexity for looking out, inserting, or eradicating an
entry is O(1), which is clearly higher than the considered one of std::map
.
Hidden in the price of std::map<std::string, T>
is the comparability of
the strings as we scan the map. Alternatively,
std::hash_map<std::string, T>
has to compute a hash of the supplied
strings.
Let’s apply the next modifications in our program and see the way it
impacts its efficiency.
-5,5 +5,5
#embody "tokenize.hpp"
-#embody <map>
+#embody <unordered_map>
#embody <sstream>
-27,5 +27,5
std::string outcome;
std::uint64_t next_date = 0;
- std::map<std::string, std::string> entries;
+ std::unordered_map<std::string, std::string> entries;
whereas (std::getline(iss, line))
Very simple modification. What distinction does it make?
As much as 30% enhancements over the baseline, and 23% in contrast with the
earlier implementation! Notice that it doesn’t matter if the map is
small or massive, it’s all the time a win to make use of std::unordered_map
right here.
Let’s take a look at the perf
profile.
- 17% in
std::unordered_map<std::string, std::string>::operator[]
. - 17% in
tokenize
.- Principally in
std::vector<std::string>::push_back
.
- Principally in
- 12% in
std::unordered_map<std::string, std::string>::discover
. - 7% in
substitute
.- With
std::string::discover
as the highest offender.
- With
- 6% in
std::getline
.
Appears to be like like our maps are much less of an issue now, and tokenize
goes
up with its makes use of of std::vector
.
We are going to see if we are able to do one thing about it in subsequent week’s submit. Keep
tuned!