Now Reading
a journey to sooner C++ compile instances

a journey to sooner C++ compile instances

2024-01-06 13:21:06

On this submit I’ll speak about bringing compile instances of the {fmt} library
on par with the C normal I/O library (stdio).

First some background: {fmt} is a well-liked open-source formatting library for
C++ that gives a greater different to C++ iostreams and C stdio. It has
already surpassed stdio in lots of areas:

  • Kind security with compile-time format string checks out there by
    default since C++20 and as an decide in for C++14/17. Runtime format strings are
    additionally protected to make use of in {fmt} which is unimaginable to realize in printf.
  • Extensibility: user-defined type can be made formattable and most
    normal library varieties akin to containers, dates and instances are formattable out
    of the field.
  • Efficiency: {fmt} is considerably sooner than widespread normal library
    implementations of printf, in some instances by an order of magnitude
    (e.g. on floating-point formatting).
  • Transportable Unicode assist.

Nevertheless, one space the place stdio remained considerably higher was compile instances.

We’ve put a variety of effort into optimizing compile instances in {fmt} by making use of
sort erasure on each argument and output degree, limiting templates to a small
top-level API layer and introducing fmt/core.h with minimal dependencies.

This made {fmt} faster to compile than C++ alternatives akin to
iostreams, Enhance Format and Folly Format however couldn’t shut the hole with stdio.
We knew that the bottleneck was within the <string> dependency nevertheless it was additionally
wanted for the primary API, fmt::format.

Over time it has grow to be clear that there are use instances that don’t want (or need)
std::string. Quoting Sean Middleditch’s comment on GitHub:

If I don’t use std::string (and I don’t) I don’t wish to pull within the heavy
dependencies of that header and to each single TU which may do some
formatting (and therefore needs to have entry to formatter<> specializations).

{fmt} has grow to be more and more used for I/O and logging libraries the place
std::string objects solely might seem as arguments at some name websites.

And an important use case of all of them is, in fact, Godbolt the place folks
usually use {fmt} to print issues, particularly those not supported by printf,
and some hundred milliseconds of overhead is noticeable.

However, it’s arduous to keep away from <string> in C++. In case you are utilizing any
a part of the library it would possible get pulled in transitively. Additionally compile
instances weren’t horrible and there have been extra essential issues to take action for a
whereas I haven’t been actively engaged on it.

C++20 modified the state of affairs dramatically. Think about the next Howdy World
program with primary formatted output (hi there.cc)

#embody <fmt/core.h>

int principal() {
  fmt::print("Howdy, {}!n", "world");
}

With C++11 it took ~225ms to compile it with clang on an M1 MacBook Professional
(right here and under I report one of the best of three runs):

% time c++ -c hi there.cc -I embody -std=c++11
c++ -c hi there.cc -I embody -std=c++11  0.17s consumer 0.04s system 90% cpu 0.225 whole

With C++20 it now takes ~319ms, 40% extra:

% time c++ -c hi there.cc -I embody -std=c++20
c++ -c hi there.cc -I embody -std=c++20  0.26s consumer 0.05s system 95% cpu 0.319 whole

For comparability, right here is an equal C program (hello-stdio.c):

#embody <stdio.h>

int principal() {
  printf("Howdy, %s!n", "world");
}

and it takes solely ~59ms:

% time cc hello-stdio.c
cc hello-stdio.c  0.05s consumer 0.02s system 121% cpu 0.059 whole

So because of uncontrolled normal library bloat between C++11 and C++20 we are actually
greater than 5 instances slower to compile than printf, all due to <string>
embody. Can we do one thing about it?

It turned out that with sort erasure the dependency on std::string in
fmt/core.h has grow to be minimal so I made a decision to provide it a try to take away this
dependency. However first let’s see what’s happening by getting a time hint:

c++ -ftime-trace -c hi there.cc -I embody -std=c++20

and opening hello.json in Chrome utilizing
chrome://tracing/:

The time spent in fmt/core.h itself is just ~7.5ms and more often than not is
spent in contains:

  • <iterator>: ~71ms
  • <reminiscence>: ~37ms
  • <string>: ~122ms (highlighted on the above hint)

OK, <string> is certainly the worst however what in regards to the different ones? Sadly
eradicating the opposite contains doesn’t change the state of affairs as a result of the quantity of
stuff pulled in transitively stays roughly the identical. These headers present up on
the hint solely as a result of they occur to be included earlier than <string>.

Thorough analysis (googling) revealed that we will really do one thing about it
in libc++ due to _LIBCPP_REMOVE_TRANSITIVE_INCLUDES. Let’s attempt
it out:

% time c++ -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -c hi there.cc -I embody -std=c++20
c++ -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -c hi there.cc -I embody -std=c++20  0.18s consumer 0.03s system 91% cpu 0.231 whole

So this diminished the compile time to ~231ms, virtually C++11 degree, which is sweet
however nonetheless a far cry from stdio.

However with out spaghetti transitive dependencies it now is sensible to do away with
<iterator> and <reminiscence>.

<reminiscence> is just utilized in one place for std::addressof to workaround a damaged
implementation of std::vector<bool>::reference in libc++ that gives a really
modern overload of unary operator&. Right here’s this utilization:

customized.worth = const_cast<value_type*>(std::addressof(val));

We will substitute it with a couple of casts at the price of not with the ability to immediately
format std::vector<bool>::reference at compile time which is a tradeoff I can
reside with:

if constexpr (std::is_same<decltype(&val), T*>::worth)
  customized.worth = const_cast<value_type*>(&val);
if (!is_constant_evaluated())
  customized.worth = const_cast<char*>(&reinterpret_cast<const char&>(val));

Now that we don’t have <reminiscence> (I would favor to not have reminiscence of this
workaround) this brings the time right down to ~195ms, higher than the preliminary C++11
time.

Eradicating <iterator> is a bit trickier as a result of we use back_insert_iterator
to detect and optimize formatting into contiguous containers. Sadly it’s
not even attainable to detect it with SFINAE as a result of back_insert_iterator has
the identical API form as front_insert_iterator. There may be a variety of options to
this drawback akin to shifting the optimization to fmt/format.h. For now I added
a easy native substitute, fmt::back_insert_iterator. With out <iterator>
the time was right down to ~178ms.

This might be the time to sort out <string> however because it turned out we additionally
conditionally included <string_view> or <experimental/string_view> (sigh).
It doesn’t add any overhead immediately as a result of it’s pulled in from <string>
anyway however we have to take away one in an effort to do away with the opposite. We already
have a trait in ranges to detect std::string_view-like API which we will
apply right here with some simplifications:

template <typename T, typename Allow = void>
struct is_string_like : std::false_type {};

// A heuristic to detect std::string and std::string_view.
template <typename T>
struct is_string_like<T, void_t<decltype(std::declval<T>().find_first_of(
                             typename T::value_type(), 0))>> : std::true_type {
};

This can provide false positives however they’re benign because the worst factor that
can occur is that your sort that appears like a string will likely be formatted as a
string. When you don’t need this you may at all times decide out.

Now to the ultimate boss, <string>. There have been only a few references to
std::string in fmt/core.h. Nevertheless, we additionally had std::char_traits which
was utilized in our fallback implementation of string_view wanted for
compatibility with C++11. char_traits didn’t convey a lot worth and was simple to
substitute with C features akin to strlen and their fallbacks for constexpr.

The one API that used std::string was fmt::format. One choice was shifting it
to fmt/format.h however this might be an enormous breaking change. So I made a decision to do
one thing horrible however non-breaking and ahead declare std::basic_string
as an alternative. Doing such issues is frowned upon nevertheless it’s not the worst factor we had
to do in {fmt} to workaround limitations of C and C++ normal libraries. Right here
is a barely simplified model:

#ifdef FMT_BEGIN_NAMESPACE_STD
FMT_BEGIN_NAMESPACE_STD
template <typename Char>
struct char_traits;
template <typename T>
class allocator;
template <typename Char, typename Traits, typename Allocator>
class basic_string;
FMT_END_NAMESPACE_STD
#else
# embody <string>
#endif

FMT_BEGIN_NAMESPACE_STD and FMT_END_NAMESPACE_STD are outlined relying on
the implementation. At present each main normal libraries, libstdc++ and
libc++, are supported.

Clearly this didn’t work with our definition of fmt::format:

template <typename... T>
 FMT_NODISCARD FMT_INLINE auto format(format_string<T...> fmt, T&&... args)
    -> basic_string<char> {
   return vformat(fmt, fmt::make_format_args(args...));
 }

giving the next error:

In file included from hi there.cc:1:
embody/fmt/core.h:2843:31: error: implicit instantiation of undefined template 'std::basic_string<char, std::char_traits<char>, std::allocator<char>>'
FMT_NODISCARD FMT_INLINE auto format(format_string<T...> fmt, T&&... args)
                              ^

As typical in C++, the answer is extra ranges of indirection templates:

template <typename... T, typename Char = char>
 FMT_NODISCARD FMT_INLINE auto format(format_string<T...> fmt, T&&... args)
    -> basic_string<Char> {
   return vformat(fmt, fmt::make_format_args(args...));
 }

Was it value it? Let’s examine:

% time c++ -c hi there.cc -I embody -std=c++20
c++ -c hi there.cc -I embody -std=c++20  0.04s consumer 0.02s system 81% cpu 0.069 whole

We’re down from ~319ms to ~69ms and don’t even want
_LIBCPP_REMOVE_TRANSITIVE_INCLUDES any extra. With all of the optimizations,
fmt/core.h is now corresponding to stdio.h when it comes to compile instances, with
solely ~17% distinction in our take a look at. I believe this can be a very low value to pay for
improved security, efficiency and extensibility.

P.S. After the optimization stdio.h is now the second heaviest embody, including
whopping 5ms to compile instances.


Final modified on 2024-01-06

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top