Lowering code dimension in librsvg by eradicating an pointless generic struct
Somebody talked about cargo-bloat the opposite day and it
jogged my memory that I’ve been desirous to measure the code dimension for
generic features in librsvg, and see if there are enhancements to be
made.
Cargo-bloat may give you a tough estimate of the code dimension for every
Rust crate in a compiled binary, and in addition a extra detailed view of the
quantity of code generated for particular person features. It wants a [bin]
goal to work on; when you have only a [lib]
, it won’t do
something. So, for librsvg’s functions, I ran cargo-bloat on the
rsvg-convert
binary.
$ cargo bloat --launch --crates
Completed launch [optimized] goal(s) in 0.23s
Analyzing goal/launch/rsvg-bench
File .textual content Measurement Crate
10.0% 38.7% 1.0MiB librsvg
4.8% 18.8% 505.5KiB std
2.5% 9.8% 262.8KiB clap
1.8% 7.1% 191.3KiB regex
... traces omitted ...
25.8% 100.0% 2.6MiB .textual content part dimension, the file dimension is 10.2MiB
Observe: numbers above are a consequence of guesswork. They are not 100% appropriate and by no means will be.
The output above is for cargo bloat --release --crates
. The
--release
possibility is to generate an optimized binary, and --crates
tells cargo-bloat to simply print a abstract of crate sizes. The numbers
usually are not utterly correct since, for instance, inlined features could
have an effect on callers of a specific crate. Nonetheless, that is adequate to
begin getting an thought of the sizes of issues.
On this case, the librsvg crate’s code is about 1.0 MB.
Now, let’s discover what generic features we might be able to condense.
When cargo-bloat is run with out --crates
, it prints the scale of
particular person features. After some experimentation, I ended up with
cargo bloat --release -n 0 --filter librsvg
. The -n 0
possibility
tells cargo-bloat to print all features, not simply the highest N largest
ones, and --filter librsvg
is to make it print features solely in
that crate, not for instance in std
or regex
.
$ cargo bloat --release -n 0 --filter librsvg
File .textual content Measurement Crate Identify
0.0% 0.0% 1.2KiB librsvg librsvg::factor::ElementInner<T>::new
0.0% 0.0% 1.2KiB librsvg librsvg::factor::ElementInner<T>::new
0.0% 0.0% 1.2KiB librsvg librsvg::factor::ElementInner<T>::new
... output omitted ...
0.0% 0.0% 825B librsvg librsvg::factor::ElementInner<T>::set_style_attribute
0.0% 0.0% 825B librsvg librsvg::factor::ElementInner<T>::set_style_attribute
0.0% 0.0% 825B librsvg librsvg::factor::ElementInner<T>::set_style_attribute
... output omitted ...
0.0% 0.0% 358B librsvg librsvg::factor::ElementInner<T>::get_cond
0.0% 0.0% 358B librsvg librsvg::factor::ElementInner<T>::get_cond
0.0% 0.0% 358B librsvg librsvg::factor::ElementInner<T>::get_cond
... and so on ...
After trying a bit on the output, I discovered the “duplicated” features
I needed to search out. What is going on right here is that ElementInner<T>
is
a kind with generics, and rustc is producing one copy of every of its
strategies for each kind occasion. So, there’s one copy of every methodology
for ElementInner<Circle>
, one for ElementInner<Rect>
, and so forth
for all of the SVG factor varieties.
The code round that may be a bit convoluted; it is in part of the
library that hasn’t gotten a lot cleanup after the C to Rust port and
preliminary refactoring. Let’s examine what it’s like.
The preliminary code
Librsvg parses the XML in an SVG doc and builds one thing that
resembles a DOM tree. The tree itself makes use of the rctree
crate; it has reference-counted nodes and features like
first_child
or next_sibling
. Nodes can characterize XML components, or
character content material inside XML tags. Right here we’re all for components
solely.
Take into account a component like this:
<path d="M0,0 L10,10 L0,10 Z" fill="black"/>
Let’s take a look at how librsvg represents that. Inside every
reference-counted node in an rctree
, librsvg retains a NodeData
enum
that may differentiate between components and character content material:
enum NodeData {
Aspect(Aspect),
Textual content(Chars),
}
Then, Aspect
is an enum that may distinguish between all of the
components within the svg
namespace that librsvg helps:
enum Aspect {
Circle(Field<ElementInner<Circle>>),
Ellipse(Field<ElementInner<Ellipse>>),
Path(Field<ElementInner<Path>>),
// ... about 50 others omitted ...
}
Inside every of these enum’s variants there’s an ElementInner<T>
, a
struct with a generic kind parameter. ElementInner
holds the information
for the DOM-like factor:
struct ElementInner<T: ElementTrait> {
element_name: QualName,
attributes: Attributes,
// ... different fields omitted
element_impl: T,
}
For the <path>
factor above, this struct would include the next:
element_name
: a professional identifypath
with ansvg
namespace.attributes
: an array of(identify, worth)
pairs, on this case(d, "M0,0 L10,10 L0,10 Z")
,
(fill, "black")
.element_impl
: A concrete kind,Path
on this case.
The specifics of the Path
kind usually are not terribly attention-grabbing right here;
it is simply the internal representation for Bézier paths.
struct Path {
path: Rc<SvgPath>,
}
Let’s take a look at the small print of the reminiscence format for all of this.
Preliminary reminiscence format
Right here is how the enums and structs above are specified by reminiscence, in
phrases of allocations, with out bearing in mind the rctree:Node
that wraps a NodeData
.
There may be one allotted block for the NodeData
enum, and that block
holds the enum’s discriminant and the embedded Aspect
enum. In
flip, the Aspect
enum has its personal discriminant and house for a
Field
(i.e. a pointer), since every of its variants simply holds a single
field.
That field factors to an allocation for an ElementInner<T>
, which itself
comprises a Path
struct.
It’s awkward that the fields to carry XML-isms like a component’s identify
and its attributes are in ElementInner<T>
, not in Aspect
. However
extra importantly, ElementInner<T>
has somewhat bunch of strategies:
impl<T: ElementTrait> ElementInner<T> {
fn new(...) -> ElementInner<T> {
// plenty of building
}
fn element_name(&self) -> &QualName {
...
}
fn get_attributes(&self) -> &Attributes {
...
}
// A bunch of different strategies
}
Nonetheless, none however certainly one of these strategies really use the element_impl: T
area! That’s, all of them do issues which can be widespread to all
factor varieties. The one methodology that basically offers with the
element_impl
area is the ::draw()
methodology, and the one factor it
does is to delegate all the way down to the concrete kind’s implementation of
::draw()
.
Eradicating that generic kind
So, let’s shuffle issues round. I did this:
-
Flip
enum Aspect
right into astruct Aspect
, with the fields widespread to all factor varieties. -
Have an
Aspect.element_data
area… -
… that’s of kind
ElementData
, an enum that truly is aware of about
all supported factor varieties.
There are not any varieties with generics in right here:
struct Aspect {
element_name: QualName,
attributes: Attributes,
// ... different fields omitted
element_data: ElementData,
}
enum ElementData {
Circle(Field<Circle>),
Ellipse(Field<Ellipse>),
Path(Field<Path>),
// ...
}
Now the reminiscence format seems like this:
One additional allocation, however let’s have a look at if this adjustments the code dimension.
Code dimension
We wish to know the scale of the .textual content
part within the ELF file.
# previous
$ objdump --section-headers ./goal/launch/rsvg-bench
Idx Identify Measurement VMA LMA File off Algn
15 .textual content 0029fa17 000000000008a060 000000000008a060 0008a060 2**4
(2750999 bytes)
# new
Idx Identify Measurement VMA LMA File off Algn
15 .textual content 00271ff7 000000000008b060 000000000008b060 0008b060 2**4
(2564087 bytes)
The brand new code is is 186912 bytes smaller. Not earth-shattering, however
cargo-bloat now not reveals duplicated features which don’t have any motive
to be monomorphized, since they do not contact the various information.
previous:
$ cargo bloat --release --crates
File .textual content Measurement Crate
10.0% 38.7% 1.0MiB librsvg
# traces omitted
25.8% 100.0% 2.6MiB .textual content part dimension, the file dimension is 10.2MiB
new:
$ cargo bloat --release --crates
File .textual content Measurement Crate
9.2% 37.5% 939.5KiB librsvg
24.6% 100.0% 2.4MiB .textual content part dimension, the file dimension is 10.0MiB
Much less code ought to assist a bit with cache locality, however the features
concerned usually are not in sizzling loops. Virtually all of librsvg’s time is
spent in Cairo for rasterization, and Pixman for compositing.
Dynamic dispatch
All of the concrete varieties (Circle
, ClipPath
, and so on.) implement
ElementTrait
, which has issues like a draw()
methodology, though that
shouldn’t be seen within the varieties above. That is what’s most handy
for librsvg; utilizing Field<ElementTrait>
for kind erasure could be somewhat
awkward there — we used it a very long time in the past, however not anymore.
Finally the code wants to search out the ElementTrait
vtable that
corresponds to every of ElementData
‘s variants:
let information: &dyn ElementTrait = match self {
ElementData::Circle(d) => &**d,
ElementData::ClipPath(d) => &**d,
ElementData::Ellipse(d) => &**d,
// ...
};
information.some_method_in_the_trait(...);
The ugly &**d
is to reach on the &dyn ElementTrait
that every
variant implements. It should get much less ugly when pattern matching for
boxes
will get stabilized within the Rust compiler.
This isn’t the one approach of doing issues. For librsvg it’s
handy to really know the kind of a component, that’s, to maintain
an enum
of the identified factor varieties. Other forms of code could also be
completely proud of the kind erasure that occurs when you will have a
Field<SomeTrait>
. If that code wants to return to the concrete kind,
another is to make use of one thing just like the
downcast-rs crate, which lets
you get well the concrete kind contained in the field.
Heap utilization really modified
It’s possible you’ll discover within the diagrams under that the unique NodeData
did not field its variants, however now it does.
Previous:
enum NodeData {
Aspect(Aspect),
Textual content(Chars),
}
New:
enum NodeData {
Aspect(Field<Aspect>),
Textual content(Field<Chars>),
}
One factor I did not discover throughout the first round of memory
reduction is that the NodeData::Textual content(Chars)
variant shouldn’t be
boxed. That’s, the scale of NodeData
enum is the scale of the
largest of Aspect
and Chars
, plus house for the enum’s
discriminant. I needed to make each variants the identical dimension, and by
boxing them they occupy solely a pointer every.
I measured heap utilization for a reasonably large SVG:
I used Valgrind’s Massif to measure peak reminiscence consumption throughout loading:
valgrind --instrument=massif --massif-out-file=massif.out ./goal/launch/rsvg-bench --num-load 1 --num-render 0 India_roadway_map.svg
ms_print massif.out
The very first thing that ms_print
reveals is an outline of this system’s
reminiscence utilization over time, and the listing of snapshots it created. The next is an extract of its output for the new model of the code, the place snapshot 36 is the one with peak reminiscence utilization:
MB
14.22^ :
| @#:::::::::::::::::::::
| @@@#: :::: :: ::: ::
| @@@@@#: :::: :: ::: ::
| @@@ @@@#: :::: :: ::: ::
| @@@ @ @@@#: :::: :: ::: ::
| @@@@ @ @@@#: :::: :: ::: ::
| @@@@@@@ @ @@@#: :::: :: ::: ::
| @@@@ @@@@ @ @@@#: :::: :: ::: ::
| @@ @@ @@@@ @ @@@#: :::: :: ::: :::
| @@@@ @@ @@@@ @ @@@#: :::: :: ::: :::
| @@ @@ @@ @@@@ @ @@@#: :::: :: ::: :::
| @@@ @@ @@ @@@@ @ @@@#: :::: :: ::: :::
| @@@@@@ @@ @@ @@@@ @ @@@#: :::: :: ::: :::
| :@@@ @@@ @@ @@ @@@@ @ @@@#: :::: :: ::: ::@
| @@@:@@@ @@@ @@ @@ @@@@ @ @@@#: :::: :: ::: ::@
| @@@@@ @:@@@ @@@ @@ @@ @@@@ @ @@@#: :::: :: ::: ::@
| :::@ @ @ @:@@@ @@@ @@ @@ @@@@ @ @@@#: :::: :: ::: ::@
| :@:: @ @ @ @:@@@ @@@ @@ @@ @@@@ @ @@@#: :::: :: ::: ::@
| @@@@::::@:: @ @ @ @:@@@ @@@ @@ @@ @@@@ @ @@@#: :::: :: ::: ::@
0 +----------------------------------------------------------------------->Mi
0 380.9
Quantity of snapshots: 51
Detailed snapshots: [3, 4, 5, 9, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 (peak), 50]
Since we’re simply measuring reminiscence consumption throughout loading, the
chart reveals that reminiscence utilization climbs steadily till it peaks when the
full SVG is loaded, after which it stays kind of fixed whereas
librsvg does the preliminary CSS cascade.
The model of librsvg with out adjustments reveals this (notice how the massif
snapshot with peak utilization is quantity 39 on this one):
--------------------------------------------------------------------------------
n time(i) whole(B) helpful-heap(B) additional-heap(B) stacks(B)
--------------------------------------------------------------------------------
39 277,635,004 15,090,640 14,174,848 915,792 0
That’s, 15,090,640 bytes.
And after making the adjustments in reminiscence format, we get this:
--------------------------------------------------------------------------------
n time(i) whole(B) helpful-heap(B) additional-heap(B) stacks(B)
--------------------------------------------------------------------------------
36 276,041,810 14,845,456 13,935,702 909,754 0
I.e. after the adjustments, the height utilization of heap reminiscence when the entire
file is loaded is 14,845,456 bytes. So the adjustments above not solely
diminished the code dimension, but in addition barely lowered reminiscence consumption at runtime.
Good!
Wall-clock efficiency
This file shouldn’t be large — say, 15 MB when loaded — so no matter we gained
in reminiscence consumption is a negligible win. It is good to know that
code dimension could be diminished, however it’s not an issue for librsvg both
approach.
I did a number of measurements of the time utilized by the previous and new
variations to render the identical file, and there was no vital
distinction. It’s because though we could get higher cache locality
and the whole lot, the time spent executing the element-related code is
a lot smaller than the rendering code. That’s, Cairo takes up most
of the runtime of rsvg-convert
, and librsvg itself takes comparatively
little of it.
Conclusion
Not less than for this case, it was possible to cut back the quantity of code
emitted for generics, since this can be a case the place we positively did not
want generics! The code dimension within the ELF file’s .textual content
part shrank
by 186912 bytes, out of two.6 MB.
For code that does want generics, one can take totally different approaches.
For instance, a perform that take arguments of kind AsRef<Path>
can
first really receive the &Path
, after which move that to a perform
that does the actual work. For instance, from the usual library:
impl PathBuf {
pub fn push<P: AsRef<Path>>(&mut self, path: P) {
self._push(path.as_ref())
}
fn _push(&mut self, path: &Path) {
// plenty of code right here
}
}
The push
perform can be monomorphized into very tiny features
that decision _push
after changing what you handed to a &Path
reference, however the large _push
perform is just emitted as soon as.
There may be additionally the momo crate, which helps doing comparable issues
routinely. I’ve not used it but, so I can not remark additional on
it.
You may see the patches for librsvg in the merge request.