Diagram format engines: Minimizing hierarchical edge crossings
TALA is a brand new format engine designed particularly for software program structure. At its core, it combines many alternative graph visualization algorithms to supply aesthetic diagrams. This text is part of a sequence the place we dive into the internal workings of TALA.
At present’s matter is on minimizing edge crossings in hierarchical layouts.
For background, a hierarchical format is a diagram that splits all of the objects into layers of a hierarchy in a single route, e.g. top-down being the most common. Many format engines similar to Graphviz’s DOT are solely hierarchical, that means your diagram shall be a hierarchy whether or not or not the mannequin you supposed is hierarchical in nature. TALA, then again, identifies parts of the diagram which look like hierarchical, and selectively applies a hierarchical format algorithm to these parts.
An instance hierarchical diagram rendered by TALA.
The Drawback
Let’s begin with probably the most primary hierarchy: a -> b -> c
. There’s just one technique to prepare this hierarchically (assuming top-down).
Now let’s throw in 2 extra objects.
There may be now extra than simply the one technique to prepare. You may swap the nodes within the second or third layers, transfer a
to the left, transfer nodes to different layers.
All else being equal, the format engine ought to select the association that minimizes edge crossings. Typically, fewer edge crossings means a cleaner and most well-liked format.
We will fairly simply see {that a} swap in both the second or third layer ends in fewer crossings.
What about this diagram? Are you able to see what must rearrange to optimize this?
When the diagram will get bigger, it is not simple to search out the swaps and actions to scale back crossings. We definitely cannot eyeball it. One of these drawback is NP-hard: there’s no algorithm that’ll give an optimum answer for all instances in an inexpensive runtime. So we approximate with heuristics. The crossing minimization steps lined on this article rearrange the above diagram such that the variety of crossings is diminished from the 12 (pictured above), to the two (pictured beneath).
Assumptions
By the point the diagram reaches the crossing-minimization stage of this text, it has already handed by some related phases:
- Hierarchy identification
- Connection-breaking
- Layer task
These are meaty sufficient to be posts of their very own, so we’ll maintain the reasons of them transient. Additionally word that there are phases that run after this, so what we arrive at is not the top results of your entire engine.
Hierarchy identification
This stage asks inquiries to establish whether or not a subgraph ought to use the hierarchical format algorithms. For instance, does this subgraph have majority connections flowing in a single route? If not, a hierarchy is probably going a poor selection of format. Does it comprise particular shapes? If it has a sequence of steps, that takes priority over hierarchies.
Connection-breaking
Dummy nodes are nodes inserted in between lengthy connections. They exist just for the span of crossing minimization.
This helps cross-minimization localize its algorithm layer-by-layer, as a substitute of getting to contemplate adjoining nodes that skip layers.
Dummy nodes are represented with -1 all through this text.
Layer task
This stage turns an inventory of related nodes and assigns them their layers.
After this stage, layers are fastened, so a node can’t transfer up or down layers. The one factor they do in crossing minimization is transfer inside its personal layer.
A two-phase method
TALA’s technique for minimizing edge crossing is separated into two phases.
Section 1: Layer by layer crossing minimization
If we will run a heuristic search, we have to first outline the aim to attenuate, one thing quantifiable.
Check out the next distinction in a crossing vs non-crossing, how would possibly you generalize?
Within the first diagram, the place of the node in its row (index) and its adjoining node are the identical. Within the second, they’re swapped.
What if we add a node?
This produces one further crossing with one of many edges, however not the opposite. And the sting that it created the crossing for, the index distinction of the nodes elevated (the highest one is at index 2 whereas the underside one is at index 0).
When the index distinction between a node and its adjoining nodes is decrease, there are fewer crossings. Let’s name this distinction its “adjacency delta”. For instance, within the above diagram, this might be the calculation for the adjacency delta of the 2 backside nodes:
- Left node: 2 (as a result of the highest node it is related to is in place 2, and it is in place 0).
- Proper node: 0.5 (it is related to 2 prime nodes, at positions 0 and 1, and we simply take the common, which is 0.5. The node itself is in place 1, so the delta is 0.5).
Generalized, for your entire hierarchy, we need to prepare the nodes such that the sum of the adjacency deltas for all nodes is minimized.
If we swap the underside two nodes, the numbers change:
- Left node: 1 (its place is 1, the highest related node place is 2, delta is 1)
- Proper node: 0.5 nonetheless
General, that swap decreases the sum of adjacency deltas.
What makes this quite a bit more durable in a hierarchy is that nodes are sometimes interconnected. While you transfer one node, you are affecting a number of adjacency deltas: that node’s plus all of its adjoining nodes. And it cascades.
So, as a substitute of making an attempt to optimize globally (take into account all nodes within the hierarchy), we simplify and optimize one layer at a time on this part.
Per layer, rearrangements occur in two extra, nested, steps:
- Contemplate just one facet (both prime or backside) and type primarily based on their adjacency deltas.
- Contemplate each side, however solely enable swapping with sibling neighbors to not utterly overwrite step 1.
The purpose of getting completely different targets is so to flee native minimas. Each of those algorithms will run layer by layer, top-down, a number of occasions, stopping when an optimum format (no modifications in a complete iteration) is reached or the max variety of iterations is hit.
That is the pseudocode:
for i = 0 to N:
for layer in topDown(graph):
sortByUpperAdjacency(layer)
for node in layer:
bestSwap = findBestSwapNeighbor(node, layer)
if bestSwap:
swap(node, bestSwap)
This is an instance:
From this prime layer, node a
has an adjacency delta of 5/3, as a result of its neighbors are at positions 0, 1, and 4, and the common is 5/3. Node c
has 3, and node b
has 5/3 as properly.
So based on the algorithm, the type will end in a
, b
, c
.
Shifting to the second layer, that is the ensuing rearrangement. We can’t present all of the calculations once more, however the type strikes one node.
Since we’re on the second layer, we search for swaps by contemplating each above and beneath.
Discover how there is a crossing between h
and j
, and -1
and d
. Swapping h
and -1
reduces their adjacency deltas, eradicating that crossing.
e
and f
can equally be improved by swapping.
We then proceed down all the way in which to the underside. As soon as it is made it by all of the layers, that is thought of one move. We carry out a number of passes (the precise quantity is set by the variety of layers and max width of layers).
Section 2: Sifting
Within the earlier instance, we converged on an optimum state after a pair passes. The format engine nonetheless has steps to do after this, however for crossing-minimization, there is no such thing as a technique to rearrange for much less crossings.
However that is not all the time the case. It may very well be that they only shuffle round, getting caught at an area minima (even with the makes an attempt to flee it).
That is what this second part is for.
Sifting goes by the hierarchy, node by node ordered by its diploma (variety of connections), as a substitute of the layer by layer method within the first part. For every one among these nodes, we merely examine if swapping with any sibling in the identical layer would end in much less crossings (taking a look at each side), and if that’s the case make that swap.
for i = 0 to M:
Q = sortNodesByDegree(G)
for node in Q:
for sibling in layer:
swap(node, sibling)
countCrossings()
swap(sibling, node)
moveToMinEdgeCrossings(node)
This is an instance of sifting. Word that the diagram is again to the beginning place earlier than part 1, in order that we are able to show on an already-familiar diagram.
The very best diploma node is f
, so we strive swapping it amongst indices in its layer, protecting observe of the place with the bottom crossing depend.
Then we transfer onto node a
, as a result of it has the second highest diploma. It has no swaps that’d enhance edge crossing depend.
Subsequent up is node c
, which is swapped with b
.
And it continues swapping nodes round till all nodes are processed. Then it repeats, as long as there have been modifications within the earlier iterations or we attain the utmost 10 iterations.
Contemplating containers
TALA checks on tons of of real-world software program structure diagrams, and we regularly make changes and issues for that individual use case. The commonest one which we continually have to include into our implementation is the help of containers. It is hardly ever talked about in graph visualization publications, but it surely’s important for software program structure and nontrivially impacts algorithms.
This is an instance of containers showing inside hierarchies. It provides the constraint that nodes contained in the container can’t transfer out of it, and vice versa. So, making a swap demonstrated right here can be unlawful.
As a substitute of constructing a swap between these two nodes, we have modified the algorithm to carry out grouped swaps. As a substitute of a single node swapping, if it is inside a container, the entire container swaps with the node, carrying its descendents with it.
Container descendents can solely order and swap amongst themselves.
This was an easy change to Section 2’s sifting: simply filter the listing of siblings that swaps take into account. Nevertheless, for Section 1, the algorithm has to vary to be recursive, going from least nested to most nested nodes.
The existence of containers additionally opens up the opportunity of a swap that does not lower crossing minimizations however does lower edge size. Contemplate the next. The 2 nodes within the container can swap for a format with shorter edge lengths. This would not be attainable with out containers, because the remoted node simply would not be a part of the hierarchy.
Conclusion
There’s numerous completely different algorithms that exist and work for hierarchical layouts. It is a cool side of NP-hard issues: there’s usually extra variety of attention-grabbing concepts. What’s extra, every algorithm can have numerous weights and tuning, and randomization strategies that additional cut up up the breadth of outcomes. There is no such thing as a one proper approach. D2 bundles Dagre and ELK, that are each hierarchical layouts, and it’s normal for all three to present completely different layouts for a similar diagram!
TALA’s dealing with of hierarchies is an space of energetic enchancment. A kind of enhancements is in how we establish hierarchies. There’s many attention-grabbing heuristics to make use of, and it is a distinctive benefit of TALA that it might probably select whether or not to enter the hierarchical algorithms or not. For instance, on this diagram, the creator didn’t image a hierarchical mannequin of their head, however utilizing Dagre will end in a hierarchical format regardless:
Whereas TALA selected to make use of an orthogonal format as a substitute:
If you would like to make these diagrams your self, it is quick and straightforward with D2, an open-source, fashionable scripting language for turning textual content to diagrams.
References
If you would like to be taught extra: