Now Reading
Simulation Islands :: Box2D

Simulation Islands :: Box2D

2023-10-10 02:55:58

Island administration is a basic low degree function of physics engines and may have a big effect on solver design and efficiency. This was one of many first issues I made a decision to work on for Box2D model 3 (v3).

Since I started engaged on v3 I have been evaluating a number of algorithms for island administration. My purpose has been to make island constructing scale higher with a number of CPU cores. Listed here are the three approaches I’ve thought of:

  • Depth-first search (DFS)
  • Parallel union-find
  • Persistent islands

Let’s dive in!

What are islands?

You’ll be able to view a inflexible physique simulation as a graph the place our bodies are nodes and constraints are edges. Constraints include contacts (shapes touching) and joints (hinges, sliders, and many others). From the graph perspective contacts and joints are the identical.

In graph principle islands are referred to as connected components. In inflexible physique simulation static our bodies will not be a part of the graph. For instance, this picture exhibits three islands.

islandsIslands

Static our bodies will not be a part of the island graph as a result of they are often shared amongst a number of islands. They want to shared as a result of in a online game a number of our bodies will likely be touching the static floor. With out sharing the static our bodies amongst a number of islands there will likely be far fewer islands. Islands then lose their worth. Islands will not be required for correctness. They’re an optimization (as we will see) and so they solely work effectively as an optimization if there are numerous islands. Static our bodies can be shared as a result of nothing within the simulation impacts their place or velocity. Sharing static our bodies amongst a number of islands requires just a little little bit of care within the algorithms, however that is straight ahead and for simplicity I cannot embrace such logic in my examples.

In follow an island is an information construction that retains monitor of a set of our bodies and a set of constraints. That is the island information construction in Box2D v2.4:

class b2Island
{
    b2Body** our bodies;
    b2Contact** contacts;
    b2Joint** joints;
    int bodyCount;
    int contactCount;
    int jointCount;
};

The island makes use of arrays of pointers as a result of the island doesn’t personal the our bodies or constraints. An island is a subset of the simulation world.

Customers of Box2D do not work together with islands. They’re an implementation element of the solver. Box2D v2.4 doesn’t retain islands throughout time steps, so there isn’t any technique to entry them.

Sleeping and waking

Islands are very helpful in sport physics. An important facet is sleeping. Sleeping inflexible our bodies are faraway from the solver and this drastically reduces their CPU load. The engine checks islands throughout simulation for low vitality. If all our bodies in an island have low vitality for a number of time steps, then all of the our bodies within the island are flagged as sleeping and now not added to the energetic simulation. The Box2D debug show exhibits sleeping our bodies as grey.

sleep

Sleeping have to be finished per island, not per physique. I made this error early in my work on sport physics. In case you put a physique to sleep however not the entire island then you will note shapes start to overlap and joints start to dislodge.

Sleep has a flip facet as effectively: waking.

Our bodies wake different sleeping our bodies by constraints and this should propagate by all touching our bodies instantly. In any other case you get these nasty overlapping shapes and dislodged joints. Islands are simply as vital for waking as for sleeping.

Parallel islands

Islands work effectively for multicore simulation. Sport worlds typically have many separate islands of our bodies. A ragdoll over right here. A pile of particles over there. This pertains to the idea of spatial coherence as described by Gino van den Bergen in his e book Collision Detection in Interactive 3D Environments. Spatial conherence says that inflexible our bodies are inclined to unfold out and we will benefit from this to enhance efficiency and decrease reminiscence utilization.

An island might be simulated indepedently from different islands. Subsequently an island might be simulated on a thread and you may ship a number of islands to a number of threads utilizing a activity system akin to enkiTS. This scales very effectively with CPU core depend so long as there are a adequate variety of islands. That is additionally cache environment friendly as a result of the core is doing vital work on a subset of the simulation world.

Parallel islands simulation just isn’t a silver bullet. There will not be sufficient islands to span all of the CPU cores. In some video games there could also be a big pile or tower of inflexible our bodies which can be in a single island. This huge single island can dominate efficiency even with multithreading. It could actually result in a scenario the place many cores are sitting idle. I plan to debate giant islands in a future put up.

towerGiant Island

Box2D v2.4 makes use of depth-first search DFS to construct islands from scratch each time step. Model 2.4 doesn’t have multithreading assist, so islands are solely used for sleeping and waking.

The SolveIslands operate takes all of the our bodies and constraints of the simulation world and finds their simulation islands. Earlier than discovering the islands, it clears a mark (visitation flag) on each physique and constraint. These marks be certain that a physique or constraint is simply added to a single island.

The algorithm has a simulation island G that’s reused for every island discovered. The island finder loops over all our bodies and appears for awake dynamic our bodies to be the seed for the island. Ranging from the seed, linked our bodies and constraints are added to G. As soon as all of the linked our bodies have been traversed the island G is simulated, updating the constraint forces and the physique velocities and positions. Then the algorithm continues searching for the following seed physique that hasn’t already been simulated within the present time step.

operate SolveIslands(our bodies, constraints)
    ClearMarks(our bodies)
    ClearMarks(constraints)
    let G be an island

    for seed in our bodies
        if seed not marked and seed is dynamic and seed is awake
            Clear(G)
            Mark(seed)
            let S be a stack
            S.Push(seed)
            whereas S not empty
                b = S.Pop()
                G.Add(b)
                for c in b.GetConstraints()
                    if c not marked
                        G.Add(c)
                        Mark(c)
                        different = OtherBody(c)
                        if different not marked
                            S.Push(different)
            G.Simulate()

DFS has linear time complexity within the variety of our bodies. It may be made quicker by solely utilizing awake our bodies as seeds. This may be finished by retaining an array of awake dynamic our bodies.

Traversal mark administration may also be costly. At the price of extra reminiscence, every physique and constraint can maintain a time step counter. The DFS can then examine a physique’s time step counter with the present time step depend of the physics world. Marking a physique or constraint is finished by setting the native time step counter equal to the world time step depend.

The DFS naturally wakes our bodies. When a physique is added to an island it’s flagged as awake. A physique can solely be flagged as sleeping with one other routine referred to as IslandSleep.

A part of island simulation is checking for sleeping. Each physique has velocity and a sleep timer. If the rate is bigger than a velocity threshold then sleep timer is reset. In any other case it’s superior by the point step. If the minimal sleep timer of all our bodies in an island is bigger than a time to sleep threshold then your complete island is marked as sleeping.

operate IslandSleep(islandBodies, timeStep)
    minSleepTime = MAX_FLOAT
    for b in islandBodies
        if b velocity > velocityThreshold
            b.sleepTime = 0
            minSleepTime = 0
        else
            b.sleepTime += timeStep
            minSleepTime = Min(minSleepTime, b.sleepTime)

    if minSleepTime > timeToSleep
        for b in islandBodies
            MarkSleeping(b)

Then the following time step no collisions between sleeping our bodies are up to date and sleeping our bodies will not be added to simulation islands except an awake physique begins interacting with a sleeping physique. It is a big win for efficiency in giant sport worlds.

Watch this video if you’d like extra particulars on DFS:

Union-find

Union-find (UF) is a competitor to DFS for sport physics island simulation. This algorithm can also be referred to as a Disjoint-set data structure.

UF focuses on merging units (islands). Every dynamic physique begins alone in its personal set. As constraints are added to the world the units are merged. Inside every set a single physique is taken into account the basis physique and uniquely identifies the set. In spite of everything constraints have been added the union-find is full.

This video collection explains union-find very effectively: https://youtube.com/playlist?list=PLDV1Zeh2NRsBI1C-mR6ZhHTyfoEJWlxvq&si=brluQ5CldgIXFMhv. If you’re not acquainted with union-find and/or path compression, please go watch this and are available again.

Like DFS, union-find (with path compression) has linear time complexity for constructing islands. Union-find generally is a drop-in alternative for DFS. Nevertheless, some further care have to be used to make sure that all touching our bodies are awakened.

The Amdahl drawback

DFS and UF are quick however they aren’t parallel algorithms. So after I started multithreading Box2D I began to get timing outcomes like this:

island gap

That is Amdahl’s law in motion. Multi-core scaling is proscribed by single-threaded processing. Additionally, you will discover a single-threaded broadphase part which I will talk about in a future put up.

Parallel union-find

On the GDC 2022, Jorrit Rouwe launched a parallel union-find algorithm. The slides are here. You too can discover an implementation in Jolt Physics.

I applied this algorithm in Box2D v3. Within the course of I realized loads about utilizing atomics and lock-free algorithms. If you wish to study extra about atomics and the C/C++ reminiscence mannequin, I extremely advocate this two-part video by Herb Sutter:

The primary problem I confronted was making the algorithm recursive in order that it might wake whole islands in a single time step. I wanted so as to add some further iteration to make sure that all touching our bodies are woken up. After fumbling with atomics a bit, I acquired one thing working.

Nevertheless, I rapidly bumped into an issue and did a number of head scratching attempting to determine the issue. Fortuitously I nonetheless have loads of hair.

Parallel union-find makes use of atomic compare-and-swap (CAS) when merging islands. This implies the order of constraints within the ensuing islands is non-deterministic. The ultimate order relies on which core occurs to be quicker at finishing the CAS. This may change dramatically throughout time steps and repeated runs of the applying.

It is a major problem for the solver utilized in sport physics as a result of the constraints are solved separately, sequentially. The outcomes from every constraint propagate to the following constraint. This contraint solver algorithm is formally referred to as the Gauss-Seidel method. The issue is magnified as a result of Box2D makes use of heat beginning for contacts, which will get confused when the contraint order varies every time step. You’ll be able to learn extra about Gauss-Seidel and heat beginning here.

The constraints might be sorted after union-find and they are often sorted for every island independently. Nevertheless, I couldn’t discover a quicker method to do that than by utilizing quicksort which is O(n log n). For big islands this added value is critical, even when sorting a minimal index array. This video exhibits the impact of a parallel union-find and the ensuing non-deterministic constraint order.

The serial DFS and UF do not need this drawback. They generate constraints in a deterministic order. The lack of determinism is because of multithreading and the usage of atomic CAS. If you wish to study extra about determinism and multithreading, I like to recommend this video:

What’s subsequent?

After implementing the parallel union-find I took a step again and reconsidered my choices and tried to take a look at the large image.

First, I made a decision that I didn’t need to sacrifice determinism. As a programmer I worth predictability within the software program I write. Determinism impacts my potential to debug and any Box2D person’s potential to debug. I am keen to surrender efficiency for my software program to be debuggable.

Second, Amdahl says that the serial a part of a program dominates scaling. Nevertheless, having a really quick serial half is likely to be okay. Not each algorithm must be parallel whether it is quick sufficient. So possibly I can discover one thing quicker than DFS or serial UF.

Third, Gino additionally describes temporal coherence in his e book. Temporal coherence says the configuration of inflexible our bodies in a simulation doesn’t change a lot throughout a single time step. How typically do islands change? I think islands change slowly.

Fourth, possibly the price of islands might be unfold out in different methods. For instance, it’s pressing that islands are merged inside the present time step. That is essential to wake islands instantly. However it isn’t pressing to place islands to sleep or to separate islands when our bodies cease interacting. Delayed sleep is a minor efficiency loss.

Persistent islands

The factors above led to me examine persistent islands. Persistent islands are retained throughout time steps. They should assist including and eradicating our bodies and contraints incrementally.

When a dynamic physique is created it additionally creates a brand new island: an island of a single physique. Static our bodies don’t get islands.

When a constraint between two dynamic our bodies is created the 2 related islands are merged. It might be the case that the our bodies are already in the identical island and that is an early out within the island merger.

When a dynamic physique is destroyed it’s faraway from its island. If that island is now empty then the island can also be destroyed.

When a constraint is destroyed then it’s faraway from the island. The island turns into a candidate for splitting.

I must merge persistent islands instantly. That is vital to make sure that all our bodies that ought to be awake are added to the energetic simulation.

Persistent island splitting might be deferred. I can put a quota on island splitting. For instance, one island per time step can break up. Perhaps I can select to separate the biggest island first. Or possibly I can select to separate the island with probably the most constraints eliminated. Some heuristic can be utilized. I think any affordable heuristic is okay.

I modified the v2.4 island construction to assist including and eradicating our bodies and constraints rapidly utilizing linked lists. This additionally makes it quick to merge islands. Linked lists are sluggish to traverse due to cache misses, however I think I cannot must traverse them typically. I additionally added a flag maySplit to point that the island has had contacts or joints eliminated and it might be doable to separate the island into two or extra islands.

struct b2Island
{
	int index;
	int parentIsland;

	int headBody;
	int tailBody;

	int headContact;
	int tailContact;

	int headJoint;
	int tailJoint;

	bool maySplit;
};

If I’ve an current set of islands I can add an edge and this may increasingly result in two islands merging. Island merging ought to be quick and it needs to be deterministic. I made a decision to stay with serial union-find for merging islands.

Right here is the code for including a contact constraint. Joints are related. I abbreviated the code a bit and not noted path compression. You’ll be able to see the total model here. Discover that islands are flagged as awake when they’re merged.

void b2LinkContact(b2Island* islands, b2Body* bodyA, b2Body* bodyB, b2Contact* contact)
{
	int islandIndexA = bodyA->islandIndex;
	int islandIndexB = bodyB->islandIndex;

	if (islandIndexA == islandIndexB)
	{
        // bodyA and bodyB are already in the identical island
		b2AddContactToIsland(&islands[islandIndexA], contact);
		return;
	}

	// Discover root of islandA
	b2Island* rootA = &islands[islandIndexA];
	b2WakeIsland(rootA);
    whereas (rootA->parentIsland != B2_NULL_INDEX)
    {
        rootA = &islands[rootA->parentIsland];
        b2WakeIsland(rootA);
    }

	// Discover root of islandB
	b2Island* rootB = &islands[islandIndexB];
	b2WakeIsland(rootB);
    whereas (rootB->parentIsland != B2_NULL_INDEX)
    {
        rootB = &islands[rootB->parentIsland];
        b2WakeIsland(rootB);
    }

	// Make islandB a toddler of islandA
	if (rootA != rootB)
	{
		rootB->parentIsland = rootA->index;
	}

	b2AddContactToIsland(rootA, contact);
}

Eradicating a contact from an island entails linked record bookkeeping and flagging the island for spitting (maySplit). Union-find just isn’t concerned.

As soon as all the brand new constraints have been added to all islands, there’s a serial merge step. Right here is an abbreviated model of the code. The complete model is here. The operate b2MergeIslandWithParent is just a few boring bookkeeping code. Observe that b2DestroyIsland invalidates the present island, so tread rigorously.

void b2MergeIslands(b2Island* islands, int depend)
{
	// Step 1: guarantee each little one island factors on to its root island
	for (int i = 0; i < depend; ++i)
	{
		b2Island* island = &islands[i];
		b2Island* rootIsland = island;

		whereas (rootIsland->parentIsland != B2_NULL_INDEX)
		{
			rootIsland = &islands[rootIsland->parentIsland];
		}

		if (rootIsland != island)
		{
			island->parentIsland = rootIsland->index;
		}
	}

	// Step 2: merge each awake island into its guardian
	for (int i = 0; i < depend; ++i)
	{
		b2Island* island = &islands[i];
		if (island->parentIsland != B2_NULL_INDEX)
		{
            b2MergeIslandWithParent(island);
            b2DestroyIsland(island);
		}
    }
}

These code snippets present that union-find builds a number of island bushes then collapses every island tree into its root island. With persistence this course of can happen incrementally as shapes come into contact.

Island splitting

I deal with island splitting by taking an island and utilizing all of its our bodies as seeds for the DFS algorithm. This may very well be finished with union-find as effectively. Any island discovering algorithm will do. I am probably not splitting the island, I simply constructing new islands from the unique island.

As a result of the island is assured not to connect with different islands, the depth-first traversal is assured to remain inside the unique island. That is crucial to permit the unique island to be break up concurrently with different work.

Right here is the pseudo-code for island splitting. It is vitally much like the DFS algorithm above. As a substitute of coping with all of the our bodies and constraints in your complete simulation world, it’s restricted to the our bodies and constraints of a single island. After a brand new island is constructed, it’s added to the world as a brand new persistent islands. After the island is break up the unique island is destroyed. That is nice as a result of nothing ought to discuss with the unique island at this level.

See Also

operate SplitIsland(world, island)
    our bodies = GetIslandBodies(island)
    constraints = GetIslandConstraints(island)
    ClearMarks(our bodies)
    ClearMarks(constraints)

    let G be an island

    for seed in our bodies
        if seed not marked and seed is dynamic and seed is awake
            Clear(G)
            Mark(seed)
            let S be a stack
            S.Push(seed)
            whereas S not empty
                b = S.Pop()
                G.Add(b)
                for c in b.GetConstraints()
                    if c not marked
                        G.Add(c)
                        Mark(c)
                        different = OtherBody(c)
                        if different not marked
                            S.Push(different)
            world.AddIsland(G)

    world.DestroyIsland(island)

The results of splitting the island is a number of new islands. It’s a little unhappy if just one new island outcomes, however it might be the proper consequence. Eradicating a constraint could or could not trigger and island to separate into two islands. Figuring out whether or not an island will break up or not is much like the work of doing the DFS, so I’d as effectively optimistically try to separate the island.

Making persistent islands quick

Whereas engaged on parallel union-find I realized that the narrow-phase can drive island administration. The narrow-phase is the simulation stage the place contact factors are computed between colliding shapes. The contact pairs that handle doubtlessly colliding shapes are persevered throughout time steps. Every contact pair holds the present contact factors. The variety of contact factors is 0, 1, or 2. If the variety of contact factors modifications from zero to non-zero or vice-versa then there’s an edge that ought to be added or faraway from the island graph.

Scalability requires the narrow-phase to be executed in parallel. It is a naturally parallel algorithm as a result of the contact factors between one form pair will not be affected by different form pairs. That is the best a part of a physics engine to unfold throughout a number of cores. Like butter on bread.

When a contact pair is up to date there are three outcomes related to persistent islands:

  1. edge added
  2. edge eliminated
  3. unchanged

The gambit is that almost all pairs fall within the third camp in a typical sport state of affairs. This appears to be the case up to now in my testing.

Edge additions must be processed instantly, earlier than the islands are solved. I deal with this with serial union-find as described above. Edge removals are much less pressing. The contact constraint must be eliminated instantly, however the island does not must be break up. Eradicating and edge does not imply the island will break up. There could also be different edges holding the island collectively. Nevertheless, the island would possibly break up. So I flag the island as doubtlessly splitting.

I can defer island splitting to later. For instance, I can break up an island after it has been solved, in the identical activity. This works as a result of island A does not care if island B is break up. Island A solely cares if island B needs to merge with it. And merging is dealt with serially.

How about determinism? Determinism is maintained if the sting additions and removals are processed in deterministic order. It seems in Box2D that the order of contact pairs on this planet is deterministic, so I simply must course of the sting modifications in keeping with the order of the contact pairs. Nevertheless, there generally is a big variety of contact pairs and looping over all of the pairs serially searching for modifications goes to be very sluggish.

There’s one information construction that may have an giant variety of entries and nonetheless be quick to iterate throughout. The bit array! Fashionable CPUs assist us do that rapidly with bit scanning intrinsics. See this and this.

The bit array can even work effectively with multithreading. Every narrow-phase employee can entry a thread context that holds an area bit array. When the contact pair has an edge add or take away final result, it flips a bit within the thread context bit array. Then after the slim section is full, the primary thread can bit-wise OR all of the bit arrays collectively. That is fairly quick utilizing a bit array constructed on 64-bit phrases. And no atomics are wanted.

Placing this all collectively I’ve this system circulation proven beneath. This instance has three threads within the slim section. Every thread will get its personal bit array which is initially all zeros. When a contact pair is processed it checks if the variety of contact factors went from 0 to non-zero or vice-versa. In both case it units the thread’s bit array on the index related to the contact pair. In spite of everything contact pairs are processed the bits from every thread are mixed into a world bit array utilizing bit-wise OR. Then a loop iterates over the worldwide bit array searching for set bits. If a set bit is discovered, the code appears up the contact pair and determines if an edge ought to be added or faraway from the island graph after which does that work. These additions and removals are finished serially in the primary thread. This retains a deterministic constraint order.

persistent island flowPersistent island program circulation

In follow the variety of set bits may be very small. Bit traversal may be very quick, even for numerous contact pairs.

Outcomes

And efficiency? Efficiency is nice. Gino was proper!

This picture exhibits a typical timing consequence. The blue bars are the contact pairs and the yellow bars are the island constraint solvers. The hole in between is the serial island administration. The hole additionally contains island solver preparation and activity system overhead.

persistent island performance

Let us take a look at some benchmarks.

The primary check is 182 pyramids with a base of 10 containers, so 55 our bodies every and a complete of 10010 our bodies. These pyramids will not be shifting and so this check favors persistent islands.

pyramid benchmark

The tumbler check has 2000 containers inside a hole field that’s consistently rotating on a revolute joint. This check ought to be troublesome for persistent islands as a result of constraints are consistently being added and eliminated. The darker containers are coloured that method as a result of Box2D considers them quick sufficient to interact steady collision checks.

tumbler benchmark

Listed here are the outcomes for the 2 exams. Instances are in milliseconds. The primary time worth is the common and the worth in parentheses is the utmost. For DFS I am utilizing commit #32 of Box2D v3. For persistent islands I am utilizing commit #36.

For persistent islands there might be many frames in a row the place the island administration has no work, making it difficult to benchmark. Persistent islands are strangly quick. I needed to confirm the code was working just a few instances. Certainly it’s doing actual work as I’ve examined island merging and splitting in addition to sleeping and waking.

Take the maximums with a grain of salt. They may very well be as a result of something. I embrace them as a result of the persistent island has a heavier load when the our bodies are created and that is related to keep away from body charge dips.

Check DFS Persistent
pyramids 0.69 (0.98) 0.01 (0.45)
tumbler 0.43 (0.87) 0.08 (0.43)

Persistent islands are roughly an order of magnitude quicker than DFS.

What about splitting?

The present implementation makes use of DFS to separate at most one island per time step. My present heuristic is to separate the biggest island that has had a number of constraints eliminated.

Island splitting is not free however it’s normally straightforward to cover the fee. I’ve experimented with placing the island splitting on the finish of an island resolve activity. When there are numerous islands and a number of cores then this time is nearly utterly hidden and insignificant, amounting to a couple microseconds on a single core.

If there are a small variety of giant islands then the price of splitting might be vital. However, island splitting might be finished in parallel with different work and it is a vital achieve over the normal serial DFS/UF.

Abstract

I lined a number of materials on this put up and there are a number of references and auxilliary data. Simulation islands contact a number of methods, so it’s difficult to totally perceive islands with out understanding how a physics engine is put collectively. Hopefully I’ve supplied sufficient background data so you may perceive the scope of island mangement. If not, please let me know!

These are the primary takeaways I’ve realized or bolstered whereas engaged on islands:

  • island administration can turn into a bottleneck in multicore simulation
  • determinism is vital for sport physics
  • spatial and temporal coherence present many alternatives for enhancing efficiency
  • generally a quick serial algorithm could also be higher than a parallel algorithm
  • bit arrays are superior!

What’s subsequent?

On the finish of this journey I’ve many instruments to work with islands. This can assist me as I discover different facets of enhancing Box2D.

There’s extra work to do on Box2D and extra to jot down up. I plan to put up extra sooner or later on broad-phase enhancements and coping with giant islands. Keep tuned!

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top