RTS devlog #8: Methods to beat lag
It is time to cope with the bane of on-line multiplayer video games: lag. Beforehand Command & Construct simply displayed uncooked community updates – as quickly as a message arrived it could replace models accordingly. Nonetheless with my low-bandwidth design on a (simulated) actually horrible community, that finally ends up wanting like this:
In different phrases, unplayably terrible. The problem? Make the consumer present the sport as easily and precisely as attainable, even below community circumstances that unhealthy.
I have been coding professionally for over a decade now, and writing code to maintain the sport working easily with an unpredictable community was nonetheless very powerful to get proper. However I am right here to let you know the way it all works and present you the code I ended up with!
Simulating latency
On the transport layer, the Web is chaos. Messages might not arrive in any respect. They will arrive within the fallacious order. Each message can have a unique latency. Issues like dependable ordered supply are constructed on high of this in software program utilizing queues and retransmission to create order from the chaos.
There are three important measurements that have an effect on the community high quality:
- Latency: how lengthy a message takes to journey a technique (from consumer to server, or vice versa). Multiplayer avid gamers are most likely aware of ping time, however be aware which will check with the round-trip time, i.e. for a message to journey each to the vacation spot and for a response to be obtained again; right here I’m utilizing latency to check with the a technique time.
- Packet delay variation (aka PDV or jitter): how a lot variation there may be on the latency for various messages. For instance if one message takes 150ms and the subsequent takes 250ms, that suggests a PDV of 100ms.
- Packet loss: the proportion of messages that merely go lacking and by no means arrive. For instance if that is 5% then 5 out of each 100 messages won’t ever be delivered.
I’ve famous beforehand how the single-player mode makes use of the identical server structure as a multiplayer sport, as it’s easier and helps with testing. Subsequently I can check how issues work on a poor community by working single participant mode and simulating latency, PDV and packet loss, by artificially delaying (or dropping) messages. That is far simpler to develop with than really looking for a poor high quality community and repeatedly run actual multiplayer video games!
Assemble’s Multiplayer plugin already has a latency simulation function, however single participant video games do not use the multiplayer function, so I wrote some customized latency simulation code for testing. It is all in latencySimulation.js the place you’ll be able to set ENABLE_LATENCY_SIMULTATION
to true
to activate simulated latency for testing, and there are constants for the latency, PDV and packet loss to simulate.
Briefly, the packet loss is a random probability {that a} message is dropped, and the latency and PDV are used to create a synthetic delay each upon sending and receiving messages. Nonetheless as with a lot of multiplayer coding, there are subtleties to this that have to be taken care of:
- Dependable messages topic to packet loss should nonetheless arrive – I roughly guessed a 3x multiplier to the latency to simulate retransmission, representing random additional delays to some messages.
- Ordered messages should not turn into unordered on account of PDV – e.g. if one message takes 400ms to ship and the subsequent takes 200ms, the second will overtake the primary, however should not be handled as obtained till the prior message arrives to protect the ordering assure.
That sorted, I then simulated a extremely terrible community: a random 200-400ms latency (base latency of 200ms with added 200ms PDV), and 20% packet loss. This produced the appalling consequence proven within the earlier video. My pondering is nearly all real-world networks needs to be higher than this, so if I could make the sport work not less than OK-ish below these circumstances, it needs to be wonderful on most actual networks.
Community timelines
How can we remedy this? Let’s discuss in regards to the concept. The general precept is the next:
- Synchronise the time on each the consumer and server.
- The server timestamps each message it sends.
- The consumer can then retailer a timeline of updates, ordered by the timestamp.
- The consumer then follows the timeline, however with an additional added delay. This implies:
- The consumer has a brief quantity of “buffer time” for community occasions to reach earlier than it makes use of them.
- The consumer can interpolate between the earlier and the subsequent upcoming worth, so it may possibly easily apply adjustments.
This is a visualisation of the timeline for a single worth, comparable to a turret’s offset angle, to reveal the precept. If you wish to correctly perceive how this technique works, it’s nicely value spending some time inspecting this, because it’s the important thing to your complete system of dealing with lag.
Notice the next about how this technique works.
- The consumer has a great estimate of what the actual server time is. It then provides the latency, and provides a bit of additional delay, and shows the sport at the moment.
- The server timestamps all messages it sends with the present server time. Values obtained over the community (proven in blue) are positioned on the timeline in response to the server time within the message. This ensures they’re accurately ordered and are utilized on schedule no matter variation in community timings (comparable to PDV).
- Notice that PDV, or retransmission for misplaced messages, can nonetheless trigger messages to arrive late (behind the present time the consumer is displaying). Nonetheless the actual fact the consumer provides an additional delay on high of the latency signifies that average PDV simply means messages fall in to the additional buffer time, somewhat than arriving late.
- Within the above diagram maybe one of many updates is lacking, as there’s a little bit of a spot. This might be on account of packet loss or retransmission inflicting it to be too delayed for use. Nonetheless the consumer can see the earlier worth and the subsequent upcoming worth, so it may possibly interpolate between them. This can seemingly cowl up the actual fact a message went lacking.
It is also attention-grabbing to notice that it is probably not latency that messes up the consumer illustration of the sport. Even if in case you have a reasonably excessive latency, so long as there may be low packet loss and low PDV, then mainly every thing is completely dependable. The one draw back is the delay you see the sport on. That is vital for issues like first-person shooters the place response time is vital, however not a lot for video games like this one. What actually messes up the consumer illustration is excessive packet loss and excessive PDV. This implies messages preserve arriving late, behind the present time the consumer is displaying. Late messages imply the consumer both stops updating issues or guesses the place they ended up, after which later has to right it. If it is actually unhealthy then issues will begin leaping far and wide. So except for the delay brought on by latency, packet loss and PDV are extra vital elements of the connection high quality for a clean illustration on the consumer.
With my chosen simulated latency of 200-400ms and 20% packet loss, there will certainly be late messages. In order that makes certain I’ve to write down code to cope with it!
Implementing timelines
The implementation is pretty complicated, so I will simply level out the related sections of code. Additionally a lot of the code might look comparatively simple, however that is after numerous testing, rearrangement and rewriting – as I discussed this was powerful to get proper, and it is the kind of factor that is stuffed with nook instances and subtleties which can be simple to overlook.
Clock synchronization
The primary activity is to get the consumer to measure its estimated latency. This and different community timing associated duties are dealt with by PingManager.
Latency is measured by sending a “ping” message each 2 seconds. The server instantly sends a “pong” message again with the present server time. Then the consumer can work out:
- The estimated latency, primarily based on the time for the “pong” to come back again, however divided by 2 to make it the one-way time. (There’s not really any assure that the latency is identical in each instructions, however we have now to make a greatest guess.)
- The present server time, primarily based on the time within the “pong” message, plus the latency. That is used to calculate the distinction between the server time and the consumer time. In concept that may be a fixed worth, and it additionally means the consumer can calculate the server time at any prompt primarily based by itself time.
These measurements will all have some variation. So it retains the previous few latency values and averages them. It additionally smooths out adjustments within the calculated time distinction at a charge of 1% (10ms per second), so even when the measured time distinction varies, the smoothed time distinction ought to find yourself hovering round the actual time distinction with no sudden jumps. Equally the consumer delay (the latency plus the additional delay time) is smoothed out because the latency measurements can change over time, and we do not wish to see any jumps. This really means it runs the sport ever so barely in quick ahead or sluggish movement to compensate because the calculated delay adjustments!
This implies the consumer now has a good suggestion of the server time, latency, and the consumer delay, and none of them change all of a sudden. A very powerful time worth is known as the simulation time, which is the inexperienced arrow within the diagram above. That is the time the consumer needs to be representing the sport. It is calculated because the estimated present server time, minus latency, minus an additional delay (considerably arbitrarily set at 80ms in the meanwhile).
Worth timelines
There is a ValueTimeline base class that represents a sequence of timestamped updates very similar to within the diagram above. There are two derived lessons representing variants of the essential timeline:
- InterpolatedValueTimeline is ready to interpolate between values, comparable to easily rotating between turret offset angle updates from the community.
- SteppedValueTimeline is used for one-off occasions, comparable to community occasions like “projectile fired”, and in addition the place updates (since when you bear in mind, within the bandwidth design these are solely despatched each 2 seconds, so are extra like occasions than steady values). The sort of timeline does not interpolate, it simply returns the values when the consumer time reaches them.
An excellent instance of the utilization of timelines is the ClientTurret, which is an easy instance as in the meanwhile it solely has one interpolated worth for its offset angle. When a price is obtained from the community, it calls OnNetworkUpdateOffsetAngle()
and inserts the worth to the timeline. The consumer additionally “ticks” models each body with the present simulation time, and in Tick()
the turret seems up its present offset angle in its timeline. It additionally deletes previous entries so they do not accumulate and waste reminiscence.
Community occasions
One-off occasions like “projectile fired” are dealt with with a stepped timeline in GameClientMessageHandler. This mainly simply queues up obtained occasions till the simulation time catches up, at which level it then applies them.
Dealing with lateness
Dealing with late arrival of community occasions works out fairly properly. Late occasions might be simply calculated by seeing if the server time is behind the simulation on arrival, and the distinction additionally tells the consumer how late it’s. Then the consumer can simulate the occasion having occurred on the proper time prior to now!
This works nicely with the “projectile fired” occasion. Projectiles have solely predictable movement, shifting in a straight line at a set velocity. So if a “projectile fired” occasion arrives 400ms late, the consumer advances the projectile by the space it could have travelled in 400ms. The top result’s a barely late “projectile fired” occasion means the projectile simply all of a sudden materialises within the right place. After all if the message is actually late then the projectile could have already disappeared on the server. That is too unhealthy – the consumer won’t ever see it, however because the consumer’s show of projectiles is only beauty, it will not have any bearing on the precise gameplay.
Dealing with late occasions is not as simple for “projectile hit” and “unit destroyed” occasions: in each instances it is tough to cover the actual fact the occasion arrived late. One of the best the consumer can do is keep away from creating explosions if the occasion is admittedly late, in order to keep away from drawing the participant’s consideration to laggy occasions; as a substitute the consumer simply tries to quietly catch up.
Late place updates
Do not forget that in response to the bandwidth design, unit positions are solely despatched in “full” updates, rotating by way of all models each 2 seconds. At first I made purchasers ignore late place updates. Nonetheless with poor community circumstances, many place updates arrive late. Ignoring them means models drift considerably off their right place, generally for comparatively lengthy intervals of time, after which ultimately leap again to the proper place when a place replace arrives on time.
I made a decision late place updates must be used, in any other case the state of the sport will get too far off with a poor community. However how do you utilize a message that claims one thing like “this unit was at this place 400ms in the past”?
The answer was to preserve a historical past of the unit place on the consumer. Luckily we will use timelines for this as nicely, solely maintaining a listing of values prior to now. ClientPlatform retains its place over the previous 2 seconds. Then when a late place replace happens, it may possibly look within the historical past, see the place it was on the given server time, after which work out how far it is off that place. For instance it may possibly decide “400ms in the past I used to be at (100, 100), however the server stated I ought to have been at (80, 110), so I am at the moment off by (20, -10)”. The logic for that is in OnNetworkUpdatePosition().
Lastly, the correction can be smoothed. This additionally applies to updates that arrive on schedule – if the consumer finds out by any signifies that the unit is within the fallacious place, we do not need it to leap. As an alternative it saves its offset from the true place within the #xCorrection
and #yCorrection
properties, and applies the correction over time within the Tick()
technique. For small updates this simply strikes the unit at 20 pixels per second linearly. Usually the updates are small and sluggish sufficient to not be noticable. If it is method off although, it would transfer it exponentially at a 95% correction per second. This will help give the impression of the unit “catching up” as a substitute of simply teleporting elsewhere.
Lazy creation
One final network-related replace I made was somewhat than having the server sending all of the preliminary models on startup, the consumer simply creates models because it receives full updates for models it does not but find out about. To make it look higher, the models fade in when they’re first created, so that you see a form of cool impact as the entire stage fades in – a neat trick to cowl up the community syncing.
The consumer additionally occasions out models if they do not obtain any replace from the server for a number of seconds. The server ought to replace each unit each 2 seconds; if a unit goes 7 seconds with out an replace it assumes the “unit destroyed” occasion obtained misplaced, and removes the unit. Nonetheless with the lazy creation, if the server does ship a full replace once more, it would simply pop again in to existence and stick with it. This implies even a transient full outage in community transmission – e.g. going by way of a tunnel whereas on a prepare – will ultimately permit the sport to re-sync and stick with it. (I believe that is additionally inconceivable with the normal “lockstep” strategy, in order that’s a pleasant benefit!)
Outcomes
There’s now an honest framework for coping with lag, interpolating between values, and dealing with late updates. This is how the earlier instance now seems with precisely the identical simulated latency – 200-400ms random delays with 20% packet loss.
Examine to the earlier video – it is much better. Discover the top-right unit lags behind – it seems just like the message with its velocity replace arrived late, however the consumer shortly finds the error, after which exponential correction makes it visually meet up with its right place. After that it really works easily.
The sport ought to nonetheless be playable below these circumstances! Even with an appalling simulated connection that loses 1 in 5 messages and has giant unpredictable random delays, it is comparatively clean. The models react on a brief delay and sometimes lag behind a bit, however general it really works.
I attempted this on a few real-world Web connections and it really works very nicely. I believe I am proper that the majority real-world connections are higher than what I simulated. Even my telephone working over cell knowledge will get about 100ms latency with solely 30-40ms PDV, which is low sufficient that messages are hardly ever late, and so every thing seems fairly good with hardly any noticable lag. Enjoying from the workplace with a house PC over broadband had simply 30ms latency.
Conclusion
I am more than happy with this consequence! It was powerful to get proper however now there is a good system for making one of the best of even actually unhealthy web connections, and with an honest high quality connection it really works nice. I believe this shall be OK for most individuals, and doubtless solely battle if in case you have a extremely poor wi-fi sign, or when you attempt to play with somebody on the other aspect of the globe. (Though I might have an interest to listen to the way it works out when you do!)
It is wanting like a strong basis: it may possibly deal with 1000 models with low bandwidth, low CPU utilization, and clean issues over on poor connections. It is most likely time to lastly cease models having the ability to drive over one another! Then there’s issues like surroundings and pathfinding, and making an attempt to cease visitors jams. The gang management side shall be powerful as nicely. However hey, I signed up for a problem!
As ever you’ll be able to strive it out now at CommandAndConstruct.com, though I’ve seen an issue with random disconnects generally – oops! I will attempt to determine that out quickly. All of the code is on GitHub so you’ll be able to dig in to the code and see all the main points – as ever I’ve tried to ensure there are many clear and detailed feedback.