Now Reading
Why Cities: Skylines 2 performs poorly

Why Cities: Skylines 2 performs poorly

2023-11-05 11:54:07

The enamel will not be the one drawback


Desk of contents
  1. (This is not) a performance review

  2. Pulling back the curtain

  3. Renderdoc analysis

  4. Summary and conclusions

One of the vital extremely anticipated PC video games of the yr Cities: Skylines 2 was launched final week to a mixed reception. My impression is that gameplay and simulation-wise it appears to be a step in the proper route, and at the very least on paper the sport appears extra well-rounded when it comes to options than the unique was at launch. There are nonetheless vital points with the sport, starting from steadiness issues and questionable design decisions to bugs rendering a variety of the sport’s financial simulation nearly pointless. Whether or not or not the sport is a worthy successor to the unique is an open query at this level, however one factor is sort of universally agreed upon: the title’s efficiency is lower than par.



(This isn’t) a efficiency evaluate

There have been warning indicators. Below a month earlier than launch date there was an announcement informing that the sport’s beneficial system necessities had been raised, and the console launch was delayed to 2024. Fairly just a few YouTubers and streamers obtained early entry to the sport, however they have been explicitly forbidden to speak about efficiency till the common evaluate embargo was lifted. This wasn’t distinctive as video games are likely to obtain frequent efficiency optimizations and different fixes in the previous couple of weeks earlier than launch, nevertheless it wasn’t signal both. Then, only a week earlier than launch Colossal Order issued a statement which I’d describe as a pre-emptive apology for the poor efficiency of the sport. After which the sport was launched.

Simulation-heavy video games like metropolis builders will be surprisingly laborious to run at framerate, however what makes Cities: Skylines 2 stand out is that on most programs and in most conditions the sport is GPU-bound — quite uncommon for a recreation of this style, as most video games like this are usually actually heavy on the CPU (reminiscent of the unique Cities: Skylines), however comparatively mild on graphics. Visually the sport is an enchancment on most facets in comparison with the 2015 authentic, however nothing actually justifies the sport being harder to run than maxed out Cyberpunk 2077 with path tracing enabled. Personally I’d even argue that C:S2 seems to be fairly disagreeable; whereas particular person fashions are comparatively detailed and the sense of scale is spectacular, the shading is decidedly last-gen and the display is totally lined in rendering artifacts and poorly filtered textures. Evaluating the sport’s graphics to a comparatively shut competitor Anno 1800 (which was launched in 2019) doesn’t do it any favours. Anno goes for a barely extra stylized look and in my humble opinion manages to look extra polished and constant, whereas offering decent performance even on {hardware} that was thought of low-to-mid vary in 2019.

A screenshot of a character in Cities: Skylines 2
There’s lots happening this with this character: oddly clean hair, weirdly chunky beard, suspicious moustache, 100% opaque drug vendor sun shades, a seam between the top and the remainder of the physique, and painted-on garments.

I’m not going to waste my time meticulously benchmarking the sport as a result of many others have achieved that already and to a considerably larger customary than I may. If you’re desirous about that, try this written article from PC Video games {Hardware} (in German) or this video from Avid gamers Nexus (in Americanese). I’ll summarize the outcomes as such: when the settings are turned above the naked minimal (the “very low” graphics preset fully disables decadent luxuries reminiscent of shadows and fog) you want a graphics card that prices about 1000 to 2000 euros to run on the recreation at 60 frames per second at 1080p. As a comparability related {hardware} in Alan Wake 2 — which was launched the identical week as C:S2 and is taken into account by some to be one of the best wanting recreation of this console technology — reaches comparable average framerates with all settings cranked together with path tracing, both at 1440p with none upscaling magic or at 4K with some assist from DLSS. I feel that’s illustration of how bizarrely demanding C:S2 is.

A screenshot of a CPU usage graph and utilization percentages for different aspects of hardware.
The sport ran so poorly that Home windows Sport Bar refused to acknowledge there even was a framerate.

Some private experiences to cap off this introduction: once I began the sport for the primary time on my comparatively beefy gaming PC (geared up with an NVidia RTX 3080 graphics card, an AMD Ryzen 7 5800X CPU and a 5120×1440 tremendous ultrawide monitor) I used to be greeted by a body price of underneath 10 FPS in the principle menu. After tweaking the settings as instructed by the developer (which concerned disabling depth of subject, movement blur and volumetric results) my FPS reached nearly 90. What made this particularly weird was that the menu options only a static background picture and some buttons. Loading into an empty map gave me about 30 to 40 FPS, and the body price stayed round that degree after enjoying for about an hour, with the occasional stutter. Let’s examine.

A screenshot of a Cities: Skylines 2’s main menu.
100% GPU utilization and seven frames per second, everybody. Cropped from 32:9.



Pulling again the curtain

Cities: Skylines 2 like its predecessor is made in Unity, which implies the sport will be decompiled and inspected fairly simply utilizing any .NET decompiler. I used JetBrains dotPeek which has a good Visible Studio -like UI with a big number of search and evaluation choices. Nonetheless static evaluation doesn’t actually inform us something concrete in regards to the rendering efficiency of the sport. To research what’s going with rendering I used Renderdoc, an open supply graphics debugger which has saved my bacon with some of my previous GPU-y personal projects.



Engine and structure

Let’s undergo among the technical fundamentals of the sport. Cities: Skylines 2 makes use of Unity 2022.3.7, which is only a few months previous on the time of writing. Essentially the most notable facet of Unity 2022 was the stabilization of the DOTS set of applied sciences which Unity has been engaged on for a number of years, and apparently C:S2 appears to be largely constructed on high of these things, together with the newfangled Entity Part System (ECS) implementation and Burst compiler. I’ve been desirous about the ECS architecture for just a few years and have experimented with implementations reminiscent of Specs, Legion and most lately Bevy, and if it weren’t for the numerous points on this recreation I’d quite be writing about how ECS is mainly the perfect structure for a recreation like this. Cities: Skylines 2 appears to make use of DOTS to nice impact as the sport makes use of a number of CPU cores far more effectively than its predecessor. Sadly a variety of the graphics-related points are not directly brought on by the sport’s use of DOTS. I’ll broaden on that later.

A screenshot from dotPeek showing different ECS-related types from the game.
Judging by the code there are about 1200 completely different programs powering virtually all the recreation logic.

The sport additionally makes use of a bunch of third get together middleware and a few customized / forked libraries. In contrast to DOTS, Unity’s UI Toolkit is seemingly nonetheless not prepared for manufacturing as C:S2 makes use of HTML, CSS and JavaScript based mostly Coherent Gameface (what a reputation!) for its consumer interfaces. A short look on the JS bundle reveals that they’re utilizing React and bundling utilizing Webpack. Whereas that is one thing that’s assured to make the typical native growth purist yell at clouds and complain that the darned youngsters ought to get off their garden, I feel at the very least on paper this can make the sport’s UIs considerably simpler to keep up and modify than earlier than. Different notable bundled libraries embrace InstaLOD, Odin Serializer, and a DLL file for NVidia DLSS 3, though the expertise just isn’t at present supported by the sport.

For graphics rendering the sport makes use of Direct3D 11 and Unity’s High Definition Rendering Pipeline, also called HDRP. Unity’s common rendering system solely works with conventional MonoBehaviour-based recreation objects, so a recreation constructed utilizing DOTS & ECS wants one thing to bridge the hole. Unity has a bundle referred to as Entities Graphics, however surprisingly Cities: Skylines 2 doesn’t appear to make use of that. The explanation is perhaps its relative immaturity and its restricted set of supported rendering options; in accordance with the feature matrix each skinning (used for animated fashions like characters) and occlusion culling (not rendering issues which are behind different issues) are marked as experimental, and digital texturing (making GPU texture dealing with extra advanced however hopefully extra environment friendly) just isn’t supported in any respect. As an alternative evidently Colossal Order determined to implement the glue between the ECS and the renderer by themselves, using BatchRendererGroup and a variety of comparatively low degree code. I’ll cowl this and its many implications in additional element later.



Attachment points

Getting Renderdoc connected to a course of and accumulating rendering occasions is normally fairly trivial. Usually you simply want to offer Renderdoc the trail of the executable, the working listing and a few command line arguments, after which Renderdoc begins the binary and injects itself to the sport course of. Nonetheless, my subject was that I had the sport on Xbox Sport Move, which does some bizarre sandboxing and / or NTFS possession magic to restrict what you are able to do with the sport recordsdata. Renderdoc was not allowed to learn the sport’s executable, even when working as administrator. Earlier than I knew Sport Move was the issue I additionally tried to make use of NVidia Nsight Graphics™️ as an alternative (a software just like Renderdoc from NVidia), nevertheless it had the identical subject. Finally I ended up fixing that specific drawback with my bank card: I purchased the sport once more on Steam at full value, regardless of understanding it had extreme points. Sorry.

Nonetheless, the Steam model didn’t instantly begin co-operating both. This time the issue was Paradox Launcher, a little bit piece of bloatware utilized in most huge finances Paradox-published titles. The launcher binary can be included within the Sport Move model, however at the very least on launch it gave the impression to be fully unused. Principally whenever you begin C:S2 from Steam it pops up Paradox Launcher, you click on both Resume or Play, after which it truly runs the sport binary. I attempted to connect Renderdoc by working Cities2.exe straight however that didn’t work — it creates the sport window, however then runs for just a few seconds, opens the launcher after which exits. There may be an possibility in Renderdoc referred to as “Seize baby processes” which ought to in idea make Renderdoc inject itself to all processes began by the goal course of — so it ought to connect itself to the launcher began by the sport binary, after which get injected to the sport binary once more — however I feel there was some additional layer of indirection which sadly prevented that from working. I configured Renderdoc to start out Paradox Launcher straight, however in brief that didn’t work both, as Steam and the launcher do some communication to pick out which recreation to start out and to deal with authentication / DRM thingies. A few of that communication occurs by means of command line arguments which I used to be in a position to extract utilizing Process Explorer, however reusing the identical arguments didn’t work both, so I gave up on that strategy as effectively.

Finally I used to be in a position to lastly connect Renderdoc by utilizing the Global Process Hook possibility, which this system hides by default and advices towards utilizing. It’s a very invasive technique of hooking because it injects a DLL to each single course of that’s began on the system, however hey, it labored! We are able to lastly see what’s happening.

This place is a message… and a part of a system of messages… take note of it!
Sending this message was vital to us. We thought of ourselves to be a robust tradition.
This place just isn’t a spot of honor… no extremely esteemed deed is commemorated right here… nothing valued is right here.

— Renderdoc whenever you attempt to allow world course of hooking.

I used to be later in a position to get NVidia Nsight Graphics™️ working as effectively. As an alternative of attempting to start out the sport or the launcher, I opened Steam from Nsight after which began the sport from Steam’s UI as ordinary. Finally I wasn’t in a position to get far more data out of Nsight than I already had from Renderdoc, as evidently lots of NSight’s profiling and efficiency targeted options will not be supported with D3D11.



Renderdoc evaluation

I’ll preface this part by admitting that I’m not knowledgeable graphics programmer nor even a very proficient hobbyist. I do graphics programming sometimes and have spent various time toying with recreation engines, however I’m not an skilled on both topic. I’ve by no means carried out Precise Correct Graphics Issues like deferred rendering or cascaded shadow mapping, although I feel I understand how they need to work in idea. In order troublesome it is perhaps to consider, there’s a likelihood that I’m mistaken about among the issues I’m about to say. For those who assume I’m mistaken, please let me know!

Let’s start by analyzing the next body (click on to open it as a brand new tab):

A screenshot from Cities: Skylines 2 at night

It is a decently advanced body, nevertheless it’s removed from the size the sport can truly attain. This was captured in a city of about 1000 inhabitants underneath an hour into a brand new save. There’s rain and it’s night time time, however in my expertise neither strikes the needle a lot when it comes to efficiency. The sport model was 1.0.11f1, so the primary post-release hotfix is included. It needs to be famous that newest patch on the time of publication (1.0.12f1) was launched throughout the making of this text and it contains some enhancements for the problems I’m about to explain, nevertheless it’s removed from having solved all of them.

Renderdoc studies that the body took about 87.8 milliseconds to render, which might common to about 11.4 FPS. The sport was working at 30-40 FPS on common on the time, so both this body is an outlier (which — as we’ve discovered from the Gamers Nexus video — is mockingly fairly frequent) or maybe extra possible Renderdoc provides a little bit of overhead in a approach that impacts the measurements, as all the frames I’ve captured have reported barely larger body occasions than what I’ve seen in-game when enjoying usually. I’m making the idea that even when Renderdoc does add some overhead, it provides it in a approach that doesn’t fully invalidate the measurements, like making particular API calls take 10x longer than they usually would.

For reference, at constant 60 FPS the frametime ought to all the time be about 1000 / 60 = 16.666... milliseconds.

Listed here are some fundamental rendering statistics reported by Renderdoc:

Draw calls: 6705
Dispatch calls: 191
API calls: 53361
Index/vertex bind calls: 8724
Fixed bind calls: 25006
Sampler bind calls: 563
Useful resource bind calls: 13451
Shader set calls: 1252
Mix set calls: 330
Depth/stencil set calls: 301
Rasterization set calls: 576
Useful resource replace calls: 1679
Output set calls: 739
API:Draw/Dispatch name ratio: 7.73796

342 Textures - 3926.25 MB (3924.10 MB over 32x32), 180 RTs - 2327.51 MB.
Avg. tex dimension: 1611.08x2212.36 (2133.47x2984.88 over 32x32)
4144 Buffers - 446.59 MB whole 6.48 MB IBs 43.35 MB VBs.
6700.34 MB - Grand whole GPU buffer + texture load.

There’s not a lot we will deduce from these figures alone. 6705 draw calls and over 50000 API calls each sound like lots, however with out additional context their price is tough to judge. 6.7 gigabytes of used video reminiscence is lots for a comparatively easy scene like this, particularly contemplating there are nonetheless present technology mid-tier graphics playing cards with solely 8 gigabytes of VRAM.

A screenshot from Renderdoc, showing API calls grouped by rendering pass.

For the reason that recreation makes use of HDRP its documentation may function place to begin for understanding the completely different rendering and compute passes the sport carry out on every body. I’m not going to do a flowery graphics research like these legendary ones for DOOM 2016 and GTA V, however I’ll undergo a lot of the rendering course of step-by-step and spotlight among the extra fascinating issues alongside the best way.



DOTS occasion information replace

Virtually each draw name the sport makes makes use of instancing, which is required in a recreation of this scale. To make instancing work the sport has a single massive buffer of occasion information which accommodates all the pieces mandatory for rendering any and all objects. The contents and dimension of per-instance information varies by the kind of entity however evidently common recreation objects like buildings take about 50 floats per occasion, roads considerably extra. I haven’t totally found out how the buffer is managed as a result of it’s a really advanced system, however basically the occasion information for each seen object is up to date to the buffer every body, and the adjustments are then uploaded to the GPU. The buffer begins at about 60 megabytes, and is reallocated to a bigger dimension when mandatory.

The buffer is used for virtually each draw name the sport makes, and in accordance with Renderdoc it’s at the very least accessible in each vertex and pixel shader, although I might assume it’s primarily solely utilized in vertex shaders. It could be fascinating to understand how this buffer impacts GPU’s cache as I might assume cases will not be specified by the buffer in the identical order they’re rendered and that could possibly be an issue for caching, however I lack the experience to determine that out. Regardless there’s a sure price related to wanting up information from this buffer for each vertex, and it’d clarify among the points concerning excessive poly meshes I’ll get to quickly.



Simulation

A number of compute shaders are used for graphics-related simulations, reminiscent of water, snow and particles, in addition to skeletal animation. These take about 1.5 milliseconds in whole, which is underneath 2% of body time.

One early idea concerning the sport’s poor efficiency was that possibly it was offloading a variety of the precise recreation simulation to the GPU, saving CPU time however taking processing energy away from rendering. Nonetheless, I can conclude based mostly on each decompiled code and GPU calls that that is merely not the case.



Digital texturing cache replace

Keep in mind how I discussed that digital texturing just isn’t supported by Entities Graphics? Properly, evidently C:S2 implements its personal digital texturing / texture streaming system. I first assumed that the sport is utilizing Unity’s built-in solution for that, however in conventional Unity trend though it was added to the engine in 2020 following an acquisition it stays as experimental and unsupported as ever (if not more so).

What’s digital texturing, anyway? My understanding is that digital texturing is an strategy for loading and managing texture information in a doubtlessly extra reminiscence environment friendly approach than the méthode traditionnelle of utilizing one GPU texture per texture asset. Textures are saved in texture atlases, that are mainly fancier variations of sprite sheets (which I additionally occurred to cowl in my GPU tile mapping article). Atlases encompass tiles of a hard and fast dimension, and every tile can comprise a number of textures. The trick which may save reminiscence is that enormous textures will be break up into a number of tiles, so you probably have a big texture that’s solely seen in a small portion of the display, you solely have to load the tiles which are truly seen. Digital texture visibility data is produced as a facet product of regular rendering in a later cross, and the visibility data is used on the CPU facet to find out which tiles must be loaded and which will be unloaded. If you wish to know extra, Unreal Engine’s documentation appears to supply an incredible description of the method in additional element. The sport appears to make use of digital texturing for all static 3D objects besides the terrain.

A screenshot from Renderdoc, showing a virtual texture tile atlas.
One of many tile atlases used within the rendering of my instance body. Massively downscaled; the unique is 16368×8448.

This strategy to texturing is sort of elegant in idea nevertheless it comes with many tradeoffs, and the sport’s implementation nonetheless has some teething points, reminiscent of excessive decision textures typically failing to load even when the floor is near the digital camera. Using digital texturing can be possible the perpetrator for the sport’s lack of help for anisotropic texture filtering, an ordinary function in PC video games for the reason that starting of the millennium.

The cross took about 0.5 milliseconds.



Skybox technology

The sport makes use of Unity HDRP’s built-in sky system, so it generates a skybox texture (a cubemap) each body. This takes about 0.65 milliseconds which isn’t lots in comparison with all the pieces else, but when the sport was focusing on 60 FPS it could be nearly 4% of the whole body time finances.



Pre-pass

A screenshot from Renderdoc, showing the normal/roughness buffer of my example frame.

Now we get to the precise rendering. C:S2 makes use of deferred rendering, which mainly implies that rendering is finished in lots of phases and utilizing a number of completely different intermediate render targets. The primary part is the pre-pass, which produces per-pixel depth, regular and (presumably) smoothness data into two separate textures.

This cross is surprisingly heavy because it takes about 8.2 milliseconds, or roughly about far too lengthy, and that is the place among the greatest points with the sport’s rendering begin to seem. However first we have to speak about THE TEETH.



The enamel controversy

A screenshot from Renderdoc, showing a simply shaded mesh of a character’s mouth.

One weird but well-liked speaking level about Cities: Skylines 2’s efficiency is the truth that the character fashions have totally modelled enamel, though there’s actually no option to see them in-game, until we depend utilizing the photograph mode and clipping the digital camera inside a personality’s head. Reddit consumer Hexcoder0 did some digging utilizing NVidia Nsight Graphics™️ and posted their findings in to a thread within the official subreddit (which impressed me to do my very own analysis and write this pointlessly lengthy article). It was revealed that not solely does the sport have totally modelled enamel, they’re rendered actually on a regular basis at most high quality. Extra importantly that is the case for all the pieces associated to characters: not one of the character meshes have any LOD variants. Colossal Order was quick to acknowledge this publicly, and so they even referenced broader issues with LOD dealing with. Ignore all of the bizarre rambling about simulating residents’s enamel and whatnot; this isn’t Dwarf Fortress so they aren’t doing that, and even when they have been that clearly wouldn’t require rendering the enamel.

Colossal Order has additionally told us that that they’re utilizing a middleware referred to as Didimo Popul8 to generate the character fashions. If I recall accurately the enamel controversy started even earlier than the sport was launched when somebody observed that the Didimo character specification contains separate meshes for issues like enamel and eyelashes. I had initially assumed that the sport is utilizing Didimo’s default character meshes — as a result of to be sincere they appear very generic and soulless — however now I’m not so positive. The meshes within the recreation in truth have much more polygons than Didimo’s defaults: the notorious mouth / enamel mannequin for instance consists of 6108 vertices, considerably greater than the default mesh’s 1060. A single character even earlier than we add hair, clothes and niknaks is about 56 thousand vertices, which is lots. For context the typical low-density residential constructing makes use of lower than 10 thousand vertices earlier than yard props and different particulars are added.

On this instance body the sport renders 13 units of enamel, and their visible affect on the body is zero: not a single pixel is affected. Even the characters themselves contribute mainly nothing to the body apart from noise and artifacts.

A screenshot from Renderdoc highlighting the impact a single charater has on the final image.
At this distance and rendering decision a person character (the purple Tetris block) impacts a literal handful of pixels. Scaled 4x to make the person pixels seen.



Pre-pass continued, that includes the excessive poly corridor of disgrace

The egregiously unoptimized character fashions will not be the only real explanation for the sport’s poor efficiency (as a result of it’s by no means that straightforward), however they’re an indicator of the broader points with the sport’s belongings and rendering. The sport commonly attracts too many objects with too many polygons which have fairly actually zero affect on the ultimate picture. This isn’t particular to the pre-pass, as the identical points appear to have an effect on all rendering passes which rasterize geometry. I feel there are two most important causes for this:

  1. Some fashions don’t have any LOD variants in any respect.
  2. The sport’s culling system just isn’t very superior; the customized rendering code solely implements frustum culling and there’s no signal of occlusion culling in any respect. There may be some culling based mostly on distance nevertheless it’s not very aggressive, which is nice for avoiding pop-in however unhealthy for efficiency.

Listed here are just a few different examples moreover the character fashions.

The pixels affected by a pallet of gas tanks
The gas tank pallet mesh
This extremely detailed pallet of gasoline tanks consists of over 17K vertices.
These densely packed clotheslines are comprised of 25K vertices per piece and have dozens of individually modelled clothespins, or laundry boys as we name them in Finland. There’s additionally an much more dense variant that includes over 30K vertices.
This parking sales space mesh is technically not utilized in my instance body’s pre-pass, however it’s nonetheless current within the scene and later used within the shadow mapping cross. This mesh consists of over 40K vertices with no LODs, and options luxurious particulars you don’t even get in most AAA video games, like individually modelled cables connecting screens and keyboards. They’re even routed by means of a (comparatively spherical) gap within the desk! Combining the constructing and the furnishings into one mesh saves on draw calls, nevertheless it additionally implies that the props can’t be culled individually.
A pile of logs
This mesh of a pile of logs is equally solely used within the shadow rendering cross, and options over 100K vertices. It’s the highest poly mannequin within the recreation I’ve encountered to date, although I haven’t performed for various hours.

Now you may say that these are simply cherry-picked examples, and that trendy {hardware} handles fashions like these simply high quality. And you’d be broadly appropriate in that, however the issue is that each one of those comparatively small prices begin to add up, particularly in a metropolis builder the place one unoptimized mannequin may get rendered just a few hundred occasions in a single body. Rasterizing tens of 1000’s of polygons per occasion per body and actually not affecting a single pixel is simply wasteful, whether or not or not the {hardware} can deal with it. The problems are fortunately fairly straightforward to repair, each by creating extra LOD variants and by bettering the culling system. It’s going to take a while although, and it stays to be seen if CO and Paradox wish to make investments that point, particularly if it entails going by means of a lot of the recreation’s belongings and fixing them one after the other.

To be clear having extremely detailed fashions just isn’t an issue in itself, particularly in case you are aspiring to make a self-proclaimed subsequent technology metropolis builder. The issue is that the sport is struggling to deal with this degree of element, and that polygons are used inefficiently and inconsistently. For each character mannequin with opulently modelled nostril hairs there are frequent props with surprisingly low polycounts. I feel if the sport ran effectively individuals could be celebrating these extremely detailed fashions and making hyperbolic social media posts and clickbait movies titled “OMG the devs considered EVERYTHING 🤯🤯🤯” and “I can’t consider they modelled the cables within the parking sales space 😱😱😱” and “CITY SKYLINES 2 MOST DETAILED GAME EVER CONFIRMED?”. As an alternative we’re right here.

Oh yeah I used to be speaking about rendering in some unspecified time in the future, wasn’t I? Let’s proceed.



Movement vectors

The sport renders per-pixel movement vectors as a separate cross, which can be utilized for anti-aliasing and movement blur. I feel movement vectors are barely damaged now, which can be the explanation the sport doesn’t help DLSS or FSR2 on the time of writing. There’s a temporal anti-aliasing possibility hidden within the superior settings menu and it improves the rendering high quality to some extent, however issues animated utilizing vertex shaders like timber are simply lined in artifacts and ghosting.

This cross takes about 0.6 milliseconds.



Roads and decals

The decal buffer of my example frame.

A closeup of the decal buffer.

Now we’re lastly rendering one thing recognizable: roads! And lawns, and different issues that comply with the floor of the terrain.

This cross takes about 1 millisecond.



Primary cross

The albedo buffer of the main pass.

A closeup of the albedo buffer.

The normal buffer of the main pass.

That is the meat (vegan options can be found) of the deferred rendering course of. This cross takes in all of the intermediate render targets produced to date alongside the digital texture caches and a few seemingly hardcoded textures and produces a number of extra buffers, together with ones for albedo, normals, completely different PBR properties and depth. It additionally produces the digital texture visibility data I discussed earlier. It’s rendered at half horizontal decision, presumably as an optimization. Terrain doesn’t use digital texturing, so it’s rendered at full decision and with a continuing colour whatever the precise terrain texture.

The virtual texture visibility buffer.

This cross takes 16.7 milliseconds, or about as lengthy all the body ought to take if we have been aiming for 60 frames per second. The cross rasterizes all the geometry once more, so the identical causes for the pre-pass being gradual apply right here as effectively. The extra price might be defined by the variety of extra outputs, plus the price of digital texture cache lookups and texture mapping itself.

See Also



Ambient occlusion

Subsequent the sport produces an ambient occlusion buffer utilizing movement vectors, normals and the depth buffer plus copies of the final two from the earlier body. Judging by the debug names of the shaders the algorithm is GTAO. This takes about 1.6 milliseconds.



Cascaded shadow mapping

The shadow map buffer.

C:S2 makes use of cascaded shadow mapping, and in my view not very effectively. Shadows are stuffed with artifacts and continuously flickering particularly when both the solar or any foliage are shifting (and they’re, all the time). Even when the display isn’t fully lined with artifacts, the decision of the shadows is sort of low, and the leap in high quality between the completely different shadow cascades could be very noticeable.

The sport makes use of 4 cascades with a decision of 2048×2048 pixels per cascade. There’s a directional shadow map decision setting within the superior graphics settings menu, however on the time of writing it’s not linked to something within the code; neither the person setting nor the general shadow high quality setting alters the decision of the shadow map. That is the explanation why the medium and the excessive shadow setting presets are actually an identical. I don’t know whether or not that is an oversight or if the setting was swiftly disabled as a result of it was inflicting points. The low preset differs from medium and excessive in that it disables shadows forged by the terrain.

Regardless of the low high quality, it’s by far the slowest rendering cross, taking about 40 milliseconds or nearly half of whole frametime. It additionally dwarfs all different passes when it comes to the variety of draw calls: in my take a look at body 4828 out of 6705 draw calls have been for shadow mapping, a staggering 72%. This is the reason there’s such an enormous efficiency acquire when shadows are disabled.

The explanations behind this cross’s slowness are principally the identical as with the pre-pass and the principle cross: an excessive amount of pointless geometry rendered with approach too many draw calls. Renderdoc’s efficiency counters view signifies that most of the draw calls have an effect on between zero and underneath 100 pixels within the shadow map, and the enamel are again once more. The sport appears to deal with each single 3D object as a possible shadow caster on all high quality settings no matter dimension or distance. There’s a variety of room for optimization right here, and in idea basic enhancements to LODs and culling ought to have a big affect on shadow mapping efficiency as effectively. Hopefully after efficiency has been improved CO (or modders) can flip the shadow high quality setting again up, and lift the shadow map decision to one thing extra 2023.

Let’s finish this half on a optimistic facet word: when digging into shadow dealing with code I stumbled upon the truth that the sport computes the positions of the solar and the moon utilizing the present date, time and coordinates of town. That’s a very neat element!



Display area reflections and world illumination

The screen space reflection buffer.

The screen space global illumination buffer.

The sport makes use of Unity HDRP’s built-in implementations of screen space reflections (SSR) and screen space global illumination (SSGI). I received’t be overlaying them intimately as a result of Unity’s documentation is already decently complete, plus I’m not going to fake I totally perceive them. World illumination makes use of ray-marching and is evaluated by default at half decision. Denoising and temporal accumulation are used to enhance the standard. It could be good if the sport as a self-proclaimed subsequent technology metropolis builder supported {hardware} accelerated ray tracing along with these display area options, however I’m not holding my breath.

These two results mixed took about 3 milliseconds.



Deferred lighting

The results of deferred lighting

That is the place all of it comes collectively. A lot of the intermediate buffers produced to date are mixed to render the near-final picture. Not far more to say about this cross, besides that it takes about 2.1 milliseconds.



Bizarre clothes cross

There’s a small rendering cross only for the garments of the Didimo characters, on this case 3 attire, 1 jumpsuit and 1 set of swimsuit trousers. The remaining 8 characters are both bare or their garments use completely different shaders. This cross impacts nearly no pixels at this zoom degree. Fortunately it takes simply 0.2 milliseconds.



Sky rendering

The sky is rendered subsequent utilizing the beforehand generated skybox texture, although it’s not seen in my instance body. This cross takes about 0.3 milliseconds.



Clear objects pre-pass

Conventional deferred rendering doesn’t work with clear objects, so they’re rendered individually. Clear objects are rendered in two phases, beginning with this pre-pass which solely updates the traditional and depth buffers. There will not be many distinctive clear objects within the body, so this cross takes about 0.12 milliseconds.



Water rendering

The sport does some pre-processing in compute shaders to organize for water rendering after which produces a number of downscaled and blurred variations of the almost-final picture. These inputs are fed to the principle water rendering shader which renders the water floor. This takes about 1 millisecond.



Particles, rain and clear objects

This cross handles most issues clear, together with particles, climate results and 3D objects made from glass and different clear supplies. No particles are seen within the body, however the recreation nonetheless tries to render the smoke from the commercial zone’s chimneys, in addition to the stream of goop produced by the sewage pipe. Rain is rendered subsequent, utilizing 20 cases of 12K vertices every. Curiously the remaining clear objects are rendered after the rain, inflicting some weirdness when clear objects (like greenhouses and energy strains) and rain overlap. All of this takes about 0.56 milliseconds.



VT suggestions processing

The virtual texture feedback buffer.

The digital texture visibility buffer we obtained earlier is processed with a compute shader, leading to an output texture 1/sixteenth of the unique decision. For the visualization I nearest neighbor scaled the output 8x to make it extra readable. That is the knowledge the sport finally will get again from the GPU to resolve which texture tiles to load and unload. Renderdoc reported little or no time spent on this, effectively underneath 0.1 milliseconds.



Bunch of post-processing

The sport makes use of lots of Unity’s built-in post-processing results, together with temporal AA (which is a bit damaged as I beforehand talked about), bloom and tonemapping, plus DOF and movement blur if enabled. I can’t be bothered to sum up the timings of all of those, nevertheless it’s about 1 to 2 milliseconds in whole.



Outlines, textual content and different UI

Road names are rendered using SDF

The final remaining draw calls are used to render all the completely different UI components, each those which are drawn into the world in addition to the extra conventional UI components like the underside bar and different controls. Various draw calls are used for the Gameface-powered UI components, although finally these calls are very quick in comparison with the remainder of the rendering course of. The names of roads are rendered into the scene utilizing 2D signed distance fields. The depth buffer is used to mix the textual content with the scene if the textual content is behind a constructing or different object, which is a pleasant contact. This remaining cross takes an irrelevant period of time.

And we’re achieved!

A screenshot from Cities: Skylines 2 at night

I attempted to not make this into an in-depth graphics research, however I feel I failed. Hope you discovered one thing new.



Abstract and conclusions

So why is Cities: Skylines 2 so extremely heavy on the GPU? The quick reply is that the sport is throwing a lot pointless geometry on the graphics card that the sport manages to be largely restricted by the accessible rasterization efficiency. The trigger for pointless geometry is each the shortage of simplified LOD variants for most of the recreation’s meshes, in addition to the simplistic and seemingly untuned culling implementation. And the explanation why the sport has its personal culling implementation as an alternative of utilizing Unity’s inbuilt resolution (which ought to at the very least in idea be far more superior) is as a result of Colossal Order needed to implement various the graphics facet themselves as a result of Unity’s integration between DOTS and HDRP continues to be very a lot a piece in progress and arguably unsuitable for many precise video games. Equally Unity’s digital texturing resolution stays eternally in beta, so CO needed to implement their very own resolution for that too, which nonetheless has some teething points.

Right here’s what I feel that occurred (a.ok.a that is hypothesis): Colossal Order took a big gamble on Unity’s new and glossy tech, and in some methods it paid off massively and in others it precipitated them a variety of headache. This isn’t a uncommon state of affairs in software program growth and is one thing I’ve skilled myself as effectively in my dayjob as a web-leaning developer. They selected DOTS because the structure to repair the CPU bottlenecks their earlier recreation suffered from and to extend the size & depth of the simulation, and largely succeeded on that entrance. CO began the sport when DOTS was nonetheless experimental, and it in all probability got here as a shock how a lot they needed to implement themselves even when DOTS was formally thought of manufacturing prepared. I wouldn’t be stunned in the event that they began the sport with Entities Graphics however then needed to pivot to customized options for culling, skeletal animation, texture streaming and so forth once they realized Unity’s official resolution was not going to chop it. Finally the sport needed to be launched too early when these programs have been nonetheless unpolished, possible because of monetary and / or writer strain. None of those technical points have been information for the builders on launch day, and I don’t consider their declare that the sport was supposed to focus on 30 FPS from the start — no purebred PC recreation has achieved that for the reason that early 2000s, and the graphical constancy doesn’t justify it.

Whereas I did discover lots to complain in regards to the recreation’s expertise, this little investigation which has been consuming a big share of my free time for the previous 1.5 weeks has additionally made me recognize the sport’s lofty targets and sympathize extra with the builders of this technically bold but troubled recreation. I’ve discovered lots about how Cities: Skylines 2 & Unity HDRP work underneath the hood, and I’ve additionally gotten some good follow with Renderdoc.

For those who preferred this text, good for you! I don’t have something to promote you. Write a remark or one thing to the media aggregator or social media of your alternative. Subscribe to my Atom feed if it nonetheless works. Keep tuned for my subsequent article in a few years.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top