Constructing the DirectX shader compiler higher than Microsoft?
This can be a story nightmare in regards to the messy state of Microsoft’s DirectX shader compiler, and making an attempt to wrangle it right into a nicer expertise for recreation builders. In some respects, we now construct the DXC compiler higher than how Microsoft does.
Setting the stage
For Mach engine we’ve been constructing an experimental graphics API called sysgpu utilizing Zig, aiming to be a successor and descendant of WebGPU for native graphics. It can help Metallic, Vulkan, Direct3D, and OpenGL backends. As a part of this, we have to compile shader packages into one thing that Direct3D 12 can eat. However what does it eat?
A quick historical past lesson
The DirectX graphics API makes use of HLSL as its shading language of alternative. Previously, with Direct3D 11 and earlier, this compiler was known as ‘FXC’ (the ‘results compiler’)
FXC is deprecated, DXC enters the scene with Direct3D 12
Sadly, FXC as a compiler is quite notoriously gradual amongst recreation builders, with suboptimal code era – that means shaders usually each compile and execute pretty suboptimally.
With the discharge of Direct3D 12 and Shader Mannequin 6.0 (SM6), Microsoft formally deprecated the FXC compiler distributed as a part of the Home windows OS in favor of a brand new compiler known as ‘DXC’ (‘directx compiler’), which exists as a public Microsoft-official fork of LLVM/Clang v3.7 Microsoft/DirectXShaderCompiler and prebuilt binaries you may obtain.
On this Microsoft fork of LLVM, adjustments are meticulously annotated by way of // HLSL Change Begin
and // HLSL Change Finish
feedback making it clear who owns what code:
What a DirectX driver eats for breakfast: DXBC or DXIL
Though HLSL is the language of alternative for Direct3D programming, on the finish of the day GPUs beneath the hood all have completely different compute architectures and necessities: the compiled binary type of a shader program that an Intel GPU wants goes to be completely different from what an NVIDIA GPU wants, similar goes for AMD.
Microsoft’s function is to offer the good recreation developer frontend APIs (like Direct3D, and the HLSL shanding language), whereas working with unbiased {hardware} distributors (IHV’s) like Intel/AMD/NVIDIA who write the drivers – bridging these good frontend APIs to no matter is hopefully closest to {hardware} producer’s instruction set structure (ISA) beneath the hood. You possibly can consider it like net browsers ensuring JavaScript can run on each a Home windows PC and a macOS Apple Silicon machine, although graphics builders would spit on the prompt comparability.
DirectX variations 11th of September had driver producers consuming what is known as DXBC (DirectX Byte Code) – recreation builders would produce DXBC both utilizing a CLI instrument to compile their HLSL packages like fxc.exe
, or at runtime utilizing the d3dcompiler
APIs, after which the driving force’s job was to take that decently-optimized shader bytecode and switch it into the precise binary that the GPU would run. This bytecode was an undocumented, proprietary format actually solely shared between Microsoft and GPU driver producers – excluding a couple of odd-ball Linux builders who cared to reverse engineer it for Proton.
With the arrival of DirectX 12 and Shader Mannequin 6.0, Microsoft aspirationally had supposed to create their very own normal IR known as DXIR, however in 2021 they removed all language suggesting they might do this. The intent was for DXIR to be the ‘excessive stage’, ‘unoptimized’ IR kind which compilers (assume: Rust) might goal, after which the DXC compiler might decrease DXIR into the optimized DXIL bytecode kind, a brand new ‘low stage’ post-optimization IR format, earlier than handing it off to graphics drivers to muck with as they please earlier than it will get translated to run on the precise {hardware}.
Requested about DXIR documentation, a Microsoft employee had famous this in 2019:
Sadly, documentation on the reducing course of [from DXIR to DXIL] is generally non-existent. […]
Oh, and DXIR isn’t something official, however simply the primary LLVM IR after CodeGen.
As you’ll quickly see, this theme of ‘there are not any docs, simply no matter our compiler really does’ will turn out to be a standard sample.
DXIL
DXIL (pronunciation?) is the official format that DirectX 12 driver producers eat as we speak.
A recreation developer produces DXIL bytecode utilizing the DXC compiler, which is a fork of LLVM/clang closely modified to help HLSL compilation, and the DirectX APIs hand that DXIL over to the graphics driver which then converts the IR into their very own intermediate languages, performing any secret sauce optimization passes on it, and in the end boiling right down to the precise machine code that can run on the GPU {hardware}.
Very similar to the previous bytecode format DXBC which DXIL changed, it’s additionally an undocumented bytecode format, particularly it’s LLVM’s model 3.7 post-codegen post-optimization-passes bitecode format. It’s undocumented not as a result of no one desires to doc it, however quite as a result of the documentation is actually ‘regardless of the Microsoft fork of LLVM v3.7 with all of the HLSL adjustments we made, after CodeGen and optimization passes have occurred, really emits as LLVM bitcode – plus a small customized container/wrapper file format on high.’
Correcting the Microsoft fork of LLVM
Microsoft themselves are effectively conscious {that a} bunch of unbiased driver producers counting on and anticipating to eat a hyper-specific undocumented LLVM bitcode format particularly produced by their fork is, effectively, lower than supreme – and likewise conscious that their fork of LLVM is just not tremendous enjoyable to take care of, both. Quoting another Microsoft employee (Sep 2023) who was requested in regards to the potential of including DirectX 9/10/11 help to the brand new/higher DXC compiler, they acknowledged:
DXC’s fork of LLVM eliminated and/or broken a lot of the code era layer and infrastructure [of LLVM]. On condition that, supporting DXBC era in DXC can be a large job to repair and restore damaged LLVM performance. As a result of massive scale of this problem and useful resource constraints on our crew we’re not going to deal with this problem in [the new] DXC [compiler] ever.
We could help DXBC era in Clang sooner or later (we talked about that within the unique proposal to LLVM). That work is unlikely to start for a couple of years as our focus can be on supporting DXIL and SPIR-V era first.
As famous above, in March of 2022, Microsoft had proposed and begun work on upstreaming HLSL compilation support directly into LLVM/clang proper – work that’s nonetheless ongoing as we speak – and concerned including again legacy LLVM v3.7 bitcode writing help to fashionable LLVM/clang variations:
By isolating as a lot of the DXIL-specific code as attainable right into a goal we hope to attenuate the price on the group to take care of our legacy bitcode writing help.
i.e. the plan to get away fromn their fork is to upstream HLSL and DXIL help to LLVM/clang correct.
The problem for gamedevs, WebGPU, and many others.
Graphics abstraction layers which purpose to offer a unified interface to fashionable graphics APIs like Metallic, Direct3D 12, and Vulkan.. in the end want to offer a unified shading language as effectively. In case you look as we speak, you’ll discover most WebGPU implementations which do that have had a objective of ‘sooner or later we would have the ability to emit DXIL straight..’ however in apply, none really do.
As a substitute, principally each WebGPU implementation as we speak behaves as follows:
- The WGSL textual language first will get translated to HLSL at runtime
- HLSL is compiled into DXBC or DXIL utilizing an HLSL compiler
- The optimized DXBC/DXIL is handed to the graphics driver, which then will get transformed to the assorted vendor-specific ILs earlier than lastly changing into machine code that runs on the GPU.
A fast detour: SPIR-V
Vulkan/SPIR-V does a lot the identical because the above, actually most drivers can’t assume SPIR-V is optimized in any respect – although some do, and this varies by cell/desktop GPUs – and have extra work to carry out to get SPIR-V was a driver-compiled native binary.
Valve has Fossilize and maintains caches of every particular (GPU, driver model, and many others.) pairing together with the precise driver-compiled binary for a SPIR-V blob, to allow downloading ‘pre-cached shaders’ from Valve servers forward of taking part in video games for that reason: so that you just don’t spend all day ready to your laptop to go brrr compiling and optimizing SPIR-V shaders into precise native code your GPU understands.
In different phrases, DXIL is at all times post-optimization-passes LLVM bitcode, whereas SPIR-V can or can’t be an an optimized kind, and GPU producers write their drivers primarily based on what SPIR-V appears like within the wild – which can or is probably not a pre-optimized kind. SPIR-V is nearer to {hardware} than a textual shading language, however nonetheless very removed from native machine code a GPU understands.
Solely Apple’s Metallic graphics API helps compiling on to the precise goal {hardware}’s native binary format (due to that iron fist they maintain over their {hardware}, I assume.)
To make use of dxcompiler.dll or not?
Since WGSL->HLSL->DXIL is occurring at runtime, WebGPU runtimes are confronted with a problem: will we use the brand new DXC HLSL compiler, or the previous, formally deprecated FXC compiler which has worse efficiency and codegen high quality? On the floor, this hardly feels like a tough alternative!
Nonetheless, regardless of this, many indie devs and recreation engines select to make use of FXC by default. Bevy game engine’s documentation places it very well:
The Fxc compiler (default) is previous, gradual and unmaintained. Nonetheless, it doesn’t require any further .dlls to be shipped with the applying.
The Dxc compiler is new, quick and maintained. Nonetheless, it requires each
dxcompiler.dll
anddxil.dll
to be shipped with the applying. These information may be downloaded from https://github.com/microsoft/DirectXShaderCompiler/releases.
Because of this, a lot software program defaults to the previous, gradual and unmaintained compiler. And it’s not simply Bevy: wgpu
Rust customers, Daybreak WebGPU customers, and many others. are all confronted with this similar query. It’s seemingly one of many causes WebGPU doesn’t help Shader Mannequin 6.0+ performance as we speak – utilizing the DXC compiler is just not so nice: it’s in any case a big, clunky Microsoft fork of a C++ codebase from practically a decade in the past!
Properly, why not simply statically hyperlink towards it?
You possibly can’t.
Firstly, there’s the difficulty that Microsoft’s fork of LLVM doesn’t support statically linking. On the floor, this seems simply to be resulting from some cmake information assuming SHARED
as an alternative of STATIC
when creating libraries, however when you dig into it – as I did – you’ll quickly discover it’s a lot extra concerned than that.
Switching SHARED
to STATIC
in every single place in CMake information will seem to get you a construct with ~15 completely different static libraries to hyperlink towards (not nice in comparison with only one.) You may assume utilizing cmake OBJECT
libraries might clear up this, however with this you’ll rapidly encounter a problem the place though the cmake information are structured logically as dependants, they really have implicit dependencies on eachother as a result of HLSL adjustments Microsoft made. I’m 80% certain you would wish to rewrite each cmake file within the repository to help OBJECT libraries. I can say this, as a result of I attempted!
You could be pondering, linking towards ~15 static libraries isn’t SO dangerous so long as the ultimate executable is static, proper?
Not so quick – many elements of DXC’s COM interface implementation can also be explicitly designed to load itself as a DLL, i.e. to load dxcompiler.dll
and dxil.dll
as dynamic libraries and self-invoke strategies.
OK, we simply must patch the implementation to not name LoadLibraryW
then, principally, proper?
Introducing dxil.dll – the proprietary code signing blob for DirectX shaders
In case you’ve ever constructed DirectXShaderCompiler from supply, you may discover one thing: dxil.dll doesn’t get constructed. Why? It’s distributed in each launch on GitHub, each for Home windows (x86/arm) and Linux (x86 solely).
Unusual, I believed the compiler was alleged to be open supply? Properly, it wouldn’t be the primary time[0][1] I’ve encountered a Microsoft ‘open supply’ repository that really fully relies on some proprietary platform-specific code blobs behind the scenes.
By the way, I stumbled throughout the D3D12 Shader Cache API specification which mentions the existence of this proprietary code signing blob as a ‘good purpose for invoking the shader compiler at runtime’:
D3D12 will solely settle for signed shaders. That signifies that if any patching or runtime optimizations are carried out, reminiscent of fixed folding, the shader have to be re-validated and re-signed, which is non-trivial.
And within the current ‘preview release’ for Shader Model 6.8 functionality, Microsoft notes how they seem to leverage this DLL to limit new experimental shader performance:
The DXIL signing library (dxil.dll/libdxil.so) is just not supplied with this preview launch. DXIL generated with this compiler concentrating on Shader Mannequin 6.8 is just not ultimate, can’t be validated, and isn’t supported for distribution or execution on machines not working in developer mode.
In different phrases: when you don’t have dxil.dll, then your shaders won’t be signed/validated. In case your shaders aren’t signed/validated, then they can not run on a Home windows machine except it’s working in Developer Mode.
Platform help challenges
For a second, I’d like to return to one thing I wrote at first of this text:
For Mach engine […] we have to compile shader packages into one thing that Direct3D 12 can eat.
I’d like for us to have the ability to carry out offline shader compilation, and skip out on distributing the heavy DXC dependency, when desired.
However Microsoft solely distributes a replica of dxil.dll for Home windows (x86/arm) and Linux (x86). There’s no Linux aarch64 binary. There’s no macOS binary. In different phrases, you may’t produce builds of your cross-platform recreation for Home windows utilizing offline shader compilation on a mac, or in your Arm Linux CI pipeline. You want a Home windows or x86_64 Linux machine to run the proprietary blob.
Recap
To recap:
- We can’t construct DXC as a static library, as a result of the decades-old Microsoft fork of LLVM v3.7 has a really messy build-system.
- Even when we might, we can’t construct DXC as a static library due to the proprietary code-signing blob.
- We can’t compile DirectX HLSL shaders offline on a Mac, or construct our cross-platform recreation in an arm Linux CI pipeline, as a result of Microsoft doesn’t distribute copies of the proprietary code signing blob for these platforms.
Going deeper
Un#$@&%*! the construct system
The primary drawback I wished to deal with was really construct this codebase right into a single static library.
After a number of days of making an attempt to repair the implicit dependencies that altering the cmake digital libraries from DYNAMIC
-> OBJECT
surfaces, I gave up. Initially, my intent was to make use of their current cmake construct system (in order to not diverge from their codebase an excessive amount of) and simply swap the compiler with zig cc
because the construct toolchain for cross-compilation.
After it slowly and painfully turned obvious that route was not going to be any higher than sustaining the whole buildsystem myself, I made a decision to only chunk the bullet and rewrite the whole CMake construct system they’d, some ~10.5k traces of code, utilizing construct.zig
as an alternative. To make issues easier, I selected to construct solely the 2 elements we (and others) actually care about as shoppers of the code: the dxcompiler.dll
library, and dxc.exe
binary for offline compilation / testing. (we’ll cope with dxil.dll
later.)
This resulted in someplace round ~1k lines of build.zig logic, and in apply it’s lower than that as a result of a lot of it’s simply associated to working git clone
on the supply repository, being able for Zig bundle shoppers to make use of a prebuilt binary as an alternative of constructing the big C++ library from supply, and header/supply era (although we’re nonetheless not executed with that, due to llvm-tablegen)
Un#$@&%*! the dynamic library dependency
As talked about earlier, DXC is written with the expectation that dxcompiler.dll
and dxil.dll
exist. Studying the code, it nearly seems as if the COM API implementation invokes the DLL, which then invokes itself dynamically relying on which is out there.
Taking some recommendation from Microsoft, I acquired my arms soiled, forked their codebase and set to work on the precise C++ code. I started annotating my adjustments with cute // Mach change begin
and // Mach change finish
feedback, to know who owns what code. All of this current as a alternative that I hope will come again to hang-out my goals sooner or later as a lot as Microsoft’s personal option to underemploy the HLSL crew and fork LLVM 3.7 initially.
I used to be off to the races: simulating dllmain entrypoints, disabling the power to print the compiler model information derived from the dlls, and emulating dynamic library function pointer loads.
Un#$@&%*! the proprietary code signing
All that was left was that pesky dxil.dll
– what kind of magic may Microsoft be using in that library to “signal shaders”? How can they stop unsigned shaders from working on Home windows machines that aren’t in developer mode? How are they capable of distribute that binary on Linux, too?
I gained’t touch upon any of these questions, however will say that you’ll find dxil.dll is NOT a dependency of mach-dxcompiler in any form. You possibly can compile an HLSL shader on a macOS machine utilizing mach-dxcompiler, with out the proprietary dxil.dll
blob – and find yourself with a DXIL bytecode file that’s byte-for-byte equal to at least one which runs it on a typical Home windows field. Take pleasure in!
Outcomes
We now have prebuilt, static binaries of the dxcompiler
library, in addition to the dxc
CLI here, with zero dependency on the proprietary dxil.dll
. On the time of writing, now we have binaries constructing in our CI pipeline for:
- macOS (the primary ever in historical past), each Apple Silicon (aarch64) and Intel (x86_64).
- Linux, together with musl and glibc, in addition to aarch64 (first ever in historical past) and x86_64.
- Home windows, x86_64 and aarch64, together with for MinGW/GNU ABI (first ever in historical past?)
Moreover included is a small C API the library now exposes, as a substitute for the COM API historically required.
Zig recreation builders will discover the repository additionally features a Zig API, see src/main.zig
assessments for utilization. By default prebuilt binaries are downloaded/used.
You possibly can build from source yourself for any OS/arch with solely zig
and git
, simply ensure you have the right Zig version:
git clone https://github.com/hexops/mach-dxcompiler
cd mach-dxcompiler/
zig construct -Dfrom_source -Dtarget=aarch64-macos
zig construct -Dfrom_source -Dtarget=x86_64-windows-gnu
zig construct -Dfrom_source -Dtarget=x86_64-linux-gnu
Caveats
It’s not all roses – there are some drawbacks:
- Home windows MSVC ABI binaries are presently not constructing resulting from a small bug within the C bindings – will repair it rapidly if necessary for you, in any other case at our personal tempo.
- Linux musl binaries are untested, they construct nice and I’d be curious to know in the event that they run nice!
- With Mach engine, we plan to make use of Zig itself as our shading language, not HLSL, so I don’t construct SPIRV-output help, sorry! I’ve no plans so as to add it.
- No plans to replace this to help SM6.7 presently (launched very just lately), although maybe sooner or later.
- LLVM’s cmake construct system is just not trivial, there are some features yet-to-be-translated. See
generated-include/
for specifics which come from the cmake construct system nonetheless. - In case you use this, you’ll be counting on myself to repair/tackle any points. I’m the one particular person engaged on this, and it exists solely to resolve Mach’s personal issues. If it really works for you, nice – however there could also be a time we discover a higher path ahead for us and it might get deprecated, so maintain that in thoughts.
On a private be aware

My identify is Stephen, I work a traditional tech job, and after signing off from work on the finish of the day I am going on-line to construct Mach engine. I have been dreaming of having the ability to construct a recreation engine like this for a very long time, and I am lastly doing it!
FOSS is in my roots, I consider we should always personal our instruments, they need to empower us-not be a part of the ‘open source’ game which is all too prevelant as we speak (even amongst ‘open supply’ engines.) I want Mach to genuinely be software you can love.
My dream is someday to dwell a easy, modest, life incomes a dwelling constructing Mach for everybody and promoting high-quality video games. Please think about sponsoring my work when you consider in my imaginative and prescient. It means the world to me!
Thanks for studying
