Mesh Shaders and Meshlet Culling in Steel 3 – Steel by Instance
Steel 3, launched by Apple at WWDC 2022, introduced with it a big variety of options that allow trendy rendering methods, sooner useful resource loading, and versatile shader compilation. It additionally consists of an all-new geometry pipeline that unlocks novel rendering methods by permitting builders to bypass many of the conventional vertex processing steps and submit geometry on to the rasterizer.
On this article, we are going to discover the options of the brand new geometry pipeline in Steel 3, the way it works, and its use circumstances. We’ll then go into extra depth on how one can use mesh shaders to implement meshlet culling, an essential function of recent GPU-driven rendering engines.
Obtain the pattern code for this text here. Some implementation particulars are omitted from this exposition for the sake of brevity, and there’s no substitute for studying the code.
A New Geometry Pipeline
If you’re accustomed to utilizing Steel’s primary rendering options, you’ve got nearly actually used vertex descriptors. Vertex descriptors point out how vertex attributes are specified by vertex buffers. Together with a vertex descriptor in your render pipeline descriptor permits the shader compiler to inject code (a vertex operate preamble) that robotically hundreds present vertex’s information into the vertex operate’s stage-in argument. This function is known as vertex fetch. In contrast, when not utilizing a vertex descriptor, you’re accountable for manually loading vertex information from vertex buffers your self; that is referred to as vertex pull.
Counting on vertex fetch simplifies writing vertex features. With out it, you not solely should manually load information; you additionally should manually carry out any vital conversions (e.g., from normalized integer representations to floating-point, or from 3-element vectors to 4-element vectors). However this strategy just isn’t with out downsides.
Motivation
The vertex processing portion of the normal programmable graphics pipeline is suboptimal for a lot of sorts of scenes. As a result of draw calls have an encoding overhead, it has traditionally been suggested to scale back the draw name rely. This in flip implies that every draw name may draw geometry that spans a big space and comprises triangles oriented in arbitrary instructions. As a result of vertices have to be processed earlier than they’re assembled into primitives, reminiscence bandwidth is wasted on loading vertex information that belongs to triangles that may then instantly be culled. Moreover, as a result of vertex features function on the vertex degree, they don’t have any method to motive about primitives and no method to effectively reject primitives midstream.
Even for triangles that survive culling, the normal strategy usually underutilizes the post-transform vertex cache. It is because index buffers are sometimes not sorted such that vertices belonging to adjoining primitives are referenced by adjoining indices. This results in listed meshes trying quite a bit like triangle soup from the vertex cache’s perspective.
Since 2015 there was an rising emphasis on utilizing compute shaders to find out triangle visibility earlier than passing geometry to the rasterizer. Graham Wihlidal’s influential GDC 2016 talk is a helpful reference to get acquainted with the mindset of compute-based geometry processing, although it’s closely technical and particular to AMD’s GCN structure. In our dialogue of meshlet culling beneath, we are going to implement a few these concepts in Steel.
As compute-based geometry processing methods turned extra frequent, GPU distributors and graphics API designers started incorporating these concepts into their merchandise, giving rise to trendy geometry pipelines. Apple’s implementation of compute-oriented geometry processing provides two programmable phases to the graphics pipeline that may run in-line (in the identical command encoder) with rasterization. These phases are object shaders and mesh shaders, and we are going to take a look at every in flip.
Object Features
The phrase “mesh” is considerably ambiguous. We regularly consider meshes as 3D fashions, which could have any variety of submeshes, every of which regularly has distinct materials properties.
Inside the context of object and mesh shaders, the phrase “mesh” has a extra explicit which means. It’s a small assortment of vertices, indices, and per-primitive information. A mesh is perhaps a sprig of grass, a strand of hair, or another form consisting of at most a number of dozen vertices and some dozen triangles. The explanation meshes are restricted in dimension is that they want to have the ability to match into threadgroup reminiscence for environment friendly processing.
The chief concept behind object shaders is to present the programmer the chance to resolve which chunks of geometry ought to proceed by way of the remainder of the pipeline. Identical to compute grids, objects are summary entities that may characterize any assortment of labor that is sensible in your software. An object is perhaps a group of fashions, a single submesh of a bigger mannequin, a patch of terrain, or another unit of labor that may give rise to a number of meshes.
An important concept in object shaders is the notion of amplification. Due to the item shader execution mannequin, every object can spawn zero, one, or many meshes.
Simply as every vertex or fragment is processed by an invocation of a vertex or fragment operate, an object is processed by an invocation of an object operate. Just like how we dispatch a compute grid by specifying a threadgroup dimension and a threadgroup rely and a compute pipeline, we draw objects by calling a way on a render command encoder that takes the scale of the item threadgroup to execute and what number of object threadgroups ought to be launched.
Every object thread produces a payload, a set of arbitrary information that’s handed to a mesh operate invocation for additional processing. We’ll take a look at how one can write object shaders beneath, however the principle level is: object shaders obtain amplification by figuring out what number of meshes ought to be produced by the item and offering payload information to every subsequently launched mesh thread.
Mesh Features
Mesh features are a brand new kind of shader operate that may function on a gaggle of vertices as a substitute of particular person vertices. As talked about above, a mesh—on this context—is a parcel of vertices, indices, and per-primitive information that’s produced by a mesh shader and handed on to the fixed-function rasterizer.
Simply as an object operate can produce a payload describing zero or extra meshes, a mesh operate can produce a mesh comprising zero or extra primitives (that are stitched collectively from its constituent vertices). Every mesh threadgroup provoked by an object operate produces a single (doubtlessly empty) mesh.
Collectively, the threads of a mesh threadgroup carry out the next work:
- Copy vertex information into the output mesh
- Copy index information into the output mesh
- Copy per-primitive information into the output mesh
- Set the entire variety of primitives contained by the mesh
Tips on how to divide the work of mesh technology throughout threads is as much as you. It’s potential for a single thread to do all the work, particularly for a small mesh. Typically it is going to be extra environment friendly to share the load throughout threads, with every thread producing a vertex, a number of indices, and/or a primitive.
We’ll see an in-depth instance of how one can divide the work up within the part on meshlet culling beneath.
Mesh Shader Use Circumstances
Mesh shaders are helpful everytime you want to course of geometry at a coarser degree than particular person triangles. Mesh shaders can generate procedural geometry akin to hair, fur, foliage, or particle traces. They can be utilized to pick amongst precomputed ranges of element primarily based on metrics akin to screen-space protection and distance to the digital camera. And so they can make the most of spatial coherency to provide meshlets that absolutely exploit vertex reuse and keep away from the wasted work of processing back-facing triangles.
We’ll take a look at every of those use circumstances in flip.
Procedural Geometry
Probably the most important use circumstances for mesh shaders is procedural geometry. Procedural geometry is a class of processes and methods for producing geometry algorithmically reasonably than utilizing premade property. As an alternative of preserving a full illustration of a form in reminiscence, mesh shaders can generate shapes on the fly, enormously rising scene element with out rising a scene’s reminiscence footprint. Procedural strategies have been used for a few years in graphics, and their enchantment will increase with mesh shaders, as it’s now potential to generate detailed geometry from a simplified illustration with out preserving the absolutely expanded geometry in reminiscence. For an instance of procedural geometry within the context of fur rendering, see the WWDC 2022 session on mesh shading.
LOD Choice
One other use case for mesh shaders is degree of element (LOD) choice. LOD choice is the method of choosing the suitable degree of element for an object primarily based on its distance from the digital camera or another measure. Ranges of element will be computed prematurely at a set of discrete ranges (just like mipmaps), or generated on-the-fly from a parametric illustration, just like fixed-function tessellation.
The sample code accompanying the WWDC session on mesh shaders talked about above provides a rudimentary instance of how one can implement degree of element choice.
Now that now we have talked about a few potential use circumstances for mesh shaders—procedural geometry and level-of-detail choice—let’s dive deeper into an more and more frequent use of the brand new geometry pipeline: meshlet culling.
Meshlets and Meshlet Culling
Probably the most attention-grabbing use circumstances of mesh shaders is meshlet culling. To know the virtues of this system, we first must know what a meshlet is.
Most 3D modeling packages produce property in a handful of frequent codecs (Wavefront .obj, glTF, USD[Z], and so forth.). Incessantly, the meshes produced by these packages are unorganized lists of vertices and indices which are stitched collectively in a method that mirrors this system’s inside illustration (reasonably than any type of optimum ordering). If we load such an asset and render it, there’s a great probability numerous triangles in a given mesh might be going through away from the digital camera, and there’s a good probability that the index buffer will fail to reference vertices in an order that makes optimum use of the vertex cache.
What can we do? It has turn out to be extra frequent lately to subdivide meshes into meshlets. Because the title implies, meshlets are small meshes that collectively comprise a bigger mesh. Importantly, meshlets are constructed for coherence. The vertices in a meshlet ought to be close by each other, the meshlet’s indices ought to be laid out to match the vertices’ adjacency, and the normals of the meshlet’s triangles ought to level in the identical normal course as each other.
Preprocessing Meshes to Meshlets
How can we flip a mesh into meshlets? Primarily, it comes all the way down to reordering the mesh’s vertices in order that spatially coherent vertices will be referenced in sequence, and constructing small index buffers that join the vertices into triangles. Every meshlet, then, consists of a reference to a span of the unique vertex buffer which comprises its vertices and a listing of index triples that comprise its triangles. Since meshlets can solely reference a restricted variety of vertices, these indices are often smaller than the 16 to 32 bit indices that we usually use when doing listed drawing. We use 8-bit unsigned integers as our indices within the pattern code.
The open-source meshoptimizer library is a versatile, environment friendly device for dividing meshes into meshlets. We is not going to delve into all of meshoptimizer’s options (there are a lot of); as a substitute we are going to use its easiest meshlet technology operate, meshopt_buildMeshlets
. This operate takes an listed mesh or submesh and produces the next:
- A meshlet vertex listing, which maps optimally ordered vertices to their positions within the authentic vertex listing,
- A meshlet triangle listing, which is a listing of 8-bit indices, three for every triangle,
- A listing of meshlets, every of which references a span of vertices and a span of triangles,
- An approximate bounding sphere for every meshlet, and
- A cone representing the typical orientation and unfold of every meshlet’s vertex normals.
These outputs will be copied into Steel buffers and used instantly by our object and mesh shaders. The determine beneath illustrates how the meshlet vertex buffer maps indices onto the unique vertex buffer and the way meshlets point out their respective parts of the triangle listing and vertex listing. One attention-grabbing factor to notice is that though the mapping from meshlet vertices (the highest set of arrows) is reasonably incoherent (scattered in reminiscence), the references made by indices inside a meshlet are extremely coherent and dense.
If you’re interested by precisely how meshlets are generated, check out the meshletgen
goal within the pattern code. It’s a small command-line utility that makes use of Mannequin I/O to load 3D fashions and produce preprocessed, “meshletized” meshes in a customized format.
Meshlet Culling Strategies
As soon as now we have diced a mesh up into meshlets, how one can we use them to make rendering extra environment friendly? We’ll exploit the spatial coherence and regular coherence of meshlets along with object shaders to cull invisible meshlets earlier than we spend any time processing their vertices. We’ll do that by performing frustum culling and regular cone culling.
Meshlet frustum culling is completed by converting the viewing quantity (i.e., the view frustum) right into a set of planes towards which we are able to cheaply take a look at a meshlet’s bounding sphere. If the bounding sphere lies completely within the unfavourable half-space of any frustum aircraft, it’s not within the viewing quantity and will be culled.
Regular cone culling is barely extra concerned. As a part of the meshlet preprocessing section, we generate a cone for every meshlet that’s oriented alongside the typical course of its triangles’ normals. The width of the cone represents the maximal unfold between the typical regular and the vertex normals. With this info obtainable, we are able to cull any meshlet whose regular cone faces sufficiently distant from the digital camera: if the conventional cone doesn’t include the digital camera, then by definition, the digital camera can not see any face within the meshlet. This can be a type of combination backface culling that considers all triangles of a meshlet directly. It was launched to graphics (so far as I’m conscious) by Shirmun and Abi-Ezzi in 1993, within the context of Bezier patch culling.
We’ll take a look at how one can implement these two culling methods in an object shader beneath, after a quick introduction to object and mesh shader fundamentals.
Creating Mesh Render Pipeline States
Making a render pipeline state that includes mesh shaders is similar to making a pipeline state utilizing the normal geometry pipeline. One chief distinction is that as a result of we might be manually loading the vertex information from buffers, we don’t embody a vertex descriptor.
Along with the standard work of setting attachment pixel codecs and mixing state, we create our object, mesh, and fragment features and set them on the corresponding properties of a render pipeline descriptor:
id<MTLFunction> objectFunction = [library newFunctionWithName:@"my_object_function"]; id<MTLFunction> meshFunction = [library newFunctionWithName:@"my_mesh_function"]; id<MTLFunction> fragmentFunction = [library newFunctionWithName:@"my_fragment_function"]; MTLMeshRenderPipelineDescriptor *pipelineDescriptor = [MTLMeshRenderPipelineDescriptor new]; pipelineDescriptor.objectFunction = objectFunction; pipelineDescriptor.meshFunction = meshFunction; pipelineDescriptor.fragmentFunction = fragmentFunction;
Then we are able to use the brand new -newRenderPipelineStateWithMeshDescriptor: choices:reflection:error:
technique on our gadget to get a mesh render pipeline state:
[device newRenderPipelineStateWithMeshDescriptor:pipelineDescriptor options:MTLPipelineOptionNone reflection:nil error:&error]
Binding Object and Mesh Sources
Object and mesh features can reference assets similar to different kinds of shader features. Mesh shaders add a number of new strategies to the MTLRenderCommandEncoder
protocol for this function, together with these:
- (void)setObjectBytes:(const void *)bytes size:(NSUInteger)size atIndex:(NSUInteger)index; - (void)setObjectBuffer:(id <MTLBuffer>)buffer offset:(NSUInteger)offset atIndex:(NSUInteger)index; - (void)setObjectTexture:(id <MTLTexture>)texture atIndex:(NSUInteger)index; - (void)setMeshBytes:(const void *)bytes size:(NSUInteger)size atIndex:(NSUInteger)index; - (void)setMeshBuffer:(id <MTLBuffer>)buffer offset:(NSUInteger)offset atIndex:(NSUInteger)index - (void)setMeshTexture:(id <MTLTexture>)texture atIndex:(NSUInteger)index
Binding these assets works precisely because it does for different programmable phases.
Mesh Draw Calls
Understanding the construction of mesh draw calls is essential, as a result of the two-tier object/mesh execution mannequin is the central facet of the entire function.
Steel mesh shaders add a number of new draw strategies to the MTLRenderCommandEncoder
protocol. We’ll use simply considered one of them, -drawMeshThreadgroups:threadsPerObjectThreadgroup: threadsPerMeshThreadgroup
. Its signature seems to be like this:
-(void)drawMeshThreadgroups:(MTLSize)threadgroupsPerGrid threadsPerObjectThreadgroup:(MTLSize)threadsPerObjectThreadgroup threadsPerMeshThreadgroup:(MTLSize)threadsPerMeshThreadgroup;
The threadgroupsPerGrid
parameter tells Steel what number of object threadgroups ought to be launched. Recall that every object threadgroup can finally launch zero, one, or many mesh threadgroups.
The threadsPerObjectThreadgroup
parameter specifies the variety of threads in every object threadgroup. As with compute kernels, this quantity ought to ideally be a a number of of the pipeline’s thread execution width, which you’ll be able to retrieve from the render pipeline state’s objectThreadExecutionWidth
property (it is going to generally be 32 for present Apple GPUs).
The threadsPerMeshThreadgroup
parameter specifies the variety of threads in every mesh threadgroup. Just like the earlier parameter, it ought to be a a number of of thread execution width, which on this case is out there because the meshThreadExecutionWidth
property on MTLRenderPipelineState
.
Observe that we don’t specify the variety of mesh threadgroups that might be launched by this draw name. In any case, all the level of object shaders is that the item operate itself determines what number of meshes to course of.
Object Features
The core construction of an object operate seems to be like this:
[[object]] void my_object_function( object_data Payload &object [[payload]], grid_properties grid) { // Optionally populate the item's payload object.someProperty = ...; // Set the output grid's threadgroup rely // (Solely do that from one object thread!) grid.set_threadgroups_per_grid(uint3(meshCount, 1, 1)); }
An object operate is prefixed with the brand new [[object]]
attribute, which marks it as an object operate.
An object operate can take a parameter with the [[payload]]
attribute. If current, this parameter have to be a reference or pointer within the object_data
deal with area. You management the kind of this parameter; it’s a construction containing no matter information your mesh shader may must reference from its frightening object operate. You populate this parameter nevertheless you want within the physique of the operate. We’ll use it beneath to inform the mesh shader which meshlets to render.
An object operate additionally takes a parameter of kind grid_properties
, which has a single technique: set_threadgroups_per_grid
. That is the mechanism by which an object operate causes grid threadgroups to be dispatched. Setting the threadgroup rely to a non-zero worth tells Steel it ought to launch that many threadgroups of the pipeline’s mesh operate.
Importantly, just one thread in every object threadgroup ought to populate the threadgroup rely. You may select so as to add a parameter attributed with [[thread_position_in_threadgroup]]
to your object operate so you possibly can verify the present thread’s place and solely write this property when the thread’s place is 0 (the pattern code demonstrates this).
A Meshlet Culling Object Shader
The job of our meshlet culling object operate is to carry out meshlet culling as described within the part on culling methods. For every meshlet within the mesh being rendered, we load simply sufficient info to find out if it ought to be processed by the mesh operate.
The dialogue beneath assumes that no matter buffers are wanted by a operate have been certain appropriately to the present render command encoder; seek advice from the pattern code if you happen to care in regards to the particulars.
Suppose now we have the next construction that encapsulates all the information belonging to a meshlet:
struct MeshletDescriptor { uint vertexOffset; uint vertexCount; uint triangleOffset; uint triangleCount; packed_float3 boundsCenter; float boundsRadius; packed_float3 coneApex; packed_float3 coneAxis; float coneCutoff; //... };
The offset and rely members seek advice from spans inside the meshlet vertex buffer and meshlet triangle buffer, respectively. These should not utilized by the item shader. We’ll solely be utilizing the bounding properties and cone properties to carry out culling.
We additionally want a customized kind to retailer our object’s payload. This can merely encompass a listing of meshlet indices that go the culling exams:
struct ObjectPayload { uint meshletIndices[kMeshletsPerObject]; };
For the sake of exposition, I’ll barely simplify the item operate. See the pattern code for the complete implementation. Right here’s the item operate signature:
[[object]] void object_main( gadget const MeshletDescriptor *meshlets [[buffer(0)]], fixed InstanceData &occasion [[buffer(1)]], uint meshletIndex [[thread_position_in_grid]], uint threadIndex [[thread_position_in_threadgroup]], object_data ObjectPayload &outObject [[payload]], mesh_grid_properties outGrid)
Discover that, as earlier than, now we have a payload parameter and a mesh grid properties parameter. We additionally take a pointer to a buffer containing our meshlet metadata and a small buffer containing some per-instance information.
Within the object operate physique, we use our thread’s place within the object grid to retrieve the meshlet for which we are going to carry out culling:
gadget const MeshletDescriptor &meshlet = meshlets[meshletIndex];
The particulars of frustum culling and regular cone culling should not essential right here; we carry out each by calling out to small utility features:
bool frustumCulled = !sphere_intersects_frustum(frustumPlanes, meshlet.boundsCenter, meshlet.boundsRadius); bool normalConeCulled = cone_is_backfacing(meshlet.coneApex, meshlet.coneAxis, meshlet.coneCutoff, cameraPosition);
Since we’re working on many meshlets concurrently, we have to coordinate our object threads so that every one writes to the suitable index of the payload array.
We begin by combining our culling outcomes right into a single integer worth:
int handed = (!frustumCulled && !normalConeCulled) ? 1 : 0;
We then use a prefix sum operation to find out what number of threads with a smaller index than us handed their culling exams. Should you aren’t acquainted with prefix sums, seek the advice of a useful resource akin to this one.
int payloadIndex = simd_prefix_exclusive_sum(handed);
The ensuing payload index tells our thread the place it ought to write its meshlet’s index if the meshlet was not culled. So we seek the advice of the worth of handed
after which carry out the write:
if (handed) { outObject.meshletIndices[payloadIndex] = meshletIndex; }
The ultimate job of the item operate is to put in writing out the mesh threadgroup rely. As talked about above, we solely need one thread to do that, so we first compute the entire variety of non-culled meshlets, then—if we’re the primary thread in our object threadgroup—write out the variety of mesh shader invocations:
uint visibleMeshletCount = simd_sum(handed); if (threadIndex == 0) { outGrid.set_threadgroups_per_grid(uint3(visibleMeshletCount, 1, 1)); }
This concludes the physique of the item operate. The payload now comprises the indices of the meshlets to be rendered, and the grid properties include the variety of mesh threadgroups to run.
Mesh Shader Outputs
A mesh operate has a parameter of user-defined kind that collects the vertices, indices, and per-primitive information generated by the operate. The threads of the mesh’s threadgroup collaborate to provide this information. You outline the kind of the mesh by specifying a kind that aggregates the vertex information and per-primitive information.
We begin by defining the vertex information. This seems to be similar to the return kind of an atypical vertex operate.
struct MeshletVertex { float4 place [[position]]; float3 regular; float2 texCoords; };
In our easy instance, the one per-primitive information we go alongside is a shade, for visualization functions:
struct MeshletPrimitive { float4 shade [[flat]]; };
These constructions are included into the output mesh by declaring a typedef that consists of a template instantiation of the metallic::mesh
class:
utilizing Meshlet = metallic::mesh<MeshletVertex, MeshletPrimitive, kMaxVerticesPerMeshlet, kMaxTrianglesPerMeshlet, topology::triangle>;
A Meshlet Mesh Shader
At this stage of the pipeline, Steel will launch a mesh grid containing the variety of threadgroups specified by every object threadgroup. The variety of threads in every mesh threadgroup is specified when encoding the draw name, so this quantity ought to be the most variety of threads essential to course of a single meshlet.
The variety of threads in a mesh threadgroup depends upon the utmost variety of vertices and triangles within the meshlet and the way the work of producing the meshlet is distributed over the mesh threads. In our case, we are going to output at most one vertex and one triangle per mesh shader invocation, so the thread rely is the utmost of the utmost variety of vertices in a meshlet (128) and the utmost variety of triangles in a meshlet (256), or 256—a pleasant spherical a number of of the standard thread execution width.
The mesh shader has entry to the payload produced by its frightening object threadgroup; that is how information is handed between the 2 phases of the geometry pipeline. It might additionally use any variety of different assets (buffers, textures), like an atypical compute kernel or vertex operate. In our case, we are going to bind buffers containing the meshlet descriptors (metadata), vertex attributes, meshlet vertex map, meshlet triangle indices, and per-instance information. We will even take an out-parameter of kind Meshlet
containing the mesh being constructed by our threadgroup.
[[mesh]] void mesh_main( object_data ObjectPayload const& object [[payload]], gadget const Vertex *meshVertices [[buffer(0)]], fixed MeshletDescriptor *meshlets [[buffer(1)]], fixed uint *meshletVertices [[buffer(2)]], fixed uchar *meshletTriangles [[buffer(3)]], fixed InstanceData &occasion [[buffer(4)]], uint payloadIndex [[threadgroup_position_in_grid]], uint threadIndex [[thread_position_in_threadgroup]], Meshlet outMesh)
To search out the meshlet we’ll be rendering, we glance up its index within the payload we obtained from the item shader, then retrieve it from the meshlet buffer:
uint meshletIndex = object.meshletIndices[payloadIndex]; fixed MeshletDescriptor &meshlet = meshlets[meshletIndex];
Every thread in a mesh threadgroup can carry out as much as three duties: generate a vertex, generate a primitive, and/or set the primitive rely of the mesh. We reference our thread index to check whether or not the present thread ought to do every of this stuff in flip.
If our thread index is lower than the variety of vertices within the mesh(let), we load the vertex information from our vertex buffer and replica it to the output mesh:
if (threadIndex < meshlet.vertexCount) { gadget const Vertex &meshVertex = meshVertices[meshletVertices[meshlet.vertexOffset + threadIndex]]; MeshletVertex v; v.place = occasion.modelViewProjectionMatrix * float4(meshVertex.place, 1.0f); v.regular = (occasion.normalMatrix * float4(meshVertex.regular, 0.0f)).xyz; v.texCoords = meshVertex.texCoords; outMesh.set_vertex(threadIndex, v); }
This seems to be quite a bit like a vertex operate that makes use of vertex pull, and that’s no accident. The one main distinction is the double indirection to first lookup the index of the present vertex within the meshlet vertex listing, then lookup the precise vertex information within the vertex buffer. This could possibly be prevented by duplicating the vertices forward of time right into a vertex buffer with extra optimum format; as typical, there’s a tradeoff to be made between execution time and reminiscence utilization.
The subsequent step of our mesh operate does double obligation: it writes out the indices of a triangle and copies the info related to the present primitive. We solely carry out this step if the index of the present mesh thread is lower than the variety of triangles within the meshlet:
if (threadIndex < meshlet.triangleCount) { uint i = threadIndex * 3; outMesh.set_index(i + 0, meshletTriangles[meshlet.triangleOffset + i + 0]); outMesh.set_index(i + 1, meshletTriangles[meshlet.triangleOffset + i + 1]); outMesh.set_index(i + 2, meshletTriangles[meshlet.triangleOffset + i + 2]); MeshletPrimitive prim = { .shade = ...; }; outMesh.set_primitive(threadIndex, prim); }
The final activity of the mesh operate is to put in writing out the ultimate triangle rely for the meshlet. We solely need to do that as soon as, so we verify that the present thread is the primary within the present mesh’s threadgroup beforehand:
if (threadIndex == 0) { outMesh.set_primitive_count(meshlet.triangleCount); }
This concludes the mesh operate. At this level, now we have constructed a whole mesh(let) that’s appropriate for rasterization. As within the conventional pipeline, we write a fraction operate to provide a shaded shade for every fragment.
To get meshlet information into the fragment shader, we outline a construction that mixes the interpolated vertex information and the per-primitive information we generated above:
struct FragmentIn { MeshletVertex vert; MeshletPrimitive prim; };
We are able to then use this information in our fragment operate nevertheless we select. Within the case of the pattern code, we apply some primary diffuse lighting and in addition tint every meshlet by its distinctive shade to indicate the boundaries between them.
[[fragment]] float4 fragment_main(FragmentIn in [[stage_in]]) { ... }
Pattern App
The pattern code consists of an implementation of meshlet culling utilizing Steel mesh shaders. It makes use of a pre-meshletized model of the famous Stanford dragon that’s chopped up into meshlets, the place every triangle is coloured to indicate the meshlet it belongs to. Obtain it here.
Acknowledgements
Because of ChatGPT for writing an early draft of this put up.