Lines are fine, but surfaces are perfect.
So let's rasterize some triangles.
The Pipeline
We'll call our new pipeline the "objects pipeline", rather than -- say -- a triangles pipeline, because we're going to specifically design it to transform, light, and draw instanced vertex data; just like you might use to draw a bunch of objects in a scene.
As you might expect, we're going to base our objects pipeline on our lines pipeline.
Copying the Pipeline Declaration
Let's start by copying and modifying the Tutorial::LinesPipeline structure to make a new Tutorial::ObjectsPipeline structure.
Put it just under the lines pipeline declaration in Tutorial.hpp:
As before, add calls to the create and destroy functions in Tutorial.cpp:
Just as before, compiling should work at this point but linking should fail.
Copying the Pipeline Definition
Copy Tutorial-LinesPipeline.cpp to Tutorial-ObjectsPipeline.cpp,
and add it to the build:
Now edit Tutorial-ObjectsPipeline.cpp to switch it over to drawing triangles with our new shaders.
Load the correct shaders:
Update the structure names:
And switch the pipeline to drawing triangles:
Copy the Shaders
Copy lines.vert to objects.vert and lines.frag to objects.frag.
We'll edit these later, but for now copying will let us build the code.
Speaking of building the code, go ahead and do so now.
Your new code should build and run, though it won't do anything different yet.
A Static Vertex Buffer
Though streaming vertices from the CPU to the GPU is useful -- especially for transient debug information -- it's inefficient when the vertices being drawn don't actually change frame-to-frame.
For example, if we're moving a camera through a scene made of objects that only change under easy-to-encode-in-a-matrix transformations, sending their vertex data every frame would be redundant and wasteful.
So, for our objects pipeline, instead of creating and uploading a vertex buffer every frame, we will compute a vertex buffer and upload it once at the start of the program.
A Static Vertex Buffer
Start by adding a data member to Tutorial to hold our static vertex buffer:
Add appropriate creation and destruction code to Tutorial's constructor and destructor:
Notice that we're using the Helpers::transfer_to_buffer function to upload the data outside of our rendering function.
I wonder how that works?
(Foreshadowing!)
It's only a single triangle, but let's draw it:
Notice two things about this new code.
First, we didn't need to re-bind the camera descriptor set -- we were able to leave it bound because set 0 for both the lines pipeline and the objects pipeline are compatible.
Second, notice the somewhat awkward way we're computing the number of vertices to draw -- we'll fix this shortly.
For now, compile and run the code and you should have a fancy new triangle in your scene:
Transfering Vertices without refsol::
Let's open up that mysterious Helpers::transfer_to_buffer command:
More refsol:: code?!
Well, I guess we'll need to replace that with our own code.
To start with, if we're going to run transfer commands on the GPU we're going to need a command buffer, and to make a command buffer we're going to need a command pool.
So let's add data members to Helpers to hold those:
And let's have Helpers create and destroy those in its create and destroy functions:
Now that we've got a command buffer to work with it's time to start on our transfer code.
To begin with, we'll create a transfer source buffer and sketch out the rest of the transfer code:
Copying the data to the source buffer is a simple memcpy:
The command buffer recording looks like the first part of our rendering command buffer, but without a render pass or complicated synchronization commands:
Running the command buffer is as simple as submitting it to the graphics queue:
And, to wait until the transfer is finished, we wait until the graphics queue is idle:
With that, everything should compile and run (and show the triangle); and your code is free of one more refsol:: call.
Normals and TexCoords, oh my!
Before we dive into making some fancier solid objects, let's update our vertex structure.
A position and color is fine for some simple lines, but for surfaces we want something fancier.
Particularly, if we want to compute lighting we need normals (surface orientations);
and while we're at it, we might as well use texture coordinates so we can get sub-triangle-level color detail as well.
A PosNorTexVertex Structure
Make a header and cpp file for our new vertex type by copying PosColVertex.hpp to PosNorTexVertex.hpp and PosColVertex.cpp to PosNorTexVertex.cpp.
Then edit the header as follows:
And update the cpp file to reflect the new layout of the vertex (and the new structure name):
Notice that we're putting Position at location 0, Normal at location 1, and TexCoord at location 2.
We'll need to remember those for when we update our shader.
Now add to Maekfile.js so the new vertex type will be included in the build:
Building and running at this point should work (but won't do anything different because our pipeline isn't using the new vertex type yet).
Updating the Pipeline
Updating the pipeline is surprisingly easy, thanks to the fact that we used using to make a local vertex definition:
Running now will produce a validation error warning about our vertex input state supplying an attribute not consumed by the shader.
Also, our triangle probably won't show up; and, in fact, our shader is probably reading past-the-end on the vertex buffer, because we're still trying to feed it from the older, smaller, vertex format.
Updating the Vertex Buffer
Let's change the vertex buffer over to the new vertex format:
Note that I'm setting the texture coordinate to the \( (x,y) \) position so we can check later that it's coming through to the shader.
If you compile and run the code now, the triangle is back, but it's blue for some reason:
Update the shaders
Why is the triangle blue?
That's because our shader is reading its Color attribute from location 1, which is now fed from from the Normal member of our structure and, threfore, always set to \( (0,0,1) \).
(The alpha value still comes out as one because "short" attributes are expanded by adding values from \( (0,0,0,1) \) -- see Conversion to RGBA, as per Vertex Input Extraction.)
So let's fix that by getting our shaders re-written for the new vertex format.
We'll have the vertex shader pass the position, normal, and texCoord onward to the fragment shader:
And we'll update the fragment shader to accept these values and (for now) display the texCoord as a color:
Note that I'm using fract on texCoord so it's easier to see where textures will repeat.
These updates get us back to seeing our triangle with a colorful gradient:
Meshes
Now that we've got a good vertex format, let's make a few meshes.
We're going to put the data for all of these meshes into the same vertex buffer, so let's go ahead and create a little wrapper to package the index information together for each particular mesh:
The ObjectVertices structure stores the index of the first vertex and the count of vertices (exactly the parameters used by vkCmdDraw, in fact) for each mesh whose vertices are stored in our object_vertices array.
For now, that'll just be two meshes we generate with code, but one can imagine loading a whole library of meshes from disk into one vertex buffer and building a std::unordered_map< std::string, ObjectVertices > to track the location of each within the larger buffer.
This would be a performant way of storing static vertex data for a general scene (saves on vertex buffer re-binding commands).
Let's go ahead and build those meshes:
And we can go ahead and generate a torus by writing loops to iterate around the major and minor angles.
Note the use of a helper function to avoid writing the vertex computation in more than one place:
Compiling and running now, you get to see all of the geometry stacked up together in one spot:
Lighting (briefly)
Since we went to all the trouble to define a vertex normal, we might as well use it for something.
In your fragment shader, normalize the interpolated vertex normal and then use the basic hemisphere light equation to shade your meshes:
It's actually relatively hard to see the shading with the albedo set to the texture coordinate, but if you set the albedo to all 1's (i.e., albedo = vec3(1.0)), you get this:
Objects
To get our geometry unstacked, we need a way of positioning individual instances of our meshes in the scene.
We are going to do this by sending transformation matrices to our vertex shader, and moving the objects by these matrices.
But how to send the matrices to the shader?
We've already used push constants, and we've already used uniform blocks, so we're going to try something a bit different: storage buffers.
A Storage Buffer in the Vertex Shader
Storage buffers can be both read from and written to in shaders (though our code will only read from them).
They are much like uniform blocks in that they store global data for the shader, but storage buffers are accessed through a cache hierarchy (instead of from fast local memory), and -- thus -- while slightly slower, can hold much, much more data than uniforms.
(GPUs allow as few as 16kb of uniforms, but storage buffers can be allocated up to the size of device memory. If each object needs two 4x4 float matrices a uniform buffer might only be able to hold 128 object transforms.)
The upshot of the fact that storage buffers are so big is that we can actually send a bunch of matrices at once to the GPU and only grab the ones we need for the current object.
This will save us descriptor set binds later.
(And, also, gives us a good excuse to set up a different kind of descriptor than what we've used before.)
This is how we declare a storage buffer in our vertex shader (and, while we're at it, get rid of the camera uniform; we don't need that any more; it's baked into our per-object CLIP_FROM_LOCAL matrix):
Particularly, the set of transforms we'll package for each object will be the transformation that gets to clip space directly from the object's local space -- CLIP_FROM_LOCAL -- and transformations that get to world space (by which I really mean: "the space we'll do lighting computations in") from local positions -- WORLD_FROM_LOCAL -- and normals -- CLIP_FROM_LOCAL_NORMAL.
And we might as well go ahead and actually use the transforms in the shader as well:
As you can see from how we actually use the transforms, it would have been more efficient to make WORLD_FROM_LOCAL a mat4x3 and WORLD_FROM_LOCAL_NORMAL a mat3.
However, that would slightly complicate our data assembly on the CPU side so we don't.
Note, also, the use of gl_InstanceIndex here -- this is set by the last parameter of vkCmdDraw and is a sneaky way of getting a 32-bit index into your shader without using a push constant.
(At least it's sneaky if you aren't drawing more than one instance; if you are using instanced rendering than it's just the expected way of getting an index into the vertex shader.)
A Storage Buffer Descriptor
Hmm, the shader is accessing memory.
You know what this means: we need a buffer to hold the data the shader is accessing, a descriptor to point to that buffer, a descriptor set to hold the descriptor, and a descriptor set layout to describe the type of the descriptor set.
Let's start with the descriptor set layout:
Which, of course, we still need to properly create and destroy:
Then we'll need a descriptor set and a buffer to point it at.
We'll stream the transformations per-frame, so let's go ahead and define these in Workspace:
Taking a cue from what we did with the lines data, we'll dynamically re-allocate the buffers in our render function as needed.
But we should still write allocation code for the descriptor set (which also will require us to adjust the limits on our descriptor pool):
And, of course, clean-up code for everything:
Transforms and Objects
How should we fill the transforms buffer?
Let's make a CPU-side list of objects to draw -- storing the vertex indices and transforms for each.
We can fill it up in update and copy the transforms portion to the GPU in render.
First, the structure:
Now some test transformations:
Note that to properly transform normal vectors, the upper left 3x3 of WORLD_FROM_LOCAL_NORMAL (i.e., the only part of the matrix our shader uses) should be the inverse transpose of the upper left 3x3 of WORLD_FROM_LOCAL.
However, since our matrices are orthonormal, the inverse transpose is simply the matrix itself.
(And, thus, we avoid having to write a matrix inverse helper function in our library code.)
Next, we have to actually get the transform data into a buffer on the GPU.
To do this we can just copy exactly what we did with lines_vertices[_src] and make a few changes to the names of things (and usage flags):
Two interesting changes in this block.
First, notice that the list of transforms is built directly into the mapped transforms source memory, avoiding any additional copies.
Second, notice that a descriptor set write is included so that the descriptor set stays up to date with the re-allocated buffer.
Right, let's (finally) retrofit our drawing code to bind the descriptor set as set 1 and draw each object instance with a proper instance ID:
Notice how we used the firstSet parameter to vkCmdBindDescriptorSets to make sure our descriptor set got bound as set 1, not set 0.
With all this done, compiling and running should produce no validation errors or warnings, and will display the plane and torus, with the torus spinning:
Some quick testing on a system with an AMD Ryzen 7950x CPU and NVIDIA GeForce RTX 3080 GPU (with debug turned on, running under linux) suggests that this method of drawing allows pushing something like 64,000 torus instances at 60fps; and upwards of 125,000 torus instances with the added optimization of using a single draw call to draw all instances of the same mesh (by setting instanceCount to the actual count of instances!).
It even maintains ~15fps on 700,000+ instances.
Textures
We've been pushing texture coordinates around for a while now; let's actually use them to draw a texture.
Sampling a Texture in a Shader
Let's start at the place our texture is used: the fragment shader.
Edit objects.frag to include a sampler2D uniform and to read from it:
Attaching an Image to a sampler2D
Just like any other uniform, we need to bind a descriptor to tell the Vulkan driver where TEXTURE should point.
Unlike uniform blocks or storage buffers, a sampler2D is an "opaque descriptor", which means we direct it to image data (along with some sampler state), rather than a block of memory in a known format.
We'd like our application to support many different textures, with the texture choosable per object draw.
So let's go ahead and make data structures to hold the texture images along with everything else we need to make the descriptors:
In the code we just wrote, textures holds handles to the actual image data, texture_views are references to portions of of the textures (in this case: the whole texture), the texture_sampler gives the sampler state (wrapping, interpolation, etc) for reading from the textures, the texture_descriptor_pool is the pool from which we allocate texture descriptor sets, and, finally, texture_descriptors includes a descriptor for each of our textures.
The reason we have a separate descriptor pool for just the textures here is so you could -- conceivably -- re-allocate the pool if your code loaded a texture in the middle of rendering frames; without disturbing any of the other descriptors used by our code.
It also makes our texture creation code a bit more self-contained.
And we'll add an index to our ObjectInstance structure to indicate which texture descriptor to bind when drawing each instance:
Now we update the drawing code to bind the correct descriptor set:
Since this code will access past-the-end of the texture descriptors array if we compile and run now, it's probably a good idea for us to actually make some texture descriptors.
Making Some Texture Descriptors (Descriptor Set Layout)
To make a descriptor set, we first need a descriptor set layout.
So let's update our ObjectsPipeline structure definition:
And we can also write the code to create the descriptor set and add it to the pipeline layout:
Notice that the type of the descriptor in this set is VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, because a GLSL sampler2D references both an image and the parameters for how to sample from that image.
If we wanted to have a separate descriptor just for how to sample from an image, we'd use a VK_DESCRIPTOR_TYPE_SAMPLER descriptor and a sampler-type uniform in GLSL.
If we wanted to have a separate descriptor just for an image that could be sampled from, we'd us a VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE descriptor and the type texture2D in GLSL.
(Note that these split sampler/texture types only exist in GLSL meant to be compiled to Vulkan, and are not available in OpenGL GLSL.)
Let's not forget to destroy our descriptor set layout:
Actually Making Some Texture Descriptors
Now that we've got a descriptor set layout for our texture descriptors, we can actually write the code that makes them.
But, hey, why don't we write the clean-up code first:
Now we'll start with a general plan for our creation code, at the end of Tutorial::Tutorial:
In keeping with the theme of our tutorial, we'll fill in this plan backwards.
Once everything else is created, all we need to do is allocate the descriptor sets and then do descriptor writes for each texture into its descriptor set.
This is the same way we've allocated descriptor sets in the past, with the added twist that we just use the exact sample alloc_info repeatedly.
The descriptor set writes, however, are a bit more complicated than we've seen previously, because we're going to do them in a batch, but we need a different VkDescriptorImageInfo per-write.
We don't want the addresses of the image info structures to change so we pre-size the vector that holds them (we could also have reserve'd enough space and emplace_back'd the structures, but I figured this made things clearer).
Note that part of the image info for a descriptor is the layout we are promising the image will be in when the descriptor is used.
Image layouts are the way that Vulkan talks about how an image is organized in memory.
We talked about this a bit back when making a render pass; and we're going to talk about it a bit more in the rest of this section, since a big part of dealing with textures in Vulkan is making sure they undergo the right layout transitions to be in the state we need them in when, e.g., sampling from them in a fragment shader.
For now, make a mental note that we had better make sure our textures are in the layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL by the time our fragment shader runs.
Let's back up and make the descriptor pool.
Nothing complicated here; more-or-less the same as the code that made descriptor_pool -- just with a different descriptor type, descriptor count, and max sets.
We know how many sets and descriptors are needed because we know how many textures we have.
Taking another step back we come to creating the sampler.
We already used this in the descriptor writes -- it contains all the information that controls how the GPU will read from a texture:
A few things to notice here are that anisotropic sampling is supported out of the box (OpenGL relegates this to an extension, IIRC);
you control how the texture repeats (or doesn't) with the "addressing modes";
and mip mapping is controlled separately from the minification and magnification filtering modes (OpenGL combines these together).
The settings we've used basically turn off mip-mapping (clamp to level zero and only sample the nearest level);
if you do want mip-mapping in Vulkan you need to compute and upload your texture mip levels yourself.
And what is actually getting sampled with the sampler?
If you recall the descriptors we wrote: image views (handle type VkImageView)!
These are references to particular aspects (color, depth, stencil) of particular ranges of the mip levels and array slices of an image, interpreted in a certain format, and presented in a certain arrangement (2D, 3D, cubemap, ...).
You can't crop an image in an image view, but you can view the same image as (e.g.) a texture array or a cube map; or as depth data or RGBA bytes; or even as SRGB-encoded or linearly-encoded data.
So let's make those, using some convenience members of AllocatedImage to get the information we need:
These are unexciting image views, they just take the color aspect of the first level of the first array layer of the image in the same format we defined the image in and present it as a 2D image.
To wrap things up, let's build and upload some textures.
We start by making a 128x128 checkerboard texture (with a red blob at the origin so we know where that is):
To actually get it to the GPU we use Helpers::create_image and Helpers::transfer_to_image:
We're using VK_FORMAT_R8G8B8A8_UNORM as the format for the image.
This means that there is no SRGB decoding of the data: our 0x55/0xbb checkerboard will decode linearly to 0.2666 and 0.7333 when sampled in the shader.
Also, recalling our mental note about image layouts, notice this comment in Helpers.hpp:
So, in fact, everything is ready to go right now and we could compile and run.
Let's do it!
Okay, now for a bit of a victory lap.
Let's use the classic demoscene trick of xor-ing the x- and y- coordinates together to make a texture with an interesting binary noise pattern:
I've decided that the colors in this texture should be interpreted as if they are SRGB-encoded, and have set the format appropriately.
This is probably the right format for albedo texture images you load from disk as well, since this is generally the color space we use for visual material.
But using it for, say, normal map images is definitely a trap!
Now you can your scene creation code to apply texture one to a few objects and things will get a lot more colorful:
Just One More Thing: transfer_to_image
Before we wrap up textures, we should really take a look at what that Helpers::transfer_to_image function is doing:
If we want to understand how data gets copied into a VkImage, we should re-write this to remove the reference code.
Let's start with a framework that's very similar to what we did for transfer_to_buffer:
The interesting new things here are the two layout transitions -- this is adding a command that tells the GPU to re-arrange the image in memory -- and the use of a different copy command (since the one we used before is buffer-to-buffer).
The way we check to make sure the data is the right size is also a bit different than before, but less interesting.
Let's get to it:
The only way to figure out how many bytes are needed for each pixel in an image is to read the spec and make a big table that maps from format constants to bytes.
Thankfully, the vkuFormatElementSize function, part of the vk_format_utils.h header included with the SDK, already does that, so we don't need to write that function ourselves.
Creating the source buffer and copying the data into the source buffer proceed exactly the same way as in the other transfer function:
Nothing exciting about starting the command buffer recording, either:
To tell the GPU to put the image in a specific layout, we use a pipeline barrier command with a VkImageMemoryBarrier structure.
This is a synchronization primitive that requires that every command before the barrier (in a certain pipeline stage, doing a certain memory operation) must happen before the layout transition, and that every command after the barrier (in a certain pipeline stage, doing a certain memory operation) must happen after the layout transition.
Specifically, by setting srcAccessMask to zero this barrier doesn't place any conditions on earlier commands, but the dstAccessMask (write) and dstStageMask (transfer) indicate that the transition must complete before any transfers write data to the image.
These constraints make sense in context because this transition is taking the image from VK_IMAGE_LAYOUT_UNDEFINED (which means "throw away any image contents") to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (which is "whatever layout is best for receiving data").
Now that the image is in a good layout to copy into, we can record the copy command:
This is a different copy function but shouldn't be too daunting.
The region describes what part of the image to copy, and the other parameters indicate buffer and image to copy between and the current format of the image.
Frustratingly, the imageSubresource field of VkBufferImageCopy is a VkImageSubresourceLayers not a VkImageSubresourceRange, otherwise we could have used our convenient whole_image structure from above.
Now another image layout transition, this time to the optimal-to-read-from-in-a-shader format:
Notice that the access masks and stage flags here are different than the previous barrier.
In this case, the barrier waits until all transfer writes are complete, then transitions the image, then allows fragment shader reads to proceed.
(This second part is not strictly necessary in this function because we're going to wait for the queue to drain at the end of the function; but if you were doing layout changes as part of rendering you'd definitely want to force texture reads to wait like this.)
The final steps in the function are just as we've done before: finish the command buffer, submit it to the queue, and wait until the queue finishes running.
And with that we've eliminated another refsol call and learned a bit more about how to wrangle images on the GPU.
If you compile and run now, the code should work exactly as it did before, and you should feel a sense of pride that you now understand the texture uploading process.
Lighting (pt2)
To wrap up our solid object drawing, let's revisit our lighting computation.
Particularly, we'll set up a basic sun (directional light) + sky (hemisphere light) and make the parameters of the lights adjustable from the CPU.
A remind, first, of the current lighting setup:
Lights and Colors
Let's start by creating a uniform block to hold our new light parameters:
It's actually a bit redundant to have a direction for both the sun and the sky because we're doing lighting in world space, so we could re-orient world space so that (e.g.) the sky was always directly upward.
But that seems likely to end up confusing things if we ever wanted to make world-position-dependent shaders; or if we wanted to have some of our object transforms remain static instead of uploading all of them each frame.
Let's go ahead and use these values in our lighting computation:
Notice the difference in how the dot product is used in the hemisphere light (only reaches zero energy when the normal is exactly opposite the light direction) and the directional light (reaches zero energy when the normal is perpendicular to the lighting direction).
Compiling and running the code now produces a bunch of validation errors (and -- at least for me -- solid black objects).
But that isn't surprising -- the descriptor set that's bound at 0 when running the objects pipeline (i.e., the camera descriptor set for our lines pipeline) isn't compatible with the layout of the descriptor set our shader is expecting.
A World Struct and Descriptor Set Layout
Let's get the CPU-side type information for our descriptor sorted out first:
Notice the padding included after the vec3 members of the structure.
This is required by the std140 layout, which aligns vec3s on 4-element boundaries.
Now we write the code to create the descriptor set layout for set 0.
If you copy-paste your code for set1_Transforms make sure to change the descriptor type and stage flags:
And what we create we must also destroy:
And, now that we have the descriptor set layout, we can add it to the pipeline layout:
Building and running now should work, but we'll still have a stream of errors because we still don't have any descriptor sets to bind to set 0.
Buffers and Sets
We're going to stream world information per-frame.
So let's add the appropriate CPU- and GPU-side buffers to our Workspace:
Notice that this is the same setup as for the other uniforms and storage buffers.
In fact, for the creation code, we can copy-paste-modify the code we used for the lines pipeline's Camera uniform block:
And we can even use the same vkUpdateDescriptorSets call to point binding zero in the world descriptor set to its associated buffer:
Two bits of book-keeping to do though.
First, we should remember to destroy these resources:
And, second, we need to size our descriptor pool properly to account for the new descriptor set we're allocating:
Compiling and running at this point will, again, work -- but there's still a ton of per-frame warnings because we haven't bound our new descriptor set.
Transfers and Bindings
Okay, time to get data into our buffer.
Let's set up some scene variables to track the sun and sky position:
And actually set the variables to something in the update function:
Now to the render function to upload the data (a slightly simpler version of what we did for the camera):
And, finally, we can add code to bind the descriptor set:
Compiling and running, we finally have no validation errors and our lighting showing up in the world.