Now that we have an interesting background layer, let's actually render something in the foreground.
Nothing is quite as classic rasterized graphics as a grid, so we'll render one of those.
This will require building a pipeline that renders lines,
as well as handling the memory transfers required to get vertices for those lines to the GPU.
We'll finish up by also sending a clip-from-local transformation matrix to our shader so we can fly a camera around our lines.
The Pipeline
We're going to base the lines pipeline on our existing background pipeline.
Before we copy the background pipeline, let's replace the refsol:: code being used in the BackgroundPipeline::destroy() function:
Build and run the code now and things should behave just as before (and with no complaints in the console from the validation layer about things not being properly destroyed).
Copying the Pipeline Declaration
Let's start by copying and modifying the Tutorial::BackgroundPipeline structure to make a new Tutorial::LinesPipeline structure.
Put it just under the background pipeline declaration in Tutorial.hpp:
And go ahead and calls to the create and destroy calls in Tutorial.cpp:
Compiling the source at this point should work, but linking should fail because of the missing function definitions.
Copying the Pipeline Declaration
Edit the Maekfile.js to build our soon-to-be-created lines pipeline:
Copy Tutorial-BackgroundPipeline.cpp to Tutorial-LinesPipeline.cpp,
and add it to the build:
Edit Tutorial-LinesPipeline.cpp to adapt it to our new purposes.
Load the correct shaders (hmm, we should write those soon):
Change the create function's structure name:
We're eventually going to use a descriptor set to pass data to this pipeline.
For now, update the pipeline layout to not include any push constants:
Update the input assembly state to reflect the fact that the lines pipeline will draw lines:
Enable the depth test:
And, finally, remember to change the structure name for the destroy function:
The Shaders
Now the only thing standing between us and compiling the code is the lack of shader programs.
So let's write those.
The lines pipeline vertex shader will copy the position supplied as a vertex attribute into the gl_Position output and pass the color supplied as a vertex attribute onward to the fragment shader:
Recall from the background pipeline that layout(location=N) decorations are how varying values are matched up between the vertex and fragment shaders.
This is also the case for vertex shader -- the location decorators are how our CPU-side code will assign streams of data to each of the vertex shader inputs.
By the way, in the shaders for this class, I'll tend to use Uppercase variables for attributes (vertex shader stream inputs), lowercase variables for varyings (vertex shader outputs / fragment shader inputs), and -- with some exceptions -- SHOUTYCASE variables for uniforms (global inputs).
Most of the time the shaders will be so simple it won't particularly matter, but it's nice to have a convention for better at-a-glance understanding of code.
The fragment shader will write its input color to the output:
Compiling and running the code now should result in the vulkan validation layer warning in the console about missing vertex attribute descriptions in the vertex input state.
This is because our vertex shader expects inputs but our pipeline creation code hasn't provided any information about how to get those inputs.
A Vertex
A vertex can hold whatever data you want. In our background pipeline, it held nothing at all.
For this pipeline -- as you've already seen in the shader code -- the vertex will have both position and color attributes.
Let's define a vertex structure.
You'll generally be using the same vertex formats across different pipelines, so we're going to make a new header and C++ file for this vertex structure.
We'll begin with the layout itself -- a 3-vector of floating point numbers for the position and a 4-vector of 8-bit unsigned integers to store an RGBA color.
The static_assert is here just to make sure that the structure's layout in memory is as we expect (no padding).
We'll also go ahead and associate an VkPipelineVertexInputStateCreateInfo structure with this vertex type to make it easier to instantiate a pipeline that uses an stream of PosColVertex as input:
Now make a PosColVertex.cpp so we can define array_input_state:
This defines a pipeline vertex input that takes data from one binding (location in GPU memory).
There are two attribute streams that are assembled from that binding:
the stream at location 0 is made of 3-vectors of 32-bit floating point values;
and the stream at location 1 is made of 4-vectors of 8-bit unsigned integer values, which will be treated as "normalized values" (values between 0.0 and 1.0) by dividing by 255.0 .
Add the PosColVertex files to the Maekfile.js:
Now is a good time to check that everything builds okay.
Once you've sorted out any typos, let's tell our lines pipeline about the vertex format it will be using.
We'll do this with a using (a type alias) in the LinesPipeline structure:
And we'll update the lines pipeline creation code to use the input state structure we've conveniently already created:
If you compile and run the code now, you'll see that the warning about vertex attributes is no longer displayed,
now that our code is now supplying an input state definition that matches the vertex shader's inputs.
A List of Vertices
We know the type of our vertices, but we haven't actually created any vertices yet.
Vulkan gives us a lot of flexibility in how we choose to store and transfer vertices.
What method we choose depends on how we plan to write and render the vertices.
For this grid, we're going to generate the vertices CPU-side every frame and stream them to the GPU for drawing.
This would be inefficient for large, static objects, but it is exactly what we'd want if we were (e.g.) making a list of lines every frame to display debugging info in our 3D scene.
Generating a Grid
Let's start with code to generate the vertex stream.
We'll need a vector< PosColVertex > to write vertices into every frame:
And we'll put code in Tutorial::update to generate some vertices -- in this case, a simple "x" shape for testing:
Now to actually get these lines into GPU memory.
Memory Wrangling
Vulkan has a nuanced way of talking about memory.
Not all memory in Vulkan is equivalent.
Instead, Vulkan gives you a way to ask for different types of memory with different features (and, potentially, allocated from different memory heaps).
When you've allocated memory through Vulkan you receive a VkDeviceMemory handle.
This is an opaque handle that is useless to both the CPU and the GPU unless you do one (or more) of three things to get a more useful view of the memory:
Map it into the CPU's address space with vkMapMemory. This will give you a pointer that you can use in your CPU code to do regular memory reads and writes on the memory. (This is not possible for all memory types.)
Bind it to a VkBuffer. This tags the memory with enough additional metadata to allow you to use it for various data-buffer-like purposes on the GPU (like copying data or using it as a source for a vertex stream). (This is, again, not possible for all memory types and buffer usages.)
Bind it to a VkImage. This allows "image-like" memory access on the GPU, e.g., sampling it as a texture in a shader, or writing to it as a framebuffer attachment.
It provides few guarantees about the actual memory layout (mapping between pixels and addresses) of the image, and explicit support for changing the layout, in order to allow the use of dedicated special-purpose framebuffer and texture access units on the GPU.
(And is, again, not supported for all memory types.)
As such, memory allocation in Vulkan is generally a multi-step process.
First, you figure out what you want to do with the memory;
you use that to figure out how much device memory of what type you need;
you allocate that memory (ideally, from a larger slab you've already allocated, it's not efficient to ask the Vulkan driver to manage many small allocations);
and finally you bind and/or map it as needed.
In our case, we have some helper functions and structures to manage this process in Helpers.hpp, as you'll see in a moment.
(And, yes, you'll re-write these eventually.)
Actually talking about our code now
To get our vertices from a CPU-side heap-allocated array into GPU device memory we're going to do a pair of copies.
First, we'll have the CPU copy the vertices into a VkBuffer whose backing VkDeviceMemory is mapped into the CPU's address space;
then we'll have the GPU copy the memory into a VkBuffer in device memory that is suitable for using as a vertex attribute source.
We'll allocate these buffers per-workspace to avoid race conditions.
(Where, e.g., the CPU is computing vertices for the next frame into a buffer that the GPU hasn't finished copying out of for the previous frame.)
We'll write code to actually allocate these buffers later, but to start with lets make sure they get cleaned up when the application is finished but adding a call to Helpers::destroy_buffer in the per-Workspace part of Tutorial's destructor:
Now we'll add code in the render function to resize the lines buffers if needed.
To start with, we'll just compute how many bytes of buffer are needed and -- if that number is exceeded -- a nice size to reallocate to:
Now clean-up code for the buffers if they are already allocated:
And, finally, the actual allocation.
Notice that we're supplying the allocation helper with both a VkBufferUsageFlags to indicate what we will do with each buffer and a VkMemoryPropertyFlags to indicate properties of the memory to allocate it in.
We are allocating lines_vertices_src to use as a "staging buffer" -- the buffer that we copy a frame's lines data into using the CPU, before having the GPU transfer the data to the other buffer.
Therefore, we pass VK_BUFFER_USAGE_TRANSFER_SRC_BIT for buffer usage (we plan to have the GPU copy data from it);
we request memory that is both VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT (the memory can be mapped from the CPU side) and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT (the memory doesn't require special flush operations to make host writes available) so that it will be easy for us to copy into the memory on the CPU side;
and we ask the allocation helper to map the memory (put it somewhere in the CPU address space) by passing Helpers::Mapped as the last parameter.
We are allocating lines_vertices to use as the GPU-side vertex buffer, and to receive a copy of the data held in the lines src buffer.
Therefore, we pass both VK_BUFFER_USAGE_VERTEX_BUFFER_BIT (use as a vertex buffer) and VK_BUFFER_USAGE_TRANSFER_DST_BIT (use as the target of a memory copy) for usage flags;
we request memory that is VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT (actually on the GPU);
and we don't ask the allocation helper to map the memory.
When you compile and run this code, you'll notice that the message Re-allocated lines buffers to 4096 bytes. is printed twice.
Think about this for a moment to make sure you understand why this is not a bug.
Copying Vertices to the GPU
Now that our buffers are large enough, it's time to copy data to the GPU.
To do this we will first use the CPU to copy from the lines_vertices vector to the workspace.lines_vertices_src staging buffer;
and then record a command to have the GPU copy the data from the staging buffer to the workspace.lines_vertices buffer.
The CPU-side copy is easy since the staging buffer is mapped.
We can just use std::memcpy:
The command to have the GPU do the host-to-GPU copy is also refreshingly straightforward:
Note that the srcOffset and dstOffset members of the VkBufferCopy are offsets into the buffers, not their allocations.
This, despite our aside about offsets earlier.
Making Sure The Copy Finishes
You might be thinking at this point that copying data to the GPU is remarkably uncomplicated for a Vulkan task.
And you're partly right.
Starting the copy is very straightfoward!
Making sure the copy finishes before the GPU that runs other commands that depend on it is more complicated.
Add this code before the render pass begins:
The vkCmdPipelineBarrier command establishes a memory dependency between any operation in the srcStageMask doing any memory operation in the srcAccessMask before the barrier command and any operation in the dstStageMask doing any memory operation in the dstAccessMask after the command.
In this case, this means that any memory writes done by transfer commands before the barrier (like the copy we just wrote!) must be visible to any memory reads in the vertex input stage of any piplines run after the barrier (like the draw we're about to write!).
Drawing
We've got some lines uploaded to the GPU, but we haven't actually asked the GPU to do anything with them yet.
Let's fix that.
This code is very similar to the commands used to run the background pipeline.
Of course, we aren't sending any push constants;
and we have a vertex buffer, so we record the state command vkCmdBindVertexBuffers in order to bind it for use in the pipeline.
Finally, we record a draw command with the number of vertices we've uploaded.
Compiling and running the code, we get to see our "x" in pixels:
Of course this simple "x" isn't doing much except showing us that (-1,-1) is indeed the upper left of normalized device coordinate space.
Let's see if we can do a bit more and test that depth buffering is working:
Examining the output, we can see that the depth buffer appears to be working properly (the blue lines are behind on the yellow lines on the left and behind on the right):
Let's Do 3D: a clip-from-local matrix
Drawing lines is fine but wouldn't it be cool if we could see them in 3D?
Yes. Yes, it would.
To see our lines in 3D we need write down a viewing and perspective transform between the local coordinate system of the lines and clip space.
Conveniently, all the transforms we need can be represented and concatenated as linear functions on 4D homogeneous coordinates.
And linear functions on 4D coordinates can be tabulated as 4x4 matricies.
So let's write a quick 4x4 matrix math library.
We're going to write this for demonstration and learning purposes but -- in production -- you should probably move to glm, or at least write your own library making use of SIMD intrinsics (x64; arm).
We start by defining a mat4 as an array of 16 floats and a vec4 as an array of 4 floats:
We will think of our matrices as stored in column major order;
in other words, the elements will be stored with columns written contiguously.
This means that the element at row r and column c is stored in mat[c * 4 + r].
Let's start with applying the linear function tabulated in a matrix to a vector; i.e., doing a matrix-vector multiply:
And we'll follow that up with the composition of two linear functions tabulated in matrices; i.e., matrix-matrix multiplication:
Now let's write a function to compute a perspective matrix:
Note that -- as per convention since time immemorial -- this does perspective projection for a camera looking down the \( -z \) axis with \( +x \) right and \( +y \) up.
One can do some quick sanity checks by substituting in \( z = -n \) and verifying that the output \( z \) coordinate (after homogeneous divide) is \( 0 \);
and, similarly, that points with \( z = -f \) map to results with \( z = 1 \).
Further, setting the vertical fov to \( \pi/2 \) radians and the near plane distance to 1 unit, we know that \( (\pm 1, \pm 1, -1) \) should map to \( (\pm 1, \mp 1, 0) \) after transformation and homogenous divide.
Now we've got a way to look at things through a perspective camera, but we don't have a way to move the camera.
So let's write a function to compute a "look at" matrix.
If you want a good check on your vector math intuition, just read the comments and try to write the code from those alone.
A Rotating Camera
Let's compute a matrix through which to view our grid of lines.
We'll call it CLIP_FROM_WORLD to indicate it will be used to transform between world space and clip space.
We'll compute CLIP_FROM_WORLD in update using the functions we've already written.
And, just to make sure everything is working, let's transform our lines vertices by the CLIP_FROM_WORLD linear function on the CPU:
Compile and run and you should have a new perspective on your lines:
Moving the matrix to the GPU: Descriptors
GPUs are -- more-or-less -- built to multiply matrices and rasterize triangles.
And, at present, we're all out of triangles.
So let's transfer our WORLD_FROM_CLIP matrix to the GPU so it can do the multiplication work.
To start with, let's stop doing the multiplication on the CPU:
We could use push constants -- our matrix is only 64 bytes and we've got at least 128 bytes of push constants available -- but for educational purposes let's use a different way of giving our shader access to the matrix: a uniform block.
Shader programs can access GPU memory through different pieces of special- and general-purpose hardware.
In shader code, you select the memory access path for each piece of global data by specifying its type (and, sometimes, with layout decorators).
For example, a uniform sampler2D TEX will read data through a texture unit with filtering and interpolation;
while a buffer Particles { vec4 PARTICLES[100]; } provides read/write access main GPU memory through a more conventional cache;
and a uniform Camera { mat4 CLIP_FROM_LOCAL; } will provide fast read-only access to data that is copied into core-local scatchpad memory.
When using Vulkan to run a shader program, you provide data for each global data access location in the shader via a descriptor.
A descriptor is a pointer to a resource in GPU memory.
Descriptors are typed: the type of the descriptor specifies how the shader can use the resource it points to.
For example, VK_DESCRIPTOR_TYPE_STORAGE_IMAGE allows a shader program to read and write pixels of a VkImage (well, through a VkImageView) wheras VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER allows a shader program to use that same image (again through a view) as a sampled texture.
Further, the type of a descriptor must correspond to the type of the global resources in the shader that it provides (though the correspondence is not exactly 1-1.)
Descriptors are stored in sets (handle type: VkDescriptorSet) to provide a mechanism to coordinate swapping out many descriptors at the same time.
Descriptor "set" is somewhat of a misnomer though -- these sets are actually ordered lists of bindings where each binding is an array of descriptors of the same type.
Shaders indicate the descriptor supplying a global resource by using a layout decorator to assign it a descriptor set index and a binding index within that set -- e.g., layout(set=2,binding=1) uniform sampler2D textures[2] will connect to an array of two descriptors stored in set two's binding one.
When recording a command buffer, your code binds descriptor sets to specific set indices in order to switch out blocks of global resources.
Dividing descriptors into sets allows your code to only re-bind descriptors that it needs to change between draw calls, while leaving other descriptors unchanged.
This is an optimization because the process of binding a descriptor set may involve, e.g., the GPU needing reconfigure a texture unit, or copy data from main GPU memory into local scratchpad memory.
Let's make our vertex shader expect a uniform buffer descriptor which will supply our camera information, and use the matrix to transform the input position.
Note that the std140 in the layout decorator indicates how the data will be organized in memory.
The OpenGL Specification, section 7.6.2.2 gives the layout algorithm for std140 interface blocks as well as std430 blocks (which are mostly the same but pack data in arrays more tightly).
In this case, the layout algorithm says that our mat4 CLIP_FROM_WORLD will be stored at offset zero and without any padding.
If you compile and run the code now, the validation layer will complain loudly that you are trying to use a pipeline with a shader whose descriptor set layout doesn't match the layout of the pipeline.
And, indeed, this is the case.
When we created our pipeline layout, we said it didn't include any descriptors; but our shader says there must be a single descriptor in set zero at binding zero that has type VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER.
So we need to create an appropriate descriptor set layout (handle type: VkDescriptorSetLayout) and add it to our pipeline layout.
We'll want the descriptor set layout handle layer for creating descriptor sets as well, so we'll add it to the LinesPipeline structure:
And we can add code to Tutorial-LinesPipeline.cpp to create, supply to the pipeline layout, and destroy the descriptor set layout:
The code here is pretty self explanatory. We're making a DSL with a single binding that has descriptorCount = 1 descriptors of type descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER which can be accessed only in the vertex shader.
One quirk here is that the order of bindings in VkDescriptorSetLayoutCreateInfo::pBindings does not mean anything -- instead each VkDescriptorSetLayoutBinding specifies which binding index it is filling in with its .binding field.
The same is NOT true for the order of DSLs in VkPipelineLayoutCreateInfo::pSetLayouts -- the n-th element of that array is the layout that the pipeline will expect in its n-th set index.
Now that we've done the work of properly updating the type information for our pipeline, we can get to the business of actually creating descriptor sets and (finally) pointing the descriptors in them to the data we want the shader to access.
We'll be uploading our WORLD_FROM_CLIP matrix to the GPU every frame, so we'll need a buffer and a staging buffer for it in each workspace; further, we'll need a descriptor set that points to the buffer.
You would think that we could just create a VkDescriptorSet (the handle for a descriptor set) immediately using our VkDescriptorSetLayout but... no.
In Vulkan, descriptor sets are allocated from a pool (handle type VkDescriptorPool).
The idea is that -- for some descriptors sets -- you might want to create them transiently as your code draws, and then free them all at once (which you can do by resetting the pool).
In our case, we don't need to reconfigure our descriptor sets, so we'll allocate them once from a pool and free the pool at the end of the program.
Sinec Since we want these VkDescriptorPools to be made per-workspace we're going to make them members of Tutorial::Workspace:
Now let's write create and destroy code for everything, starting with the descriptor set pool:
Notice that you have to pre-size the pool.
You have to specify both the maximum number of sets you can allocate as well as the number of sets of each descriptor type you can allocate.
We're allocating exactly one set of exactly one descriptor from the pool per workspace, so it's not so hard for us to count.
Now let's handle the per-workspace resources.
The creation and destruction of the buffer and staging buffer for the camera descriptor are similar to those for the line vertices, but we don't need to dynamically resize them, and we set the usage flags differently (VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT instead of VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, for reasons that are probably self-evident).
To allocate the descriptor set, we just need to specify the layout.
We don't need to free the descriptor set -- just free the pool that it was allocated from.
Finally, we need to write a descriptor for (reference to) the workspace.Camera buffer into the descriptor set:
Using The Descriptor Sets
All of our various buffers and pointers are ready, so let's put them into service in the render function.
To start with, let's actually bind them during drawing. This will get rid of the validation layer's complaints.
Notice that the vkCmdBindDescriptorSets function allows you to bind several contiguous sets, and that you can select where that binding starts.
This means that you can (e.g.) have a set 0 that gets bound once per frame and stays bound, and then have sets 1 and 2 that you bind new descriptors to per draw call.
Also -- and this is kinda wild -- you can actually re-bind pipelines and keep the same descriptor set N bound as long as descriptor sets 1 .. N match between the pipelines.
So you can have some global sets that stay the same even between, e.g., materials.
If you build and run the code now, those ugly Vulkan validation layer warnings will be gone.
But you probably won't see any lines.
That's because we aren't uploading the matrix data yet.
To do that, we need to set up copies, just below where we copy the lines vertices:
And just like that, we're back to rendering our lines (and saving a lot of floating point work on the CPU).