Swapchain Wrangling
The next step on our journey to implement (and thus understand) all refsol::
code is in Tutorial.cpp
.
By my count we have seven calls to replace, with the complicated calls being ones that deal with constructing framebuffers and associated resources.
The First Warm-Up: Workspace Constructor/Destructor
We begin with the code that allocates and frees the command buffer used in the render function.
We've already written this code in Helpers.cpp
, and we can copy-paste that over here.
First, allocation (copied from Helpers::create
with some renaming):
Then the free code (copied from Helpers::destroy
with some renaming):
At this point the code should compile and run with no warnings or other problems. Great -- two of seven calls done.
Second Warm-Up: Render Submit
The second warm-up is refsol::Tutorial_render_submit
in our render function.
Now, we've seen vkQueueSubmit
before, but this one is a bit unique in that it needs to wait on and signal semaphores as well as signal a fence.
What's going on with all these signals?
First, notice that there are two types of synchronization primitives at play here.
The VkSemaphore
s are used to synchronize work between on-GPU workloads.
Semaphores start in the un-signalled state and the work that signals them must be submitted before the work that waits on them; in other words: a GPU that only runs work in order must not deadlock.
(Note that Vulkan has two types of semaphores -- binary semaphores, which are GPU-only; and timeline semaphores, which can also be used for GPU-CPU synchronization. We're talking about binary semaphores here.)
The VkFence
is used to synchronize between GPU and CPU workloads.
Fences start in whatever state you'd like, and -- for similar but probably more obvious reasons -- you shouldn't wait on a fence from a CPU thread before that thread submits the work that will signal that fence (though you could submit work to signal the fence from another CPU thread).
Now that we understand the types of the synchronization primitives, here are their purposes:
-
The
image_available
semaphore will be signalled when the image is done being presented and is ready to render to. By the time therender
function runs, the work that signals this semaphore has already been submitted by the window system interface layer. So ourvkQueueSubmit
indicates that no work in the queue should reach the point in the pipeline where the color buffer is written until after the semaphore is signalled. Notice that this means that, e.g., our copy operations can run before the image is ready to be drawn to. Nifty! -
The
image_done
semaphore should be signalled after the rendering work in this batch is done. The work that waits on this semaphore will be submitted by the window system interface layer after we finish therender
call. Note that, if we wanted to, we could actually have extra GPU work that runs after the image is finished (e.g., processing some motion blur textures for next frame), and this extra work could run even afterimage_done
was signalled. We'd need to submit this work with a separatevkQueueSubmit
and set its semaphores appropriately. -
The
workspace_available
fence is used to make sure that nothing is still using members of the workspace (e.g., copying to/from various buffers) when it is recycled for use in a newrender
call. Our code uses two workspaces so that one can still be being read from by the GPU working on the current frame while the CPU is copying to the other one to prepare the next frame. Thus we passworkspace_available
tovkQueueSubmit
to be signalled when all the submitted work is done, which is exactly when the workspace is free to be re-used.
We'll see the Vulkan Window System Interface (WSI) code that handles submitting work to signal/wait on the semaphores, along with the RTG::
code that waits on the fence, in the next step.
For now, that's another reference solution function eliminated. The code should, again, compile and work without any problems here.
The Render Pass
I believe we already talked about the purpose of render passes back in Step 1. As a brief reminder, render passes tell the GPU how to sequence data to/from various images during rendering (and framebuffers actually provide references to those images). As you can now appreciate, this is very analogous to how pipeline layouts (and associated descriptor set layouts) describe the layout of global resources accessed during shader execution (while descriptor sets actually provide references to those resources).
So let's go ahead and write the constructor code that creates the render pass (and the command pool, which got rolled into the same refsol function not because it was related but because it made things more compact).
We'll actually begin by selecting the format for the depth buffer that will be used in the render pass:
Recall that we wrote the image format helper last step.
This call checks which of two possible image formats -- VK_FORMAT_D32_SFLOAT
or VK_FORMAT_X8_D24_UNORM_PACK32
-- can be used as a depth format on the current GPU.
This is because the specification's required format support tables don't guarantee that both of these are supported, but do guarantee that at least one of them is.
(Also, since our function returns the first matching format, the order here indicates that we have a preference for the floating point depth buffer over the 24-bit fixed-point depth buffer.)
Now the code to actually create the render pass.
This is quite verbose because render passes are describing some subtle image transformation and synchronization information.
We'll start with the vkCreateRenderPass
call itself, which takes a list of attachments (descriptions of the images to be rendered to), a list of subpasses (descriptions of the steps in the rendering), and a list of dependencies (description of the synchronization required between the subpasses, as well as with external operations).
For the list of attachments, we'll have both a color image and a depth image, with the formats determined by the output surface (we'll write the code that picks this next step) and the depth image finding we just wrote above, respectively.
Notice that for each attachment we define the format, but also how to actually load the data (.loadOp
) before rendering happens, what layout (.initialLayout
) the image will be transitioned to before the load,
how to write the data back after rendering (.storeOp
), and what layout (.finalLayout
) the image will be transitioned to after the store.
These parameters are important because they allow, e.g., a GPU that has a special fast framebuffer memory to -- in our case -- not bother copying anything into this memory at the start of the render pass and to only copy the color buffer out of this memory at the end of the render pass, while discarding the depth buffer. (Though, admittedly, special fast framebuffer memory is -- I think -- more of a mobile GPU thing than a desktop GPU thing.)
Now the subpasses. Subpasses are parts of the rendering that can proceed (potentially) in parallel. They allow you to specify the memory dependencies of multi-pass rendering. For example, in a deferred renderer, you might have one subpass that renders position, normal, and material information out to several images, and a second subpass that reads these images and runs lighting and material computations. It may appear that these passes must proceed in series since the second depends on the first, but since the dependencies are all to the same pixels, a sufficiently clever GPU and driver could interleave the execution of the passes, working one "tile" of the framebuffer at at time, to avoid having to ever allocate a whole-screen-sized temporary buffer for the intermediate values.
For the tutorial, however, we only have one subpass:
Each subpass is equipped with a list of what attachments it reads from, writes to, and uses as a depth/stencil buffer.
Finally, the dependencies indicate how/when subpass computations can be overlapped, and how the computations for this renderpass can interleave with work that goes before or comes after it.
These dependencies are actually surprisingly tricky to parse (and get right!), so I'll attempt to walk through it carefully.
In each case, a memory dependency (which is a happens-before relationship + a guarantee the writes from before can be read by reads after) is created between operations that do a certain types of accesses in certain pipeline stages in certain subpasses (or, VK_SUBPASS_EXTERNAL
, which just means "everything before" / "everything after" this render pass).
An important note is that the layout transitions between the initial layout, VkAttachmentDescription::initialLayout
; per-subpass layout, VkAttachmentReference::layout
; and final layout, VkAttachmentDescription::finalLayout
for each attachment happen at times given by subpass dependencies.
Recall how we used an VkImageMemoryBarrier
to transition the layout of our textures during upload.
This is similar.
Each subpass dependency that involves an image layout transition positions the image layout transition at the synchronization point of the dependency.
So anything that happens before the dependency (that is, happens in the stages in srcStageMask
or transitively before something in those stages) happens before the transition and anything that happens after the dependency (i.e., executes in the stages in dstStageMask
or transitively after something in those stages) happens after the transition.
For memory synchronization, any accesses in the source access scope (srcAccessScope
) performed in the source stages resolve before the layout transition,
and any accesses in the destination access scope (dstAccessScope
) in the destination stages resolve after the layout transition.
With this in mind, the first dependency is saying "finish all work in the color attachment output stage, then do the layout transition, then start work in the color attachment output stage again". Further, it doesn't force any memory operations to complete before the transition, but it does make sure the layout transition [and load, since this is the first transition] is visible to operations that write to the image.
The easier dependency to talk about here is the second one in the list. It establishes that all existing work finishes the late fragment tests stage (the last point in the pipeline that touches the depth buffer), then the layout transition for the depth image happens, then the depth must finish before subpass zero of this render pass can do operations in its early fragment tests stage (the earliest stage that touches the depth buffer). The memory dependencies ensure that writes resolve before the layout transition (and that the layout transition finishes before the load operation for the image writes to it to clear it).
Cleaning Up
The clean-up code is a lot more straightforward:
The Command Pool
In the last section, we left ourselves some TODO
s relating to command pool creation and destruction.
But we already wrote the code we need; we can just copy (with some minor-renaming) from Helpers::create
and Helpers::destroy
:
And the destruction code:
And now, once again, the code should build and run without any validation errors or warnings.
The Swapchain Views and Framebuffers
The list of all images that might be rendered to is called the swapchain.
In our code, the RTG
class is responsible for maintaining these images,
but it is up to our Tutorial
code to maintain any additional buffers, and to package the swapchain images together with these additional buffers into VkFramebuffer
s to point the render pass at.
Whenever our code needs to re-build these framebuffers (e.g., if the window is resized), the Tutorial::on_swapchain
function is called by the RTG
's run function.
Let's first take a look at the RTG::SwapchainEvent
structure it receives as a parameter:
Let's sketch out the basic function of the on_swapchain
function:
This function's job is pretty simple: it re-makes the depth buffer at the correct size, gets a view of it (image views refer to parts of images, but in this case we're just taking the whole image), and then packages it up with each of the swapchain images to make a set of framebuffers -- one for each possible swapchain image.
The framebuffer clean-up we'll do by calling a function we are about to re-write that does that:
Depth image creation uses our image creation helper and the depth format we already determined during startup:
The depth image view references the entire depth image as a 2D texture with depth values:
The actual VkFramebuffer
creation is handled by a loop over the swapchain image views supplied in the parameter.
(Notice the use of std::vector::assign
as a handy way to set the size and contents of the swapchain_framebuffers
container.)
At this point the code should compile and run, though we really should get that clean-up function written too.
Celebrate!
And that's it. We can remove the refsol header as a final check:
The code should continue to compile and run.
The only remaining refsol::
usage in RTG.cpp
, and that's what we'll tackle in the next step.