Swapchain Wrangling

The next step on our journey to implement (and thus understand) all refsol:: code is in Tutorial.cpp.

By my count we have seven calls to replace, with the complicated calls being ones that deal with constructing framebuffers and associated resources.

The First Warm-Up: Workspace Constructor/Destructor

We begin with the code that allocates and frees the command buffer used in the render function. We've already written this code in Helpers.cpp, and we can copy-paste that over here.

First, allocation (copied from Helpers::create with some renaming):

in Tutorial.cpp
//in Tutorial::Tutorial:
	workspaces.resize(rtg.workspaces.size());
	for (Workspace &workspace : workspaces) {
		refsol::Tutorial_constructor_workspace(rtg, command_pool, &workspace.command_buffer);
		{ //allocate command buffer:
			VkCommandBufferAllocateInfo alloc_info{
				.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO,
				.commandPool = command_pool,
				.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY,
				.commandBufferCount = 1,
			};
			VK( vkAllocateCommandBuffers(rtg.device, &alloc_info, &workspace.command_buffer) );
		}
		//...
	}

Then the free code (copied from Helpers::destroy with some renaming):

in Tutorial.cpp
//in Tutorial::~Tutorial:
	for (Workspace &workspace : workspaces) {
		refsol::Tutorial_destructor_workspace(rtg, command_pool, &workspace.command_buffer);
		if (workspace.command_buffer != VK_NULL_HANDLE) {
			vkFreeCommandBuffers(rtg.device, command_pool, 1, &workspace.command_buffer);
			workspace.command_buffer = VK_NULL_HANDLE;
		}
		//...
	}

At this point the code should compile and run with no warnings or other problems. Great -- two of seven calls done.

Second Warm-Up: Render Submit

The second warm-up is refsol::Tutorial_render_submit in our render function. Now, we've seen vkQueueSubmit before, but this one is a bit unique in that it needs to wait on and signal semaphores as well as signal a fence.

in Tutorial.cpp
//in Tutorial::render:

	{ //submit `workspace.command buffer` for the GPU to run:
	refsol::Tutorial_render_submit(rtg, render_params, workspace.command_buffer);
		std::array< VkSemaphore, 1 > wait_semaphores{
			render_params.image_available
		};
		std::array< VkPipelineStageFlags, 1 > wait_stages{
			VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
		};
		static_assert(wait_semaphores.size() == wait_stages.size(), "every semaphore needs a stage");

		std::array< VkSemaphore, 1 > signal_semaphores{
			render_params.image_done
		};
		VkSubmitInfo submit_info{
			.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
			.waitSemaphoreCount = uint32_t(wait_semaphores.size()),
			.pWaitSemaphores = wait_semaphores.data(),
			.pWaitDstStageMask = wait_stages.data(),
			.commandBufferCount = 1,
			.pCommandBuffers = &workspace.command_buffer,
			.signalSemaphoreCount = uint32_t(signal_semaphores.size()),
			.pSignalSemaphores = signal_semaphores.data(),
		};

		VK( vkQueueSubmit(rtg.graphics_queue, 1, &submit_info, render_params.workspace_available) );
	}
}

What's going on with all these signals?

First, notice that there are two types of synchronization primitives at play here. The VkSemaphores are used to synchronize work between on-GPU workloads. Semaphores start in the un-signalled state and the work that signals them must be submitted before the work that waits on them; in other words: a GPU that only runs work in order must not deadlock. (Note that Vulkan has two types of semaphores -- binary semaphores, which are GPU-only; and timeline semaphores, which can also be used for GPU-CPU synchronization. We're talking about binary semaphores here.)

The VkFence is used to synchronize between GPU and CPU workloads. Fences start in whatever state you'd like, and -- for similar but probably more obvious reasons -- you shouldn't wait on a fence from a CPU thread before that thread submits the work that will signal that fence (though you could submit work to signal the fence from another CPU thread).

Now that we understand the types of the synchronization primitives, here are their purposes:

We'll see the Vulkan Window System Interface (WSI) code that handles submitting work to signal/wait on the semaphores, along with the RTG:: code that waits on the fence, in the next step.

For now, that's another reference solution function eliminated. The code should, again, compile and work without any problems here.

The Render Pass

I believe we already talked about the purpose of render passes back in Step 1. As a brief reminder, render passes tell the GPU how to sequence data to/from various images during rendering (and framebuffers actually provide references to those images). As you can now appreciate, this is very analogous to how pipeline layouts (and associated descriptor set layouts) describe the layout of global resources accessed during shader execution (while descriptor sets actually provide references to those resources).

So let's go ahead and write the constructor code that creates the render pass (and the command pool, which got rolled into the same refsol function not because it was related but because it made things more compact).

We'll actually begin by selecting the format for the depth buffer that will be used in the render pass:

in Tutorial.cpp
//in Tutorial::Tutorial:
	refsol::Tutorial_constructor(rtg, &depth_format, &render_pass, &command_pool);
	//select a depth format:
	//  (at least one of these two must be supported, according to the spec; but neither are required)
	depth_format = rtg.helpers.find_image_format(
		{ VK_FORMAT_D32_SFLOAT, VK_FORMAT_X8_D24_UNORM_PACK32 },
		VK_IMAGE_TILING_OPTIMAL,
		VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT
	);

	//TODO: create render pass

	//TODO: create command pool

Recall that we wrote the image format helper last step. This call checks which of two possible image formats -- VK_FORMAT_D32_SFLOAT or VK_FORMAT_X8_D24_UNORM_PACK32 -- can be used as a depth format on the current GPU. This is because the specification's required format support tables don't guarantee that both of these are supported, but do guarantee that at least one of them is. (Also, since our function returns the first matching format, the order here indicates that we have a preference for the floating point depth buffer over the 24-bit fixed-point depth buffer.)

Now the code to actually create the render pass. This is quite verbose because render passes are describing some subtle image transformation and synchronization information. We'll start with the vkCreateRenderPass call itself, which takes a list of attachments (descriptions of the images to be rendered to), a list of subpasses (descriptions of the steps in the rendering), and a list of dependencies (description of the synchronization required between the subpasses, as well as with external operations).

in Tutorial.cpp
//in Tutorial::Tutorial:
	{ //create render pass
		//TODO: attachments

		//TODO: subpass

		//TODO: dependencies

		VkRenderPassCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO,
			.attachmentCount = uint32_t(attachments.size()),
			.pAttachments = attachments.data(),
			.subpassCount = 1,
			.pSubpasses = &subpass,
			.dependencyCount = uint32_t(dependencies.size()),
			.pDependencies = dependencies.data(),
		};

		VK( vkCreateRenderPass(rtg.device, &create_info, nullptr, &render_pass) );
	}

For the list of attachments, we'll have both a color image and a depth image, with the formats determined by the output surface (we'll write the code that picks this next step) and the depth image finding we just wrote above, respectively.

in Tutorial.cpp
//in Tutorial::Tutorial:
	//TODO: attachments
	std::array< VkAttachmentDescription, 2 > attachments{
		VkAttachmentDescription{ //0 - color attachment:
			.format = rtg.surface_format.format,
			.samples = VK_SAMPLE_COUNT_1_BIT,
			.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR,
			.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
			.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
			.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
			.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
			.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
		},
		VkAttachmentDescription{ //1 - depth attachment:
			.format = depth_format,
			.samples = VK_SAMPLE_COUNT_1_BIT,
			.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR,
			.storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
			.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
			.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
			.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
			.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL,
		},
	};

Notice that for each attachment we define the format, but also how to actually load the data (.loadOp) before rendering happens, what layout (.initialLayout) the image will be transitioned to before the load, how to write the data back after rendering (.storeOp), and what layout (.finalLayout) the image will be transitioned to after the store.

These parameters are important because they allow, e.g., a GPU that has a special fast framebuffer memory to -- in our case -- not bother copying anything into this memory at the start of the render pass and to only copy the color buffer out of this memory at the end of the render pass, while discarding the depth buffer. (Though, admittedly, special fast framebuffer memory is -- I think -- more of a mobile GPU thing than a desktop GPU thing.)

Now the subpasses. Subpasses are parts of the rendering that can proceed (potentially) in parallel. They allow you to specify the memory dependencies of multi-pass rendering. For example, in a deferred renderer, you might have one subpass that renders position, normal, and material information out to several images, and a second subpass that reads these images and runs lighting and material computations. It may appear that these passes must proceed in series since the second depends on the first, but since the dependencies are all to the same pixels, a sufficiently clever GPU and driver could interleave the execution of the passes, working one "tile" of the framebuffer at at time, to avoid having to ever allocate a whole-screen-sized temporary buffer for the intermediate values.

For the tutorial, however, we only have one subpass:

in Tutorial.cpp
//in Tutorial::Tutorial:
	//TODO: subpass
	VkAttachmentReference color_attachment_ref{
		.attachment = 0,
		.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
	};

	VkAttachmentReference depth_attachment_ref{
		.attachment = 1,
		.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL,
	};

	VkSubpassDescription subpass{
		.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS,
		.inputAttachmentCount = 0,
		.pInputAttachments = nullptr,
		.colorAttachmentCount = 1,
		.pColorAttachments = &color_attachment_ref,
		.pDepthStencilAttachment = &depth_attachment_ref,
	};

Each subpass is equipped with a list of what attachments it reads from, writes to, and uses as a depth/stencil buffer.

Finally, the dependencies indicate how/when subpass computations can be overlapped, and how the computations for this renderpass can interleave with work that goes before or comes after it.

in Tutorial.cpp
//in Tutorial::Tutorial:
	//TODO: dependencies
//this defers the image load actions for the attachments:
	std::array< VkSubpassDependency, 2 > dependencies {
		VkSubpassDependency{
			.srcSubpass = VK_SUBPASS_EXTERNAL,
			.dstSubpass = 0,
			.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
			.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
			.srcAccessMask = 0,
			.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
		},
		VkSubpassDependency{
			.srcSubpass = VK_SUBPASS_EXTERNAL,
			.dstSubpass = 0,
			.srcStageMask = VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT,
			.dstStageMask = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT,
			.srcAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,
			.dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,
		}
	};

These dependencies are actually surprisingly tricky to parse (and get right!), so I'll attempt to walk through it carefully. In each case, a memory dependency (which is a happens-before relationship + a guarantee the writes from before can be read by reads after) is created between operations that do a certain types of accesses in certain pipeline stages in certain subpasses (or, VK_SUBPASS_EXTERNAL, which just means "everything before" / "everything after" this render pass).

An important note is that the layout transitions between the initial layout, VkAttachmentDescription::initialLayout; per-subpass layout, VkAttachmentReference::layout; and final layout, VkAttachmentDescription::finalLayout for each attachment happen at times given by subpass dependencies. Recall how we used an VkImageMemoryBarrier to transition the layout of our textures during upload. This is similar.

Each subpass dependency that involves an image layout transition positions the image layout transition at the synchronization point of the dependency. So anything that happens before the dependency (that is, happens in the stages in srcStageMask or transitively before something in those stages) happens before the transition and anything that happens after the dependency (i.e., executes in the stages in dstStageMask or transitively after something in those stages) happens after the transition. For memory synchronization, any accesses in the source access scope (srcAccessScope) performed in the source stages resolve before the layout transition, and any accesses in the destination access scope (dstAccessScope) in the destination stages resolve after the layout transition.

With this in mind, the first dependency is saying "finish all work in the color attachment output stage, then do the layout transition, then start work in the color attachment output stage again". Further, it doesn't force any memory operations to complete before the transition, but it does make sure the layout transition [and load, since this is the first transition] is visible to operations that write to the image.

The easier dependency to talk about here is the second one in the list. It establishes that all existing work finishes the late fragment tests stage (the last point in the pipeline that touches the depth buffer), then the layout transition for the depth image happens, then the depth must finish before subpass zero of this render pass can do operations in its early fragment tests stage (the earliest stage that touches the depth buffer). The memory dependencies ensure that writes resolve before the layout transition (and that the layout transition finishes before the load operation for the image writes to it to clear it).

Cleaning Up

The clean-up code is a lot more straightforward:

in Tutorial.cpp
//in Tutorial::~Tutorial:
	refsol::Tutorial_destructor(rtg, &render_pass, &command_pool);
	//TODO: destroy command pool

	if (render_pass != VK_NULL_HANDLE) {
		vkDestroyRenderPass(rtg.device, render_pass, nullptr);
		render_pass = VK_NULL_HANDLE;
	}

The Command Pool

In the last section, we left ourselves some TODOs relating to command pool creation and destruction. But we already wrote the code we need; we can just copy (with some minor-renaming) from Helpers::create and Helpers::destroy:

in Tutorial.cpp
//in Tutorial::Tutorial:
	{ //create command pool
		VkCommandPoolCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,
			.flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT,
			.queueFamilyIndex = rtg.graphics_queue_family.value(),
		};
		VK( vkCreateCommandPool(rtg.device, &create_info, nullptr, &command_pool) );
	}

And the destruction code:

in Tutorial.cpp
//in Tutorial::~Tutorial:
	//TODO: destroy command pool
	if (command_pool != VK_NULL_HANDLE) {
		vkDestroyCommandPool(rtg.device, command_pool, nullptr);
		command_pool = VK_NULL_HANDLE;
	}

And now, once again, the code should build and run without any validation errors or warnings.

The Swapchain Views and Framebuffers

The list of all images that might be rendered to is called the swapchain. In our code, the RTG class is responsible for maintaining these images, but it is up to our Tutorial code to maintain any additional buffers, and to package the swapchain images together with these additional buffers into VkFramebuffers to point the render pass at.

Whenever our code needs to re-build these framebuffers (e.g., if the window is resized), the Tutorial::on_swapchain function is called by the RTG's run function. Let's first take a look at the RTG::SwapchainEvent structure it receives as a parameter:

in RTG.hpp
//parameters passed to Application::on_swapchain() when swapchain is [re]created:
// (these can also be accessed on the rtg directly but the package puts them in a convenient spot)
struct SwapchainEvent {
	VkExtent2D const &extent; //swapchain extent
	std::vector< VkImage > const &images; //swapchain images
	std::vector< VkImageView > const &image_views; //swapchain image views
};

Let's sketch out the basic function of the on_swapchain function:

in Tutorial.cpp
void Tutorial::on_swapchain(RTG &rtg_, RTG::SwapchainEvent const &swapchain) {
	//[re]create framebuffers:
	refsol::Tutorial_on_swapchain(rtg, swapchain, depth_format, render_pass, &swapchain_depth_image, &swapchain_depth_image_view, &swapchain_framebuffers);
	//TODO: clean up existing framebuffers

	//TODO: allocate depth image for framebuffers to share

	//TODO: create an image view of the depth image

	//TODO: create framebuffers pointing to each swapchain image view and the shared depth image view
}

This function's job is pretty simple: it re-makes the depth buffer at the correct size, gets a view of it (image views refer to parts of images, but in this case we're just taking the whole image), and then packages it up with each of the swapchain images to make a set of framebuffers -- one for each possible swapchain image.

The framebuffer clean-up we'll do by calling a function we are about to re-write that does that:

in Tutorial.cpp
void Tutorial::on_swapchain(RTG &rtg_, RTG::SwapchainEvent const &swapchain) {
	//clean up existing framebuffers (and depth image):
	if (swapchain_depth_image.handle != VK_NULL_HANDLE) {
		destroy_framebuffers();
	}

	//...
}

Depth image creation uses our image creation helper and the depth format we already determined during startup:

in Tutorial.cpp
void Tutorial::on_swapchain(RTG &rtg_, RTG::SwapchainEvent const &swapchain) {
	//...

	//Allocate depth image for framebuffers to share:
	swapchain_depth_image = rtg.helpers.create_image(
		swapchain.extent,
		depth_format,
		VK_IMAGE_TILING_OPTIMAL,
		VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT,
		VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
		Helpers::Unmapped
	);

	//...
}

The depth image view references the entire depth image as a 2D texture with depth values:

in Tutorial.cpp
void Tutorial::on_swapchain(RTG &rtg_, RTG::SwapchainEvent const &swapchain) {
	//...

	{ //create depth image view:
		VkImageViewCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
			.image = swapchain_depth_image.handle,
			.viewType = VK_IMAGE_VIEW_TYPE_2D,
			.format = depth_format,
			.subresourceRange{
				.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT,
				.baseMipLevel = 0,
				.levelCount = 1,
				.baseArrayLayer = 0,
				.layerCount = 1
			},
		};

		VK( vkCreateImageView(rtg.device, &create_info, nullptr, &swapchain_depth_image_view) );
	}

	//...
}

The actual VkFramebuffer creation is handled by a loop over the swapchain image views supplied in the parameter. (Notice the use of std::vector::assign as a handy way to set the size and contents of the swapchain_framebuffers container.)

in Tutorial.cpp
void Tutorial::on_swapchain(RTG &rtg_, RTG::SwapchainEvent const &swapchain) {
	//...

	//Make framebuffers for each swapchain image:
	swapchain_framebuffers.assign(swapchain.image_views.size(), VK_NULL_HANDLE);
	for (size_t i = 0; i < swapchain.image_views.size(); ++i) {
		std::array< VkImageView, 2 > attachments{
			swapchain.image_views[i],
			swapchain_depth_image_view,
		};
		VkFramebufferCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO,
			.renderPass = render_pass,
			.attachmentCount = uint32_t(attachments.size()),
			.pAttachments = attachments.data(),
			.width = swapchain.extent.width,
			.height = swapchain.extent.height,
			.layers = 1,
		};

		VK( vkCreateFramebuffer(rtg.device, &create_info, nullptr, &swapchain_framebuffers[i]) );
	}
}

At this point the code should compile and run, though we really should get that clean-up function written too.

in Tutorial.cpp
void Tutorial::destroy_framebuffers() {
	refsol::Tutorial_destroy_framebuffers(rtg, &swapchain_depth_image, &swapchain_depth_image_view, &swapchain_framebuffers);

	for (VkFramebuffer &framebuffer : swapchain_framebuffers) {
		assert(framebuffer != VK_NULL_HANDLE);
		vkDestroyFramebuffer(rtg.device, framebuffer, nullptr);
		framebuffer = VK_NULL_HANDLE;
	}
	swapchain_framebuffers.clear();

	assert(swapchain_depth_image_view != VK_NULL_HANDLE);
	vkDestroyImageView(rtg.device, swapchain_depth_image_view, nullptr);
	swapchain_depth_image_view = VK_NULL_HANDLE;

	rtg.helpers.destroy_image(std::move(swapchain_depth_image));

}
The theme of clean-up code being easier to write continues.

Celebrate!

And that's it. We can remove the refsol header as a final check:

in Tutorial.cpp
//...

#include "Tutorial.hpp"
#include "refsol.hpp"

//...

The code should continue to compile and run.

The only remaining refsol:: usage in RTG.cpp, and that's what we'll tackle in the next step.