Rendering a Grid

Now that we have an interesting background layer, let's actually render something in the foreground. Nothing is quite as classic rasterized graphics as a grid, so we'll render one of those.

This will require building a pipeline that renders lines, as well as handling the memory transfers required to get vertices for those lines to the GPU. We'll finish up by also sending a clip-from-local transformation matrix to our shader so we can fly a camera around our lines.

The Pipeline

We're going to base the lines pipeline on our existing background pipeline. Before we copy the background pipeline, let's replace the refsol:: code being used in the BackgroundPipeline::destroy() function:

in Tutorial-BackgroundPipeline.cpp
//...
#include "refsol.hpp"
//...

void Tutorial::BackgroundPipeline::destroy(RTG &rtg) {
	refsol::BackgroundPipeline_destroy(rtg, &layout, &handle);
	if (layout != VK_NULL_HANDLE) {
		vkDestroyPipelineLayout(rtg.device, layout, nullptr);
		layout = VK_NULL_HANDLE;
	}

	if (handle != VK_NULL_HANDLE) {
		vkDestroyPipeline(rtg.device, handle, nullptr);
		handle = VK_NULL_HANDLE;
	}
}

Build and run the code now and things should behave just as before (and with no complaints in the console from the validation layer about things not being properly destroyed).

Copying the Pipeline Declaration

Let's start by copying and modifying the Tutorial::BackgroundPipeline structure to make a new Tutorial::LinesPipeline structure. Put it just under the background pipeline declaration in Tutorial.hpp:

in Tutorial.hpp
	struct LinesPipeline {
		//no descriptor set layouts (yet)

		//no push constants

		VkPipelineLayout layout = VK_NULL_HANDLE;

		//no vertex bindings (yet)

		VkPipeline handle = VK_NULL_HANDLE;

		void create(RTG &, VkRenderPass render_pass, uint32_t subpass);
		void destroy(RTG &);
	} lines_pipeline;
Declaration for our new pipeline, complete with foreshadowing comments.

And go ahead and calls to the create and destroy calls in Tutorial.cpp:

in Tutorial.cpp
//in Tutorial::Tutorial:

	background_pipeline.create(rtg, render_pass, 0);
	lines_pipeline.create(rtg, render_pass, 0);

//in Tutorial::~Tutorial:

	background_pipeline.destroy(rtg);
	lines_pipeline.destroy(rtg);

Compiling the source at this point should work, but linking should fail because of the missing function definitions.

Copying the Pipeline Declaration

Edit the Maekfile.js to build our soon-to-be-created lines pipeline:

Copy Tutorial-BackgroundPipeline.cpp to Tutorial-LinesPipeline.cpp, and add it to the build:

in Maekfile.js
//uncomment to build lines shaders and pipeline:
const lines_shaders = [
	maek.GLSLC('lines.vert'),
	maek.GLSLC('lines.frag'),
];
main_objs.push( maek.CPP('Tutorial-LinesPipeline.cpp', undefined, { depends:[...lines_shaders] } ) );

Edit Tutorial-LinesPipeline.cpp to adapt it to our new purposes.

Load the correct shaders (hmm, we should write those soon):

in Tutorial-LinesPipeline.cpp
static uint32_t vert_code[] =
#include "spv/lines.vert.inl"
;

static uint32_t frag_code[] =
#include "spv/lines.frag.inl"
;

Change the create function's structure name:

in Tutorial-LinesPipeline.cpp
void Tutorial::LinesPipeline::create(RTG &rtg, VkRenderPass render_pass, uint32_t subpass) {

We're eventually going to use a descriptor set to pass data to this pipeline. For now, update the pipeline layout to not include any push constants:

in Tutorial-LinesPipeline.cpp
	{ //create pipeline layout:
		VkPushConstantRange range{
			.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT,
			.offset = 0,
			.size = sizeof(Push),
		};

		VkPipelineLayoutCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
			.setLayoutCount = 0,
			.pSetLayouts = nullptr,
			.pushConstantRangeCount = 0,
			.pPushConstantRanges = nullptr,
		};

		VK( vkCreatePipelineLayout(rtg.device, &create_info, nullptr, &layout) );
	}

Update the input assembly state to reflect the fact that the lines pipeline will draw lines:

in Tutorial-LinesPipeline.cpp
		//this pipeline will draw lines:
		VkPipelineInputAssemblyStateCreateInfo input_assembly_state{
			.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,
			.topology = VK_PRIMITIVE_TOPOLOGY_LINE_LIST,
			.primitiveRestartEnable = VK_FALSE
		};

Enable the depth test:

in Tutorial-LinesPipeline.cpp
		//depth test will be less, and stencil test will be disabled:
		VkPipelineDepthStencilStateCreateInfo depth_stencil_state{
			.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO,
			.depthTestEnable = VK_TRUE,
			.depthWriteEnable = VK_TRUE,
			.depthCompareOp = VK_COMPARE_OP_LESS,
			.depthBoundsTestEnable = VK_FALSE,
			.stencilTestEnable = VK_FALSE,
		};

And, finally, remember to change the structure name for the destroy function:

in Tutorial-LinesPipeline.cpp
void Tutorial::LinesPipeline::destroy(RTG &rtg) {

The Shaders

Now the only thing standing between us and compiling the code is the lack of shader programs. So let's write those.

The lines pipeline vertex shader will copy the position supplied as a vertex attribute into the gl_Position output and pass the color supplied as a vertex attribute onward to the fragment shader:

in lines.vert (new file)
#version 450

layout(location=0) in vec3 Position;
layout(location=1) in vec4 Color;

layout(location=0) out vec4 color;

void main() {
	gl_Position = vec4(Position, 1.0);
	color = Color;
}

Recall from the background pipeline that layout(location=N) decorations are how varying values are matched up between the vertex and fragment shaders. This is also the case for vertex shader -- the location decorators are how our CPU-side code will assign streams of data to each of the vertex shader inputs.

By the way, in the shaders for this class, I'll tend to use Uppercase variables for attributes (vertex shader stream inputs), lowercase variables for varyings (vertex shader outputs / fragment shader inputs), and -- with some exceptions -- SHOUTYCASE variables for uniforms (global inputs). Most of the time the shaders will be so simple it won't particularly matter, but it's nice to have a convention for better at-a-glance understanding of code.

The fragment shader will write its input color to the output:

in lines.frag (new file)
#version 450

layout(location=0) in vec4 color;

layout(location=0) out vec4 outColor;

void main() {
	outColor = color;
}

Compiling and running the code now should result in the vulkan validation layer warning in the console about missing vertex attribute descriptions in the vertex input state. This is because our vertex shader expects inputs but our pipeline creation code hasn't provided any information about how to get those inputs.

A Vertex

A vertex can hold whatever data you want. In our background pipeline, it held nothing at all. For this pipeline -- as you've already seen in the shader code -- the vertex will have both position and color attributes.

Let's define a vertex structure. You'll generally be using the same vertex formats across different pipelines, so we're going to make a new header and C++ file for this vertex structure.

We'll begin with the layout itself -- a 3-vector of floating point numbers for the position and a 4-vector of 8-bit unsigned integers to store an RGBA color.

in PosColVertex.hpp (new file)
#pragma once

#include <vulkan/vulkan_core.h>

#include <cstdint>

struct PosColVertex {
	struct { float x,y,z; } Position;
	struct { uint8_t r,g,b,a; } Color;
};

static_assert(sizeof(PosColVertex) == 3*4 + 4*1, "PosColVertex is packed.");

The static_assert is here just to make sure that the structure's layout in memory is as we expect (no padding).

We'll also go ahead and associate an VkPipelineVertexInputStateCreateInfo structure with this vertex type to make it easier to instantiate a pipeline that uses an stream of PosColVertex as input:

in PosColVertex.hpp
struct PosColVertex {
	struct { float x,y,z; } Position;
	struct { uint8_t r,g,b,a; } Color;
	//a pipeline vertex input state that works with a buffer holding a PosColVertex[] array:
	static const VkPipelineVertexInputStateCreateInfo array_input_state;
};

static_assert(sizeof(PosColVertex) == 3*4 + 4*1, "PosColVertex is packed.");

Now make a PosColVertex.cpp so we can define array_input_state:

in PosColVertex.cpp (new file)
#include "PosColVertex.hpp"

#include <array>

static std::array< VkVertexInputBindingDescription, 1 > bindings{
	VkVertexInputBindingDescription{
		.binding = 0,
		.stride = sizeof(PosColVertex),
		.inputRate = VK_VERTEX_INPUT_RATE_VERTEX,
	}
};

static std::array< VkVertexInputAttributeDescription, 2 > attributes{
	VkVertexInputAttributeDescription{
		.location = 0,
		.binding = 0,
		.format = VK_FORMAT_R32G32B32_SFLOAT,
		.offset = offsetof(PosColVertex, Position),
	},
	VkVertexInputAttributeDescription{
		.location = 1,
		.binding = 0,
		.format = VK_FORMAT_R8G8B8A8_UNORM,
		.offset = offsetof(PosColVertex, Color),
	},
};

const VkPipelineVertexInputStateCreateInfo PosColVertex::array_input_state{
	.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
	.vertexBindingDescriptionCount = uint32_t(bindings.size()),
	.pVertexBindingDescriptions = bindings.data(),
	.vertexAttributeDescriptionCount = uint32_t(attributes.size()),
	.pVertexAttributeDescriptions = attributes.data(),
};

This defines a pipeline vertex input that takes data from one binding (location in GPU memory). There are two attribute streams that are assembled from that binding: the stream at location 0 is made of 3-vectors of 32-bit floating point values; and the stream at location 1 is made of 4-vectors of 8-bit unsigned integer values, which will be treated as "normalized values" (values between 0.0 and 1.0) by dividing by 255.0 .

Add the PosColVertex files to the Maekfile.js:

in Maekfile.js
const main_objs = [
	maek.CPP('Tutorial.cpp'),
	maek.CPP('PosColVertex.cpp'),
	maek.CPP('RTG.cpp'),
	maek.CPP('Helpers.cpp'),
	maek.CPP('main.cpp'),
];

Now is a good time to check that everything builds okay. Once you've sorted out any typos, let's tell our lines pipeline about the vertex format it will be using. We'll do this with a using (a type alias) in the LinesPipeline structure:

in Tutorial.hpp
#pragma once

#include "PosColVertex.hpp"

#include "RTG.hpp"

//...

	struct LinesPipeline {
		//no descriptor set layouts (yet)

		//no push constants

		VkPipelineLayout layout = VK_NULL_HANDLE;

		using Vertex = PosColVertex;
		
		VkPipeline handle = VK_NULL_HANDLE;

		void create(RTG &, VkRenderPass render_pass, uint32_t subpass);
		void destroy(RTG &);
	} lines_pipeline;

//...

And we'll update the lines pipeline creation code to use the input state structure we've conveniently already created:

in Tutorial-LinesPipeline.cpp
//...
		//this pipeline will take no per-vertex inputs:
		VkPipelineVertexInputStateCreateInfo vertex_input_state{
			.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,
			.vertexBindingDescriptionCount = 0,
			.pVertexBindingDescriptions = nullptr,
			.vertexAttributeDescriptionCount = 0,
			.pVertexAttributeDescriptions = nullptr,
		};
//...
		//all of the above structures get bundled together into one very large create_info:
		VkGraphicsPipelineCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,
			.stageCount = uint32_t(stages.size()),
			.pStages = stages.data(),
			.pVertexInputState = &Vertex::array_input_state,
			.pInputAssemblyState = &input_assembly_state,
			.pViewportState = &viewport_state,
			.pRasterizationState = &rasterization_state,
			.pMultisampleState = &multisample_state,
			.pDepthStencilState = &depth_stencil_state,
			.pColorBlendState = &color_blend_state,
			.pDynamicState = &dynamic_state,
			.layout = layout,
			.renderPass = render_pass,
			.subpass = subpass,
		};
//...
Amazingly enough, an edit that makes our pipeline definition (slightly) shorter.

If you compile and run the code now, you'll see that the warning about vertex attributes is no longer displayed, now that our code is now supplying an input state definition that matches the vertex shader's inputs.

A List of Vertices

We know the type of our vertices, but we haven't actually created any vertices yet.

Vulkan gives us a lot of flexibility in how we choose to store and transfer vertices. What method we choose depends on how we plan to write and render the vertices.

For this grid, we're going to generate the vertices CPU-side every frame and stream them to the GPU for drawing. This would be inefficient for large, static objects, but it is exactly what we'd want if we were (e.g.) making a list of lines every frame to display debugging info in our 3D scene.

Generating a Grid

Let's start with code to generate the vertex stream. We'll need a vector< PosColVertex > to write vertices into every frame:

in Tutorial.hpp
	//--------------------------------------------------------------------
	//Resources that change when time passes or the user interacts:

	virtual void update(float dt) override;
	virtual void on_input(InputEvent const &) override;

	float time = 0.0f;

	std::vector< LinesPipeline::Vertex > lines_vertices;

And we'll put code in Tutorial::update to generate some vertices -- in this case, a simple "x" shape for testing:

in Tutorial.cpp
void Tutorial::update(float dt) {
	time = std::fmod(time + dt, 60.0f);

	//make an 'x':
	lines_vertices.clear();
	lines_vertices.reserve(4);
	lines_vertices.emplace_back(PosColVertex{
		.Position{ .x = -1.0f, .y = -1.0f, .z = 0.0f },
		.Color{ .r = 0xff, .g = 0xff, .b = 0xff, .a = 0xff }
	});
	lines_vertices.emplace_back(PosColVertex{
		.Position{ .x =  1.0f, .y =  1.0f, .z = 0.0f },
		.Color{ .r = 0xff, .g = 0x00, .b = 0x00, .a = 0xff }
	});
	lines_vertices.emplace_back(PosColVertex{
		.Position{ .x = -1.0f, .y =  1.0f, .z = 0.0f },
		.Color{ .r = 0x00, .g = 0x00, .b = 0xff, .a = 0xff }
	});
	lines_vertices.emplace_back(PosColVertex{
		.Position{ .x =  1.0f, .y = -1.0f, .z = 0.0f },
		.Color{ .r = 0x00, .g = 0x00, .b = 0xff, .a = 0xff }
	});
	assert(lines_vertices.size() == 4);
}

Now to actually get these lines into GPU memory.

Memory Wrangling

Vulkan has a nuanced way of talking about memory. Not all memory in Vulkan is equivalent. Instead, Vulkan gives you a way to ask for different types of memory with different features (and, potentially, allocated from different memory heaps).

When you've allocated memory through Vulkan you receive a VkDeviceMemory handle. This is an opaque handle that is useless to both the CPU and the GPU unless you do one (or more) of three things to get a more useful view of the memory:

As such, memory allocation in Vulkan is generally a multi-step process. First, you figure out what you want to do with the memory; you use that to figure out how much device memory of what type you need; you allocate that memory (ideally, from a larger slab you've already allocated, it's not efficient to ask the Vulkan driver to manage many small allocations); and finally you bind and/or map it as needed.

In our case, we have some helper functions and structures to manage this process in Helpers.hpp, as you'll see in a moment. (And, yes, you'll re-write these eventually.)

Actually talking about our code now

To get our vertices from a CPU-side heap-allocated array into GPU device memory we're going to do a pair of copies. First, we'll have the CPU copy the vertices into a VkBuffer whose backing VkDeviceMemory is mapped into the CPU's address space; then we'll have the GPU copy the memory into a VkBuffer in device memory that is suitable for using as a vertex attribute source.

We'll allocate these buffers per-workspace to avoid race conditions. (Where, e.g., the CPU is computing vertices for the next frame into a buffer that the GPU hasn't finished copying out of for the previous frame.)

in Tutorial.hpp
	//workspaces hold per-render resources:
	struct Workspace {
		VkCommandBuffer command_buffer = VK_NULL_HANDLE; //from the command pool above; reset at the start of every render.
		
		//location for lines data: (streamed to GPU per-frame)
		Helpers::AllocatedBuffer lines_vertices_src; //host coherent; mapped
		Helpers::AllocatedBuffer lines_vertices; //device-local

	};

We'll write code to actually allocate these buffers later, but to start with lets make sure they get cleaned up when the application is finished but adding a call to Helpers::destroy_buffer in the per-Workspace part of Tutorial's destructor:

in Tutorial.cpp
//in Tutorial::~Tutorial:
for (Workspace &workspace : workspaces) {
	refsol::Tutorial_destructor_workspace(rtg, command_pool, &workspace.command_buffer);

	if (workspace.lines_vertices_src.handle != VK_NULL_HANDLE) {
		rtg.helpers.destroy_buffer(std::move(workspace.lines_vertices_src));
	}
	if (workspace.lines_vertices.handle != VK_NULL_HANDLE) {
		rtg.helpers.destroy_buffer(std::move(workspace.lines_vertices));
	}
}
workspaces.clear();
Cleaning up the per-workspace lines buffers in Tutorial::~Tutorial.

Now we'll add code in the render function to resize the lines buffers if needed. To start with, we'll just compute how many bytes of buffer are needed and -- if that number is exceeded -- a nice size to reallocate to:

in Tutorial.cpp
//in Tutorial::render:

{ //begin recording:
	//...
}

if (!lines_vertices.empty()) { //upload lines vertices:
	//[re-]allocate lines buffers if needed:
	size_t needed_bytes = lines_vertices.size() * sizeof(lines_vertices[0]);
	if (workspace.lines_vertices_src.handle == VK_NULL_HANDLE || workspace.lines_vertices_src.size < needed_bytes) {
		//round to next multiple of 4k to avoid re-allocating continuously if vertex count grows slowly:
		size_t new_bytes = ((needed_bytes + 4096) / 4096) * 4096;

		//TODO

		std::cout << "Re-allocated lines buffers to " << new_bytes << " bytes." << std::endl;
	}

	assert(workspace.lines_vertices_src.size == workspace.lines_vertices.size);
	assert(workspace.lines_vertices_src.size >= needed_bytes);
}

{ //render pass
	//...
}

Now clean-up code for the buffers if they are already allocated:

in Tutorial.cpp
//round to next multiple of 4k to avoid re-allocating continuously if vertex count grows slowly:
size_t new_bytes = ((needed_bytes + 4096) / 4096) * 4096;
if (workspace.lines_vertices_src.handle) {
	rtg.helpers.destroy_buffer(std::move(workspace.lines_vertices_src));
}
if (workspace.lines_vertices.handle) {
	rtg.helpers.destroy_buffer(std::move(workspace.lines_vertices));
}

//TODO

std::cout << "Re-allocated lines buffers to " << new_bytes << " bytes." << std::endl;

And, finally, the actual allocation. Notice that we're supplying the allocation helper with both a VkBufferUsageFlags to indicate what we will do with each buffer and a VkMemoryPropertyFlags to indicate properties of the memory to allocate it in.

in Tutorial.cpp
size_t new_bytes = ((needed_bytes + 4096) / 4096) * 4096;
if (workspace.lines_vertices_src.handle) {
	rtg.helpers.destroy_buffer(std::move(workspace.lines_vertices_src));
}
if (workspace.lines_vertices.handle) {
	rtg.helpers.destroy_buffer(std::move(workspace.lines_vertices));
}
//TODO
workspace.lines_vertices_src = rtg.helpers.create_buffer(
	new_bytes,
	VK_BUFFER_USAGE_TRANSFER_SRC_BIT, //going to have GPU copy from this memory
	VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, //host-visible memory, coherent (no special sync needed)
	Helpers::Mapped //get a pointer to the memory
);
workspace.lines_vertices = rtg.helpers.create_buffer(
	new_bytes,
	VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, //going to use as vertex buffer, also going to have GPU into this memory
	VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, //GPU-local memory
	Helpers::Unmapped //don't get a pointer to the memory
);

std::cout << "Re-allocated lines buffers to " << new_bytes << " bytes." << std::endl;

We are allocating lines_vertices_src to use as a "staging buffer" -- the buffer that we copy a frame's lines data into using the CPU, before having the GPU transfer the data to the other buffer. Therefore, we pass VK_BUFFER_USAGE_TRANSFER_SRC_BIT for buffer usage (we plan to have the GPU copy data from it); we request memory that is both VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT (the memory can be mapped from the CPU side) and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT (the memory doesn't require special flush operations to make host writes available) so that it will be easy for us to copy into the memory on the CPU side; and we ask the allocation helper to map the memory (put it somewhere in the CPU address space) by passing Helpers::Mapped as the last parameter.

We are allocating lines_vertices to use as the GPU-side vertex buffer, and to receive a copy of the data held in the lines src buffer. Therefore, we pass both VK_BUFFER_USAGE_VERTEX_BUFFER_BIT (use as a vertex buffer) and VK_BUFFER_USAGE_TRANSFER_DST_BIT (use as the target of a memory copy) for usage flags; we request memory that is VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT (actually on the GPU); and we don't ask the allocation helper to map the memory.

When you compile and run this code, you'll notice that the message Re-allocated lines buffers to 4096 bytes. is printed twice. Think about this for a moment to make sure you understand why this is not a bug.

Copying Vertices to the GPU

Now that our buffers are large enough, it's time to copy data to the GPU. To do this we will first use the CPU to copy from the lines_vertices vector to the workspace.lines_vertices_src staging buffer; and then record a command to have the GPU copy the data from the staging buffer to the workspace.lines_vertices buffer.

The CPU-side copy is easy since the staging buffer is mapped. We can just use std::memcpy:

in Tutorial.cpp
//...
	std::cout << "Re-allocated lines buffers to " << new_bytes << " bytes." << std::endl;
	}

	assert(workspace.lines_vertices_src.size == workspace.lines_vertices.size);
	assert(workspace.lines_vertices_src.size >= needed_bytes);

	//host-side copy into lines_vertices_src:
	assert(workspace.lines_vertices_src.allocation.mapped);
	std::memcpy(workspace.lines_vertices_src.allocation.data(), lines_vertices.data(), needed_bytes);

}

The command to have the GPU do the host-to-GPU copy is also refreshingly straightforward:

in Tutorial.cpp
	//host-side copy into lines_vertices_src:
	assert(workspace.lines_vertices_src.allocation.mapped);
	std::memcpy(workspace.lines_vertices_src.allocation.data(), lines_vertices.data(), needed_bytes);

	//device-side copy from lines_vertices_src -> lines_vertices:
	VkBufferCopy copy_region{
		.srcOffset = 0,
		.dstOffset = 0,
		.size = needed_bytes,
	};
	vkCmdCopyBuffer(workspace.command_buffer, workspace.lines_vertices_src.handle, workspace.lines_vertices.handle, 1, &copy_region);
}

Note that the srcOffset and dstOffset members of the VkBufferCopy are offsets into the buffers, not their allocations. This, despite our aside about offsets earlier.

Making Sure The Copy Finishes

You might be thinking at this point that copying data to the GPU is remarkably uncomplicated for a Vulkan task. And you're partly right. Starting the copy is very straightfoward! Making sure the copy finishes before the GPU that runs other commands that depend on it is more complicated.

Add this code before the render pass begins:

in Tutorial.cpp
{ //render pass
	//...
	vkCmdCopyBuffer(workspace.command_buffer, workspace.lines_vertices_src.handle, workspace.lines_vertices.handle, 1, &copy_region);
}

{ //memory barrier to make sure copies complete before rendering happens:
	VkMemoryBarrier memory_barrier{
		.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER,
		.srcAccessMask = VK_ACCESS_MEMORY_WRITE_BIT,
		.dstAccessMask = VK_ACCESS_MEMORY_READ_BIT,
	};

	vkCmdPipelineBarrier( workspace.command_buffer,
		VK_PIPELINE_STAGE_TRANSFER_BIT, //srcStageMask
		VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, //dstStageMask
		0, //dependencyFlags
		1, &memory_barrier, //memoryBarriers (count, data)
		0, nullptr, //bufferMemoryBarriers (count, data)
		0, nullptr //imageMemoryBarriers (count, data)
	);
}

{ //render pass
	std::array< VkClearValue, 2 > clear_values{
	//...

The vkCmdPipelineBarrier command establishes a memory dependency between any operation in the srcStageMask doing any memory operation in the srcAccessMask before the barrier command and any operation in the dstStageMask doing any memory operation in the dstAccessMask after the command. In this case, this means that any memory writes done by transfer commands before the barrier (like the copy we just wrote!) must be visible to any memory reads in the vertex input stage of any piplines run after the barrier (like the draw we're about to write!).

Drawing

We've got some lines uploaded to the GPU, but we haven't actually asked the GPU to do anything with them yet. Let's fix that.

in Tutorial.cpp
	{ //draw with the background pipeline:
			vkCmdBindPipeline(workspace.command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, background_pipeline.handle);

			{ //push time:
				BackgroundPipeline::Push push{
					.time = float(time),
				};
				vkCmdPushConstants(workspace.command_buffer, background_pipeline.layout, VK_SHADER_STAGE_FRAGMENT_BIT, 0, sizeof(push), &push);
			}

			vkCmdDraw(workspace.command_buffer, 3, 1, 0, 0);
	}

	{ //draw with the lines pipeline:
		vkCmdBindPipeline(workspace.command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, lines_pipeline.handle);

		{ //use lines_vertices (offset 0) as vertex buffer binding 0:
			std::array< VkBuffer, 1 > vertex_buffers{ workspace.lines_vertices.handle };
			std::array< VkDeviceSize, 1 > offsets{ 0 };
			vkCmdBindVertexBuffers(workspace.command_buffer, 0, uint32_t(vertex_buffers.size()), vertex_buffers.data(), offsets.data());
		}

		//draw lines vertices:
		vkCmdDraw(workspace.command_buffer, uint32_t(lines_vertices.size()), 1, 0, 0);
	}

	vkCmdEndRenderPass(workspace.command_buffer);

This code is very similar to the commands used to run the background pipeline. Of course, we aren't sending any push constants; and we have a vertex buffer, so we record the state command vkCmdBindVertexBuffers in order to bind it for use in the pipeline. Finally, we record a draw command with the number of vertices we've uploaded.

Compiling and running the code, we get to see our "x" in pixels:

tutorial app showing colorful background and boring 'x' of lines
These foreground lines are not keeping up with our fancy background.

Of course this simple "x" isn't doing much except showing us that (-1,-1) is indeed the upper left of normalized device coordinate space. Let's see if we can do a bit more and test that depth buffering is working:

in Tutorial.cpp
//in Tutorial::update

//make an 'x':
lines_vertices.clear();
//... various emplace_backs ...
assert(lines_vertices.size() == 4);
{ //make some crossing lines at different depths:
	lines_vertices.clear();
	constexpr size_t count = 2 * 30 + 2 * 30;
	lines_vertices.reserve(count);
	//horizontal lines at z = 0.5f:
	for (uint32_t i = 0; i < 30; ++i) {
		float y = (i + 0.5f) / 30.0f * 2.0f - 1.0f;
		lines_vertices.emplace_back(PosColVertex{
			.Position{.x = -1.0f, .y = y, .z = 0.5f},
			.Color{ .r = 0xff, .g = 0xff, .b = 0x00, .a = 0xff},
		});
		lines_vertices.emplace_back(PosColVertex{
			.Position{.x = 1.0f, .y = y, .z = 0.5f},
			.Color{ .r = 0xff, .g = 0xff, .b = 0x00, .a = 0xff},
		});
	}
	//vertical lines at z = 0.0f (near) through 1.0f (far):
	for (uint32_t i = 0; i < 30; ++i) {
		float x = (i + 0.5f) / 30.0f * 2.0f - 1.0f;
		float z = (i + 0.5f) / 30.0f;
		lines_vertices.emplace_back(PosColVertex{
			.Position{.x = x, .y =-1.0f, .z = z},
			.Color{ .r = 0x44, .g = 0x00, .b = 0xff, .a = 0xff},
		});
		lines_vertices.emplace_back(PosColVertex{
			.Position{.x = x, .y = 1.0f, .z = z},
			.Color{ .r = 0x44, .g = 0x00, .b = 0xff, .a = 0xff},
		});
	}
	assert(lines_vertices.size() == count);
}

Examining the output, we can see that the depth buffer appears to be working properly (the blue lines are behind on the yellow lines on the left and behind on the right):

tutorial app showing a grid of yellow and blue lines
Notice that the blue and yellow lines have different crossing orders on the left and the right.

Let's Do 3D: a clip-from-local matrix

Drawing lines is fine but wouldn't it be cool if we could see them in 3D?

Yes. Yes, it would.

To see our lines in 3D we need write down a viewing and perspective transform between the local coordinate system of the lines and clip space. Conveniently, all the transforms we need can be represented and concatenated as linear functions on 4D homogeneous coordinates. And linear functions on 4D coordinates can be tabulated as 4x4 matricies.

So let's write a quick 4x4 matrix math library. We're going to write this for demonstration and learning purposes but -- in production -- you should probably move to glm, or at least write your own library making use of SIMD intrinsics (x64; arm).

We start by defining a mat4 as an array of 16 floats and a vec4 as an array of 4 floats:

in mat4.hpp (new file)
#pragma once

//A *small* matrix math library for 4x4 matrices only.

#include <array>
#include <cmath>
#include <cstdint>

//NOTE: column-major storage order (like in OpenGL / GLSL):
using mat4 = std::array< float, 16 >;
static_assert(sizeof(mat4) == 16*4, "mat4 is exactly 16 32-bit floats.");

using vec4 = std::array< float, 4 >;
static_assert(sizeof(vec4) == 4*4, "vec4 is exactly 4 32-bit floats.");

We will think of our matrices as stored in column major order; in other words, the elements will be stored with columns written contiguously. This means that the element at row r and column c is stored in mat[c * 4 + r].

Let's start with applying the linear function tabulated in a matrix to a vector; i.e., doing a matrix-vector multiply:

in mat4.hpp
inline vec4 operator*(mat4 const &A, vec4 const &b) {
	vec4 ret;
	//compute ret = A * b:
	for (uint32_t r = 0; r < 4; ++r) {
		ret[r] = A[0 * 4 + r] * b[0];
		for (uint32_t k = 1; k < 4; ++k) {
			ret[r] += A[k * 4 + r] * b[k];
		}
	}
	return ret;
}

And we'll follow that up with the composition of two linear functions tabulated in matrices; i.e., matrix-matrix multiplication:

in mat4.hpp
inline mat4 operator*(mat4 const &A, mat4 const &B) {
	mat4 ret;
	//compute ret = A * B:
	for (uint32_t c = 0; c < 4; ++c) {
		for (uint32_t r = 0; r < 4; ++r) {
			ret[c * 4 + r] = A[0 * 4 + r] * B[c * 4 + 0];
			for (uint32_t k = 1; k < 4; ++k) {
				ret[c * 4 + r] += A[k * 4 + r] * B[c * 4 + k];
			}
		}
	}
	return ret;
}
I've written this function so many times I've lost count, and I've rarely gotten it right on the first try. Running some test multiplications through it to debug is highly recommended -- just throw them in the top of your main function and delete them when you're convinced your function actually works.

Now let's write a function to compute a perspective matrix:

in mat4.hpp
//perspective projection matrix.
// - vfov is fov *in radians*
// - near maps to 0, far maps to 1
// looks down -z with +y up and +x right
inline mat4 perspective(float vfov, float aspect, float near, float far) {
	//as per https://www.terathon.com/gdc07_lengyel.pdf
	// (with modifications for Vulkan-style coordinate system)
	//  notably: flip y (vulkan device coords are y-down)
	//       and rescale z (vulkan device coords are z-[0,1])
	const float e = 1.0f / std::tan(vfov / 2.0f);
	const float a = aspect;
	const float n = near;
	const float f = far;
	return mat4{ //note: column-major storage order!
		e/a,  0.0f,                      0.0f, 0.0f,
		0.0f,   -e,                      0.0f, 0.0f,
		0.0f, 0.0f,-0.5f - 0.5f * (f+n)/(f-n),-1.0f,
		0.0f, 0.0f,             - (f*n)/(f-n), 0.0f,
	};
}

Note that -- as per convention since time immemorial -- this does perspective projection for a camera looking down the \( -z \) axis with \( +x \) right and \( +y \) up.

One can do some quick sanity checks by substituting in \( z = -n \) and verifying that the output \( z \) coordinate (after homogeneous divide) is \( 0 \); and, similarly, that points with \( z = -f \) map to results with \( z = 1 \). Further, setting the vertical fov to \( \pi/2 \) radians and the near plane distance to 1 unit, we know that \( (\pm 1, \pm 1, -1) \) should map to \( (\pm 1, \mp 1, 0) \) after transformation and homogenous divide.

Now we've got a way to look at things through a perspective camera, but we don't have a way to move the camera. So let's write a function to compute a "look at" matrix. If you want a good check on your vector math intuition, just read the comments and try to write the code from those alone.

in mat4.hpp (new file)
//look at matrix:
// makes a camera-space-from-world matrix for a camera at eye looking toward
// target with up-vector pointing (as-close-as-possible) along up.
// That is, it maps:
//  - eye_xyz to the origin
//  - the unit length vector from eye_xyz to target_xyz to -z
//  - an as-close-as-possible unit-length vector to up to +y
inline mat4 look_at(
	float eye_x, float eye_y, float eye_z,
	float target_x, float target_y, float target_z,
	float up_x, float up_y, float up_z ) {

	//NOTE: this would be a lot cleaner with a vec3 type and some overloads!

	//compute vector from eye to target:
	float in_x = target_x - eye_x;
	float in_y = target_y - eye_y;
	float in_z = target_z - eye_z;

	//normalize 'in' vector:
	float inv_in_len = 1.0f / std::sqrt(in_x*in_x + in_y*in_y + in_z*in_z);
	in_x *= inv_in_len;
	in_y *= inv_in_len;
	in_z *= inv_in_len;

	//make 'up' orthogonal to 'in':
	float in_dot_up = in_x*up_x + in_y*up_y +in_z*up_z;
	up_x -= in_dot_up * in_x;
	up_y -= in_dot_up * in_y;
	up_z -= in_dot_up * in_z;

	//normalize 'up' vector:
	float inv_up_len = 1.0f / std::sqrt(up_x*up_x + up_y*up_y + up_z*up_z);
	up_x *= inv_up_len;
	up_y *= inv_up_len;
	up_z *= inv_up_len;

	//compute 'right' vector as 'in' x 'up'
	float right_x = in_y*up_z - in_z*up_y;
	float right_y = in_z*up_x - in_x*up_z;
	float right_z = in_x*up_y - in_y*up_x;

	//compute dot products of right, in, up with eye:
	float right_dot_eye = right_x*eye_x + right_y*eye_y + right_z*eye_z;
	float up_dot_eye = up_x*eye_x + up_y*eye_y + up_z*eye_z;
	float in_dot_eye = in_x*eye_x + in_y*eye_y + in_z*eye_z;

	//final matrix: (computes (right . (v - eye), up . (v - eye), -in . (v-eye), v.w )
	return mat4{ //note: column-major storage order
		right_x, up_x, -in_x, 0.0f,
		right_y, up_y, -in_y, 0.0f,
		right_z, up_z, -in_z, 0.0f,
		-right_dot_eye, -up_dot_eye, in_dot_eye, 1.0f,
	};
}

A Rotating Camera

Let's compute a matrix through which to view our grid of lines. We'll call it CLIP_FROM_WORLD to indicate it will be used to transform between world space and clip space.

in Tutorial.hpp
#pragma once

#include "PosColVertex.hpp"
#include "mat4.hpp"

#include "RTG.hpp"

struct Tutorial : RTG::Application {
//...
	float time = 0.0f;

	mat4 CLIP_FROM_WORLD;

	std::vector< LinesPipeline::Vertex > lines_vertices;
//...
};
Yes, I will persist in using SHOUTYCASE for identifiers that will [eventually be] used as global variables in shaders.

We'll compute CLIP_FROM_WORLD in update using the functions we've already written.

in Tutorial.cpp
void Tutorial::update(float dt) {
	time = std::fmod(time + dt, 60.0f);

	{ //camera orbiting the origin:
		float ang = float(M_PI) * 2.0f * 10.0f * (time / 60.0f);
		CLIP_FROM_WORLD = perspective(
			60.0f * float(M_PI) / 180.0f, //vfov
			rtg.swapchain_extent.width / float(rtg.swapchain_extent.height), //aspect
			0.1f, //near
			1000.0f //far
		) * look_at(
			3.0f * std::cos(ang), 3.0f * std::sin(ang), 1.0f, //eye
			0.0f, 0.0f, 0.5f, //target
			0.0f, 0.0f, 1.0f //up
		);
	}
//...
}

And, just to make sure everything is working, let's transform our lines vertices by the CLIP_FROM_WORLD linear function on the CPU:

in Tutorial.cpp
void Tutorial::update(float dt) {
	time = std::fmod(time + dt, 60.0f);

	{ //make some crossing lines at different depths:
		//...
	}

	//HACK: transform vertices on the CPU(!)
	for (PosColVertex &v : lines_vertices) {
		vec4 res = CLIP_FROM_WORLD * vec4{v.Position.x, v.Position.y, v.Position.z, 1.0f};
		v.Position.x = res[0] / res[3];
		v.Position.y = res[1] / res[3];
		v.Position.z = res[2] / res[3];
	}
}
Note that this will not handle vertices behind the camera properly -- there is no clipping. But our scene (or, at least, my scene) fits nicely in the \( [-1,1] \times [-1,1] \times [0,1] \) box, which this orbiting camera keeps in front of the camera.

Compile and run and you should have a new perspective on your lines:

3D view of lines
Now we can see the lines in 3D with an orbiting view.

Moving the matrix to the GPU: Descriptors

GPUs are -- more-or-less -- built to multiply matrices and rasterize triangles. And, at present, we're all out of triangles. So let's transfer our WORLD_FROM_CLIP matrix to the GPU so it can do the multiplication work.

To start with, let's stop doing the multiplication on the CPU:

in Tutorial.cpp
void Tutorial::update(float dt) {
	//...
	//HACK: transform vertices on the CPU(!)
	for (PosColVertex &v : lines_vertices) {
		vec4 res = CLIP_FROM_WORLD * vec4{v.Position.x, v.Position.y, v.Position.z, 1.0f};
		v.Position.x = res[0] / res[3];
		v.Position.y = res[1] / res[3];
		v.Position.z = res[2] / res[3];
	}
}
That didn't last long.

We could use push constants -- our matrix is only 64 bytes and we've got at least 128 bytes of push constants available -- but for educational purposes let's use a different way of giving our shader access to the matrix: a uniform block.

Shader programs can access GPU memory through different pieces of special- and general-purpose hardware. In shader code, you select the memory access path for each piece of global data by specifying its type (and, sometimes, with layout decorators). For example, a uniform sampler2D TEX will read data through a texture unit with filtering and interpolation; while a buffer Particles { vec4 PARTICLES[100]; } provides read/write access main GPU memory through a more conventional cache; and a uniform Camera { mat4 CLIP_FROM_LOCAL; } will provide fast read-only access to data that is copied into core-local scatchpad memory.

When using Vulkan to run a shader program, you provide data for each global data access location in the shader via a descriptor. A descriptor is a pointer to a resource in GPU memory. Descriptors are typed: the type of the descriptor specifies how the shader can use the resource it points to. For example, VK_DESCRIPTOR_TYPE_STORAGE_IMAGE allows a shader program to read and write pixels of a VkImage (well, through a VkImageView) wheras VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER allows a shader program to use that same image (again through a view) as a sampled texture. Further, the type of a descriptor must correspond to the type of the global resources in the shader that it provides (though the correspondence is not exactly 1-1.)

Descriptors are stored in sets (handle type: VkDescriptorSet) to provide a mechanism to coordinate swapping out many descriptors at the same time. Descriptor "set" is somewhat of a misnomer though -- these sets are actually ordered lists of bindings where each binding is an array of descriptors of the same type. Shaders indicate the descriptor supplying a global resource by using a layout decorator to assign it a descriptor set index and a binding index within that set -- e.g., layout(set=2,binding=1) uniform sampler2D textures[2] will connect to an array of two descriptors stored in set two's binding one.

When recording a command buffer, your code binds descriptor sets to specific set indices in order to switch out blocks of global resources. Dividing descriptors into sets allows your code to only re-bind descriptors that it needs to change between draw calls, while leaving other descriptors unchanged. This is an optimization because the process of binding a descriptor set may involve, e.g., the GPU needing reconfigure a texture unit, or copy data from main GPU memory into local scratchpad memory.

Let's make our vertex shader expect a uniform buffer descriptor which will supply our camera information, and use the matrix to transform the input position.

in lines.vert
#version 450

layout(set=0, binding=0, std140) uniform Camera {
	mat4 CLIP_FROM_WORLD;
};

//...

void main() {
	gl_Position = CLIP_FROM_WORLD * vec4(Position, 1.0);
	color = Color;
}

Note that the std140 in the layout decorator indicates how the data will be organized in memory. The OpenGL Specification, section 7.6.2.2 gives the layout algorithm for std140 interface blocks as well as std430 blocks (which are mostly the same but pack data in arrays more tightly). In this case, the layout algorithm says that our mat4 CLIP_FROM_WORLD will be stored at offset zero and without any padding.

If you compile and run the code now, the validation layer will complain loudly that you are trying to use a pipeline with a shader whose descriptor set layout doesn't match the layout of the pipeline. And, indeed, this is the case. When we created our pipeline layout, we said it didn't include any descriptors; but our shader says there must be a single descriptor in set zero at binding zero that has type VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER. So we need to create an appropriate descriptor set layout (handle type: VkDescriptorSetLayout) and add it to our pipeline layout.

We'll want the descriptor set layout handle layer for creating descriptor sets as well, so we'll add it to the LinesPipeline structure:

in Tutorial.hpp
	struct LinesPipeline {
		//no descriptor set layouts (yet)
		//descriptor set layouts:
		VkDescriptorSetLayout set0_Camera = VK_NULL_HANDLE;

		//types for descriptors:
		struct Camera {
			mat4 CLIP_FROM_WORLD;
		};
		static_assert(sizeof(Camera) == 16*4, "camera buffer structure is packed");

//...
	} lines_pipeline;

And we can add code to Tutorial-LinesPipeline.cpp to create, supply to the pipeline layout, and destroy the descriptor set layout:

in Tutorial-LinesPipeline.cpp
void Tutorial::LinesPipeline::create(RTG &rtg, VkRenderPass render_pass, uint32_t subpass) {
	VkShaderModule vert_module = rtg.helpers.create_shader_module(vert_code);
	VkShaderModule frag_module = rtg.helpers.create_shader_module(frag_code);

	{ //the set0_Camera layout holds a Camera structure in a uniform buffer used in the vertex shader:
		std::array< VkDescriptorSetLayoutBinding, 1 > bindings{
			VkDescriptorSetLayoutBinding{
				.binding = 0,
				.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
				.descriptorCount = 1,
				.stageFlags = VK_SHADER_STAGE_VERTEX_BIT
			},
		};
		
		VkDescriptorSetLayoutCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
			.bindingCount = uint32_t(bindings.size()),
			.pBindings = bindings.data(),
		};

		VK( vkCreateDescriptorSetLayout(rtg.device, &create_info, nullptr, &set0_Camera) );
	}

	{ //create pipeline layout:
		std::array< VkDescriptorSetLayout, 1 > layouts{
			set0_Camera,
		};

		VkPipelineLayoutCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
			.setLayoutCount = uint32_t(layouts.size()),
			.pSetLayouts = layouts.data(),
			.pushConstantRangeCount = 0,
			.pPushConstantRanges = nullptr,
		};

		VK( vkCreatePipelineLayout(rtg.device, &create_info, nullptr, &layout) );
	}
//...
}

//...

void Tutorial::LinesPipeline::destroy(RTG &rtg) {
	if (set0_Camera != VK_NULL_HANDLE) {
		vkDestroyDescriptorSetLayout(rtg.device, set0_Camera, nullptr);
		set0_Camera = VK_NULL_HANDLE;
	}
//...
}

The code here is pretty self explanatory. We're making a DSL with a single binding that has descriptorCount = 1 descriptors of type descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER which can be accessed only in the vertex shader. One quirk here is that the order of bindings in VkDescriptorSetLayoutCreateInfo::pBindings does not mean anything -- instead each VkDescriptorSetLayoutBinding specifies which binding index it is filling in with its .binding field. The same is NOT true for the order of DSLs in VkPipelineLayoutCreateInfo::pSetLayouts -- the n-th element of that array is the layout that the pipeline will expect in its n-th set index.

Now that we've done the work of properly updating the type information for our pipeline, we can get to the business of actually creating descriptor sets and (finally) pointing the descriptors in them to the data we want the shader to access. We'll be uploading our WORLD_FROM_CLIP matrix to the GPU every frame, so we'll need a buffer and a staging buffer for it in each workspace; further, we'll need a descriptor set that points to the buffer.

You would think that we could just create a VkDescriptorSet (the handle for a descriptor set) immediately using our VkDescriptorSetLayout but... no. In Vulkan, descriptor sets are allocated from a pool (handle type VkDescriptorPool). The idea is that -- for some descriptors sets -- you might want to create them transiently as your code draws, and then free them all at once (which you can do by resetting the pool). In our case, we don't need to reconfigure our descriptor sets, so we'll allocate them once from a pool and free the pool at the end of the program.

Sinec Since we want these VkDescriptorPools to be made per-workspace we're going to make them members of Tutorial::Workspace:

in Tutorial.hpp
//...
	//pools from which per-workspace things are allocated:
	VkCommandPool command_pool = VK_NULL_HANDLE;
	VkDescriptorPool descriptor_pool = VK_NULL_HANDLE;

	//workspaces hold per-render resources:
	struct Workspace {
		VkCommandBuffer command_buffer = VK_NULL_HANDLE; //from the command pool above; reset at the start of every render.

		//location for lines data: (streamed to GPU per-frame)
		Helpers::AllocatedBuffer lines_vertices_src; //host coherent; mapped
		Helpers::AllocatedBuffer lines_vertices; //device-local

		//location for LinesPipeline::Camera data: (streamed to GPU per-frame)
		Helpers::AllocatedBuffer Camera_src; //host coherent; mapped
		Helpers::AllocatedBuffer Camera; //device-local
		VkDescriptorSet Camera_descriptors; //references Camera

	};
	std::vector< Workspace > workspaces;
//...

Now let's write create and destroy code for everything, starting with the descriptor set pool:

in Tutorial.cpp
//...
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
//...
	lines_pipeline.create(rtg, render_pass, 0);

	{ //create descriptor pool:
		uint32_t per_workspace = uint32_t(rtg.workspaces.size()); //for easier-to-read counting

		std::array< VkDescriptorPoolSize, 1> pool_sizes{
			//we only need uniform buffer descriptors for the moment:
			VkDescriptorPoolSize{
				.type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
				.descriptorCount = 1 * per_workspace, //one descriptor per set, one set per workspace
			},
		};
		
		VkDescriptorPoolCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
			.flags = 0, //because CREATE_FREE_DESCRIPTOR_SET_BIT isn't included, *can't* free individual descriptors allocated from this pool
			.maxSets = 1 * per_workspace, //one set per workspace
			.poolSizeCount = uint32_t(pool_sizes.size()),
			.pPoolSizes = pool_sizes.data(),
		};

		VK( vkCreateDescriptorPool(rtg.device, &create_info, nullptr, &descriptor_pool) );
	}

	workspaces.resize(rtg.workspaces.size());
//...
}

Tutorial::~Tutorial() {
//...
	workspaces.clear();

	if (descriptor_pool) {
		vkDestroyDescriptorPool(rtg.device, descriptor_pool, nullptr);
		descriptor_pool = nullptr;
		//(this also frees the descriptor sets allocated from the pool)
	}

	refsol::Tutorial_destructor(rtg, &render_pass, &command_pool);
}
//...

Notice that you have to pre-size the pool. You have to specify both the maximum number of sets you can allocate as well as the number of sets of each descriptor type you can allocate. We're allocating exactly one set of exactly one descriptor from the pool per workspace, so it's not so hard for us to count.

Now let's handle the per-workspace resources. The creation and destruction of the buffer and staging buffer for the camera descriptor are similar to those for the line vertices, but we don't need to dynamically resize them, and we set the usage flags differently (VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT instead of VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, for reasons that are probably self-evident).

in Tutorial.cpp
//...
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
//...
	workspaces.resize(rtg.workspaces.size());
	for (Workspace &workspace : workspaces) {
		refsol::Tutorial_constructor_workspace(rtg, command_pool, &workspace.command_buffer);

		workspace.Camera_src = rtg.helpers.create_buffer(
			sizeof(LinesPipeline::Camera),
			VK_BUFFER_USAGE_TRANSFER_SRC_BIT, //going to have GPU copy from this memory
			VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, //host-visible memory, coherent (no special sync needed)
			Helpers::Mapped //get a pointer to the memory
		);
		workspace.Camera = rtg.helpers.create_buffer(
			sizeof(LinesPipeline::Camera),
			VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, //going to use as a uniform buffer, also going to have GPU copy into this memory
			VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, //GPU-local memory
			Helpers::Unmapped //don't get a pointer to the memory
		);

		//TODO: descriptor set

		//TODO: descriptor write

	}
}

Tutorial::~Tutorial() {
//...
	for (Workspace &workspace : workspaces) {
		refsol::Tutorial_destructor_workspace(rtg, command_pool, &workspace.command_buffer);

		if (workspace.lines_vertices_src.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.lines_vertices_src));
		}
		if (workspace.lines_vertices.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.lines_vertices));
		}

		if (workspace.Camera_src.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.Camera_src));
		}
		if (workspace.Camera.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.Camera));
		}

		//Camera_descriptors freed when pool is destroyed.
	}
	workspaces.clear();

	refsol::Tutorial_destructor(rtg, &render_pass, &command_pool);
}
//...

To allocate the descriptor set, we just need to specify the layout. We don't need to free the descriptor set -- just free the pool that it was allocated from.

in Tutorial.cpp
//...
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
//...
	workspaces.resize(rtg.workspaces.size());
	for (Workspace &workspace : workspaces) {
		//...

		//TODO: descriptor set
		{ //allocate descriptor set for Camera descriptor
			VkDescriptorSetAllocateInfo alloc_info{
				.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO,
				.descriptorPool = descriptor_pool,
				.descriptorSetCount = 1,
				.pSetLayouts = &lines_pipeline.set0_Camera,
			};

			VK( vkAllocateDescriptorSets(rtg.device, &alloc_info, &workspace.Camera_descriptors) );
		}

		//TODO: descriptor write

	}
}
//...

Finally, we need to write a descriptor for (reference to) the workspace.Camera buffer into the descriptor set:

in Tutorial.cpp
//...
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
//...
	workspaces.resize(rtg.workspaces.size());
	for (Workspace &workspace : workspaces) {
		//...

		//TODO: descriptor write
		{ //point descriptor to Camera buffer:
			VkDescriptorBufferInfo Camera_info{
				.buffer = workspace.Camera.handle,
				.offset = 0,
				.range = workspace.Camera.size,
			};

			std::array< VkWriteDescriptorSet, 1 > writes{
				VkWriteDescriptorSet{
					.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
					.dstSet = workspace.Camera_descriptors,
					.dstBinding = 0,
					.dstArrayElement = 0,
					.descriptorCount = 1,
					.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
					.pBufferInfo = &Camera_info,
				},
			};

			vkUpdateDescriptorSets(
				rtg.device, //device
				uint32_t(writes.size()), //descriptorWriteCount
				writes.data(), //pDescriptorWrites
				0, //descriptorCopyCount
				nullptr //pDescriptorCopies
			);
		}
	}
}
//...

Using The Descriptor Sets

All of our various buffers and pointers are ready, so let's put them into service in the render function.

To start with, let's actually bind them during drawing. This will get rid of the validation layer's complaints.

in Tutorial.cpp
//in Tutorial::render
{ //draw with the lines pipeline:
	vkCmdBindPipeline(workspace.command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, lines_pipeline.handle);

	{ //use lines_vertices (offset 0) as vertex buffer binding 0:
		std::array< VkBuffer, 1 > vertex_buffers{ workspace.lines_vertices.handle };
		std::array< VkDeviceSize, 1 > offsets{ 0 };
		vkCmdBindVertexBuffers(workspace.command_buffer, 0, uint32_t(vertex_buffers.size()), vertex_buffers.data(), offsets.data());
	}

	{ //bind Camera descriptor set:
		std::array< VkDescriptorSet, 1 > descriptor_sets{
			workspace.Camera_descriptors, //0: Camera
		};
		vkCmdBindDescriptorSets(
			workspace.command_buffer, //command buffer
			VK_PIPELINE_BIND_POINT_GRAPHICS, //pipeline bind point
			lines_pipeline.layout, //pipeline layout
			0, //first set
			uint32_t(descriptor_sets.size()), descriptor_sets.data(), //descriptor sets count, ptr
			0, nullptr //dynamic offsets count, ptr
		);
	}

	//draw lines vertices:
	vkCmdDraw(workspace.command_buffer, uint32_t(lines_vertices.size()), 1, 0, 0);
}

Notice that the vkCmdBindDescriptorSets function allows you to bind several contiguous sets, and that you can select where that binding starts. This means that you can (e.g.) have a set 0 that gets bound once per frame and stays bound, and then have sets 1 and 2 that you bind new descriptors to per draw call. Also -- and this is kinda wild -- you can actually re-bind pipelines and keep the same descriptor set N bound as long as descriptor sets 1 .. N match between the pipelines. So you can have some global sets that stay the same even between, e.g., materials.

If you build and run the code now, those ugly Vulkan validation layer warnings will be gone. But you probably won't see any lines. That's because we aren't uploading the matrix data yet. To do that, we need to set up copies, just below where we copy the lines vertices:

in Tutorial.cpp
//in Tutorial::render
//...
		vkCmdCopyBuffer(workspace.command_buffer, workspace.lines_vertices_src.handle, workspace.lines_vertices.handle, 1, &copy_region);
	}

	{ //upload camera info:
		LinesPipeline::Camera camera{
			.CLIP_FROM_WORLD = CLIP_FROM_WORLD
		};
		assert(workspace.Camera_src.size == sizeof(camera));

		//host-side copy into Camera_src:
		memcpy(workspace.Camera_src.allocation.data(), &camera, sizeof(camera));

		//add device-side copy from Camera_src -> Camera:
		assert(workspace.Camera_src.size == workspace.Camera.size);
		VkBufferCopy copy_region{
			.srcOffset = 0,
			.dstOffset = 0,
			.size = workspace.Camera_src.size,
		};
		vkCmdCopyBuffer(workspace.command_buffer, workspace.Camera_src.handle, workspace.Camera.handle, 1, &copy_region);
	}

	{ //memory barrier to make sure copies complete before rendering happens:
		VkMemoryBarrier memory_barrier{
//...
}

And just like that, we're back to rendering our lines (and saving a lot of floating point work on the CPU).

3D lines, now transformed on the GPU
Now with GPU-accelerated matrix multiplication.