Solid 3D Objects

Lines are fine, but surfaces are perfect. So let's rasterize some triangles.

The Pipeline

We'll call our new pipeline the "objects pipeline", rather than -- say -- a triangles pipeline, because we're going to specifically design it to transform, light, and draw instanced vertex data; just like you might use to draw a bunch of objects in a scene.

As you might expect, we're going to base our objects pipeline on our lines pipeline.

Copying the Pipeline Declaration

Let's start by copying and modifying the Tutorial::LinesPipeline structure to make a new Tutorial::ObjectsPipeline structure. Put it just under the lines pipeline declaration in Tutorial.hpp:

in Tutorial.hpp
struct ObjectsPipeline {
	//descriptor set layouts:
	VkDescriptorSetLayout set0_Camera = VK_NULL_HANDLE;

	//types for descriptors:
	using Camera = LinesPipeline::Camera;

	//no push constants

	VkPipelineLayout layout = VK_NULL_HANDLE;

	using Vertex = PosColVertex;
		
	VkPipeline handle = VK_NULL_HANDLE;

	void create(RTG &, VkRenderPass render_pass, uint32_t subpass);
	void destroy(RTG &);
} objects_pipeline;
Declaration for our new pipeline.

As before, add calls to the create and destroy functions in Tutorial.cpp:

in Tutorial.cpp
//in Tutorial::Tutorial:

	background_pipeline.create(rtg, render_pass, 0);
	lines_pipeline.create(rtg, render_pass, 0);
	objects_pipeline.create(rtg, render_pass, 0);

//in Tutorial::~Tutorial:

	background_pipeline.destroy(rtg);
	lines_pipeline.destroy(rtg);
	objects_pipeline.destroy(rtg);

Just as before, compiling should work at this point but linking should fail.

Copying the Pipeline Definition

Copy Tutorial-LinesPipeline.cpp to Tutorial-ObjectsPipeline.cpp, and add it to the build:

in Maekfile.js
//uncomment to build lines shaders and pipeline:
const objects_shaders = [
	maek.GLSLC('objects.vert'),
	maek.GLSLC('objects.frag'),
];
main_objs.push( maek.CPP('Tutorial-ObjectsPipeline.cpp', undefined, { depends:[...objects_shaders] } ) );

Now edit Tutorial-ObjectsPipeline.cpp to switch it over to drawing triangles with our new shaders.

Load the correct shaders:

in Tutorial-ObjectsPipeline.cpp
static uint32_t vert_code[] =
#include "spv/objects.vert.inl"
;

static uint32_t frag_code[] =
#include "spv/objects.frag.inl"
;

Update the structure names:

in Tutorial-ObjectsPipeline.cpp
void Tutorial::ObjectsPipeline::create(RTG &rtg, VkRenderPass render_pass, uint32_t subpass) {
//...
void Tutorial::ObjectsPipeline::destroy(RTG &rtg) {

And switch the pipeline to drawing triangles:

in Tutorial-ObjectsPipeline.cpp
		//this pipeline will draw triangles:
		VkPipelineInputAssemblyStateCreateInfo input_assembly_state{
			.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,
			.topology = VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST,
			.primitiveRestartEnable = VK_FALSE
		};

Copy the Shaders

Copy lines.vert to objects.vert and lines.frag to objects.frag. We'll edit these later, but for now copying will let us build the code.

Speaking of building the code, go ahead and do so now. Your new code should build and run, though it won't do anything different yet.

A Static Vertex Buffer

Though streaming vertices from the CPU to the GPU is useful -- especially for transient debug information -- it's inefficient when the vertices being drawn don't actually change frame-to-frame. For example, if we're moving a camera through a scene made of objects that only change under easy-to-encode-in-a-matrix transformations, sending their vertex data every frame would be redundant and wasteful.

So, for our objects pipeline, instead of creating and uploading a vertex buffer every frame, we will compute a vertex buffer and upload it once at the start of the program.

A Static Vertex Buffer

Start by adding a data member to Tutorial to hold our static vertex buffer:

in Tutorial.hpp
//...
//-------------------------------------------------------------------
//static scene resources:

Helpers::AllocatedBuffer object_vertices;

//...

Add appropriate creation and destruction code to Tutorial's constructor and destructor:

in Tutorial.cpp
//in Tutorial::Tutorial:
	for (Workspace &workspace : workspaces) {
		//...
	}
	{ //create object vertices
		std::vector< PosColVertex > vertices;
		
		//TODO: replace with more interesting geometry
		//A single triangle:
		vertices.emplace_back(PosColVertex{
			.Position{ .x = 0.0f, .y = 0.0f, .z = 0.0f },
			.Color{ .r = 0xff, .g = 0xff, .b = 0xff, .a = 0xff  },
		});
		vertices.emplace_back(PosColVertex{
			.Position{ .x = 1.0f, .y = 0.0f, .z = 0.0f },
			.Color{ .r = 0xff, .g = 0x00, .b = 0x00, .a = 0xff },
		});
		vertices.emplace_back(PosColVertex{
			.Position{ .x = 0.0f, .y = 1.0f, .z = 0.0f },
			.Color{ .r = 0x00, .g = 0xff, .b = 0x00, .a = 0xff  },
		});

		size_t bytes = vertices.size() * sizeof(vertices[0]);

		object_vertices = rtg.helpers.create_buffer(
			bytes,
			VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT,
			VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
			Helpers::Unmapped
		);

		//copy data to buffer:
		rtg.helpers.transfer_to_buffer(vertices.data(), bytes, object_vertices);
	}
}

//in Tutorial::~Tutorial:
	rtg.helpers.destroy_buffer(std::move(object_vertices));

	if (swapchain_depth_image.handle != VK_NULL_HANDLE) {
		destroy_framebuffers();
	}
//...

Notice that we're using the Helpers::transfer_to_buffer function to upload the data outside of our rendering function. I wonder how that works? (Foreshadowing!)

It's only a single triangle, but let's draw it:

in Tutorial.cpp
//in Tutorial::render:
{ //draw with the lines pipeline:
	//...
}

{ //draw with the objects pipeline:
	vkCmdBindPipeline(workspace.command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, objects_pipeline.handle);

	{ //use object_vertices (offset 0) as vertex buffer binding 0:
		std::array< VkBuffer, 1 > vertex_buffers{ object_vertices.handle };
		std::array< VkDeviceSize, 1 > offsets{ 0 };
		vkCmdBindVertexBuffers(workspace.command_buffer, 0, uint32_t(vertex_buffers.size()), vertex_buffers.data(), offsets.data());
	}

	//Camera descriptor set is still bound(!)

	//draw all vertices:
	vkCmdDraw(workspace.command_buffer, uint32_t(object_vertices.size / sizeof(ObjectsPipeline::Vertex)), 1, 0, 0);
}

Notice two things about this new code. First, we didn't need to re-bind the camera descriptor set -- we were able to leave it bound because set 0 for both the lines pipeline and the objects pipeline are compatible. Second, notice the somewhat awkward way we're computing the number of vertices to draw -- we'll fix this shortly.

For now, compile and run the code and you should have a fancy new triangle in your scene:

demo application showing some lines and a single triangle
Our new triangle is rendering in the scene.

Transfering Vertices without refsol::

Let's open up that mysterious Helpers::transfer_to_buffer command:

in Helpers.cpp
void Helpers::transfer_to_buffer(void *data, size_t size, AllocatedBuffer &target) {
	refsol::Helpers_transfer_to_buffer(rtg, data, size, &target);
}
You probably expected this.

More refsol:: code?! Well, I guess we'll need to replace that with our own code.

To start with, if we're going to run transfer commands on the GPU we're going to need a command buffer, and to make a command buffer we're going to need a command pool. So let's add data members to Helpers to hold those:

in Helpers.hpp
	//-----------------------
	//CPU -> GPU data transfer:

	// NOTE: synchronizes *hard* against the GPU; inefficient to use for streaming data!
	void transfer_to_buffer(void *data, size_t size, AllocatedBuffer &target);
	void transfer_to_image(void *data, size_t size, AllocatedImage &image); //NOTE: image layout after call is VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

	VkCommandPool transfer_command_pool = VK_NULL_HANDLE;
	VkCommandBuffer transfer_command_buffer = VK_NULL_HANDLE;

	//-----------------------

And let's have Helpers create and destroy those in its create and destroy functions:

in Helpers.cpp
void Helpers::create() {
	VkCommandPoolCreateInfo create_info{
		.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,
		.flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT,
		.queueFamilyIndex = rtg.graphics_queue_family.value(),
	};
	VK( vkCreateCommandPool(rtg.device, &create_info, nullptr, &transfer_command_pool) );

	VkCommandBufferAllocateInfo alloc_info{
		.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO,
		.commandPool = transfer_command_pool,
		.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY,
		.commandBufferCount = 1,
	};
	VK( vkAllocateCommandBuffers(rtg.device, &alloc_info, &transfer_command_buffer) );
}

void Helpers::destroy() {
	//technically not needed since freeing the pool will free all contained buffers:
	if (transfer_command_buffer != VK_NULL_HANDLE) {
		vkFreeCommandBuffers(rtg.device, transfer_command_pool, 1, &transfer_command_buffer);
		transfer_command_buffer = VK_NULL_HANDLE;
	}

	if (transfer_command_pool != VK_NULL_HANDLE) {
		vkDestroyCommandPool(rtg.device, transfer_command_pool, nullptr);
		transfer_command_pool = VK_NULL_HANDLE;
	}
}

Now that we've got a command buffer to work with it's time to start on our transfer code. To begin with, we'll create a transfer source buffer and sketch out the rest of the transfer code:

in Helpers.cpp
void Helpers::transfer_to_buffer(void *data, size_t size, AllocatedBuffer &target) {
	refsol::Helpers_transfer_to_buffer(rtg, data, size, &target);
	//NOTE: could let this stick around and use it for all uploads, but this function isn't for performant transfers anyway:
	AllocatedBuffer transfer_src = create_buffer(
		size,
		VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
		VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
		Mapped
	);

	//TODO: copy data to transfer buffer

	//TODO: record CPU->GPU transfer to command buffer

	//TODO: run command buffer

	//TODO: wait for command buffer to finish

	//don't leak buffer memory:
	destroy_buffer(std::move(transfer_src));
}

Copying the data to the source buffer is a simple memcpy:

in Helpers.cpp
//TODO: copy data to transfer buffer
//copy data into transfer buffer:
std::memcpy(transfer_src.allocation.data(), data, size);

The command buffer recording looks like the first part of our rendering command buffer, but without a render pass or complicated synchronization commands:

in Helpers.cpp
//TODO: record CPU->GPU transfer to command buffer
{ //record command buffer that does CPU->GPU transfer:
	VK( vkResetCommandBuffer(transfer_command_buffer, 0) );

	VkCommandBufferBeginInfo begin_info{
		.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,
		.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, //will record again every submit
	};

	VK( vkBeginCommandBuffer(transfer_command_buffer, &begin_info) );

	VkBufferCopy copy_region{
		.srcOffset = 0,
		.dstOffset = 0,
		.size = size
	};
	vkCmdCopyBuffer(transfer_command_buffer, transfer_src.handle, target.handle, 1, &copy_region);

	VK( vkEndCommandBuffer(transfer_command_buffer) );
}

Running the command buffer is as simple as submitting it to the graphics queue:

in Helpers.cpp
{ //run command buffer
	VkSubmitInfo submit_info{
		.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
		.commandBufferCount = 1,
		.pCommandBuffers = &transfer_command_buffer
	};

	VK( vkQueueSubmit(rtg.graphics_queue, 1, &submit_info, VK_NULL_HANDLE) );
}

And, to wait until the transfer is finished, we wait until the graphics queue is idle:

in Helpers.cpp
//wait for command buffer to finish
VK( vkQueueWaitIdle(rtg.graphics_queue) );

With that, everything should compile and run (and show the triangle); and your code is free of one more refsol:: call.

Normals and TexCoords, oh my!

Before we dive into making some fancier solid objects, let's update our vertex structure. A position and color is fine for some simple lines, but for surfaces we want something fancier. Particularly, if we want to compute lighting we need normals (surface orientations); and while we're at it, we might as well use texture coordinates so we can get sub-triangle-level color detail as well.

A PosNorTexVertex Structure

Make a header and cpp file for our new vertex type by copying PosColVertex.hpp to PosNorTexVertex.hpp and PosColVertex.cpp to PosNorTexVertex.cpp. Then edit the header as follows:

in PosNorTexVertex.hpp
//...
struct PosNorTexVertex {
	struct { float x,y,z; } Position;
	struct { uint8_t r,g,b,a; } Color;
	struct { float x,y,z; } Normal;
	struct { float s,t; } TexCoord;
	//a pipeline vertex input state that works with a buffer holding a PosColVertex[] array:
	static const VkPipelineVertexInputStateCreateInfo array_input_state;
};

static_assert(sizeof(PosNorTexVertex) == 3*4 + 3*4 + 2*4, "PosNorTexVertex is packed.");

And update the cpp file to reflect the new layout of the vertex (and the new structure name):

in PosNorTexVertex.cpp
#include "PosNorTexVertex.hpp"

//...

static std::array< VkVertexInputBindingDescription, 1 > bindings{
	VkVertexInputBindingDescription{
		.binding = 0,
		.stride = sizeof(PosNorTexVertex),
		.inputRate = VK_VERTEX_INPUT_RATE_VERTEX,
	}
};


static std::array< VkVertexInputAttributeDescription, 3 > attributes{
	VkVertexInputAttributeDescription{
		.location = 0,
		.binding = 0,
		.format = VK_FORMAT_R32G32B32_SFLOAT,
		.offset = offsetof(PosNorTexVertex, Position),
	},
	VkVertexInputAttributeDescription{
		.location = 1,
		.binding = 0,
		.format = VK_FORMAT_R8G8B8A8_UNORM,
		.offset = offsetof(PosColVertex, Color),
	},
	VkVertexInputAttributeDescription{
		.location = 1,
		.binding = 0,
		.format = VK_FORMAT_R32G32B32_SFLOAT,
		.offset = offsetof(PosNorTexVertex, Normal),
	},
	VkVertexInputAttributeDescription{
		.location = 2,
		.binding = 0,
		.format = VK_FORMAT_R32G32_SFLOAT,
		.offset = offsetof(PosNorTexVertex, TexCoord),
	},
};

const VkPipelineVertexInputStateCreateInfo PosNorTexVertex::array_input_state{
	//...
};

Notice that we're putting Position at location 0, Normal at location 1, and TexCoord at location 2. We'll need to remember those for when we update our shader.

Now add to Maekfile.js so the new vertex type will be included in the build:

in Maekfile.js
const main_objs = [
	maek.CPP('Tutorial.cpp'),
	maek.CPP('PosColVertex.cpp'),
	maek.CPP('PosNorTexVertex.cpp'),
	maek.CPP('RTG.cpp'),
	maek.CPP('Helpers.cpp'),
	maek.CPP('main.cpp'),
];

Building and running at this point should work (but won't do anything different because our pipeline isn't using the new vertex type yet).

Updating the Pipeline

Updating the pipeline is surprisingly easy, thanks to the fact that we used using to make a local vertex definition:

in Tutorial.hpp
#pragma once

#include "PosColVertex.hpp"
#include "PosNorTexVertex.hpp"
//...
	struct ObjectsPipeline {
		//...
		using Vertex = PosNorTexVertex;
		//...
	} objects_pipeline;
//...

Running now will produce a validation error warning about our vertex input state supplying an attribute not consumed by the shader. Also, our triangle probably won't show up; and, in fact, our shader is probably reading past-the-end on the vertex buffer, because we're still trying to feed it from the older, smaller, vertex format.

Updating the Vertex Buffer

Let's change the vertex buffer over to the new vertex format:

in Tutorial.cpp
{ //create object vertices
	std::vector< PosNorTexVertex > vertices;
		
	//TODO: replace with more interesting geometry
	//A single triangle:
	vertices.emplace_back(PosNorTexVertex{
		.Position{ .x = 0.0f, .y = 0.0f, .z = 0.0f },
		.Color{ .r = 0xff, .g = 0xff, .b = 0xff, .a = 0xff  },
		.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f },
		.TexCoord{ .s = 0.0f, .t = 0.0f },
	});
	vertices.emplace_back(PosNorTexVertex{
		.Position{ .x = 1.0f, .y = 0.0f, .z = 0.0f },
		.Color{ .r = 0xff, .g = 0x00, .b = 0x00, .a = 0xff },
		.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f },
		.TexCoord{ .s = 1.0f, .t = 0.0f },
	});
	vertices.emplace_back(PosNorTexVertex{
		.Position{ .x = 0.0f, .y = 1.0f, .z = 0.0f },
		.Color{ .r = 0x00, .g = 0xff, .b = 0x00, .a = 0xff  },
		.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f },
		.TexCoord{ .s = 0.0f, .t = 1.0f },
	});

	size_t bytes = vertices.size() * sizeof(vertices[0]);
	//...
}

Note that I'm setting the texture coordinate to the \( (x,y) \) position so we can check later that it's coming through to the shader.

If you compile and run the code now, the triangle is back, but it's blue for some reason:

a blue triangle in our 3D scene.
We have skipped both "two" and "red" and proceeded, instead, directly from one triangle to blue triangle.

Update the shaders

Why is the triangle blue? That's because our shader is reading its Color attribute from location 1, which is now fed from from the Normal member of our structure and, threfore, always set to \( (0,0,1) \). (The alpha value still comes out as one because "short" attributes are expanded by adding values from \( (0,0,0,1) \) -- see Conversion to RGBA, as per Vertex Input Extraction.)

So let's fix that by getting our shaders re-written for the new vertex format. We'll have the vertex shader pass the position, normal, and texCoord onward to the fragment shader:

in objects.vert
//...
layout(location=0) in vec3 Position;
layout(location=1) in vec4 Color;
layout(location=1) in vec3 Normal;
layout(location=2) in vec2 TexCoord;

layout(location=0) out vec4 color;
layout(location=0) out vec3 position;
layout(location=1) out vec3 normal;
layout(location=2) out vec2 texCoord;

void main() {
	gl_Position = CLIP_FROM_WORLD * vec4(Position, 1.0);
	color = Color;
	position = Position;
	normal = Normal;
	texCoord = TexCoord;
}

And we'll update the fragment shader to accept these values and (for now) display the texCoord as a color:

in objects.frag
#version 450

layout(location=0) in vec4 color;
layout(location=0) in vec3 position;
layout(location=1) in vec3 normal;
layout(location=2) in vec2 texCoord;

layout(location=0) out vec4 outColor;

void main() {
	outColor = vec4(fract(texCoord), 0.0, 1.0);
}

Note that I'm using fract on texCoord so it's easier to see where textures will repeat.

These updates get us back to seeing our triangle with a colorful gradient:

a red-to-green gradient triangle in our 3D scene.
The triangle now displays the texture coordinate attribute we've sent along with the vertices.

Meshes

Now that we've got a good vertex format, let's make a few meshes.

We're going to put the data for all of these meshes into the same vertex buffer, so let's go ahead and create a little wrapper to package the index information together for each particular mesh:

in Tutorial.hpp
	//-------------------------------------------------------------------
	//static scene resources:

	Helpers::AllocatedBuffer object_vertices;
	struct ObjectVertices {
		uint32_t first = 0;
		uint32_t count = 0;
	};
	ObjectVertices plane_vertices;
	ObjectVertices torus_vertices;

The ObjectVertices structure stores the index of the first vertex and the count of vertices (exactly the parameters used by vkCmdDraw, in fact) for each mesh whose vertices are stored in our object_vertices array. For now, that'll just be two meshes we generate with code, but one can imagine loading a whole library of meshes from disk into one vertex buffer and building a std::unordered_map< std::string, ObjectVertices > to track the location of each within the larger buffer. This would be a performant way of storing static vertex data for a general scene (saves on vertex buffer re-binding commands).

Let's go ahead and build those meshes:

in Tutorial.cpp
{ //create object vertices
	std::vector< PosNorTexVertex > vertices;
		
	//TODO: replace with more interesting geometry
	{ //A [-1,1]x[-1,1]x{0} quadrilateral:
		plane_vertices.first = uint32_t(vertices.size());
		vertices.emplace_back(PosNorTexVertex{
			.Position{ .x = -1.0f, .y = -1.0f, .z = 0.0f },
			.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f },
			.TexCoord{ .s = 0.0f, .t = 0.0f },
		});
		vertices.emplace_back(PosNorTexVertex{
			.Position{ .x = 1.0f, .y = -1.0f, .z = 0.0f },
			.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f},
			.TexCoord{ .s = 1.0f, .t = 0.0f },
		});
		vertices.emplace_back(PosNorTexVertex{
			.Position{ .x = -1.0f, .y = 1.0f, .z = 0.0f },
			.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f},
			.TexCoord{ .s = 0.0f, .t = 1.0f },
		});
		vertices.emplace_back(PosNorTexVertex{
			.Position{ .x = 1.0f, .y = 1.0f, .z = 0.0f },
			.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f },
			.TexCoord{ .s = 1.0f, .t = 1.0f },
		});
		vertices.emplace_back(PosNorTexVertex{
			.Position{ .x = -1.0f, .y = 1.0f, .z = 0.0f },
			.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f},
			.TexCoord{ .s = 0.0f, .t = 1.0f },
		});
		vertices.emplace_back(PosNorTexVertex{
			.Position{ .x = 1.0f, .y = -1.0f, .z = 0.0f },
			.Normal{ .x = 0.0f, .y = 0.0f, .z = 1.0f},
			.TexCoord{ .s = 1.0f, .t = 0.0f },
		});

		plane_vertices.count = uint32_t(vertices.size()) - plane_vertices.first;
	}

	{ //A torus:
		torus_vertices.first = uint32_t(vertices.size());

		//TODO: torus!

		torus_vertices.count = uint32_t(vertices.size()) - torus_vertices.first;
	}

	//...
}

And we can go ahead and generate a torus by writing loops to iterate around the major and minor angles. Note the use of a helper function to avoid writing the vertex computation in more than one place:

in Tutorial.cpp
{ //A torus:
	torus_vertices.first = uint32_t(vertices.size());

	//TODO: torus!
	//will parameterize with (u,v) where:
	// - u is angle around main axis (+z)
	// - v is angle around the tube

	constexpr float R1 = 0.75f; //main radius
	constexpr float R2 = 0.15f; //tube radius

	constexpr uint32_t U_STEPS = 20;
	constexpr uint32_t V_STEPS = 16;

	//texture repeats around the torus:
	constexpr float V_REPEATS = 2.0f;
	constexpr float U_REPEATS = std::ceil(V_REPEATS / R2 * R1);

	auto emplace_vertex = [&](uint32_t ui, uint32_t vi) {
		//convert steps to angles:
		// (doing the mod since trig on 2 M_PI may not exactly match 0)
		float ua = (ui % U_STEPS) / float(U_STEPS) * 2.0f * float(M_PI);
		float va = (vi % V_STEPS) / float(V_STEPS) * 2.0f * float(M_PI);

		vertices.emplace_back( PosNorTexVertex{
			.Position{
				.x = (R1 + R2 * std::cos(va)) * std::cos(ua),
				.y = (R1 + R2 * std::cos(va)) * std::sin(ua),
				.z = R2 * std::sin(va),
			},
			.Normal{
				.x = std::cos(va) * std::cos(ua),
				.y = std::cos(va) * std::sin(ua),
				.z = std::sin(va),
			},
			.TexCoord{
				.s = ui / float(U_STEPS) * U_REPEATS,
				.t = vi / float(V_STEPS) * V_REPEATS,
			},
		});
	};

	for (uint32_t ui = 0; ui < U_STEPS; ++ui) {
		for (uint32_t vi = 0; vi < V_STEPS; ++vi) {
			emplace_vertex(ui, vi);
			emplace_vertex(ui+1, vi);
			emplace_vertex(ui, vi+1);

			emplace_vertex(ui, vi+1);
			emplace_vertex(ui+1, vi);
			emplace_vertex(ui+1, vi+1);
		}
	}

	torus_vertices.count = uint32_t(vertices.size()) - torus_vertices.first;
}

Compiling and running now, you get to see all of the geometry stacked up together in one spot:

our scene with all meshes drawn at the origin
Our plane and torus, hanging out together at the origin.

Lighting (briefly)

Since we went to all the trouble to define a vertex normal, we might as well use it for something. In your fragment shader, normalize the interpolated vertex normal and then use the basic hemisphere light equation to shade your meshes:

in objects.frag
void main() {
	vec3 n = normalize(normal);
	vec3 l = vec3(0.0, 0.0, 1.0);
	vec3 albedo = vec3(fract(texCoord), 0.0);

	//hemisphere lighting from direction l:
	vec3 e = vec3(0.5 * dot(n,l) + 0.5);

	outColor = vec4(e * albedo, 1.0);
}

It's actually relatively hard to see the shading with the albedo set to the texture coordinate, but if you set the albedo to all 1's (i.e., albedo = vec3(1.0)), you get this:

Meshes, now with hemisphere lighting.
The meshes after modifying the fragment shader to do lighting (and, for the purposes of this screenshot, setting albedo = vec3(1.0)).

Objects

To get our geometry unstacked, we need a way of positioning individual instances of our meshes in the scene.

We are going to do this by sending transformation matrices to our vertex shader, and moving the objects by these matrices. But how to send the matrices to the shader? We've already used push constants, and we've already used uniform blocks, so we're going to try something a bit different: storage buffers.

A Storage Buffer in the Vertex Shader

Storage buffers can be both read from and written to in shaders (though our code will only read from them). They are much like uniform blocks in that they store global data for the shader, but storage buffers are accessed through a cache hierarchy (instead of from fast local memory), and -- thus -- while slightly slower, can hold much, much more data than uniforms. (GPUs allow as few as 16kb of uniforms, but storage buffers can be allocated up to the size of device memory. If each object needs two 4x4 float matrices a uniform buffer might only be able to hold 128 object transforms.)

The upshot of the fact that storage buffers are so big is that we can actually send a bunch of matrices at once to the GPU and only grab the ones we need for the current object. This will save us descriptor set binds later. (And, also, gives us a good excuse to set up a different kind of descriptor than what we've used before.)

This is how we declare a storage buffer in our vertex shader (and, while we're at it, get rid of the camera uniform; we don't need that any more; it's baked into our per-object CLIP_FROM_LOCAL matrix):

in objects.vert
layout(set=0, binding=0, std140) uniform Camera {
	mat4 CLIP_FROM_WORLD;
};

struct Transform {
	mat4 CLIP_FROM_LOCAL;
	mat4 WORLD_FROM_LOCAL;
	mat4 WORLD_FROM_LOCAL_NORMAL;
};

layout(set=1, binding=0, std140) readonly buffer Transforms {
	Transform TRANSFORMS[];
};

layout(location=0) in vec3 Position;
layout(location=1) in vec3 Normal;
layout(location=2) in vec2 TexCoord;

Particularly, the set of transforms we'll package for each object will be the transformation that gets to clip space directly from the object's local space -- CLIP_FROM_LOCAL -- and transformations that get to world space (by which I really mean: "the space we'll do lighting computations in") from local positions -- WORLD_FROM_LOCAL -- and normals -- CLIP_FROM_LOCAL_NORMAL.

And we might as well go ahead and actually use the transforms in the shader as well:

in objects.vert
void main() {
	gl_Position = TRANSFORMS[gl_InstanceIndex].CLIP_FROM_LOCAL * vec4(Position, 1.0);
	position = mat4x3(TRANSFORMS[gl_InstanceIndex].WORLD_FROM_LOCAL) * vec4(Position, 1.0);
	normal = mat3(TRANSFORMS[gl_InstanceIndex].WORLD_FROM_LOCAL_NORMAL) * Normal;
	texCoord = TexCoord;
}

As you can see from how we actually use the transforms, it would have been more efficient to make WORLD_FROM_LOCAL a mat4x3 and WORLD_FROM_LOCAL_NORMAL a mat3. However, that would slightly complicate our data assembly on the CPU side so we don't.

Note, also, the use of gl_InstanceIndex here -- this is set by the last parameter of vkCmdDraw and is a sneaky way of getting a 32-bit index into your shader without using a push constant. (At least it's sneaky if you aren't drawing more than one instance; if you are using instanced rendering than it's just the expected way of getting an index into the vertex shader.)

A Storage Buffer Descriptor

Hmm, the shader is accessing memory. You know what this means: we need a buffer to hold the data the shader is accessing, a descriptor to point to that buffer, a descriptor set to hold the descriptor, and a descriptor set layout to describe the type of the descriptor set.

Let's start with the descriptor set layout:

in Tutorial.hpp
struct ObjectsPipeline {
	//descriptor set layouts:
	//VkDescriptorSetLayout set0_Camera = VK_NULL_HANDLE; //<-- we'll get back to set0
	VkDescriptorSetLayout set1_Transforms = VK_NULL_HANDLE;

	//types for descriptors:
	using Camera = LinesPipeline::Camera;
	struct Transform {
		mat4 CLIP_FROM_LOCAL;
		mat4 WORLD_FROM_LOCAL;
		mat4 WORLD_FROM_LOCAL_NORMAL;
	};
	static_assert(sizeof(Transform) == 16*4 + 16*4 + 16*4, "Transform is the expected size.");

	//no push constants
	//...
} objects_pipeline;

Which, of course, we still need to properly create and destroy:

in Tutorial-ObjectsPipeline.cpp
// in Tutorial::ObjectsPipeline::create :
	{ //the set1_Transforms layout holds an array of Transform structures in a storage buffer used in the vertex shader:
		std::array< VkDescriptorSetLayoutBinding, 1 > bindings{
			VkDescriptorSetLayoutBinding{
				.binding = 0,
				.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
				.descriptorCount = 1,
				.stageFlags = VK_SHADER_STAGE_VERTEX_BIT
			},
		};
		
		VkDescriptorSetLayoutCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
			.bindingCount = uint32_t(bindings.size()),
			.pBindings = bindings.data(),
		};

		VK( vkCreateDescriptorSetLayout(rtg.device, &create_info, nullptr, &set1_Transforms) );
	}

	{ //create pipeline layout:
		std::array< VkDescriptorSetLayout, 2 > layouts{
			set0_Camera,
			set1_Transforms, //we'd like to say "VK_NULL_HANDLE" here, but that's not valid without an extension
			set1_Transforms,
		};
		//...
	}

//...

// in Tutorial::ObjectsPipeline::destroy :
	if (set1_Transforms != VK_NULL_HANDLE) {
		vkDestroyDescriptorSetLayout(rtg.device, set1_Transforms, nullptr);
		set1_Transforms = VK_NULL_HANDLE;
	}

Then we'll need a descriptor set and a buffer to point it at. We'll stream the transformations per-frame, so let's go ahead and define these in Workspace:

in Tutorial.hpp
struct Workspace {
	//...

	//location for ObjectsPipeline::Transforms data: (streamed to GPU per-frame)
	Helpers::AllocatedBuffer Transforms_src; //host coherent; mapped
	Helpers::AllocatedBuffer Transforms; //device-local
	VkDescriptorSet Transforms_descriptors; //references Transforms
};

Taking a cue from what we did with the lines data, we'll dynamically re-allocate the buffers in our render function as needed. But we should still write allocation code for the descriptor set (which also will require us to adjust the limits on our descriptor pool):

in Tutorial.cpp
//in Tutorial::Tutorial

	{ //create descriptor pool:
		uint32_t per_workspace = uint32_t(rtg.workspaces.size()); //for easier-to-read counting

		std::array< VkDescriptorPoolSize, 2> pool_sizes{
			//we only need uniform buffer descriptors for the moment:
			VkDescriptorPoolSize{
				.type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
				.descriptorCount = 1 * per_workspace, //one descriptor per set, one set per workspace
			},
			VkDescriptorPoolSize{
				.type = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
				.descriptorCount = 1 * per_workspace, //one descriptor per set, one set per workspace
			},
		};
		
		VkDescriptorPoolCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
			.flags = 0, //because CREATE_FREE_DESCRIPTOR_SET_BIT isn't included, *can't* free individual descriptors allocated from this pool
			.maxSets = 2 * per_workspace, //two sets per workspace
			.poolSizeCount = uint32_t(pool_sizes.size()),
			.pPoolSizes = pool_sizes.data(),
		};

		VK( vkCreateDescriptorPool(rtg.device, &create_info, nullptr, &descriptor_pool) );
	}
	workspaces.resize(rtg.workspaces.size());
	for (Workspace &workspace : workspaces) {
		refsol::Tutorial_constructor_workspace(rtg, command_pool, &workspace.command_buffer);

		//...
		{ //allocate descriptor set for Camera descriptor
			//...
		}

		{ //allocate descriptor set for Transforms descriptor
			VkDescriptorSetAllocateInfo alloc_info{
				.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO,
				.descriptorPool = descriptor_pool,
				.descriptorSetCount = 1,
				.pSetLayouts = &objects_pipeline.set1_Transforms,
			};

			VK( vkAllocateDescriptorSets(rtg.device, &alloc_info, &workspace.Transforms_descriptors) );
			//NOTE: will fill in this descriptor set in render when buffers are [re-]allocated
		}

And, of course, clean-up code for everything:

in Tutorial.cpp
//in Tutorial::~Tutorial
	for (Workspace &workspace : workspaces) {
		refsol::Tutorial_destructor_workspace(rtg, command_pool, &workspace.command_buffer);

		//...

		if (workspace.Transforms_src.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.Transforms_src));
		}
		if (workspace.Transforms.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.Transforms));
		}
		//Transforms_descriptors freed when pool is destroyed.
	}

Transforms and Objects

How should we fill the transforms buffer? Let's make a CPU-side list of objects to draw -- storing the vertex indices and transforms for each. We can fill it up in update and copy the transforms portion to the GPU in render.

First, the structure:

in Tutorial.hpp
	//--------------------------------------------------------------------
	//Resources that change when time passes or the user interacts:

	//...

	std::vector< LinesPipeline::Vertex > lines_vertices;

	struct ObjectInstance {
		ObjectVertices vertices;
		ObjectsPipeline::Transform transform;
	};
	std::vector< ObjectInstance > object_instances;

Now some test transformations:

in Tutorial.cpp
//in Tutorial::update
	{ //make some crossing lines at different depths:
		//...
	}

	{ //make some objects:
		object_instances.clear();

		{ //plane translated +x by one unit:
			mat4 WORLD_FROM_LOCAL{
				1.0f, 0.0f, 0.0f, 0.0f,
				0.0f, 1.0f, 0.0f, 0.0f,
				0.0f, 0.0f, 1.0f, 0.0f,
				1.0f, 0.0f, 0.0f, 1.0f,
			};

			object_instances.emplace_back(ObjectInstance{
				.vertices = plane_vertices,
				.transform{
					.CLIP_FROM_LOCAL = CLIP_FROM_WORLD * WORLD_FROM_LOCAL,
					.WORLD_FROM_LOCAL = WORLD_FROM_LOCAL,
					.WORLD_FROM_LOCAL_NORMAL = WORLD_FROM_LOCAL,
				},
			});
		}
		{ //torus translated -x by one unit and rotated CCW around +y:
			float ang = time / 60.0f * 2.0f * float(M_PI) * 10.0f;
			float ca = std::cos(ang);
			float sa = std::sin(ang);
			mat4 WORLD_FROM_LOCAL{
				  ca, 0.0f,  -sa, 0.0f,
				0.0f, 1.0f, 0.0f, 0.0f,
				  sa, 0.0f,   ca, 0.0f,
				-1.0f,0.0f, 0.0f, 1.0f,
			};

			object_instances.emplace_back(ObjectInstance{
				.vertices = torus_vertices,
				.transform{
					.CLIP_FROM_LOCAL = CLIP_FROM_WORLD * WORLD_FROM_LOCAL,
					.WORLD_FROM_LOCAL = WORLD_FROM_LOCAL,
					.WORLD_FROM_LOCAL_NORMAL = WORLD_FROM_LOCAL,
				},
			});
		}
	}

Note that to properly transform normal vectors, the upper left 3x3 of WORLD_FROM_LOCAL_NORMAL (i.e., the only part of the matrix our shader uses) should be the inverse transpose of the upper left 3x3 of WORLD_FROM_LOCAL. However, since our matrices are orthonormal, the inverse transpose is simply the matrix itself. (And, thus, we avoid having to write a matrix inverse helper function in our library code.)

Next, we have to actually get the transform data into a buffer on the GPU. To do this we can just copy exactly what we did with lines_vertices[_src] and make a few changes to the names of things (and usage flags):

in Tutorial.cpp
//in Tutorial::render
	{ //upload camera info:
		//...
	}

	if (!object_instances.empty()) { //upload object transforms:
		size_t needed_bytes = object_instances.size() * sizeof(ObjectsPipeline::Transform);
		if (workspace.Transforms_src.handle == VK_NULL_HANDLE || workspace.Transforms_src.size < needed_bytes) {
			//round to next multiple of 4k to avoid re-allocating continuously if vertex count grows slowly:
			size_t new_bytes = ((needed_bytes + 4096) / 4096) * 4096;
			if (workspace.Transforms_src.handle) {
				rtg.helpers.destroy_buffer(std::move(workspace.Transforms_src));
			}
			if (workspace.Transforms.handle) {
				rtg.helpers.destroy_buffer(std::move(workspace.Transforms));
			}
			workspace.Transforms_src = rtg.helpers.create_buffer(
				new_bytes,
				VK_BUFFER_USAGE_TRANSFER_SRC_BIT, //going to have GPU copy from this memory
				VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, //host-visible memory, coherent (no special sync needed)
				Helpers::Mapped //get a pointer to the memory
			);
			workspace.Transforms = rtg.helpers.create_buffer(
				new_bytes,
				VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, //going to use as storage buffer, also going to have GPU into this memory
				VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, //GPU-local memory
				Helpers::Unmapped //don't get a pointer to the memory
			);

			//update the descriptor set:
			VkDescriptorBufferInfo Transforms_info{
				.buffer = workspace.Transforms.handle,
				.offset = 0,
				.range = workspace.Transforms.size,
			};

			std::array< VkWriteDescriptorSet, 1 > writes{
				VkWriteDescriptorSet{
					.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
					.dstSet = workspace.Transforms_descriptors,
					.dstBinding = 0,
					.dstArrayElement = 0,
					.descriptorCount = 1,
					.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
					.pBufferInfo = &Transforms_info,
				},
			};

			vkUpdateDescriptorSets(
				rtg.device,
				uint32_t(writes.size()), writes.data(), //descriptorWrites count, data
				0, nullptr //descriptorCopies count, data
			);

			std::cout << "Re-allocated object transforms buffers to " << new_bytes << " bytes." << std::endl;
		}

		assert(workspace.Transforms_src.size == workspace.Transforms.size);
		assert(workspace.Transforms_src.size >= needed_bytes);

		{ //copy transforms into Transforms_src:
			assert(workspace.Transforms_src.allocation.mapped);
			ObjectsPipeline::Transform *out = reinterpret_cast< ObjectsPipeline::Transform * >(workspace.Transforms_src.allocation.data()); // Strict aliasing violation, but it doesn't matter
			for (ObjectInstance const &inst : object_instances) {
				*out = inst.transform;
				++out;
			}
		}

		//device-side copy from Transforms_src -> Transforms:
		VkBufferCopy copy_region{
			.srcOffset = 0,
			.dstOffset = 0,
			.size = needed_bytes,
		};
		vkCmdCopyBuffer(workspace.command_buffer, workspace.Transforms_src.handle, workspace.Transforms.handle, 1, &copy_region);
	}
Note that changes are indicated assuming you copied the body of the block from the lines vertices upload code.

Two interesting changes in this block. First, notice that the list of transforms is built directly into the mapped transforms source memory, avoiding any additional copies. Second, notice that a descriptor set write is included so that the descriptor set stays up to date with the re-allocated buffer.

Right, let's (finally) retrofit our drawing code to bind the descriptor set as set 1 and draw each object instance with a proper instance ID:

in Tutorial.cpp
//in Tutorial::render
if (!object_instances.empty()) { //draw with the objects pipeline:
	vkCmdBindPipeline(workspace.command_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, objects_pipeline.handle);

	{ //use object_vertices (offset 0) as vertex buffer binding 0:
		//...
	}

	{ //bind Transforms descriptor set:
		std::array< VkDescriptorSet, 1 > descriptor_sets{
			workspace.Transforms_descriptors, //1: Transforms
		};
		vkCmdBindDescriptorSets(
			workspace.command_buffer, //command buffer
			VK_PIPELINE_BIND_POINT_GRAPHICS, //pipeline bind point
			objects_pipeline.layout, //pipeline layout
			1, //first set
			uint32_t(descriptor_sets.size()), descriptor_sets.data(), //descriptor sets count, ptr
			0, nullptr //dynamic offsets count, ptr
		);
	}

	//Camera descriptor set is still bound, but unused(!)

	//draw all instances:
	for (ObjectInstance const &inst : object_instances) {
		uint32_t index = uint32_t(&inst - &object_instances[0]);
		vkCmdDraw(workspace.command_buffer, inst.vertices.count, 1, inst.vertices.first, index);
	}
}

Notice how we used the firstSet parameter to vkCmdBindDescriptorSets to make sure our descriptor set got bound as set 1, not set 0.

With all this done, compiling and running should produce no validation errors or warnings, and will display the plane and torus, with the torus spinning:

a scene with a spinning torus and adjacent plane
Our scene, now with instanced objects moving the way we want.

Some quick testing on a system with an AMD Ryzen 7950x CPU and NVIDIA GeForce RTX 3080 GPU (with debug turned on, running under linux) suggests that this method of drawing allows pushing something like 64,000 torus instances at 60fps; and upwards of 125,000 torus instances with the added optimization of using a single draw call to draw all instances of the same mesh (by setting instanceCount to the actual count of instances!). It even maintains ~15fps on 700,000+ instances.

Textures

We've been pushing texture coordinates around for a while now; let's actually use them to draw a texture.

Sampling a Texture in a Shader

Let's start at the place our texture is used: the fragment shader. Edit objects.frag to include a sampler2D uniform and to read from it:

in objects.frag
version 450

layout(set=2,binding=0) uniform sampler2D TEXTURE;

layout(location=0) in vec3 position;
//...
	vec3 albedo = texture(TEXTURE, texCoord).rgb;
//...
Adding an 2D texture and reading the albedo from that texture.

Attaching an Image to a sampler2D

Just like any other uniform, we need to bind a descriptor to tell the Vulkan driver where TEXTURE should point. Unlike uniform blocks or storage buffers, a sampler2D is an "opaque descriptor", which means we direct it to image data (along with some sampler state), rather than a block of memory in a known format.

We'd like our application to support many different textures, with the texture choosable per object draw. So let's go ahead and make data structures to hold the texture images along with everything else we need to make the descriptors:

in Tutorial.hpp
	//-------------------------------------------------------------------
	//static scene resources:

	Helpers::AllocatedBuffer object_vertices;
	struct ObjectVertices {
		uint32_t first = 0;
		uint32_t count = 0;
	};
	ObjectVertices plane_vertices;
	ObjectVertices torus_vertices;

	std::vector< Helpers::AllocatedImage > textures;
	std::vector< VkImageView > texture_views;
	VkSampler texture_sampler = VK_NULL_HANDLE;
	VkDescriptorPool texture_descriptor_pool = VK_NULL_HANDLE;
	std::vector< VkDescriptorSet > texture_descriptors; //allocated from texture_descriptor_pool

In the code we just wrote, textures holds handles to the actual image data, texture_views are references to portions of of the textures (in this case: the whole texture), the texture_sampler gives the sampler state (wrapping, interpolation, etc) for reading from the textures, the texture_descriptor_pool is the pool from which we allocate texture descriptor sets, and, finally, texture_descriptors includes a descriptor for each of our textures.

The reason we have a separate descriptor pool for just the textures here is so you could -- conceivably -- re-allocate the pool if your code loaded a texture in the middle of rendering frames; without disturbing any of the other descriptors used by our code. It also makes our texture creation code a bit more self-contained.

And we'll add an index to our ObjectInstance structure to indicate which texture descriptor to bind when drawing each instance:

in Tutorial.hpp
	struct ObjectInstance {
		ObjectVertices vertices;
		ObjectsPipeline::Transform transform;
		uint32_t texture = 0;
	};

Now we update the drawing code to bind the correct descriptor set:

in Tutorial.cpp
//draw all instances:
for (ObjectInstance const &inst : object_instances) {
	uint32_t index = uint32_t(&inst - &object_instances[0]);

	//bind texture descriptor set:
	vkCmdBindDescriptorSets(
		workspace.command_buffer, //command buffer
		VK_PIPELINE_BIND_POINT_GRAPHICS, //pipeline bind point
		objects_pipeline.layout, //pipeline layout
		2, //second set
		1, &texture_descriptors[inst.texture], //descriptor sets count, ptr
		0, nullptr //dynamic offsets count, ptr
	);

	vkCmdDraw(workspace.command_buffer, inst.vertices.count, 1, inst.vertices.first, index);
}

Since this code will access past-the-end of the texture descriptors array if we compile and run now, it's probably a good idea for us to actually make some texture descriptors.

Making Some Texture Descriptors (Descriptor Set Layout)

To make a descriptor set, we first need a descriptor set layout. So let's update our ObjectsPipeline structure definition:

in Tutorial.hpp
struct ObjectsPipeline {
	//descriptor set layouts:
	//VkDescriptorSetLayout set0_Camera = VK_NULL_HANDLE; //<-- we'll get back to set0
	VkDescriptorSetLayout set1_Transforms = VK_NULL_HANDLE;
	VkDescriptorSetLayout set2_TEXTURE = VK_NULL_HANDLE;

	//...
} objects_pipeline;
Yes, we will still get back to set zero.

And we can also write the code to create the descriptor set and add it to the pipeline layout:

in Tutorial-ObjectsPipeline.cpp
//in ObjectsPipeline::create
	{ //the set1_Transforms layout holds an array of Transform structures in a storage buffer used in the vertex shader:
		//...
	}

	{ //the set2_TEXTURE layout has a single descriptor for a sampler2D used in the fragment shader:
		std::array< VkDescriptorSetLayoutBinding, 1 > bindings{
			VkDescriptorSetLayoutBinding{
				.binding = 0,
				.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,
				.descriptorCount = 1,
				.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT
			},
		};
		
		VkDescriptorSetLayoutCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
			.bindingCount = uint32_t(bindings.size()),
			.pBindings = bindings.data(),
		};

		VK( vkCreateDescriptorSetLayout(rtg.device, &create_info, nullptr, &set2_TEXTURE) );
	}

	{ //create pipeline layout:
		std::array< VkDescriptorSetLayout, 3 > layouts{
			set1_Transforms, //we'd like to say VK_NULL_HANDLE here, but that's not valid without an extension
			set1_Transforms,
			set2_TEXTURE,
		};

Notice that the type of the descriptor in this set is VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, because a GLSL sampler2D references both an image and the parameters for how to sample from that image. If we wanted to have a separate descriptor just for how to sample from an image, we'd use a VK_DESCRIPTOR_TYPE_SAMPLER descriptor and a sampler-type uniform in GLSL. If we wanted to have a separate descriptor just for an image that could be sampled from, we'd us a VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE descriptor and the type texture2D in GLSL. (Note that these split sampler/texture types only exist in GLSL meant to be compiled to Vulkan, and are not available in OpenGL GLSL.)

Let's not forget to destroy our descriptor set layout:

in Tutorial-ObjectsPipeline.cpp
void Tutorial::ObjectsPipeline::destroy(RTG &rtg) {
	if (set2_TEXTURE != VK_NULL_HANDLE) {
		vkDestroyDescriptorSetLayout(rtg.device, set2_TEXTURE, nullptr);
		set2_TEXTURE = VK_NULL_HANDLE;
	}

//...

Actually Making Some Texture Descriptors

Now that we've got a descriptor set layout for our texture descriptors, we can actually write the code that makes them. But, hey, why don't we write the clean-up code first:

in Tutorial.cpp
Tutorial::~Tutorial() {
	//just in case rendering is still in flight, don't destroy resources:
	//(not using VK macro to avoid throw-ing in destructor)
	if (VkResult result = vkDeviceWaitIdle(rtg.device); result != VK_SUCCESS) {
		std::cerr << "Failed to vkDeviceWaitIdle in Tutorial::~Tutorial [" << string_VkResult(result) << "]; continuing anyway." << std::endl;
	}

	if (texture_descriptor_pool) {
		vkDestroyDescriptorPool(rtg.device, texture_descriptor_pool, nullptr);
		texture_descriptor_pool = nullptr;

		//this also frees the descriptor sets allocated from the pool:
		texture_descriptors.clear();
	}

	if (texture_sampler) {
		vkDestroySampler(rtg.device, texture_sampler, nullptr);
		texture_sampler = VK_NULL_HANDLE;
	}

	for (VkImageView &view : texture_views) {
		vkDestroyImageView(rtg.device, view, nullptr);
		view = VK_NULL_HANDLE;
	}
	texture_views.clear();

	for (auto &texture : textures) {
		rtg.helpers.destroy_image(std::move(texture));
	}
	textures.clear();

	//...
}

Now we'll start with a general plan for our creation code, at the end of Tutorial::Tutorial:

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...

	{ //TODO: make some textures
	}

	{ //TODO: make image views for the textures
	}

	{ //TODO: make a sampler for the textures
	}
		
	{ //TODO: create the texture descriptor pool
	}

	{ //TODO: allocate and write the texture descriptor sets
	}
}

In keeping with the theme of our tutorial, we'll fill in this plan backwards. Once everything else is created, all we need to do is allocate the descriptor sets and then do descriptor writes for each texture into its descriptor set.

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...

	{ //allocate and write the texture descriptor sets

		//allocate the descriptors (using the same alloc_info):
		VkDescriptorSetAllocateInfo alloc_info{
			.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO,
			.descriptorPool = texture_descriptor_pool,
			.descriptorSetCount = 1,
			.pSetLayouts = &objects_pipeline.set2_TEXTURE,
		};
		texture_descriptors.assign(textures.size(), VK_NULL_HANDLE);
		for (VkDescriptorSet &descriptor_set : texture_descriptors) {
			VK( vkAllocateDescriptorSets(rtg.device, &alloc_info, &descriptor_set) );
		}

		//TODO: write descriptors for textures
	}
}

This is the same way we've allocated descriptor sets in the past, with the added twist that we just use the exact sample alloc_info repeatedly.

The descriptor set writes, however, are a bit more complicated than we've seen previously, because we're going to do them in a batch, but we need a different VkDescriptorImageInfo per-write. We don't want the addresses of the image info structures to change so we pre-size the vector that holds them (we could also have reserve'd enough space and emplace_back'd the structures, but I figured this made things clearer).

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...

	{ //allocate and write the texture descriptor sets
		//...

		//write descriptors for textures:
		std::vector< VkDescriptorImageInfo > infos(textures.size());
		std::vector< VkWriteDescriptorSet > writes(textures.size());

		for (Helpers::AllocatedImage const &image : textures) {
			size_t i = &image - &textures[0];
			
			infos[i] = VkDescriptorImageInfo{
				.sampler = texture_sampler,
				.imageView = texture_views[i],
				.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
			};
			writes[i] = VkWriteDescriptorSet{
				.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
				.dstSet = texture_descriptors[i],
				.dstBinding = 0,
				.dstArrayElement = 0,
				.descriptorCount = 1,
				.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,
				.pImageInfo = &infos[i],
			};
		}

		vkUpdateDescriptorSets( rtg.device, uint32_t(writes.size()), writes.data(), 0, nullptr );
	}
}

Note that part of the image info for a descriptor is the layout we are promising the image will be in when the descriptor is used. Image layouts are the way that Vulkan talks about how an image is organized in memory. We talked about this a bit back when making a render pass; and we're going to talk about it a bit more in the rest of this section, since a big part of dealing with textures in Vulkan is making sure they undergo the right layout transitions to be in the state we need them in when, e.g., sampling from them in a fragment shader.

For now, make a mental note that we had better make sure our textures are in the layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL by the time our fragment shader runs.

Let's back up and make the descriptor pool. Nothing complicated here; more-or-less the same as the code that made descriptor_pool -- just with a different descriptor type, descriptor count, and max sets. We know how many sets and descriptors are needed because we know how many textures we have.

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...
	{ // create the texture descriptor pool
		uint32_t per_texture = uint32_t(textures.size()); //for easier-to-read counting

		std::array< VkDescriptorPoolSize, 1> pool_sizes{
			VkDescriptorPoolSize{
				.type = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,
				.descriptorCount = 1 * 1 * per_texture, //one descriptor per set, one set per texture
			},
		};
		
		VkDescriptorPoolCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
			.flags = 0, //because CREATE_FREE_DESCRIPTOR_SET_BIT isn't included, *can't* free individual descriptors allocated from this pool
			.maxSets = 1 * per_texture, //one set per texture
			.poolSizeCount = uint32_t(pool_sizes.size()),
			.pPoolSizes = pool_sizes.data(),
		};

		VK( vkCreateDescriptorPool(rtg.device, &create_info, nullptr, &texture_descriptor_pool) );
	}

	//...
}

Taking another step back we come to creating the sampler. We already used this in the descriptor writes -- it contains all the information that controls how the GPU will read from a texture:

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...
	{ // make a sampler for the textures
		VkSamplerCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO,
			.flags = 0,
			.magFilter = VK_FILTER_NEAREST,
			.minFilter = VK_FILTER_NEAREST,
			.mipmapMode = VK_SAMPLER_MIPMAP_MODE_NEAREST,
			.addressModeU = VK_SAMPLER_ADDRESS_MODE_REPEAT,
			.addressModeV = VK_SAMPLER_ADDRESS_MODE_REPEAT,
			.addressModeW = VK_SAMPLER_ADDRESS_MODE_REPEAT,
			.mipLodBias = 0.0f,
			.anisotropyEnable = VK_FALSE,
			.maxAnisotropy = 0.0f, //doesn't matter if anisotropy isn't enabled
			.compareEnable = VK_FALSE,
			.compareOp = VK_COMPARE_OP_ALWAYS, //doesn't matter if compare isn't enabled
			.minLod = 0.0f,
			.maxLod = 0.0f,
			.borderColor = VK_BORDER_COLOR_FLOAT_TRANSPARENT_BLACK,
			.unnormalizedCoordinates = VK_FALSE,
		};
		VK( vkCreateSampler(rtg.device, &create_info, nullptr, &texture_sampler) );
	}
	//...
}

A few things to notice here are that anisotropic sampling is supported out of the box (OpenGL relegates this to an extension, IIRC); you control how the texture repeats (or doesn't) with the "addressing modes"; and mip mapping is controlled separately from the minification and magnification filtering modes (OpenGL combines these together).

The settings we've used basically turn off mip-mapping (clamp to level zero and only sample the nearest level); if you do want mip-mapping in Vulkan you need to compute and upload your texture mip levels yourself.

And what is actually getting sampled with the sampler? If you recall the descriptors we wrote: image views (handle type VkImageView)! These are references to particular aspects (color, depth, stencil) of particular ranges of the mip levels and array slices of an image, interpreted in a certain format, and presented in a certain arrangement (2D, 3D, cubemap, ...). You can't crop an image in an image view, but you can view the same image as (e.g.) a texture array or a cube map; or as depth data or RGBA bytes; or even as SRGB-encoded or linearly-encoded data.

So let's make those, using some convenience members of AllocatedImage to get the information we need:

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...
	{ //make image views for the textures
		texture_views.reserve(textures.size());
		for (Helpers::AllocatedImage const &image : textures) {
			VkImageViewCreateInfo create_info{
				.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
				.flags = 0,
				.image = image.handle,
				.viewType = VK_IMAGE_VIEW_TYPE_2D,
				.format = image.format,
				// .components sets swizzling and is fine when zero-initialized
				.subresourceRange{
					.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
					.baseMipLevel = 0,
					.levelCount = 1,
					.baseArrayLayer = 0,
					.layerCount = 1,
				},
			};

			VkImageView image_view = VK_NULL_HANDLE;
			VK( vkCreateImageView(rtg.device, &create_info, nullptr, &image_view) );

			texture_views.emplace_back(image_view);
		}
		assert(texture_views.size() == textures.size());
	}
	//...
}

These are unexciting image views, they just take the color aspect of the first level of the first array layer of the image in the same format we defined the image in and present it as a 2D image.

To wrap things up, let's build and upload some textures. We start by making a 128x128 checkerboard texture (with a red blob at the origin so we know where that is):

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...
	{ //make some textures
		textures.reserve(2);

		{ //texture 0 will be a dark grey / light grey checkerboard with a red square at the origin.
			//actually make the texture:
			uint32_t size = 128;
			std::vector< uint32_t > data;
			data.reserve(size * size);
			for (uint32_t y = 0; y < size; ++y) {
				float fy = (y + 0.5f) / float(size);
				for (uint32_t x = 0; x < size; ++x) {
					float fx = (x + 0.5f) / float(size);
					//highlight the origin:
					if      (fx < 0.05f && fy < 0.05f) data.emplace_back(0xff0000ff); //red
					else if ( (fx < 0.5f) == (fy < 0.5f)) data.emplace_back(0xff444444); //dark grey
					else data.emplace_back(0xffbbbbbb); //light grey
				}
			}
			assert(data.size() == size*size);

			//TODO: make a place for the texture to live on the GPU

			//TODO: transfer data
		}

		{ //TODO: texture 1 will be a classic 'xor' texture
		}
	}
	//...
}

To actually get it to the GPU we use Helpers::create_image and Helpers::transfer_to_image:

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...
	{ //make some textures
		//...
		{ //texture 0 will be a dark grey / light grey checkerboard with a red square at the origin.
			//...

			//make a place for the texture to live on the GPU:
			textures.emplace_back(rtg.helpers.create_image(
				VkExtent2D{ .width = size , .height = size }, //size of image
				VK_FORMAT_R8G8B8A8_UNORM, //how to interpret image data (in this case, linearly-encoded 8-bit RGBA)
				VK_IMAGE_TILING_OPTIMAL,
				VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT, //will sample and upload
				VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, //should be device-local
				Helpers::Unmapped
			));

			//transfer data:
			rtg.helpers.transfer_to_image(data.data(), sizeof(data[0]) * data.size(), textures.back());
		}
		//...
	}
	//...
}

We're using VK_FORMAT_R8G8B8A8_UNORM as the format for the image. This means that there is no SRGB decoding of the data: our 0x55/0xbb checkerboard will decode linearly to 0.2666 and 0.7333 when sampled in the shader.

Also, recalling our mental note about image layouts, notice this comment in Helpers.hpp:

in Helpers.hpp
void transfer_to_image(void *data, size_t size, AllocatedImage &image); //NOTE: image layout after call is VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

So, in fact, everything is ready to go right now and we could compile and run. Let's do it!

scene with checkerboard-textured torus and plane
Everything ends up with texture zero because that's the default value for the texture index in our object instance structure. Also, it's the only texture we've made so far.

Okay, now for a bit of a victory lap. Let's use the classic demoscene trick of xor-ing the x- and y- coordinates together to make a texture with an interesting binary noise pattern:

in Tutorial.cpp
Tutorial::Tutorial(RTG &rtg_) : rtg(rtg_) {
	//...
	{ //texture 1 will be a classic 'xor' texture:
		//actually make the texture:
		uint32_t size = 256;
		std::vector< uint32_t > data;
		data.reserve(size * size);
		for (uint32_t y = 0; y < size; ++y) {
			for (uint32_t x = 0; x < size; ++x) {
				uint8_t r = uint8_t(x) ^ uint8_t(y);
				uint8_t g = uint8_t(x + 128) ^ uint8_t(y);
				uint8_t b = uint8_t(x) ^ uint8_t(y + 27);
				uint8_t a = 0xff;
				data.emplace_back( uint32_t(r) | (uint32_t(g) << 8) | (uint32_t(b) << 16) | (uint32_t(a) << 24) );
			}
		}
		assert(data.size() == size*size);

		//make a place for the texture to live on the GPU:
		textures.emplace_back(rtg.helpers.create_image(
			VkExtent2D{ .width = size , .height = size }, //size of image
			VK_FORMAT_R8G8B8A8_SRGB, //how to interpret image data (in this case, SRGB-encoded 8-bit RGBA)
			VK_IMAGE_TILING_OPTIMAL,
			VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT, //will sample and upload
			VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, //should be device-local
			Helpers::Unmapped
		));

		//transfer data:
		rtg.helpers.transfer_to_image(data.data(), sizeof(data[0]) * data.size(), textures.back());
	}
	//...
}

I've decided that the colors in this texture should be interpreted as if they are SRGB-encoded, and have set the format appropriately. This is probably the right format for albedo texture images you load from disk as well, since this is generally the color space we use for visual material. But using it for, say, normal map images is definitely a trap!

Now you can your scene creation code to apply texture one to a few objects and things will get a lot more colorful:

in Tutorial.cpp
	object_instances.emplace_back(ObjectInstance{
		.vertices = plane_vertices,
		.transform{
			.CLIP_FROM_LOCAL = CLIP_FROM_WORLD * WORLD_FROM_LOCAL,
			.WORLD_FROM_LOCAL = WORLD_FROM_LOCAL,
			.WORLD_FROM_LOCAL_NORMAL = WORLD_FROM_LOCAL,
		},
		.texture = 1,
	});
scene with checkerboard-textured torus and xor-textured plane
I've set the plane to use our xor texture.

Just One More Thing: transfer_to_image

Before we wrap up textures, we should really take a look at what that Helpers::transfer_to_image function is doing:

in Helpers.cpp
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	refsol::Helpers_transfer_to_image(rtg, data, size, &target);
}
Okay, you're probably not surprised to find refsol code here.

If we want to understand how data gets copied into a VkImage, we should re-write this to remove the reference code.

Let's start with a framework that's very similar to what we did for transfer_to_buffer:

in Helpers.cpp
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	refsol::Helpers_transfer_to_image(rtg, data, size, &target);
	//TODO: check data is the right size [new]

	//TODO: create a host-coherent source buffer

	//TODO: copy image data into the source buffer

	//TODO: begin recording a command buffer

	//TODO: put the receiving image in destination-optimal layout [new]

	//TODO: copy the source buffer to the image [new]

	//TODO: transition the image memory to shader-read-only-optimal layout [new]

	//TODO: end and submit the command buffer

	//TODO: wait for command buffer to finish executing

	//TODO: destroy the source buffer
}

The interesting new things here are the two layout transitions -- this is adding a command that tells the GPU to re-arrange the image in memory -- and the use of a different copy command (since the one we used before is buffer-to-buffer). The way we check to make sure the data is the right size is also a bit different than before, but less interesting.

Let's get to it:

in Helpers.cpp
//...
#include "refsol.hpp"

#include <vulkan/utility/vk_format_utils.h> //useful for byte counting

#include <utility>
//...
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	assert(target.handle != VK_NULL_HANDLE); //target image should be allocated already

	//check data is the right size:
	size_t bytes_per_pixel = vkuFormatElementSize(target.format);
	assert(size == target.extent.width * target.extent.height * bytes_per_pixel);
	//...
}

The only way to figure out how many bytes are needed for each pixel in an image is to read the spec and make a big table that maps from format constants to bytes. Thankfully, the vkuFormatElementSize function, part of the vk_format_utils.h header included with the SDK, already does that, so we don't need to write that function ourselves.

Creating the source buffer and copying the data into the source buffer proceed exactly the same way as in the other transfer function:

in Helpers.cpp
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	//...
	//create a host-coherent source buffer
	AllocatedBuffer transfer_src = create_buffer(
		size,
		VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
		VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
		Mapped
	);

	//copy image data into the source buffer
	std::memcpy(transfer_src.allocation.data(), data, size);
	//...
}

Nothing exciting about starting the command buffer recording, either:

in Helpers.cpp
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	//...
	//begin recording a command buffer
	VK( vkResetCommandBuffer(transfer_command_buffer, 0) );

	VkCommandBufferBeginInfo begin_info{
		.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,
		.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, //will record again every submit
	};

	VK( vkBeginCommandBuffer(transfer_command_buffer, &begin_info) );
	//...
}

To tell the GPU to put the image in a specific layout, we use a pipeline barrier command with a VkImageMemoryBarrier structure. This is a synchronization primitive that requires that every command before the barrier (in a certain pipeline stage, doing a certain memory operation) must happen before the layout transition, and that every command after the barrier (in a certain pipeline stage, doing a certain memory operation) must happen after the layout transition.

in Helpers.cpp
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	//...
	VkImageSubresourceRange whole_image{
		.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
		.baseMipLevel = 0,
		.levelCount = 1,
		.baseArrayLayer = 0,
		.layerCount = 1,
	};

	{ //put the receiving image in destination-optimal layout
		VkImageMemoryBarrier barrier{
			.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
			.srcAccessMask = 0,
			.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
			.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED, //throw away old image
			.newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
			.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
			.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
			.image = target.handle,
			.subresourceRange = whole_image,
		};

		vkCmdPipelineBarrier(
			transfer_command_buffer, //commandBuffer
			VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, //srcStageMask
			VK_PIPELINE_STAGE_TRANSFER_BIT, //dstStageMask
			0, //dependencyFlags
			0, nullptr, //memory barrier count, pointer
			0, nullptr, //buffer memory barrier count, pointer
			1, &barrier //image memory barrier count, pointer
		);
	}

	//...
}

Specifically, by setting srcAccessMask to zero this barrier doesn't place any conditions on earlier commands, but the dstAccessMask (write) and dstStageMask (transfer) indicate that the transition must complete before any transfers write data to the image.

These constraints make sense in context because this transition is taking the image from VK_IMAGE_LAYOUT_UNDEFINED (which means "throw away any image contents") to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (which is "whatever layout is best for receiving data").

Now that the image is in a good layout to copy into, we can record the copy command:

in Helpers.cpp
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	//...
	{ // copy the source buffer to the image
		VkBufferImageCopy region{
			.bufferOffset = 0,
			.bufferRowLength = target.extent.width,
			.bufferImageHeight = target.extent.height,
			.imageSubresource{
				.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
				.mipLevel = 0,
				.baseArrayLayer = 0,
				.layerCount = 1,
			},
			.imageOffset{ .x = 0, .y = 0, .z = 0 },
			.imageExtent{
				.width = target.extent.width,
				.height = target.extent.height,
				.depth = 1
			},
		};

		vkCmdCopyBufferToImage(
			transfer_command_buffer,
			transfer_src.handle,
			target.handle,
			VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
			1, &region
		);

		//NOTE: if image had mip levels, would need to copy as additional regions here.
	}
	//...
}

This is a different copy function but shouldn't be too daunting. The region describes what part of the image to copy, and the other parameters indicate buffer and image to copy between and the current format of the image. Frustratingly, the imageSubresource field of VkBufferImageCopy is a VkImageSubresourceLayers not a VkImageSubresourceRange, otherwise we could have used our convenient whole_image structure from above.

Now another image layout transition, this time to the optimal-to-read-from-in-a-shader format:

in Helpers.cpp
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	//...
	{ // transition the image memory to shader-read-only-optimal layout:
		VkImageMemoryBarrier barrier{
			.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
			.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
			.dstAccessMask = VK_ACCESS_SHADER_READ_BIT,
			.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
			.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
			.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
			.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
			.image = target.handle,
			.subresourceRange = whole_image,
		};

		vkCmdPipelineBarrier(
			transfer_command_buffer, //commandBuffer
			VK_PIPELINE_STAGE_TRANSFER_BIT, //srcStageMask
			VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, //dstStageMask
			0, //dependencyFlags
			0, nullptr, //memory barrier count, pointer
			0, nullptr, //buffer memory barrier count, pointer
			1, &barrier //image memory barrier count, pointer
		);
	}
	//...
}

Notice that the access masks and stage flags here are different than the previous barrier. In this case, the barrier waits until all transfer writes are complete, then transitions the image, then allows fragment shader reads to proceed. (This second part is not strictly necessary in this function because we're going to wait for the queue to drain at the end of the function; but if you were doing layout changes as part of rendering you'd definitely want to force texture reads to wait like this.)

The final steps in the function are just as we've done before: finish the command buffer, submit it to the queue, and wait until the queue finishes running.

in Helpers.cpp
void Helpers::transfer_to_image(void *data, size_t size, AllocatedImage &target) {
	//...
	//end and submit the command buffer
	VK( vkEndCommandBuffer(transfer_command_buffer) );

	VkSubmitInfo submit_info{
		.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
		.commandBufferCount = 1,
		.pCommandBuffers = &transfer_command_buffer
	};

	VK( vkQueueSubmit(rtg.graphics_queue, 1, &submit_info, VK_NULL_HANDLE) );

	//wait for command buffer to finish executing
	VK( vkQueueWaitIdle(rtg.graphics_queue) );

	//destroy the source buffer
	destroy_buffer(std::move(transfer_src));
}

And with that we've eliminated another refsol call and learned a bit more about how to wrangle images on the GPU.

If you compile and run now, the code should work exactly as it did before, and you should feel a sense of pride that you now understand the texture uploading process.

Lighting (pt2)

To wrap up our solid object drawing, let's revisit our lighting computation. Particularly, we'll set up a basic sun (directional light) + sky (hemisphere light) and make the parameters of the lights adjustable from the CPU.

A remind, first, of the current lighting setup:

in objects.frag
vec3 l = vec3(0.0, 0.0, 1.0);
//...
vec3 e = vec3(0.5 * dot(n,l) + 0.5);

outColor = vec4(e * albedo, 1.0);
The current lighting computation models a hemisphere directly above the scene that contributes one unit each of incoming red, green, and blue energy to a point with a directly-upward-facing normal.

Lights and Colors

Let's start by creating a uniform block to hold our new light parameters:

in objects.frag
#version 450

layout(set=0,binding=0,std140) uniform World {
	vec3 SKY_DIRECTION;
	vec3 SKY_ENERGY; //energy supplied by sky to a surface patch with normal = SKY_DIRECTION
	vec3 SUN_DIRECTION;
	vec3 SUN_ENERGY; //energy supplied by sun to a surface patch with normal = SUN_DIRECTION
};

layout(set=2,binding=0) uniform sampler2D TEXTURE;
//...

It's actually a bit redundant to have a direction for both the sun and the sky because we're doing lighting in world space, so we could re-orient world space so that (e.g.) the sky was always directly upward. But that seems likely to end up confusing things if we ever wanted to make world-position-dependent shaders; or if we wanted to have some of our object transforms remain static instead of uploading all of them each frame.

Let's go ahead and use these values in our lighting computation:

in objects.frag
//...
	vec3 l = vec3(0.0, 0.0, 1.0);
	vec3 albedo = texture(TEXTURE, texCoord).rgb;

	//hemisphere lighting from direction l:
	//hemisphere sky + directional sun:
	vec3 e = SKY_ENERGY * (0.5 * dot(n,SKY_DIRECTION) + 0.5)
	       + SUN_ENERGY * max(0.0, dot(n,SUN_DIRECTION)) ;
//...

Notice the difference in how the dot product is used in the hemisphere light (only reaches zero energy when the normal is exactly opposite the light direction) and the directional light (reaches zero energy when the normal is perpendicular to the lighting direction).

Compiling and running the code now produces a bunch of validation errors (and -- at least for me -- solid black objects). But that isn't surprising -- the descriptor set that's bound at 0 when running the objects pipeline (i.e., the camera descriptor set for our lines pipeline) isn't compatible with the layout of the descriptor set our shader is expecting.

our scene, but the objects are solid black
The lighting doesn't appear to be working yet.

A World Struct and Descriptor Set Layout

Let's get the CPU-side type information for our descriptor sorted out first:

in Tutorial.hpp
struct ObjectsPipeline {
	//descriptor set layouts:
	//VkDescriptorSetLayout set0_Camera = VK_NULL_HANDLE; //<-- we'll get back to set0
	VkDescriptorSetLayout set0_World = VK_NULL_HANDLE;
//...

	//types for descriptors:
	struct World {
		struct { float x, y, z, padding_; } SKY_DIRECTION;
		struct { float r, g, b, padding_; } SKY_ENERGY;
		struct { float x, y, z, padding_; } SUN_DIRECTION;
		struct { float r, g, b, padding_; } SUN_ENERGY;
	};
	static_assert(sizeof(World) == 4*4 + 4*4 + 4*4 + 4*4, "World is the expected size.");

	struct Transform {
		//...
	};
	//...
} objects_pipeline;

Notice the padding included after the vec3 members of the structure. This is required by the std140 layout, which aligns vec3s on 4-element boundaries.

Now we write the code to create the descriptor set layout for set 0. If you copy-paste your code for set1_Transforms make sure to change the descriptor type and stage flags:

in Tutorial-ObjectsPipeline.cpp
void Tutorial::ObjectsPipeline::create(RTG &rtg, VkRenderPass render_pass, uint32_t subpass) {
	VkShaderModule vert_module = rtg.helpers.create_shader_module(vert_code);
	VkShaderModule frag_module = rtg.helpers.create_shader_module(frag_code);
	
	{ //the set0_World layout holds world info in a uniform buffer used in the fragment shader:
		std::array< VkDescriptorSetLayoutBinding, 1 > bindings{
			VkDescriptorSetLayoutBinding{
				.binding = 0,
				.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
				.descriptorCount = 1,
				.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT
			},
		};
		
		VkDescriptorSetLayoutCreateInfo create_info{
			.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
			.bindingCount = uint32_t(bindings.size()),
			.pBindings = bindings.data(),
		};

		VK( vkCreateDescriptorSetLayout(rtg.device, &create_info, nullptr, &set0_World) );
	}

	{ //the set1_Transforms layout holds an array of Transform structures in a storage buffer used in the vertex shader:
		//...
	}
	//...
}

And what we create we must also destroy:

in Tutorial-ObjectsPipeline.cpp
void Tutorial::ObjectsPipeline::destroy(RTG &rtg) {
	//...
	if (set1_Transforms != VK_NULL_HANDLE) {
		//...
	}
	if (set0_World != VK_NULL_HANDLE) {
		vkDestroyDescriptorSetLayout(rtg.device, set0_World, nullptr);
		set0_World = VK_NULL_HANDLE;
	}

	//...
}

And, now that we have the descriptor set layout, we can add it to the pipeline layout:

in Tutorial-ObjectsPipeline.cpp
//in Tutorial::ObjectsPipeline::create :
{ //create pipeline layout:
	std::array< VkDescriptorSetLayout, 3 > layouts{
		set0_World,
		set1_Transforms,
		set2_TEXTURE,
	};
	//...
}

Building and running now should work, but we'll still have a stream of errors because we still don't have any descriptor sets to bind to set 0.

Buffers and Sets

We're going to stream world information per-frame. So let's add the appropriate CPU- and GPU-side buffers to our Workspace:

in Tutorial.hpp
struct Workspace {
	//...

	//location for ObjectsPipeline::World data: (streamed to GPU per-frame)
	Helpers::AllocatedBuffer World_src; //host coherent; mapped
	Helpers::AllocatedBuffer World; //device-local
	VkDescriptorSet World_descriptors; //references World

	//location for ObjectsPipeline::Transforms data: (streamed to GPU per-frame)
	Helpers::AllocatedBuffer Transforms_src; //host coherent; mapped
	Helpers::AllocatedBuffer Transforms; //device-local
	VkDescriptorSet Transforms_descriptors; //references Transforms

};

Notice that this is the same setup as for the other uniforms and storage buffers. In fact, for the creation code, we can copy-paste-modify the code we used for the lines pipeline's Camera uniform block:

in Tutorial.cpp
//in Tutorial::Tutorial:
	workspaces.resize(rtg.workspaces.size());
	for (Workspace &workspace : workspaces) {
		//...
		{ //allocate descriptor set for Camera descriptor
			//...
		}
		workspace.World_src = rtg.helpers.create_buffer(
			sizeof(ObjectsPipeline::World),
			VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
			VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
			Helpers::Mapped
		);
		workspace.World = rtg.helpers.create_buffer(
			sizeof(ObjectsPipeline::World),
			VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT,
			VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
			Helpers::Unmapped
		);

		{ //allocate descriptor set for World descriptor
			VkDescriptorSetAllocateInfo alloc_info{
				.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO,
				.descriptorPool = descriptor_pool,
				.descriptorSetCount = 1,
				.pSetLayouts = &objects_pipeline.set0_World,
			};

			VK( vkAllocateDescriptorSets(rtg.device, &alloc_info, &workspace.World_descriptors) );
			//NOTE: will actually fill in this descriptor set just a bit lower
		}
		//...
	}

And we can even use the same vkUpdateDescriptorSets call to point binding zero in the world descriptor set to its associated buffer:

in Tutorial.cpp
//in Tutorial::Tutorial:
	workspaces.resize(rtg.workspaces.size());
	for (Workspace &workspace : workspaces) {
		//...
		{ //point descriptor to Camera buffer:

			VkDescriptorBufferInfo Camera_info{
				//...
			};

			VkDescriptorBufferInfo World_info{
				.buffer = workspace.World.handle,
				.offset = 0,
				.range = workspace.World.size,
			};

			std::array< VkWriteDescriptorSet, 2 > writes{
				VkWriteDescriptorSet{
					//...
				},
				VkWriteDescriptorSet{
					.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
					.dstSet = workspace.World_descriptors,
					.dstBinding = 0,
					.dstArrayElement = 0,
					.descriptorCount = 1,
					.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
					.pBufferInfo = &World_info,
				},
			};

			vkUpdateDescriptorSets(
				rtg.device, //device
				uint32_t(writes.size()), //descriptorWriteCount
				writes.data(), //pDescriptorWrites
				0, //descriptorCopyCount
				nullptr //pDescriptorCopies
			);
		}
	}

Two bits of book-keeping to do though. First, we should remember to destroy these resources:

in Tutorial.cpp
//in Tutorial::~Tutorial:
	for (Workspace &workspace : workspaces) {
		//...
		//Camera_descriptors freed when pool is destroyed.

		if (workspace.World_src.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.World_src));
		}
		if (workspace.World.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.World));
		}
		//World_descriptors freed when pool is destroyed.

		if (workspace.Transforms_src.handle != VK_NULL_HANDLE) {
			rtg.helpers.destroy_buffer(std::move(workspace.Transforms_src));
		}
		//...
	}

And, second, we need to size our descriptor pool properly to account for the new descriptor set we're allocating:

in Tutorial.cpp
//in Tutorial::Tutorial:
{ //create descriptor pool:
	uint32_t per_workspace = uint32_t(rtg.workspaces.size()); //for easier-to-read counting

	std::array< VkDescriptorPoolSize, 2> pool_sizes{
		VkDescriptorPoolSize{
			.type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
			.descriptorCount = 2 * per_workspace, //one descriptor per set, two sets per workspace
		},
		//...
	};
		
	VkDescriptorPoolCreateInfo create_info{
		.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
		.flags = 0, //because CREATE_FREE_DESCRIPTOR_SET_BIT isn't included, *can't* free individual descriptors allocated from this pool
		.maxSets = 3 * per_workspace, //three sets per workspace
		.poolSizeCount = uint32_t(pool_sizes.size()),
		.pPoolSizes = pool_sizes.data(),
	};

	VK( vkCreateDescriptorPool(rtg.device, &create_info, nullptr, &descriptor_pool) );
}

Compiling and running at this point will, again, work -- but there's still a ton of per-frame warnings because we haven't bound our new descriptor set.

Transfers and Bindings

Okay, time to get data into our buffer. Let's set up some scene variables to track the sun and sky position:

in Tutorial.hpp
	//--------------------------------------------------------------------
	//Resources that change when time passes or the user interacts:

	//...
	std::vector< LinesPipeline::Vertex > lines_vertices;

	ObjectsPipeline::World world;

	//...

And actually set the variables to something in the update function:

in Tutorial.cpp
void Tutorial::update(float dt) {
	time = std::fmod(time + dt, 60.0f);

	{ //camera orbiting the origin:
		//...
	}

	{ //static sun and sky:
		world.SKY_DIRECTION.x = 0.0f;
		world.SKY_DIRECTION.y = 0.0f;
		world.SKY_DIRECTION.z = 1.0f;

		world.SKY_ENERGY.r = 0.1f;
		world.SKY_ENERGY.g = 0.1f;
		world.SKY_ENERGY.b = 0.2f;

		world.SUN_DIRECTION.x = 6.0f / 23.0f;
		world.SUN_DIRECTION.y = 13.0f / 23.0f;
		world.SUN_DIRECTION.z = 18.0f / 23.0f;

		world.SUN_ENERGY.r = 1.0f;
		world.SUN_ENERGY.g = 1.0f;
		world.SUN_ENERGY.b = 0.9f;
	}
	//...
}

Now to the render function to upload the data (a slightly simpler version of what we did for the camera):

in Tutorial.cpp
//in Tutorial::render
	//...
	{ //upload camera info:
		//...
	}

	{ //upload world info:
		assert(workspace.Camera_src.size == sizeof(world));

		//host-side copy into World_src:
		memcpy(workspace.World_src.allocation.data(), &world, sizeof(world));

		//add device-side copy from World_src -> World:
		assert(workspace.World_src.size == workspace.World.size);
		VkBufferCopy copy_region{
			.srcOffset = 0,
			.dstOffset = 0,
			.size = workspace.World_src.size,
		};
		vkCmdCopyBuffer(workspace.command_buffer, workspace.World_src.handle, workspace.World.handle, 1, &copy_region);
	}
	//...

And, finally, we can add code to bind the descriptor set:

in Tutorial.cpp
//in Tutorial::render
	//...
	{ //bind World and Transforms descriptor sets:
		std::array< VkDescriptorSet, 2 > descriptor_sets{
			workspace.World_descriptors, //0: World
			workspace.Transforms_descriptors, //1: Transforms
		};
		vkCmdBindDescriptorSets(
			workspace.command_buffer, //command buffer
			VK_PIPELINE_BIND_POINT_GRAPHICS, //pipeline bind point
			objects_pipeline.layout, //pipeline layout
			0, //first set
			uint32_t(descriptor_sets.size()), descriptor_sets.data(), //descriptor sets count, ptr
			0, nullptr //dynamic offsets count, ptr
		);
	}
	//...

Compiling and running, we finally have no validation errors and our lighting showing up in the world.

our scene, now with sun+sky lighting
Our scene with the new lighting. Notice the harder shadow terminator from the sun light and the subtle blue shift in the shadows.