Memory Management

At this point, you've created a fully-functional Vulkan renderer which can draw hundreds of thousands of instanced, textured, lit objects at real-time rates. So it's time to start cleaning up your code's dependence on refsol:: functions, and learning a bit more about the boilerplate required to use Vulkan as we do so.

Currently, by my count, there are 24 refsol:: function calls in the codebase that need to be replaced. In this step, we're going to go after calls in Helpers.cpp, which are all (more or less) about managing memory.

The Easy One: Creating a Shader Module

Before we get into more complicated calls, we have one easy case to clean up:

in Helpers.cpp

VkShaderModule Helpers::create_shader_module(uint32_t const *code, size_t bytes) const {
	VkShaderModule shader_module = VK_NULL_HANDLE;
	refsol::Helpers_create_shader_module(rtg, code, bytes, &shader_module);
	VkShaderModuleCreateInfo create_info{
		.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
		.codeSize = bytes,
		.pCode = code
	};
	VK( vkCreateShaderModule(rtg.device, &create_info, nullptr, &shader_module) );
	return shader_module;
}

This is straightforward -- we want to create a shader module from some SPIR-V bytecode, so we wrap that pointer in an appropriate CreateInfo and pass it to vkCreateShaderModule.

The only reason this was even pulled out as a helper function in the first place was (1) we use this when creating pipelines are there was already way too much boilerplate code there; and (2) to support that convenient templated version in the header that automatically determines the array size.

Do a quick compile-and-run -- things should continue to work -- and we're down to 23 reference solution calls, just like that.

Buffers

Now let's get into some memory wrangling. We'll start with buffers, because we don't want to think about format and layout just yet.

The general shape of making a buffer in Vulkan is as follows: first, you create a buffer with your desired parameters, but without any memory associated with it. Then, you get a allocated a block of memory of the correct type to associate with the buffer. Finally, you bind the memory to the buffer to create something you can actually use.

This create-then-allocate-then-bind dance is why our code calls them AllocatedBuffer objects, by the way.

Creating the Buffer

We'll start with creating the buffer, which goes exactly as you probably expect by now:

in Helpers.cpp

Helpers::AllocatedBuffer Helpers::create_buffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, MapFlag map) {
	AllocatedBuffer buffer;
	refsol::Helpers_create_buffer(rtg, size, usage, properties, (map == Mapped), &buffer);
	VkBufferCreateInfo create_info{
		.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO,
		.size = size,
		.usage = usage,
		.sharingMode = VK_SHARING_MODE_EXCLUSIVE,
	};
	VK( vkCreateBuffer(rtg.device, &create_info, nullptr, &buffer.handle) );
	buffer.size = size;

	//TODO: determine memory requirements

	//TODO: allocate memory

	//TODO: bind memory
	return buffer;
}

Now we need to determine the memory requirements, which we do as follows:

in Helpers.cpp

Helpers::AllocatedBuffer Helpers::create_buffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, MapFlag map) {
	//...
	//determine memory requirements:
	VkMemoryRequirements req;
	vkGetBufferMemoryRequirements(rtg.device, buffer.handle, &req);

	//...
}

Strangely enough, vkGetBufferMemoryRequirements is one of the very rare Vulkan functions that cannot return an error.

The VkMemoryRequirements structure is now filled with the requirements for the memory to be bound to the buffer. The members of this structure are a size (in bytes), an alignment (in bytes) -- both of which are what you expect -- and a memoryTypeBits, which is a bitfield of which memory types from the physical device are supported for the backing memory.

We'll pass all of this information on to the memory allocation function (which, yes, we need to write):

in Helpers.cpp

Helpers::AllocatedBuffer Helpers::create_buffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, MapFlag map) {
	//...
	//allocate memory:
	buffer.allocation = allocate(req, properties, map);
	//...
}

And, to wrap up, we'll tell Vulkan to bind the allocated memory to the already-created buffer:

in Helpers.cpp

Helpers::AllocatedBuffer Helpers::create_buffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, MapFlag map) {
	//...
	//bind memory:
	VK( vkBindBufferMemory(rtg.device, buffer.handle, buffer.allocation.handle, buffer.allocation.offset) );
}

Compiling now will reveal the missing prototype for the memory allocator. So let's fill that in (two versions!) along with a free function for good measure:

in Helpers.hpp

//...
	enum MapFlag {
		Unmapped = 0,
		Mapped = 1,
	};

	//allocate a block of requested size and alignment from a memory with the given type index:
	Allocation allocate(VkDeviceSize size, VkDeviceSize alignment, uint32_t memory_type_index, MapFlag map = Unmapped);

	//allocate a block that works for a given VkMemoryRequirements and VkMemoryPropertyFlags:
	Allocation allocate(VkMemoryRequirements const &requirements, VkMemoryPropertyFlags memory_properties, MapFlag map = Unmapped);

	//free an allocated block:
	void free(Allocation &&allocation);

	//specializations that also create a buffer or image (respectively):
//...

Now the code will compile but not link because we need to fill in the allocation functions.

Buffer Destruction

We might as well write the buffer destroy function as well (especially since there's no guarantee that our allocator will behave the same way as the refsol's, so calling its destroy_buffer may act strangely).

in Helpers.cpp

void Helpers::destroy_buffer(AllocatedBuffer &&buffer) {
	refsol::Helpers_destroy_buffer(rtg, &buffer);
	vkDestroyBuffer(rtg.device, buffer.handle, nullptr);
	buffer.handle = VK_NULL_HANDLE;
	buffer.size = 0;

	this->free(std::move(buffer.allocation));
}

The buffer gets destroyed, and then we pass the memory allocation to our free function to take care of releasing it as well. (We do the destruction in this order because we don't want to deallocate the memory while the buffer still references it.)

Another refsol call eliminated, and the code should continue to compile (but not link).

A Memory Allocator

Now it's time to fill in our memory allocation functions. As you've probably figured out from context, memory on Vulkan devices is not all the same. Instead, each physical device supports an array of different memory types, which have different properties and are allocated from (potentially) different heaps.

To make this notion a bit more concrete, let's write the version of allocate that takes a VkMemoryRequirements:

in Helpers.cpp

Helpers::Allocation::~Allocation() {
	if (!(handle == VK_NULL_HANDLE && offset == 0 && size == 0 && mapped == nullptr)) {
		std::cerr << "Destructing a non-empty Allocation; device memory will leak." << std::endl;
	}
}

//----------------------------

Helpers::Allocation Helpers::allocate(VkMemoryRequirements const &req, VkMemoryPropertyFlags properties, MapFlag map) {
	return allocate(req.size, req.alignment, find_memory_type(req.memoryTypeBits, properties), map);
}

//----------------------------

Helpers::AllocatedBuffer Helpers::create_buffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, MapFlag map) {
//...

This is -- apparently -- not the complicated version of the function.

This version of our allocate function passes the work of allocating the memory to the other overload of the function, and the work of finding a memory type in the memoryTypeBits bit set that also has the memory properties in properties to a function called find_memory_type.

So that's the next function we need to write. We'll start with the prototype:

in Helpers.hpp

	//-----------------------
	//Misc utilities:

	//for selecting memory types (used by allocate, above):
	VkPhysicalDeviceMemoryProperties memory_properties{};
	uint32_t find_memory_type(uint32_t type_filter, VkMemoryPropertyFlags flags) const;

	//for selecting image formats:
	VkFormat find_image_format(std::vector< VkFormat > const &candidates, VkImageTiling tiling, VkFormatFeatureFlags features) const;

The VkPhysicalDeviceMemoryProperties structure is the description of the physical device's memory types and heaps that our memory-type-finding function will use when enumerating the types. We'll ask Vulkan for it in Helpers::create:

in Helpers.cpp

void Helpers::create() {
	//...

	vkGetPhysicalDeviceMemoryProperties(rtg.physical_device, &memory_properties);
}

The VkPhysicalDeviceMemoryProperties structure is now filled with information about how many types of memory the device has, what the properties each type has, and what heap it is allocated from. The structure also tells us how many heaps there are and the approximate size of these heaps.

For your submission for this step you should write some debug code to print all this information out:

in Helpers.cpp

void Helpers::create() {
	//...

	vkGetPhysicalDeviceMemoryProperties(rtg.physical_device, &memory_properties);

	if (rtg.configuration.debug) {
		std::cout << "Memory types:\n";
		for (uint32_t i = 0; i < memory_properties.memoryTypeCount; ++i) {
			VkMemoryType const &type = memory_properties.memoryTypes[i];
			std::cout << " [" << i << "] heap " << type.heapIndex << ", flags: " << string_VkMemoryPropertyFlags(type.propertyFlags) << '\n';
		}
		std::cout << "Memory heaps:\n";
		for (uint32_t i = 0; i < memory_properties.memoryHeapCount; ++i) {
			VkMemoryHeap const &heap = memory_properties.memoryHeaps[i];
			std::cout << " [" << i << "] " << heap.size << " bytes, flags: " << string_VkMemoryHeapFlags( heap.flags ) << '\n';
		}
		std::cout.flush();
	}
}

Take a screenshot of the output this prints and put it in the check-in thread!

With this information in hand we can write the find_memory_type function:

in Helpers.cpp

//----------------------------

uint32_t Helpers::find_memory_type(uint32_t type_filter, VkMemoryPropertyFlags flags) const {
	for (uint32_t i = 0; i < memory_properties.memoryTypeCount; ++i) {
		VkMemoryType const &type = memory_properties.memoryTypes[i];
		if ((type_filter & (1 << i)) != 0
		 && (type.propertyFlags & flags) == flags) {
			return i;
		}
	}
	throw std::runtime_error("No suitable memory type found.");
}

VkFormat Helpers::find_image_format(std::vector< VkFormat > const &candidates, VkImageTiling tiling, VkFormatFeatureFlags features) const {

As you can see, the function walks through each memory type, attempting to find one that both appears in the type_filter bit-set and supports all the requested flags in propertyFlags. (Recall that VkMemoryPropertyFlagBits include things like VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, which means no special cache control is needed for the host to access the memory; and VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, which means the memory is efficient for device access.)

Time to tackle the actual allocation function. In this tutorial we're going to ask Vulkan for each and every piece of memory we use; but it would generally be substantially more efficient to allocate a large block of memory of each type we might need, and hand that block out in smaller pieces to the individual allocation calls. To this end, the Allocation structure includes an offset member, and if you use it properly (as we did in the buffer creation function above) it should be possible to retrofit a fancier allocator of your choosing into the helper functions here.

in Helpers.cpp

//----------------------------

Helpers::Allocation Helpers::allocate(VkDeviceSize size, VkDeviceSize alignment, uint32_t memory_type_index, MapFlag map) {
	Helpers::Allocation allocation;

	VkMemoryAllocateInfo alloc_info{
		.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO,
		.allocationSize = size,
		.memoryTypeIndex = memory_type_index
	};

	VK( vkAllocateMemory( rtg.device, &alloc_info, nullptr, &allocation.handle ) );

	allocation.size = size;
	allocation.offset = 0;

	if (map == Mapped) {
		//TODO: map memory
	}

	return allocation;
}

Helpers::Allocation Helpers::allocate(VkMemoryRequirements const &req, VkMemoryPropertyFlags properties, MapFlag map) {

Everything seems straightfoward enough, but notice one oddity: the memory alignment parameter is just ignored! This is because vkAllocateMemory is specified to return memory that is aligned enough to meets "any alignment requirement of the implementation." (Of course, if you were rolling your own slab-based allocator, you'd need to take alignment into account when finding the next free portion of the current slab.)

The last little piece of this is mapping the memory into the host address space if requested. To do this, we just call vkMapMemory. This won't work on all memory types, but for those that it does work on it provides a very nice way to quickly write data into the memory: we can use plain-old memory-writing functions.

in Helpers.cpp

//TODO: map memory
VK( vkMapMemory(rtg.device, allocation.handle, 0, allocation.size, 0, &allocation.mapped) );

Mapped memory provides another reason to replace this code with an allocator that hands out subsets of large pieces -- the number of different memory-mapped ranges one can have active is relatively small on some devices (though the spec doesn't give a number or guarantee a minimum, as far as I can tell). Better to have all of your source buffers in one big chunk and take up only one mapping slot than to have them all taking a mapping slot and potentially run out.

Let's finish things off with a free function:

in Helpers.cpp

Helpers::Allocation Helpers::allocate(VkMemoryRequirements const &req, VkMemoryPropertyFlags properties, MapFlag map) {
	return allocate(req.size, req.alignment, find_memory_type(req.memoryTypeBits, properties), map);
}

void Helpers::free(Helpers::Allocation &&allocation) {
	if (allocation.mapped != nullptr) {
		vkUnmapMemory(rtg.device, allocation.handle);
		allocation.mapped = nullptr;
	}

	vkFreeMemory(rtg.device, allocation.handle, nullptr);

	allocation.handle = VK_NULL_HANDLE;
	allocation.offset = 0;
	allocation.size = 0;
}

//----------------------------

This function unmaps the memory with vkUnmapMemory (if it is mapped), and returns the memory to Vulkan with vkFreeMemory. It also sets the allocation back to the "empty" state so that it won't complain in its destructor about leaking memory.

At this point, everything should compile and run as before.

Take a moment to scroll through the output and find the information about your device's various memory types. (It will probably be printed twice if you did the aside above because I have the refsol code do it at some point as well.)

It's interesting to see what different memory types and heaps your GPU has. On the computer where I'm currently writing, the NVIDIA GeForce RTX 3080 advertises five memory types from two heaps. The sizes of the heaps look like the size of main memory and the size of the GPU's onboard memory, respectively. Interestingly enough, one of the advertised memory types is both host-coherent and device-local (and comes from the GPU's heap) -- likely, some onboard GPU memory that can be made visible in the host address space via PCI's memory-mapping capability.

This memory would offer a fewer-copy upload path to the GPU, but might actually end up being slower to upload because of CPU-driven PCI memory operations (vs having the GPU run the copy). To find out if this is the case, we'd need to create a version of our code that targeted this alternative memory and benchmark it.

Images

We are down to just three reference solution calls in Helpers.cpp; and all three of the remaining ones have to do with VkImages.

Find Image Format

First up: find_image_format, a helper that doesn't actually get used from the image creation functions directly, or -- in fact -- by any code we've written so far in the tutorial. (But we'll need it when we get to creating swapchain-related resources.)

This function is sort of like the find_memory_type function, but for image formats. The caller asks for features they want (as a logical or of VkFormatFeatureFlagBits) and the function looks for formats (among the candidates the caller requests) that support those features:

in Helpers.cpp

VkFormat Helpers::find_image_format(std::vector< VkFormat > const &candidates, VkImageTiling tiling, VkFormatFeatureFlags features) const {
	return refsol::Helpers_find_image_format(rtg, candidates, tiling, features);
	for (VkFormat format : candidates) {
		VkFormatProperties props;
		vkGetPhysicalDeviceFormatProperties(rtg.physical_device, format, &props);
		if (tiling == VK_IMAGE_TILING_LINEAR && (props.linearTilingFeatures & features) == features) {
			return format;
		} else if (tiling == VK_IMAGE_TILING_OPTIMAL && (props.optimalTilingFeatures & features) == features) {
			return format;
		}
	}
	throw std::runtime_error("No supported format matches request.");
}

The other thing to notice is that features can vary depending on the tiling (arrangement of texels in memory) of the image, something we can specify with a VkImageTiling value both in this function and when creating an image.

Create Image

Now on to the big one: Helpers::create_image. The pattern is the same as for buffers -- create the VkImage, ask how much memory it needs, create the memory, and bind the memory.

in Helpers.cpp

Helpers::AllocatedImage Helpers::create_image(VkExtent2D const &extent, VkFormat format, VkImageTiling tiling, VkImageUsageFlags usage, VkMemoryPropertyFlags properties, MapFlag map) {
	AllocatedImage image;
	refsol::Helpers_create_image(rtg, extent, format, tiling, usage, properties, (map == Mapped), &image);
	image.extent = extent;
	image.format = format;

	VkImageCreateInfo create_info{
		.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
		.imageType = VK_IMAGE_TYPE_2D,
		.format = format,
		.extent{
			.width = extent.width,
			.height = extent.height,
			.depth = 1
		},
		.mipLevels = 1,
		.arrayLayers = 1,
		.samples = VK_SAMPLE_COUNT_1_BIT,
		.tiling = tiling,
		.usage = usage,
		.sharingMode = VK_SHARING_MODE_EXCLUSIVE,
		.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
	};

	VK( vkCreateImage(rtg.device, &create_info, nullptr, &image.handle) );

	VkMemoryRequirements req;
	vkGetImageMemoryRequirements(rtg.device, image.handle, &req);

	image.allocation = allocate(req, properties, map);

	VK( vkBindImageMemory(rtg.device, image.handle, image.allocation.handle, image.allocation.offset) );

	return image;
}

When you want to support mip-maps, non-2D textures (like cubemaps), or texture arrays, you'll need to add another parameter to this function to set the appropriate members of VkImageCreateInfo; and you'll probably want to add more members to AllocatedImage to hold them so that, e.g., transfer_to_image can do the right thing.

Note, also, that the initial layout is VK_IMAGE_LAYOUT_UNDEFINED; if you wanted to specify images directly by writing data to mapped memory (instead of copying from a buffer) you'd instead set this to VK_IMAGE_LAYOUT_PREINITIALIZED and the tiling to VK_IMAGE_TILING_LINEAR, which together would guarantee a known image layout.

Destroy Image

Destroying images proceeds much like destroying buffers: destroy the VkImage handle, then free the allocated memory.

in Helpers.cpp

void Helpers::destroy_image(AllocatedImage &&image) {
	refsol::Helpers_destroy_image(rtg, &image);
	vkDestroyImage(rtg.device, image.handle, nullptr);

	image.handle = VK_NULL_HANDLE;
	image.extent = VkExtent2D{.width = 0, .height = 0};
	image.format = VK_FORMAT_UNDEFINED;

	this->free(std::move(image.allocation));
}

We also take the time to set the metadata we're storing about the AllocatedImage back to the default values, just to keep things tidy.

Declare Victory!

And that's it! No more refsol calls in Helpers.cpp. And that means you can finally:

in Helpers.cpp

#include "RTG.hpp"
#include "VK.hpp"
#include "refsol.hpp"