Headless Mode
Now that you've implemented all of the harness code, we can make it do something different: render without a window (or, indeed, a windowing system).
In headless mode your harness will create a fake "swapchain" that it manages itself (without any WSI extensions); render frames based on events read from std::cin; and, optionally, save the rendered frames to files.
What good is a headless mode?
You can use it for benchmarking and testing -- you can set up a consistent workload (our events will include specified time deltas for update) and run it multiple times to test different rendering approaches.
You can even save and check the rendered frames to ensure your code really is producing the same results.
But you could also imagine using a headless mode like this to create frames of a scene on-demand for training data or as input for a video encoder.
Flags and Arguments
Our code will know it is meant to be in headless mode by checking a boolean in the Configuration structure:
RTG.hpp
struct Configuration {
//...
//how many "workspaces" (frames that can currently be being worked on by the CPU or GPU) to use:
uint32_t workspaces = 2;
//run without a window, read events from stdin:
bool headless = false;
//...
};
The --headless command-line argument will turn this flag on:
RTG.cpp
void RTG::Configuration::parse(int argc, char **argv) {
//...
} else if (arg == "--drawing-size") {
//...
} else if (arg == "--headless") {
headless = true;
} else {
//...
}
And let's add this flag to usage so it will be documented when the user requests --help:
RTG.cpp
void RTG::Configuration::usage(std::function< void(const char *, const char *) > const &callback) {
//...
callback("--headless", "Don't create a window; read events from stdin.");
}
At this point everything should compile and run, but the --headless flag will do nothing -- despite what the printed usage message says!
A Fake Swapchain
In headless mode, our code will manage its own "swapchain" made of plain old VkImages.
Our fake swapchain's version of "presenting" images will be to copy them from device memory to host memory (from whence they can, optionally, be saved to a file).
So, for each swapchain image, we'll need an on-GPU image, an on-CPU buffer to copy to, a command buffer to do the copy, and a fence to signal once the copy completes.
We'll wrap this all up in a struct HeadlessSwapchainImage and make a vector of them:
RTG.hpp
struct RTG {
//...
VkSwapchainKHR swapchain = VK_NULL_HANDLE; //in non-headless mode, swapchain images are managed by this object; in headless mode this will be null
//in headless mode, we maintain our own swapchain:
VkCommandPool headless_command_pool = VK_NULL_HANDLE;
struct HeadlessSwapchainImage {
Helpers::AllocatedImage image; //on-GPU rendering target
Helpers::AllocatedBuffer buffer; //host memory to copy image to after rendering
VkCommandBuffer copy_command = VK_NULL_HANDLE; //copy image -> buffer
VkFence image_presented = VK_NULL_HANDLE; //fence to signal after copy finishes
};
std::vector< HeadlessSwapchainImage > headless_swapchain;
//...
};
I also added a VkCommandPool, from which we'll allocate the per-image copy command buffers.
Creating the Fake Swapchain
Let's re-work recreate_swapchain to make a fake swapchain in headless mode by first moving all the real swapchain code to its own branch of a conditional statement:
RTG.cpp
void RTG::recreate_swapchain() {
//...
if (configuration.headless) {
assert(surface == VK_NULL_HANDLE); //headless, so must not have a surface
//make a fake swapchain:
//TODO: set extent from configuration
//TODO: set number of images to 3
//TODO: create headless_command_pool
//TODO: create headless_swapchain
//TODO: fill in swapchain_images
} else {
assert(surface != VK_NULL_HANDLE); //not headless, so must have a surface
//request a swapchain from the windowing system:
//determine size, image count, and transform for swapchain:
VkSurfaceCapabilitiesKHR capabilities;
VK( vkGetPhysicalDeviceSurfaceCapabilitiesKHR(physical_device, surface, &capabilities) );
//...
{ //get the swapchain images:
//...
}
}
//create views for swapchain images:
//...
}
Eventually, our headless mode will run without the VK_KHR_swapchain extension or any of the platform-specific WSI extensions.
So we must ensure that none of the functions from these extensions are called in headless mode
(in this particular function, it means making sure vkGetPhysicalDeviceSurfaceCapabilitiesKHR, vkCreateSwapchainKHR, and vkGetSwapchainImagesKHR don't run in headless mode.)
To create our headless swapchain, we'll follow much the same creation order as the real swapchain. Since we don't have any surface to tell us its size, we use the requested extent from the configuration. Also, we'll be emulating FIFO presentation mode, so 3 images will be sufficient for our swapchain.
RTG.cpp
//TODO: set extent from configuration
swapchain_extent = configuration.surface_extent;
//TODO: set number of images to 3
uint32_t requested_count = 3; //enough for FIFO-style presentation
The command pool will be created in the same way we've created command pools before.
Since we're going to record these commands once and never reset them, we don't pass VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT when creating the command pool.
RTG.cpp
{ //create command pool for the headless image copy command buffers:
assert(headless_command_pool == VK_NULL_HANDLE);
VkCommandPoolCreateInfo create_info{
.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,
.flags = 0,
.queueFamilyIndex = graphics_queue_family.value(),
};
VK( vkCreateCommandPool(device, &create_info, nullptr, &headless_command_pool) );
}
The headless swapchain itself is more resource creation.
Notice that the vector is pre-allocated with reserve to avoid re-allocations as more headless swapchain images are appended to the vector.
RTG.cpp
//TODO: create headless_swapchain
assert(headless_swapchain.empty());
headless_swapchain.reserve(requested_count);
for (uint32_t i = 0; i < requested_count; ++i) {
//add a headless "swapchain" image:
HeadlessSwapchainImage &h = headless_swapchain.emplace_back();
//TODO: allocate image data
//TODO: allocate buffer data
//TODO: create and record copy command
//TODO: create fence
}
Image and buffer creation can be done with helper functions. We mark the image as being used both as a color attachment (so we can render to it) and a transfer source (so we can copy from it), and request that it is placed in device memory.
RTG.cpp
//allocate image data: (on-GPU, will be rendered to)
h.image = helpers.create_image(
swapchain_extent,
surface_format.format,
VK_IMAGE_TILING_OPTIMAL,
VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | VK_IMAGE_USAGE_TRANSFER_SRC_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
);
We set up the buffer where the image will be received as a transfer destination, and ask that it be placed in host-visible, host-coherent, mapped memory so it's easy for us to extract the image for saving later.
RTG.cpp
//allocate buffer data: (on-CPU, will be copied to)
h.buffer = helpers.create_buffer(
swapchain_extent.width * swapchain_extent.height * vkuFormatTexelBlockSize(surface_format.format) / vkuFormatTexelsPerBlock(surface_format.format),
VK_BUFFER_USAGE_TRANSFER_DST_BIT,
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
Helpers::Mapped
);
Hark! In order to figure out how many bytes per texel are needed to store the image, this code uses a pair of functions from the vk_format_utils.h header, so we need to remember to include it:
RTG.cpp
//...
#include <vulkan/vk_enum_string_helper.h> //useful for debug output
#include <vulkan/utility/vk_format_utils.h> //for getting format sizes
#include <GLFW/glfw3.h>
//...
The copy command buffer is almost identical to the one used in Helpers::transfer_to_image, except that we're copying image-to-buffer, not buffer-to-image.
Also, we're going to submit the command buffer more that once, so we don't specify VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT in the begin_info flags.
RTG.cpp
{ //create and record copy command:
VkCommandBufferAllocateInfo alloc_info{
.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO,
.commandPool = headless_command_pool,
.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY,
.commandBufferCount = 1,
};
VK( vkAllocateCommandBuffers(device, &alloc_info, &h.copy_command) );
//record:
VkCommandBufferBeginInfo begin_info{
.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,
.flags = 0,
};
VK( vkBeginCommandBuffer(h.copy_command, &begin_info) );
VkBufferImageCopy region{
.bufferOffset = 0,
.bufferRowLength = swapchain_extent.width,
.bufferImageHeight = swapchain_extent.height,
.imageSubresource{
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.mipLevel = 0,
.baseArrayLayer = 0,
.layerCount = 1,
},
.imageOffset{ .x = 0, .y = 0, .z = 0 },
.imageExtent{
.width = swapchain_extent.width,
.height = swapchain_extent.height,
.depth = 1
},
};
vkCmdCopyImageToBuffer(
h.copy_command,
h.image.handle,
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
h.buffer.handle,
1, ®ion
);
VK( vkEndCommandBuffer(h.copy_command) );
}
Notice that the call to vkCmdCopyImageToBuffer specifies that the source image must be in the VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL layout -- we'll need to make sure the image is transitioned to this layout when rendering finishes.
(But let's finish with the swapchain creation for now.)
RTG.cpp
{ //create fence to signal when image is done being "presented" (copied to host memory):
VkFenceCreateInfo create_info{
.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO,
.flags = VK_FENCE_CREATE_SIGNALED_BIT, //start signaled, because all images are available to start with
};
VK( vkCreateFence(device, &create_info, nullptr, &h.image_presented) );
}
Where, for an actual swapchain, extracting references to the images requires calling vkGetSwapchainImagesKHR; with our fake swapchain we just copy the images' handles:
RTG.cpp
//copy image references into swapchain_images:
assert(swapchain_images.empty());
swapchain_images.assign(requested_count, VK_NULL_HANDLE);
for (uint32_t i = 0; i < requested_count; ++i) {
swapchain_images[i] = headless_swapchain[i].image.handle;
}
At this point, the code should compile, but running in headless mode will trigger an assertion because we haven't updated the initialization code to not create a surface yet.
Destroying the Fake Swapchain
Destroying the fake swapchain's resources proceeds much like destroying any other Vulkan resources. To start with, we'll move the non-headless code into its own branch of a conditional statement:
RTG.cpp
void RTG::destroy_swapchain() {
//...
if (configuration.headless) {
//TODO: destroy headless_swapchain images
//TODO: destroy headless_command_pool
} else {
//deallocate the swapchain and thus) its images:
if (swapchain != VK_NULL_HANDLE) {
vkDestroySwapchainKHR(device, swapchain, nullptr);
swapchain = VK_NULL_HANDLE;
}
}
}
To destroy the headless swapchain's images, we loop through each image, destroying its resources; then we clear the vector:
RTG.cpp
//destroy the headless swapchain:
for (auto &h : headless_swapchain) {
helpers.destroy_image(std::move(h.image));
helpers.destroy_buffer(std::move(h.buffer));
h.copy_command = VK_NULL_HANDLE; //pool deallocated below
vkDestroyFence(device, h.image_presented, nullptr);
h.image_presented = VK_NULL_HANDLE;
}
headless_swapchain.clear();
Notice that we don't bother to free the individual command buffers, since we're about to destroy the whole pool anyway:
RTG.cpp
//free all of the copy command buffers by destroying the pool from which they were allocated:
vkDestroyCommandPool(device, headless_command_pool, nullptr);
headless_command_pool = VK_NULL_HANDLE;
And that's fake swapchain cleanup done. Again, the code should compile and run; but attempting to run in headless mode will still run into that assertion about the surface existing.
Using the Fake Swapchain
Before we dive into how to use our fake swapchain, let's take a moment to recall how a real swapchain is used in RTG::run.
Our current run-loop code works as follows:
- First, it waits for the next workspace to become available (via
vkWaitForFenceson the workspace'sworkspace_availablefence); it needs to do this to make sure there isn't any GPU code still using the workspace. - Then it gets the index of the next image to render into via
vkAcquireNextImageKHR; this image may still be in use until theimage_availablesemaphore is signaled. - The harness code now calls into the application code (via
application.render); the application is responsible for queueing GPU work:-
the work that renders to the swapchain image must happen after the
image_availablesemaphore is signaled, and must (in turn) signal theimage_donesemaphore; - further, any work that depends on the workspace must complete before the
workspace_availablesemaphore is signaled.
-
the work that renders to the swapchain image must happen after the
-
Finally, the harness code calls
vkQueuePresentKHRto let the windowing system know which semaphore to wait on before displaying the swapchain image.
To operate in headless mode, we need to replace the calls to vkAcquireNextImageKHR and vkQueuePresentKHR with manipulations of our fake swapchain.
Since our headless-mode equivalent of "displaying the image" is "copying the image to host memory", that means we'll have a setup like this:
image_available" box indicates where our fake swapchain code will queue GPU work to do the signalling, while the "image_available already signaled" box indicates where the application's rendering work will wait on that signal.
Image Acquisition
We will emulate FIFO presentation mode with our fake swapchain. So acquiring the next image is as simple as incrementing an index to get the least-recently-used image; though we do need to do a few more steps to make sure the previous image is not still copying. We can place all of this code in a conditional statement that wraps the "real swapchain" acquire function:
RTG.cpp
void RTG::run(Application &application) {
//...
uint32_t headless_next_image = 0;
std::chrono::high_resolution_clock::time_point before = std::chrono::high_resolution_clock::now();
while (!glfwWindowShouldClose(window)) {
//...
uint32_t image_index = -1U;
if (configuration.headless) {
assert(swapchain == VK_NULL_HANDLE);
//acquire the least-recently-used headless swapchain image:
assert(headless_next_image < uint32_t(headless_swapchain.size()));
image_index = headless_next_image;
headless_next_image = (headless_next_image + 1) % uint32_t(headless_swapchain.size());
//TODO: wait for image to be done copying to buffer
//TODO: save buffer, if needed
//TODO: mark next copy as pending
//TODO: signal GPU that image is "available for rendering to"
} else {
retry:
//Ask the swapchain for the next image index -- note careful return handling:
if (VkResult result = vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, workspaces[workspace_index].image_available, VK_NULL_HANDLE, &image_index);
result == VK_ERROR_OUT_OF_DATE_KHR) {
//...
}
}
//...
}
//...
}
Our fake swapchain uses the image_presented fence to synchronize with the copy-to-host GPU work, so let's fill in the appropriate wait and reset commands:
(we'll come back later for the image saving code)
RTG.cpp
//wait for image to be done copying to buffer
VK( vkWaitForFences(device, 1, &headless_swapchain[image_index].image_presented, VK_TRUE, UINT64_MAX) );
//TODO: save buffer, if needed
//mark next copy as pending
VK( vkResetFences(device, 1, &headless_swapchain[image_index].image_presented) );
The application code is expecting to wait on an image_available semaphore, so we need to submit some GPU work to signal it:
RTG.cpp
//signal GPU that image is "available for rendering to"
VkSubmitInfo submit_info{
.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
.signalSemaphoreCount = 1,
.pSignalSemaphores = &workspaces[workspace_index].image_available
};
VK( vkQueueSubmit(graphics_queue, 1, &submit_info, nullptr) );
Note that if we were writing a headless-only application this would be overkill -- we know that the GPU work with the current image is done because we just synchronized the CPU with the copy finishing using a fence. But since our application code also needs to work with a swapchain (which doesn't necessarily synchronize CPU and GPU work during image acquisition), we need to write this code.
Image Presentation
On the presentation side, we just need to submit GPU work that waits for the image to be done rendering, kicks off the copy command buffer, and signals the copy-finished fence: (and, as before, we'll move the real-swapchain code to a branch in a conditional)
RTG.cpp
//queue the work for presentation:
if (configuration.headless) {
//in headless mode, submit the copy command we recorded previously:
//will wait in the transfer stage for image_done to be signaled:
VkPipelineStageFlags wait_stage = VK_PIPELINE_STAGE_TRANSFER_BIT;
VkSubmitInfo submit_info{
.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
.waitSemaphoreCount = 1,
.pWaitSemaphores = &swapchain_image_dones[image_index],
.pWaitDstStageMask = &wait_stage,
.commandBufferCount = 1,
.pCommandBuffers = &headless_swapchain[image_index].copy_command,
};
VK( vkQueueSubmit(graphics_queue, 1, &submit_info, headless_swapchain[image_index].image_presented) );
} else {
VkPresentInfoKHR present_info{
/* ... */
};
//...
if (VkResult result = vkQueuePresentKHR(present_queue, &present_info);
result == VK_ERROR_OUT_OF_DATE_KHR || result == VK_SUBOPTIMAL_KHR) {
//...
}
}
Event Handling
In headless mode, we want to present a deterministic workload to our application -- this means that instead of getting user inputs from GLFW and handling the passage of time with std::chrono, we need some way of reading all of this information from std::cin.
In fact, we're just going to ignore all user input; but we still need a way of letting our harness know how many frames to render and what dt values to pass to application.update between renders.
So let's design a simple format for headless events. The input will be utf8-encoded text. Each line of the input is a single event, and is a sequence of whitespace-separated tokens. The first non-whitespace token of an event is the event type; by convention it will be written in all-capital letters. The tokens following the event type and their meaning will depend on the event type.
In this tutorial we're only going to define one event type, AVAILABLE, which starts rendering of the next frame.
The remaining tokens in the line will be a floating-point number to use as the dt value for the application's update function and, optionally, an image filename (without whitespace!) to save to.
AVAILABLE 0.01666 frame1.ppm AVAILABLE 0.01666 AVAILABLE 0.01666 frame3.ppm
Where, before, our application would register and wait for events from GLFW, now it should read events in this format from standard input.
Let's start by avoiding all GLFW calls when headless mode is active.
This means no event handlers being registered or removed, no polling glfw for events, and no asking glfw if the window should close.
(This is also important because our code won't create a window in headless mode, so any GLFW call with a window parameter will segfault.)
First, we make the code that tells GLFW to send input events to our handler functions conditional on not being in headless mode:
RTG.cpp
void RTG::run(Application &application) {
//...
//setup event handling:
std::vector< InputEvent > event_queue;
if (!configuration.headless) {
glfwSetWindowUserPointer(window, &event_queue);
glfwSetCursorPosCallback(window, cursor_pos_callback);
glfwSetMouseButtonCallback(window, mouse_button_callback);
glfwSetScrollCallback(window, scroll_callback);
glfwSetKeyCallback(window, key_callback);
}
//...
}
And, since we never installed those event handlers in headless mode, we shouldn't remove them either:
RTG.cpp
void RTG::run(Application &application) {
//...
//tear down event handling:
if (!configuration.headless) {
glfwSetMouseButtonCallback(window, nullptr);
glfwSetCursorPosCallback(window, nullptr);
glfwSetScrollCallback(window, nullptr);
glfwSetKeyCallback(window, nullptr);
glfwSetWindowUserPointer(window, nullptr);
}
}
Finally, we avoid calling glfwWindowShouldClose by using short-circuit evaluation.
We also only ask GLFW for events in non-headless mode, and make ourself a place to add event handling code.
Of course, this change also means that our headless-mode event handling code better have some way of calling break, otherwise our run-loop will never finish execution!
RTG.cpp
void RTG::run(Application &application) {
//...
while (configuration.headless || !glfwWindowShouldClose(window)) {
//event handling:
if (configuration.headless) {
//TODO: read events from stdin
} else {
glfwPollEvents();
}
//...
}
//...
}
The event handling code will read lines from stdin (via the std::cin stream)
and try to parse each line as an event.
If any event fails to parse, we'll have the code warn and continue (facilitating, e.g., manually typing events into the console for testing).
If we run out of input, that's when the run loop needs to exit.
RTG.cpp
//read events from stdin
std::string line;
while (std::getline(std::cin, line)) {
//TODO: parse event from line
}
//if we've run out of events, stop running the main loop:
if (!std::cin) break;
The easiest way to read whitespace-separated tokens from a line is via an input stream.
So let's go ahead and use that for our parsing code.
We'll also wrap everything in an try ... catch to make it easier to handle errors.
RTG.cpp
//TODO: parse event from line
try {
std::istringstream iss(line);
iss.imbue(std::locale::classic()); //ensure floating point numbers always parse with '.' as the separator
//TODO: read type
//TODO: type-specific parsing
} catch (std::exception &e) {
std::cerr << "WARNING: failed to parse event (" << e.what() << ") from: "" << line << ""; ignoring it." << std::endl;
}
The imbue call is setting the locale on the string stream to make sure that floating point numbers parse the same regardless of language settings.
Oh, and since we just used a string stream, let's pull in the string stream and file stream (we'll use it later) headers:
RTG.cpp
//...
#include <cstring>
#include <iostream>
#include <sstream>
#include <fstream>
#include <set>
//...
Okay, back to the parsing code. We can begin to see how exception handling here has made it a lot easier to get nice, granular error messages without a lot of code.
RTG.cpp
//TODO: read type
std::string type;
if (!(iss >> type)) throw std::runtime_error("failed to read event type");
//TODO: type-specific parsing
if (type == "AVAILABLE") { //AVAILABLE dt [save.ppm]
//TODO: read dt
//TODO: check for save file name
//TODO: check for trailing junk
//TODO: stop parsing events so a frame can draw
} else {
throw std::runtime_error("unrecognized type");
}
We've also made a basic skeleton for how to parse the AVAILABLE event.
But before we fill out that skeleton we'll need a place to put both the delta-time and save values so they can be used later when updating and acquiring a swapchain image.
Since both of those use locations are later in the body of the run loop, we introduce these as local variables at the top of the loop:
RTG.cpp
while (configuration.headless || !glfwWindowShouldClose(window)) {
float headless_dt = 0.0f;
std::string headless_save = "";
//event handling:
//...
}
To get the delta-time from the event, we use >> and throw and exception on failure:
(we also check for out-of-range time deltas and complain)
RTG.cpp
//TODO: read dt
if (!(iss >> headless_dt)) throw std::runtime_error("failed to read dt");
if (headless_dt < 0.0f) throw std::runtime_error("dt less than zero");
For the optional save path, we do a bit of conditional error checking: (we're going to save frames in portable pixmap format, so we require the file have the ".ppm" extension)
RTG.cpp
//TODO: check for save file name
if (iss >> headless_save) {
if (!headless_save.ends_with(".ppm")) throw std::runtime_error("output filename ("" + headless_save + "") must end with .ppm");
}
That should be it for the event line, so we do a final check to see if there is any extra "junk" after the event. (Silently ignoring extra data can be confusing -- users generally want a program to pay attention to all of the input data they pass to it.)
RTG.cpp
//TODO: check for trailing junk
char junk;
if (iss >> junk) throw std::runtime_error("trailing junk in event line");
Finally, we break out of the event parsing loop:
RTG.cpp
//stop parsing events so a frame can draw
break;
This leaves us with a delta-time value and a filename to save to.
Let's handle the delta-time value first by overriding the dt value passed to application.update.
This ensures that while running in headless mode, the updates application always advances time by the same amount so the exact same frames are rendered.
RTG.cpp
{ //elapsed time handling:
std::chrono::high_resolution_clock::time_point after = std::chrono::high_resolution_clock::now();
float dt = float(std::chrono::duration< double >(after - before).count());
before = after;
dt = std::min(dt, 0.1f); //lag if frame rate dips too low
//in headless mode, override dt:
if (configuration.headless) dt = headless_dt;
application.update(dt);
}
Saving Images
Dealing with headless_save presents us with a bit of a conundrum -- we get the filename to save to when the image is made available;
but our code won't actually have an image to save until after all of the rendering work for the frame finishes (i.e., three frames from now when our code finishes waiting on the image_presented fence for this fake swapchain image).
Therefore, we'll modify HeadlessSwapchainImage so each swapchain image remembers where it should be saved:
RTG.hpp
struct HeadlessSwapchainImage {
Helpers::AllocatedImage image; //on-GPU rendering target
Helpers::AllocatedBuffer buffer; //host memory to copy image to after rendering
VkFence image_presented = VK_NULL_HANDLE; //fence to signal after copy finishes
VkCommandBuffer copy_command = VK_NULL_HANDLE; //copy image -> buffer
std::string save_to = ""; //(if non-"") file to save to
void save() const; //save buffer to save_to
};
Now we can write a bit of code to handle writing the previous frame and setting save_to for the next frame:
RTG.cpp
//wait for image to be done copying to buffer
VK( vkWaitForFences(device, 1, &headless_swapchain[image_index].image_presented, VK_TRUE, UINT64_MAX) );
//save buffer, if needed:
if (headless_swapchain[image_index].save_to != "") {
headless_swapchain[image_index].save();
headless_swapchain[image_index].save_to = "";
}
//remember if next frame should be saved:
headless_swapchain[image_index].save_to = headless_save;
//mark next copy as pending
VK( vkResetFences(device, 1, &headless_swapchain[image_index].image_presented) );
And, of course, we need to write the HeadlessSwapchainImage::save helper function.
This function will save the image in binary portable pixmap format ("P6" format). This is a very simple raster image format used by the netpbm utilities and widely supported elsewhere. The file data consists of the string "P6", a newline, the width and height of the image as space-separated ASCII integers, a newline, the maximum color value as an ASCII integer, a newline, and the pixel data as RGB bytes starting from the upper left.
RTG.cpp
void RTG::HeadlessSwapchainImage::save() const {
if (save_to == "") return;
if (image.format == VK_FORMAT_B8G8R8A8_SRGB) {
//get a pointer to the image data copied to the buffer:
char const *bgra = reinterpret_cast< char const * >(buffer.allocation.data());
//TODO: convert bgra -> rgb data
//TODO: write ppm file
} else {
std::cerr << "WARNING: saving format " << string_VkFormat(image.format) << " not supported." << std::endl;
}
}
Note for the pedantic: strict aliasing is not violated by this reinterpret_cast because char is a character type.
To convert BGRA data to RGB data we need to re-order the first three bytes of every pixel and discard the last byte.
We make a temporary std::vector to hold the converted data and size it appropriately.
RTG.cpp
//convert bgra -> rgb data:
std::vector< char > rgb(image.extent.height * image.extent.width * 3);
for (uint32_t y = 0; y < image.extent.height; ++y) {
for (uint32_t x = 0; x < image.extent.width; ++x) {
rgb[(y * image.extent.width + x) * 3 + 0] = bgra[(y * image.extent.width + x) * 4 + 2];
rgb[(y * image.extent.width + x) * 3 + 1] = bgra[(y * image.extent.width + x) * 4 + 1];
rgb[(y * image.extent.width + x) * 3 + 2] = bgra[(y * image.extent.width + x) * 4 + 0];
}
}
With this converted data in hand, we can just save the image using a file stream.
Importantly, we use a stream in std::ios::binary mode -- otherwise any \n bytes will be expanded into \r\b on Windows, causing color and alignment shifts in the output file.
RTG.cpp
//write ppm file:
std::ofstream ppm(save_to, std::ios::binary);
ppm << "P6\n"; //magic number + newline
ppm << image.extent.width << " " << image.extent.height << "\n"; //image size + newline
ppm << "255\n"; //max color value + newline
ppm.write(rgb.data(), rgb.size()); //rgb data in row-major order, starting from the top left
And that's nearly it -- if you were to comment out the assert(swapchain == VK_NULL_HANDLE); and assert(surface == VK_NULL_HANDLE); you could run a headless mode but you'd still have a window.
Doing Without a Surface
We've done a tremendous amount of work to ensure that our rendering doesn't call any GLFW functions and works without a surface,
but our code is nonetheless still initializing GLFW, requesting WSI extensions, and making a surface in RTG::RTG.
So let's go through and add more conditionals.
In headless mode, our code doesn't need to initialize GLFW or ask it for a list of extensions (which end up being the platform-specific WSI extensions):
RTG.cpp
RTG::RTG(Configuration const &configuration_) : helpers(*this) {
//...
if (!configuration.headless) { //add extensions needed by glfw:
//...
}
//...
}
In headless mode, our code doesn't need a window or surface:
RTG.cpp
RTG::RTG(Configuration const &configuration_) : helpers(*this) {
//...
if (!configuration.headless) { //create the `window` and `surface` (where things get drawn):
//...
}
//...
}
The code used to select surface_format and present_mode is part of the WSI extensions, so that also gets placed into a conditional.
In this case, we do actually need to replace it with something to set our swapchain format and make sure the user has presented FIFO present mode:
RTG.cpp
RTG::RTG(Configuration const &configuration_) : helpers(*this) {
//...
//select the `surface_format` and `present_mode` which control how colors are represented on the surface and how new images are supplied to the surface:
if (configuration.headless) {
//in headless mode, just use the first requested format:
if (configuration.surface_formats.empty()) {
throw std::runtime_error("No surface formats requested.");
}
surface_format = configuration.surface_formats[0];
//headless mode will always use VK_PRESENT_MODE_FIFO_KHR, so make sure that's an option:
bool have_fifo = false;
for (auto const &mode : configuration.present_modes) {
if (mode == VK_PRESENT_MODE_FIFO_KHR) {
have_fifo = true;
break;
}
}
if (!have_fifo) {
throw std::runtime_error("Configured present modes do not contain VK_PRESENT_MODE_FIFO_KHR.");
}
present_mode = VK_PRESENT_MODE_FIFO_KHR;
} else {
std::vector< VkSurfaceFormatKHR > formats;
std::vector< VkPresentModeKHR > present_modes;
//...
}
//...
}
Our code also shouldn't be looking for a queue to present on in headless mode, since the call to check for presentation support is part of the WSI extensions:
RTG.cpp
//...in the "look up queue indices" section:
if (!configuration.headless) {
//if it has present support, set the present queue family:
VkBool32 present_support = VK_FALSE;
VK( vkGetPhysicalDeviceSurfaceSupportKHR(physical_device, i, surface, &present_support) );
if (present_support == VK_TRUE) {
if (!present_queue_family) present_queue_family = i;
}
}
In fact, our fake swapchain will "present" on the graphics queue:
RTG.cpp
//...after the loop over queue families in the "look up queue indices" section:
//in headless mode, "present" (copy-to-host) on the graphics queue:
if (configuration.headless) {
present_queue_family = graphics_queue_family;
}
if (!graphics_queue_family) {
throw std::runtime_error("No queue with graphics support.");
}
//...
And we shouldn't be requesting the swapchain extension in headless mode:
RTG.cpp
if (!configuration.headless) {
//Add the swapchain extension:
device_extensions.emplace_back(VK_KHR_SWAPCHAIN_EXTENSION_NAME);
}
And that's it for initialization and destruction -- it turns out we already wrote RTG::~RTG defensively enough that it deals gracefully with a missing surface and window.
Fixing Two Bugs
Let's try running this code in headless mode.
Compile, run the code with the --headless flag, and manually type an event into the console (e.g., AVAILABLE 1.0 test.ppm) and press enter.
Type it in a few more times and -- when you are done supplying events -- indicate end-of-input on the terminal with CTRL-D.
As you run through events you will see between several and many complaints from the Vulkan validation layer about image layouts, referencing VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.
Further, you may or may not actually end up with any saved ppm files.
Fixing the Layout Problem
The complaints about VK_IMAGE_LAYOUT_PRESENT_SRC_KHR result from the fact that this layout is provided by an extension we are no longer requesting; so none of our code should be using it in headless mode.
Unfortunately the code that uses it isn't even in the harness; it's being used as the final layout of the color attachment by Tutorial::render_pass.
In other words, we need a way for the harness code to tell the application code what presentation layout to use.
We already tell the application about the selected surface_format and present_mode, so we can add another variable nearby to serve this function:
RTG.hpp
//The surface is where rendered images are shown:
VkSurfaceKHR surface = VK_NULL_HANDLE; //null in headless mode
VkSurfaceFormatKHR surface_format{};
VkPresentModeKHR present_mode{};
VkImageLayout present_layout = VK_IMAGE_LAYOUT_UNDEFINED; //layout to put images in after render
Then we can modify the code that sets the presentation mode and surface format to also set the presentation layout:
RTG.cpp
//select the `surface_format` and `present_mode` which control how colors are represented on the surface and how new images are supplied to the surface:
if (configuration.headless) {
//...
present_mode = VK_PRESENT_MODE_FIFO_KHR;
present_layout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
} else {
//...
present_mode = [&](){
//...
}();
present_layout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
}
We use VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL in headless mode because that's what our code does with "presented" images -- it uses them as a transfer source.
And, finally, we modify our render pass creation in Tutorial::Tutorial to use this new information from the harness:
Tutorial.cpp
VkAttachmentDescription{ //0 - color attachment:
.format = rtg.surface_format.format,
.samples = VK_SAMPLE_COUNT_1_BIT,
.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR,
.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
.finalLayout = rtg.present_layout,
},
At this point the code should compile and run in headless mode without any validation layer complaints. But it still doesn't seem to save all of the requested images.
Draining the Pipeline
The code, as it is now, doesn't save every image you request it to save. In fact, it will always miss the last three images. Hmm, three images... that's exactly the size of the fake swapchain. So what's going on?
Remember that we only save an image when we're about to recycle it for use in rendering a new frame! In other words, we're running frames in a pipelined manner and we haven't written the code that collects the last few frames to drain the pipeline.
So let's write code under the loop in RTG::run that waits on the remaining frames to finish, and saves them if requested:
RTG.cpp
void RTG::run(Application &application) {
//...
while (configuration.headless || !glfwWindowShouldClose(window)) {
//...
}
//wait for any in-flight "headless" frames marked for saving to finish:
if (configuration.headless) {
for (size_t i = 0; i < headless_swapchain.size(); ++i) {
uint32_t image_index = headless_next_image;
headless_next_image = (headless_next_image + 1) % uint32_t(headless_swapchain.size());
//block until the image is finished being "presented" (copied-to-host):
VK( vkWaitForFences(device, 1, &headless_swapchain[image_index].image_presented, VK_TRUE, UINT64_MAX) );
//save if requested:
if (headless_swapchain[image_index].save_to != "") {
headless_swapchain[image_index].save();
headless_swapchain[image_index].save_to = "";
}
}
}
//tear down event handling:
if (!configuration.headless) {
//...
}
}
A few subtleties here.
First, this works even if some of the images in the fake swapchain have never been made available, since a freshly created fake swapchain image has its image_presented fence set to signalled and its save_to set to "".
Second, we go through the swapchain in FIFO order.
As far as I know, this isn't strictly necessary (we could probably wait for the frames to finish in any order -- none of the frames should depend on these fences, so we aren't going to deadlock), but we might as well not do things in a different order after the run loop than we would have done them during the run loop.