How 3D Game Rendering Works: Vertex Processing
In this first part of our deeper look at 3D game rendering, we’ll be focusing entirely on the vertex stage of the process. This means dragging out our math textbooks, brushing up on a spot of linear algebra, matrices, and trigonometry — oh yeah!
We’ll power through how 3D models are transformed and how light sources are accounted for. The differences between vertex and geometry shaders will be thoroughly explored, and you’ll get to see where tesselation fits in. To help with the explanations, we’ll use diagrams and code examples to demonstrate how the math and numbers are handled in a game.
If you’re not ready for all of this, don’t worry — you can get started with our 3D Game Rendering 101. But once you’re set, read on our for our first closer look at the world of 3D graphics.
What’s the point?
In the world of math, a point is simply a location within a geometric space. There’s nothing smaller than a point, as it has no size, so they can be used to clearly define where objects such as lines, planes, and volumes start and end.
For 3D graphics, this information is crucial for setting out how everything will look because everything displayed is a collection of lines, planes, etc. The image below is a screenshot from Bethesda’s 2015 release Fallout 4:
It might be a bit hard to see how this is all just a big pile of points and lines, so we’ll show you how the same scene looks in ‘wireframe’ mode. Set like this, the 3D rendering engine skips textures and effects done in the pixel stage, and draws nothing but the colored lines connecting the points together.
Everything looks very different now, but we can see all of the lines that go together to make up the various objects, environment, and background. Some are just a handful of lines, such as the rocks in the foreground, whereas others have so many lines that they appear solid.
Every point at the start and end of each line has been processed by doing a whole bunch of math. Some of these calculations are very quick and easy to do; others are much harder. There are significant performance gains to be made by working on groups of points together, especially in the form of triangles, so let’s begin a closer look with these.
So what’s needed for a triangle?
The name triangle tells us that the shape has 3 interior angles; to have this, we need 3 corners and 3 lines joining the corners together. The proper name for a corner is a vertex (vertices being the plural word) and each one is described by a point. Since we’re based in a 3D geometrical world, we use the Cartesian coordinate system for the points. This is commonly written in the form of 3 values together, for example (1, 8, -3), or more generally (x, y, z).
From here, we can add in two more vertices to get a triangle:
Note that the lines shown aren’t really necessary – we can just have the points and tell the system that these 3 vertices make a triangle. All of the vertex data is stored in a contiguous block of memory called a vertex buffer; the information about the shape they will make is either directly coded into the rendering programme or stored in another block of memory called an index buffer.
In the case of the former, the different shapes that can be formed from the vertices are called primitives and Direct3D offers list, strips, and fans in the form of points, lines, and triangles. Used correctly, triangle strips use vertices for more than one triangle, helping to boost performance. In the example below, we can see that only 4 vertices are needed to make 2 triangles joined together – if they were separate, we’d need 6 vertices.
If you want to handle a larger collection of vertices, e.g. an in-game NPC model, then it’s best to use something called a mesh – this is another block of memory but it consists multiple buffers (vertex, index, etc) and the texture resources for the model. Microsoft provides a quick introduction to the use of this buffer in their online documents resource.
For now, let’s concentrate on what gets done to these vertices in a 3D game, every time a new frame is rendered (if you’re not sure what that means, have a quick scan again of our rendering 101). Put simply, one or two of things are done to them:
- Move the vertex into a new position
- Change the color of the vertex
Ready for some math? Good! Because this is how these things get done.
Enter the vector
Imagine you have a triangle on the screen and you push a key to move it to the left. You’d naturally expect the (x, y, z) numbers for each vertex to change accordingly and they are; however, how this is done may seem a little unusual. Rather than simply change the coordinates, the vast majority of 3D graphics rendering systems use a specific mathematical tool to get the job done: we’re talking about vectors.
A vector can be thought of as an arrow that points towards a particular location in space and can be of any length required. Vertices are actually described using vectors, based on the Cartesian coordinates, in this manner:
Notice how the blue arrow starts at one location (in this case, the origin) and stretches out to the vertex. We’ve used what’s called column notation to describe this vector, but row notation works just as well. You’ll have spotted that there is also one extra value – the 4th number is commonly labelled as the w-component and it is used to state whether the vector is being used to describe the location of a vertex (called a position vector) or describing a general direction (a direction vector). In the case of the latter, it would look like this:
This vector points in the same direction and has the same length as the previous position vector, so the (x, y, z) values will be the same; however, the w-component is zero, rather than 1. The uses of direction vectors will become clear later on in this article but for now, let’s just take stock of the fact that all of the vertices in the 3D scene will be described this way. Why? Because in this format, it becomes a lot easier to start moving them about.
Math, math, and more math
Remember that we have a basic triangle and we want to move it to the left. Each vertex is described by a position vector, so the ‘moving math’ we need to do (known as transformations) has to work on these vectors. Enter the next tool: matrices (or matrix for one of them). This is an array of values written out a bit like an Excel spreadsheet, in rows and columns.
For each type of transformation we want to do, there is an associated matrix to go with it, and it’s simply a case of multiplying the transformation matrix and the position vector together. We won’t go through the specific details of how and why this happens, but we can see what it looks like.
Moving a vertex about in a 3D space is called a translation and the calculation required is this:
The x0, etc values represent the original coordinates of the vertex; the delta–x values represent how much the vertex needs to be moved by. The matrix-vector calculation results in the two being simply added together (note that the w component remains untouched, so the final answer is still a position vector).
As well as moving things about, we might want to rotate the triangle or scale it bigger or smaller in size – there are transformations for both of these.
We can use the WebGL-powered graphics tool at the Real-Time Rendering website to visualize these calculations on an entire shape. Let’s start with a cuboid in a default position:
In this online tool, the model point refers to the position vector, the world matrix is the transformation matrix, and the world-space point is the position vector for the transformed vertex.
Now let’s apply a variety of transformations to the cuboid:
In the above image, the shape has been translated by 5 units in every direction. We can see these values in the large matrix in the middle, in the final column. The original position vector (4, 5, 3, 1) remains the same, as it should, but the transformed vertex has now been translated to (9, 10, 8, 1).
In this transformation, everything has been scaled by a factor of 2: the cuboid now has sides twice as long. The final example to look at is a spot of rotation:
The cuboid has been rotated through an angle of 45° but the matrix is using the sine and cosine of that angle. A quick check on any scientific calculator will show us that sin(45°) = 0.7071… which rounds to the value of 0.71 shown. We get the same answer for the cosine value.
Matrices and vectors don’t have to be used; a common alternative, especially for handling complex rotations, involves the use of complex numbers and quaternions. This math is a sizeable step up from vectors, so we’ll move on from transformations.
The power of the vertex shader
At this stage we should take stock of the fact that all of this needs to be figured out by the folks programming the rendering code. If a game developer is using a third-party engine (such as Unity or Unreal), then this will have already been done for them, but anyone making their own, from scratch, will need to work out what calculations need to be done to which vertices.
But what does this look like, in terms of code?
To help with this, we’ll use examples from the excellent website Braynzar Soft. If you want to get started in 3D programming yourself, it’s a great place to learn the basics, as well as some more advanced stuff…
This example is an ‘all-in-one transformation’. It creates the respective transformation matrices based on a keyboard input, and then applies it to the original position vector in a single operation. Note that this is always done in a set order (scale – rotate – translate), as any other way would totally mess up the outcome.
Such blocks of code are called vertex shaders and they can vary enormously in terms of what they do, their size and complexity. At their simplest, they take the vertex information and just pass it straight onto the next stage in the rendering process. A more complicated shader would maybe transform it in the 3D space, work out how it will all appear to the scene’s camera, and then pass that data on to the next stage in the rendering process.
They can be used for so much more, of course, and every time you play a game rendered in 3D just remember that all of the motion you can see is worked out by the graphics processor, following the instructions in vertex shaders.
This wasn’t always the case, though. If we go back in time to the mid to late 1990s, graphics cards of that era had no capability to process vertices and primitives themselves, this was all done entirely on the CPU.
One of the first processors to provide dedicated hardware acceleration for this kind of process was Nvidia’s original GeForce released in 2000 and this capability was labelled Hardware Transform and Lighting (or Hardware TnL, for short). The processes that this hardware could handle were very rigid and fixed in terms of commands, but this rapidly changed as newer graphics chips were released. Today, there is no separate hardware for vertex processing and the same units process everything: points, primitives, pixels, textures, etc.
Speaking of lighting, it’s worth noting that everything we see, of course, is because of light, so let’s see how this can be handled at the vertex stage. To do this, we’ll use something that we mentioned earlier in this article.
Lights, camera, action!
Picture this scene: the player stands in a dark room, lit by a single light source off to the right. In the middle of the room, there is a giant, floating, chunky teapot. Okay, so we’ll probably need a little help visualising this, so let’s use the Real-Time Rendering website, to see something like this in action:
Now, don’t forget that this object is a collection of flat triangles stitched together; this means that the plane of each triangle will be facing in a particular direction. Some are facing towards the camera, some facing the other way, and others are skewed. The light from the source will hit each plane and bounce off at a certain angle.
Depending on where the light heads off to, the color and brightness of the plane will vary, and to ensure that the object’s color looks correct, this all needs to be calculated and accounted for.
To begin with, we need to know which way the plane is facing and for that, we need the normal vector of the plane. This is another arrow but unlike the position vector, its size doesn’t matter (in fact, they are always scaled down after calculation, so that they are exactly 1 unit in length) and it is always perpendicular (at a right angle) to the plane.
The normal of each triangle’s plane is calculated by working out the vector product of the two direction vectors (p and q shown above) that form the sides of the triangle. It’s actually better to work it out for each vertex, rather than for each individual triangle, but given that there will always be more of the former, compared to the latter, it’s quicker just to do it for the triangles.
Once you have the normal of a surface, you can start to account for the light source and the camera. Lights can be of varying types in 3D rendering but for the purpose of this article, we’ll only consider directional lights, e.g. a spotlight. Like the plane of a triangle, the spotlight and camera will be pointing in a particular direction, maybe something like this:
The light’s vector and the normal vector can be used to work out the angle that the light hits the surface at (using the relationship between the dot product of the vectors and the product of their sizes). The triangle’s vertices will carry additional information about their color and material — in the case of the latter, it will describe what happens to the light when it hits the surface.
A smooth, metallic surface will reflect almost all of the incoming light off at the same angle it came in at, and will barely change the color. By contrast, a rough, dull material will scatter the light in a less predictable way and subtly change the color. To account for this, vertices need to have extra values:
- Original base color
- Ambient material attribute – a value that determines how much ‘background’ light the vertex can absorb and reflect
- Diffuse material attribute – another value but this time indicating how ‘rough’ the vertex is, which in turns affects how much scattered light is absorbed and reflected
- Specular material attributes – two values giving us a measure of how ‘shiny’ the vertex is
Different lighting models will use various math formulae to group all of this together, and the calculation produces a vector for the outgoing light. This gets combined with the camera’s vector, the overall appearance of the triangle can be determined.
We’ve skipped through much of the finer detail here and for good reason: grab any textbook on 3D rendering and you’ll see entire chapters dedicated to this single process. However, modern games generally perform the bulk of the lighting calculations and material effects in the pixel processing stage, so we’ll revisit this topic in another article.
All of what we’ve covered so far is done using vertex shaders and it might seem that there is almost nothing they can’t do; unfortunately, there is. Vertex shaders can’t make new vertices and each shader has to work on every single vertex. It would be handy if there was some way of using a bit of code to make more triangles, in between the ones we’ve already got (to improve the visual quality) and have a shader that works on an entire primitive (to speed things up). Well, with modern graphics processors, we can do this!
Please sir, I want some more (triangles)
The latest graphics chips are immensely powerful, capable of performing millions of matrix-vector calculations each second; they’re easily capable of powering through a huge pile of vertices in no time at all. On the other hand, it’s very time consuming making highly detailed models to render and if the model is going to be some distance away in the scene, all that extra detail will be going to waste.
What we need is a way of telling the processor to break up a larger primitive, such as the single flat triangle we’ve been looking at, into a collection of smaller triangles, all bound inside the original big one. The name for this process is tesselation and graphics chips have been able to do this for a good while now; what has improved over the years is the amount of control programmers have over the operation.
To see this in action, we’ll use Unigine’s Heaven benchmark tool, as it allows us to apply varying amounts of tessellation to specific models used in the test.
To begin with, let’s take a location in the benchmark and examine with no tessellation applied. Notice how the cobbles in the ground look very fake – the texture used is effective but it just doesn’t look right. Let’s apply some tessellation to the scene; the Unigine engine only applies it to certain parts but the difference is dramatic.
The ground, building edges, and doorway all now look far more realistic. We can see how this has been achieved if we run the process again, but this time with the edges of the primitives all highlighted (aka, wireframe mode):
We can clearly see why the ground looks so odd – it’s completely flat! The doorway is flush with the walls, too, and the building edges are nothing more than simple cuboids.
In Direct3D, primitives can be split up into a group of smaller parts (a process called sub-division) by running a 3-stage sequence. First, programmers write a hull shader — essentially, this code creates something called a geometry patch. Think of this of being a map telling the processor where the new points and lines are going to appear inside the starting primitive.
Then, the tesselator unit inside graphics processor applies the patch to the primitive. Finally, a domain shader is run, which calculates the positions of all the new vertices. This data can be fed back into the vertex buffer, if needed, so that the lighting calculations can be done again, but this time with better results.
So what does this look like? Let’s fire up the wireframe version of the tessellated scene:
Truth be told, we set the level of tessellation to a rather extreme level, to aid with the explanation of the process. As good as modern graphics chips are, it’s not something you’d want to do in every game — take the lamp post near the door, for example.
In the non-wireframed images, you’d be pushed to tell the difference at this distance, and you can see that this level of tessellation has piled on so many extra triangles, it’s hard to separate some of them. Used appropriately, though, and this function of vertex processing can give rise to some fantastic visual effects, especially when trying to simulate soft body collisions.
She can’nae handle it, Captain!
Remember the point about vertex shaders and that they’re always run on every single vertex in the scene? It’s not hard to see how tessellation can make this a real problem. And there are lots of visual effects where you’d want to handle multiple versions of the same primitive, but without wanting to create lots of them at the start; hair, fur, grass, and exploding particles are all good examples of this.
Fortunately, there is another shader just for such things – the geometry shader. It’s a more restrictive version of the vertex shader, but can be applied to an entire primitive, and coupled with tessellation, gives programmers greater control over large groups of vertices.
Direct3D, like all the modern graphics APIs, permits a vast array of calculations to be performed on vertices. The finalized data can either be sent onto the next stage in the rendering process (rasterization) or fed back into the memory pool, so that it can processed again or read by CPU for other purposes. This can be done as a data stream, as highlighted in Microsoft’s Direct3D documentation:
The stream output stage isn’t required, especially since it can only feed entire primitives (and not individual vertices) back through the rendering loop, but it’s useful for effects involving lots of particles everywhere. The same trick can be done using a changeable or dynamic vertex buffer, but it’s better to keep input buffers fixed as there is performance hit if they need to be ‘opened up’ for changing.
Vertex processing is a critical part to rendering, as it sets out how the scene is arranged from the perspective of the camera. Modern games can use millions of triangles to create their worlds, and every single one of those vertices will have been transformed and lit in some way.
Handling all of this math and data might seem like a logistical nightmare, but graphics processors (GPUs) and APIs are designed with all of this in mind — picture a smoothly running factory, firing one item at a time through a sequence of manufacturing stages, and you’ll have a good sense of it.
Experienced 3D game rendering programmers have a thorough grounding in advanced math and physics; they use every trick and tool in the trade to optimize the operations, squashing the vertex processing stage down into just a few milliseconds of time. And that’s just the start of making a 3D frame — next there’s the rasterization stage, and then the hugely complex pixel and texture processing, before it gets anywhere near your monitor.
Now you’ve reached the end of this article, we hope you’ve gained a deeper insight into the journey of a vertex as its processed for a 3D frame. We didn’t cover everything (that would be an enormous article!) and we’re sure you’ll have plenty of questions about vectors, matrices, lights and primitives. Fire them our way in the comments section and we’ll do our best to answer them all.