Eidos-Montréal’s Lead Rendering Programmer goes into detail on how their engine renders a single frame in Deus Ex: Mankind Divided, along with how many steps are necessary to create the final image.
For more on the Making of Deus Ex: Mankind Divided, enjoy the following articles:
- How Eidos-Montréal brought the Deus Ex DNA to 2016
- Player’s Choice in a Deus Ex-Game
- The Story Behind Eidos-Montréal’s Dawn Engine
- The Design Principles Behind Deus Ex
- How to Create a Believable Cyberpunk World
One of the greatest feelings in video game development is to witness the different visual elements coming to life, one layer at a time, to create a rich entertaining scene within a couple of milliseconds. The underlying equations and mathematical models working together to simulate what we see with our own eyes, but reproduced inside a little metallic box for pure enjoyment, greatly nourish one’s curiosity and sense of wonder. On top of that, when used accordingly, the same rendering steps put in the hands of artists can convey a message and touch millions of people that thrive for the final images.
Those are the different challenges, and excitement, I was looking for when I started my career and studies in rendering. Being on the mathematical side of the spectrum, I had to fully understand the different ingredients that went into the creation of those pretty images we see in the digital world. In this article, we embark on a very short journey … 33 milliseconds long to be precise. We would essentially like to provide you with an overview of a typical Deus Ex: Mankind Divided (DXMD) rendered frame. We will describe the different render passes, in chronological order from beginning to end, that went into creating the final image seen on screen, providing a description of each pass and its purpose. We will also add some inside information regarding the history of DXMD’s production, along with how it influenced what you ended up seeing on screen. Ready? Go!
At First, There Was Geometry
As we all know, the backbone of any image rendered in real-time is a lot of triangles. The efficient processing of those triangles is essential to reach our performance goals, without sacrificing the visual quality of our 3D models. The first part of our frame is used to compute different kinds of geometric information. While most of the game assets are static elements, some of them use different animation technologies (clothing, hair, empirical animation, skinning). Also, there’s a great deal of information that we need to acquire from the geometry itself to feed the lighting passes later down the pipeline. First, we launch a couple of compute shader jobs to process noisy animations and hair. The noisy animation system is a very lightweight animation tool, designed to simulate natural, noisy movements that we witness in nature. The basic idea behind it is to have a sum of sinusoidal waves with different frequencies, phases, and amplitudes to move vertices around, giving the appearance of natural oscillation. Of course, these parameters can be modulated with geometric data, like per-vertex weight, to mimic hooks and attach constraints. While very simplistic, our artists used this tool to imitate wind and other disturbances on fabric-like models.
Hair simulation is an in-house technology, based on AMD’s TressFX, which decouples the hair simulation from the actual rendered hair strands. While noise simulation for hair has been used before in video games, a real hair simulation algorithm provides much better results for our characters. Our engine uses a fat g-buffer, without a depth pre-pass, to find the visible surfaces that need to be shaded. We chose to decouple the processing of the aforementioned vertices jobs from our main geometry pass, mainly because we wanted more granularity in jobs, in effect making better use of the asynchronous nature of modern GPU’s. However, our traditional skinning of geometry happens during the vertex shader stage of the g-buffer pass, but we’re exploring ways to reduce the vertex shader stage in favor of more compute shaders. Our g-buffer contains all the information required by our lighting model to shade the pixels: depth, normal, albedo, reflectance, roughness, etc. Our main objective when going from IO Interactive’s Glacier 2 to Eidos-Montréal’s Dawn Engine was to improve physically based shading. While we changed the lighting algorithm (more details on that later), we kept the g-buffer approach. Each object is being rendered in multiple render targets, encoding the required data on the fly.
While most of the data can be precisely and directly stored in our render targets, midway in production, we noticed a lot of artifacts in our lighting using a naïve normal encoding strategy, mainly because of numerical imprecision and the usage of the wrong numerical range. To fix that, we switched to octahedral normal encoding. It’s a clever trick to maximize the bit usage in a three component vector. Essentially, we transform the vector in a different base, mapping each normal vector to the faces of an octahedron instead of a sphere.
In opaque geometry rendering, we need to render front to back to populate the depth buffer and to reject occluded pixels at an early stage. Furthermore, an important portion of the screen is covered by opaque first person geometry, like the gun, so we render that part first, using the stencil test to reject geometry that is otherwise hidden behind it.
“As we all know, the backbone of any image rendered in real-time is a lot of triangles.”
Our semi-transparent HUD prevents us from using the same strategy with user interface elements, but it’s something to consider for future production, depending on specific needs.
Finally, our g-buffer pass is used to blend and modulate the content of existing geometry, increasing visual complexity without raising the cost of geometric processing. Some good examples of this are decals that are added on top of geometry, where either the diffuse albedo or the normal to simulation variation of the underlying surface is changed, all without paying the price of fully rendering it again (or tessellate it to the extent of the decal details). The last type of blending we implemented is something we call soft intersection. It is used when we want to blend very fine details, like granular materials (e.g. sand or snow), on top of opaque geometry, without using alpha blending. In real alpha blending, we would light both the ground material and the top layer, then blend the result. Instead, to achieve a similar effect, we blend the overlaying matter with the destination g-buffer content based on distance, and only light the resulting pixel in the lighting pass. The result is that, when the granular material is »far« from the main environment geometry, we use its full information for shading. However, as it gets closer to it, we will gradually blend its shading info with the g-buffer material properties for final shading.
Before we move on to the actual lighting system, we must add three other layers of information that are closer to our production needs. First of all, we render in another buffer: the velocity of objects during the current frame. The velocity is used for motion blur on our PC version, but is also used during our anti-aliasing
pass to project the pixels of the current frame into the previous frame buffers. Characters are pushing the last two pieces of information. To get an approximation of skin thickness, we render the skin portion a second time, which is used in subsurface scattering. Finally, all pixels covered by characters are specially flagged to be artificially lit by a dim aura, adding a subtle contrast between characters in dark sections of the game and the characters themselves, mainly for gameplay reasons.
All these geometry passes could eat a huge portion of our 33 millisecond budget, but thanks to efficient visibility culling, we only spend precious clock cycles on visible geometry. In DXMD, visibility culling is performed by Umbra’s Occlusion Culling SDK. Taking the static environment as an input, the software computes a data structure, which can be queried in runtime, to know which models are visible. Therefore, only the visible objects are sent to the GPU pipeline. Our runtime queries use the static scene, along with a combination of bounding volumes, for dynamic geometries. The dynamic models are sorted in a coarse spatial structure. These high-level cells can be checked for visibility against the low-resolution occlusion buffer provided by Umbra.
Let There Be Light
Now that we know all about the visible geometry in the scene, it’s time to add some lighting effects and shade the pixels. As previously mentioned, physically based rendering was very important to us when nailing the material tangibility required by our game. To achieve this, we implemented a collection of specialized lighting shaders (shading models), splitting incoming light between direct lighting and indirect lighting. In rendering, we often approximate lighting interaction with a linear combination of different phenomena. Energy emits from light sources, then it interacts with surfaces by being reflected, absorbed, or transmitted until we measure it using a sensor. In our engine, we categorized all light interactions as either direct or indirect lighting. The lighting pass accumulates all light that reaches the eye in an additive buffer, with the stored value being loosely based on radiance, then it is split in RGB channels. Direct lighting is the light that has exactly one interaction between emission and measurement (or zero interaction when the camera is looking straight at an emissive object). In DXMD, direct lighting for opaque objects is implemented with a tiled lighting approach, with specialized shaders being used for particles, skin, hair, and alpha blended objects.
The tiled lighting algorithm consists of creating 3D frustums of space, bounded by 8 by 8 pixel wide tiles on screen, and the min/max depths of the tile in the view space. Each frustum is a little bounding box of geometric content in the world, and we can perform intersection tests against the light sources in the game with them. Each tile collects all light sources that affect it in a gathering phase. Then, in the shading phase, we use all of the lighting information to shade visible pixels within the tile. Since this pass is only used for opaque geometry, we use a simple shading model based on Cook-Torrance BRDF, using material roughness and reflectance to compute radiance reaching the eye. Since it uses a more complex BRDF, skin is the only opaque exception to this rule, which will be described later.
The data structure we build for opaque tiled lighting is used again to create two extra sets of light collections. The first one creates frustum in the same fashion, but is split in the z direction with multiple planes. We end up with more frustum covering the full eye frustum in 3D, and can use it to light alpha blended objects. Each alpha blended material has an extra alpha component to simulate light transmission. Before we reflect visible light on an alpha blended object, we remove the proportion of light (1-alpha percent) that passes through. The balance is used as incoming radiance for the same BRDF as opaque objects. Of course, light transmission is computed both ways, and light entering from behind and hitting the camera is approximated by the traditional alpha blended equation, with a back to front sort. The second data structure for light query we create is a very low resolution grid, comprised of spherical harmonic probes, to create lighting for particles and rain. The net effect is that, in a 3D grid covering the whole world, very dense particles can use a fast lookup to get an approximation of the incident light.
Skin and hair are given special treatment in the engine. During the opaque pass, skin shading simulated subsurface scattering, achieved by changing the Lambertian term of the BRDF with a precomputed equation, integrates shades of red on the surfaces that are not pointing directly in the normal direction of the light. Also, to further create attraction points for the player eyes, characters are artificially lit with specular highlights of non-existing lights. While this completely breaks energy conservation, it makes characters more interesting from a gameplay mechanic point of view. Comparatively, careful integration of hair had to be done in the scene because of its very fine nature. Hair polygons are rendered, but in order to be lit, only the first two hair strands per pixel are kept in a data structure. The assumption here is that the most important part of hair lighting happens between those two layers, and the rest can be approximated with a constant. Finally, the two shaded pieces of hair can be blended within a pixel to reduce aliasing. Just like the skin, hair shading uses its own BRDF augmented with an explicit anisotropic term.
When sampling the light source to compute lighting, we have to use a visibility term to prevent adding light that doesn’t reach the surface from being shaded (shadowing). In DXMD, we decoupled light sources from shadow casters, allowing artists to bind many light sources to one shadow casting transformation. While it distorts the shadow, it’s a very efficient optimization, and simulates many shadow casting lights with only one shadow map. To optimize further, we kept static shadow maps in a special cache and reused them to update the dynamic occluders, frame after frame, without computation. This allows us to easily integrate the shadow map update to our LOD system, preventing the dynamic update of shadows that are far away from the player. For indirect lighting, the occlusion term is usually approximated with some form of ambient occlusion, losing directional information (i.e. it’s integrated over hemisphere above the surface). Our screen-space ambient occlusion (SSAO) algorithm, removes some of the indirect lighting in the scene while still keeping the direct light contribution intact. A common complaint from artists was the lack of contact of some objects on their support, especially when lit by a non-shadow casting light (for performance reasons). To reduce these artifacts, we created a directional occlusion that could be applied on direct lights, and used the SSAO on indirect light. In the end, we decided to combine the two occlusion passes and apply it on all light sources, which helped to reduce GPU costs. Objects feel more grounded with this cheat, even though it removes energy where it shouldn’t.
We categorized energy in indirect lighting when it went under more than one interaction before reaching the sensor (eye, camera). To simplify the lighting equations, we again split the indirect lighting into diffuse (or ambient) and specular lighting, done in order to mirror the two parts of our main BRDF. Diffuse indirect lighting is baked into a collection of high density spherical harmonic probes. Offline, we automatically populate the scene with thousands of cube maps, convolved with a perfectly diffused BRDF, and project them into spherical harmonic space. At runtime, each visible pixel queries the spatial data structure of probes, identifying which are the closest, and then interpolates its diffuse indirect lighting term using its normal as a parameter.
The specular highlights are collected from four different subsystems that, from a spatial point of view, are increasingly coherent. With a custom cube map applied on the material, very important objects can completely override the indirect specular reflections. Unfortunately, this breaks the dependency between the materials final look and the lighting conditions, but in highly controlled scenes where the reflections are very important, it may be desirable.
If the material doesn’t override specular reflection, we rely on other algorithms to compute indirect specular reflections. Screen-space reflection (SSR) uses a clever ray-tracing algorithm in screen-space to try and fetch the correct indirect reflected color. In our implementation of SSR, we shoot many rays in a cone, with width dependent on the roughness of the material, and average their hit color using the BRDF of the material. While this approach can lead to some noise (reduced by the temporal antialiasing), it has the advantages of reducing false intersections. In the event that SSR fails to provide a hit, we must rely on a cube map at the pixel location, fetched in the mirror direction of the view vector. Cube maps are placed in the scene by the artists, and different mip-map levels encode pre-filtered versions of the environments, resulting in different roughness parameters. The final indirect specular light sources that are placed in the scene are impostor reflectors, forcing some reflections to appear independently of the scene content.
The algorithms we implemented were a series of compromises between energy conservation and artistic visual goals. Until now, we always assumed that radiance was constant in space, and no energy was lost or scattered from emitter to surfaces. An important aspect of DXMD’s art direction was light scattering within the environment. During production, we implemented many different algorithms to support single scattering of light (i.e. emitted light scattering only once toward sensor in a participating medium). The early prototypes relied on analytical solutions, but were too constraining. Instead of the scattering itself, we also tried using a shadow volume approach to extrude light occlusions, simulating the »absence of scattering«. Results were interesting, but suffered from artifacts due to tessellation of the extruded volumes. The final two approaches we tried were a raymarching approach in a pixel shader, in which scattered light would have a numeric integration, and a light diffusion inside a 3D grid that was aligned on camera, similar to light propagation volumes for real-time global illumination. The raymarching would step N times in the participating medium, adding contribution of in-scattered light, and removing absorbed light all along the ray. Despite the results being extremely precise following real single-scattering equations, the trade-off was to drop both this technique, and some precision, in favor of having more scattering light. In turn, we ended up using the light diffusion algorithm, as the GPU costs were lower for a greater number of lights.
Let’s manipulate those Pixels
If we would stop the rendering of our frame here, we would show the player a radiance buffer in energy space without any alteration. The Dawn Engine offers multiple tools to manipulate the final displayed color on screen, based on the energy computed during lighting. Some of these tools add fake effects on screen, others manipulate the frequencies of the final image, and the last category adds extra information to the player.
Before it can be displayed, the image has to go through a tone mapping operator, which serves the purpose of remapping the wide range of energy in the scene into a displayable range on regular monitors, or even HDR TVs on more recent hardware. This step converts the energy values into colors that can be manipulated with color LUTs. Also, depending on the energy values, we artificially add lens flares, glares, chromatic aberration, and other lens based artifacts that augment the feeling of seeing the game through a camera. These manipulations increase immersion from a cinematographic point of view, but also influence the emotional response of the player when, based on the game state, contrasts or tints are added to the scene (i.e. high intensity leads to more contrasts or warmer tones).
Because we use no hardware support for multisampling, the image needs proper in-house anti-aliasing. Our implementation is a temporal anti-aliasing solution, where every pixel of the current frame is projected back in previous frames, and is then blended, with some weight of confidence, according to how much we’re correctly blending spatially coherent data or just blending in wrong values. For example, wrong values could appear because of trying to fetch the color of a surface that wasn’t visible in previous frames. The heart of our temporal anti-aliasing resides in a good metric to decide how much to blend the current pixel color (which is always valid) with the previous frame’s pixel value (which could potentially be wrong). All this blending removes the very high frequencies from the frame to prevent under sampling and aliasing artifacts, but it can be very aggressive. To counter balance it, we added a little sharpening filter to boost high frequencies that are still visible in the anti-aliased frame.
The final image manipulation we add on top is solely used for gameplay purposes. Some effects are needed to guide the player in his many options that DXMD offer. For example, depending on the choices of augmentations the player unlocks during their play through, we added some UI elements on screen, such as outlining objects that can be interacted with (e.g. ammo, cabinets, etc.). We even transformed the colors of objects, based on their types, to clearly show where the threats were on screen, or where the weak points of enemies were located.
There we go! It was a long short journey from simple polygon description to final pixel color on the screen. Remember that most of the choices were guided by artistic decisions and performance constraints, all with the goal of ensuring the vision of delivering the best possible experience to our fans, within the 33 millisecond rendering time constraint, was achieved. In order to create the great images our players expect to see, we continue to upgrade our rendering technology, with the goal of using it in our future projects. Of course, when we play and enjoy the game, we don’t think and analyze all these details in real-time. One can sit back and enjoy the final result and play the game as, in the end, it matters most. However, we hope you enjoyed the ride with us as we went down the GPU pipeline of one frame of Deus Ex: Mankind Divided.
About the author:
is Lead Rendering Programmer at Eidos-Montréal.
Jean-François is passionate about 3D rendering and simulation, and has a strong background in computer graphics, which includes rendering, modeling, or image processing/post-processing effects for movies and video games. Before joining Eidos-Montréal, he had been active in industrial and academic R&D to develop techniques for offline rendering. In 2011, he joined the team at Eidos-Montréal as a Senior Graphics Programmer, but has since become the Lead Rendering Programmer, a position he has held since 2012.