Home > 计算机与 Internet > [围观技术喷子] Deferred Shading Shines. Deferred Lighting? Not So Much.(zz)

[围观技术喷子] Deferred Shading Shines. Deferred Lighting? Not So Much.(zz)

Deferred Shading Shines. Deferred Lighting? Not So Much.


zz from http://gameangst.com/?p=141

导读:作者详细解释了延期渲染和延期光照的概念、比较了它们的差别和优劣。矛头直指最近大红大紫的CryEngine3。从自身在XBox360和PS3上工作的经验出发,从RenderTarget显存占用、Batch数量、带宽消耗、灵活性等方面对Deferred Shading/Lighting方法进行了深入的探讨。其观点仅供围观。(好吧,其实他只是想喷PS3.)

Keywords : Deferred shading, Deferred Lighting, XBox 360, PS3, CryEngine3, EDRAM, RenderTargets, Phong Lighting

As I indicate in the subtitle of
this blog, there is no single way to develop games.  The techniques
used in game development are as many and as varied as the games
themselves–what’s best for one game is not necessarily best for
another.  The phrase YMMV (your mileage may vary) is pretty much a
staple of game technology discussions.  On the other hand, few teams
have the money or stamina to try every technology on every game, so I
hope people won’t hold it against me when I choose to take sides.

I’ve noticed an increase recently in game developers promoting a technique called deferred lighting.  Unfortunately this technique is old enough that not everyone remembers it by that name.  Wolfgang Engel reintroduced it in ShaderX7 under the name light pre-pass rendering,
and for many that name seems to be sticking.  The most recent advocate
of deferred lighting is Crytek.  Martin Mittring divulged in a presentation at the Triangle Games Conference that Crytek will be utilizing deferred lighting in version 3 of CryENGINE.

Now I get to tell you why that’s a bad idea.

Deferred lighting is similar to a better-known technique called deferred shading.
 In deferred shading all the attributes necessary to completely shade a
3-D scene are rendered into off-screen textures called a G-Buffer.  The
G-Buffer for a scene contains, per pixel, things like the surface
normal, material abedos, and Phong specular exponent.  Shading can then
be done in screen-space per light by reading back the necessary data
from the G-Buffer.  This has the distinct advantage of decoupling the
geometry processing in the scene from the lighting and shading
calculations.  It is generally assumed that one can construct the
G-Buffer in a single pass over the scene’s geometry and that one can
constrain the light rendering in such a way that no more pixels are
processed for a given light than are actually affected by the light.
 From an algorithmic complexity standpoint this sounds great.  Meshes
are rendered only once and no extraneous lighting or shadowing
calculations are performed.  There is a drawback however.  The G-Buffer
can be quite heavyweight, containing all those shading attributes, and
consequently deferred shading consumes a lot of memory and bandwidth in
constructing and reading back the G-Buffer.  Deferred lighting attempts
to address that problem.

In deferred lighting only the lighting, not the shading,
computations are deferred.  In the initial pass over the scene geometry
only the attributes necessary to compute per-pixel lighting (irradiance)
are written to the G-Buffer.  The screen-space, “deferred” pass then
outputs only diffuse and specular lighting data, so a second pass must
be made over the scene to read back the lighting data and output the
final per-pixel shading (radiant exitance).  The apparent advantage of
deferred lighting is a dramatic reduction in the size of the G-Buffer.
 The obvious cost, of course, is the need to render the scene meshes
twice instead of once.  An additional cost is that the deferred pass in
deferred lighting must output diffuse and specular irradiance
separately, whereas the deferred pass in deferred shading need only
output a single combined radiance value.

Five years ago, when I was designing the renderer for the Despair Engine,
I thought deferred lighting was the ideal choice.  Details on the
Playstation 3 were sketchy at that time, but we already knew that
render target memory on the Xbox 360 would be severely limited.  The
G-Buffer for a deferred shading system wouldn’t fit in EDRAM and
consequently it would have to be rendered in two tiles.  With deferred
shading on the Xbox 360 requiring two passes over the scene meshes, the
primary disadvantage of deferred lighting appeared nullified.

Despair Engine utilized deferred lighting for over two years, and we
were generally very happy with the results.  It was implemented
initially on the Xbox 360 and PC, but when the Playstation 3 was
released it was extended to that platform as well.  Unfortunately our
initial implementation on the Playstation 3 yielded significantly worse
performance than we were seeing on the Xbox 360.  We had multiple
projects well into development at that point, however, so scaling back
our expectations on the content side wasn’t a viable option.  Instead
the performance deficit on the Playstation 3 motivated our very
talented PS3 programmer, Chris McCue, to look for alternate solutions.
 From extensive profiling he identified two bottlenecks unique to the
Playstation 3.  First, the PS3 struggled far more with vertex
processing costs and consequently both the attributes and shading
stages of deferred lighting were more frequently vertex bound on the
PS3 than on the other platforms.  Second, the PS3 was sometimes ROP
bound during the deferred lighting pass itself, a problem that is all
but impossible on the Xbox 360 due to the massive bandwidth to EDRAM.

Based on this data, Chris proposed to switch to classical deferred
shading on the Playstation 3.  Deferred shading would reduce the number
of geometry passes from two to one and reduce the output bandwidth
during the deferred pass.  I agreed, and sure enough the move to
deferred shading was a success.  It helped narrow the gap between the
Playstation 3 and the Xbox 360 to the point where we could ship the
same content on both platforms and provide nearly identical play
experiences on each.

The move to deferred shading on the PS3 prompted me to take a closer
look at my decision to use deferred lighting on the other platforms.
 If deferred shading was a win on the PS3, it seemed likely to have
some advantages on the PC and maybe even the Xbox 360.  Although I’ve
never been a proponent of settling for the least-common-denominator in
cross-platform development, if we could move all platforms to the same
deferred process without sacrificing performance, I knew it would save
us some headaches in maintaining platform compatibility later on.

I implemented deferred shading on the Xbox 360 and PC a few months
later and profiled the results.  On the Xbox 360, much to my surprise,
deferred shading performed within a few percent of deferred lighting.
 I could literally toggle back and forth between the two technique and
barely notice the difference in GPU utilization.  Deferred lighting was
a few percent faster in that initial implementation, but considering
that we’d been optimizing the deferred lighting pipeline for years, I
wasn’t about to be quibble over less than a millisecond of GPU time.
 Doing head-to-head comparisons on the PC is a little more difficult
because of the wide range of PC graphics hardware, but on the high-end
DX9 cards and the low-end DX10 cards that I had access to at the time,
the difference in rendering performance between the two techniques on
the PC was similarly small.  More importantly, on the PC we suffered
far more from CPU-side batch overhead and deferred shading handily cut
that cost in half.

Having lived with deferred shading for a couple years now, I’ve come
to appreciate the many ways in which it is superior to deferred
lighting.  Although deferred lighting sounds great in theory, it can’t
quite deliver in practice.  It does, in my experience, offer marginal
GPU performance advantages on some hardware, but it does so at the
expense of a lot of CPU performance and some noteworthy feature
flexibility.  To understand this, consider the implementation of a
traditional Phong lighting pipeline under deferred shading and deferred lighting.

Deferred shading consists of two stages, the “attributes stage” and the “deferred stage.”

  • The attributes stage:
    • Reads material color textures
    • Reads material normal maps
    • Writes depth to a D24S8 target
    • Writes surface normal and specular exponent to an A8R8G8B8 target
    • Writes diffuse albedo to an X8R8G8B8 target
    • Writes specular albedo to an X8R8G8B8 target
    • Writes emissive to an X8R8G8B8 target
  • The deferred Stage:
    • Reads depth, surface normal, specular exponent, diffuse albedo, and specular albedo
    • Blends exit radiance additively into an X16R16G16B16 target.

Deferred lighting, on the other hand, consists of three stages: the
“attributes stage”, the “deferred stage,” and the “shading stage.”

  • The attributes stage:
    • Reads material normal maps
    • Writes depth to a D24S8 target
    • Writes surface normal and specular exponent to an A8R8G8B8 target
  • The deferred stage:
    • Reads depth, surface normal, and specular exponent
    • Blends specular irradiance additively into an X16R16G16B16 target.
    • Blends diffuse irradiance additively into an X16R16G16B16 target
  • The shading stage:
    • Reads material color textures
    • Reads diffuse and specular irradiance
    • Writes exit radiance into an X16R16G16B16 target

First let’s consider the memory requirements of the two techniques.
 Deferred shading uses a G-Buffer that is 20 bytes per pixel and a
radiance target that is 8 bytes per pixel for a total of 28 bytes per
pixel.  Deferred lighting requires only 8 bytes per pixel for the
G-Buffer and 8 bytes per pixel for the radiance target, but it also
requires 16 bytes per pixel for two irradiance targets.  So in this
configuration deferred lighting actually requires 8 bytes more memory
per pixel.  I am assuming that both approaches are using appropriate
bit-depth targets for high dynamic range rendering with tone
reproduction handled as a post-processing step.  If you assume LDR
rendering instead, I would argue that deferred lighting still requires
deeper than 8-bit targets for irradiance, because the range of values
for irradiance in a scene is typically far greater than the range of
values for exit radiance.  In any case, there are a few variations on
the layout described above and a number of options for overlapping or
reusing targets on the more flexible console architectures that reduce
the per-pixel costs of each technique to an equivalent 20-24 bytes per

Now let’s take a look at bandwidth usage.  The bandwidth required
for “material color textures” and “material normal maps” is content
dependent, but it is also exactly the same between the two techniques
so I can conveniently factor it out of my calculations.  Looking at the
layout described above, bandwidth consumed during the attributes and
shading stages is measured per pixel and bandwidth consumed during the
deferred stages is measured per lit pixel.  Adding everything up except
the material color textures and normal maps, we see deferred shading
writes 20 bytes per pixel plus an additional 8 bytes per lit pixel and
reads 24 bytes per lit pixel.  Deferred lighting, however, writes 16
bytes per pixel plus an additional 16 bytes per lit pixel and reads 16
bytes per pixel plus an additional 24 bytes per lit pixel.  What this
means is that if the average number of lights affecting a pixel is
greater than 0.5, deferred lighting consumes more write bandwidth than
deferred shading.  Furthermore, no matter how many lights affect each
pixel, deferred shading consumes 16 fewer bytes of read bandwidth per

The last thing to consider when comparing the two techniques is
feature flexibility.  So far I’ve looked at how traditional Phong
lighting might be implemented using the rival deferred techniques.
 Proponents of deferred lighting will sometimes argue that handling
only the irradiance calculation in screen-space affords more
flexibility in the choice of lighting models.  Once the diffuse and
specular irradiance buffers have been constructed, each material is
free to use them however it sees fit.  Unfortunately there isn’t as
much freedom in that as one would like.  Most of the interesting
variations in lighting occur in the irradiance calculation, not in the
exit radiance calculation.  Anisotropic lighting, light transmission,
and subsurface scattering all require additional attributes in the
G-Buffer.  They can’t simply be achieved by custom processing in the
shading stage.  When you consider the cost of adding additional
attributes to each technique, the advantages of deferred shading really
come to light.  The 8 byte G-Buffer layout for deferred lighting is
completely full.  There is no room for an ambient occlusion or
transmissive term without adding an additional render target at the
cost of at least 4 bytes per pixel.  The deferred shading layout I’m
using for this comparison, however, has unused channels in both the
diffuse and specular albedo targets that can be read and written
without adding anything to the space and bandwidth calculations above.

To be fair, there is one important detail I should mention.  Most
proponents of deferred lighting recognize the excessive cost in
generating separate diffuse and specular irradiance buffers and
consequently adopt a compromise to the Phone lighting model.  They
assume that specular irradiance is either monochromatic or a scalar
factor of diffuse irradiance, and consequently it can be stored in the
alpha channel of the diffuse irradiance target instead of requiring a
full target of its own.  This configuration dramatically improves the
results calculated above.  Again in the interests of fairness, when
evaluating this form of deferred lighting, a similar compromise should
be made for deferred shading.  The specular albedo can be considered
monochromatic or a scalar factor of diffuse albedo (or both with
sufficient packing).  With these modifications to both techniques
deferred lighting does, indeed, have an advantage.  Deferred lighting
will now require as little as 16 bytes of memory per pixel on some
platforms whereas deferred shading will require 20.  Deferred lighting
also ends up having equal write bandwidth requirements to deferred
shading and lower read bandwidth requirements as long as the average
number of lights per pixel is greater than 2.

Nevertheless, the differences are never huge, and ultimately there
are a number of subtleties regarding how the bandwidth is distributed
across the various stages and whether the stages are typically
bandwidth bound that further muddy the waters.  The most damning
evidence against deferred lighting remains that in a direct comparison
across the content of two games and three platforms it only provided at
best a few percent GPU performance advantage over deferred shading at
the cost of nearly doubling the CPU-side batch count.  If further
evidence is needed, consider that Killzone 2 experimented with deferred
lighting early on in its development and also ultimately settled on a
classical deferred shading architecture.

So as I said at the start, YMMV, but I for one don’t expect to be returning to deferred lighting anytime soon.

Categories: 计算机与 Internet
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: