Posts Tagged ‘Tile based’

A new era of forward shading is coming?

2012年01月30日 1 comment

A new era of forward shading is coming?!  –A glance at forward shading for massive lights with computer shader light culling from HD7970 demo ‘Leo’.

The past years is dominated by Deferred shading. It seems everyone likes lights, massive lights. From STALKER to KillZone and StarcraftII. Crytek said they are “a bit more deferred” in CE3, even Unreal is doing deferred shading in their ‘Samaritan’ demo. DICE the exploreres, did a good job in doing a ‘Tile based’ deferred shading with Computer shaders in their Frostbite 2 engine. However, they didn’t realize that this could cause an end of the era of Defered shading.

In the recent released demo for their new HD 7970 card, AMD managed to do a forward shading with massive lights, with compute shader. After downloading their demo and tried with a existing piece of code from Intel, I seem to get the point, so lets share the idea. For those guys didn’t quite familiar with computer shader tile-based deferred shading, I’d suggest you look at Intel’s article first. My test is also based on their code too.

My implementation is quite similar to DICE’s tile based deferred shading. However, instead of writing g-buffer in the geometry pass, I just disabled color writing and did a Z-only pass. Then, run the tile-based compute shader, this shader do such steps:

1. Sample Z buffer,

2. Calculate the min-Z and max-Z for each 16*16 tile (a thread group) with interlocked operations, store them in group-shared memory. ( However, you could just use near and far clip instead of above steps, to do a pure ‘forward’ way ).

3. For each light, calculate if the light intersect with current frustum for the tile. (this is parallel on each thread in the group, so if you have 1024 lights, each thread only do 4 intersection calculations).

(So far it is the same with original tile-based deferred shading).

4. Interlocked increase the group-shared light-counter, write the light index to LightIndexUAV ( Unordered access view ), at the location calulated from light-counter. The LightIndexUAV is the same size as viewport (probably need an edge extension), with format R16_UINT, so there are 16*16 texels for us to store the light index for each tile.

Then, unbind the UAV and set as SRV, run forward shading for each renderable object. This time we continuously sample from the LightIndexSRV for the tile the current shading pixel lays in, untill we arrive the clear color ( e.g. 0xffff in my case), for each sampled LightIndex, sample the actual light data buffer with this value, and accumulate the lighting as normal forward shading.

At last we get a shaded scene: (there’s no difference right? but notice the ‘No cull forward’)

Performance: ( in fps, Power plant , camera at init position, 720p, 1024 lights, on radeon hd6870 )

MSAA           off         2x           4x           8x

Forward      161        139         122          91

Deferred     262       159         123          69

One thing that makes me unhappy is the performance is not beating deferred shading when using no or low MSAA. We still need to optimize the way. However, it do give us a posibility to use native MSAA and more complex shading models, right?

Code is attached here if you want to try or help me with optimization. (Plz rename it to zip). You need to download intel’s demo and replace these files. Media files are just to big to upload here, Sorry.

Btw, This is not to involve some war between deferred and forward. Myself is a fan of deferred shading, LOL.

At last, is the ‘Leo’ Demo making fun of somebody?