nVidia 8800 Ultra Bottleneck Investigation
Conclusion
The biggest performance difference clearly comes from the core clock where some games are almost seeing a 1:1 performance delta with it. I expected it would be shader clock making the biggest difference, but clearly that isn’t the case with the 8800 Ultra.
My theory is the Ultra’s 128 SPs have plenty of shader power to burn as even a 1224 MHz shader clock is double the stock core clock (612 MHz). Ramping up the shader clock was a smart move on nVidia’s part, especially since I doubt it affects yields too much.
Enabling AF + AA moves the bottleneck away from the shaders and onto the memory and core. I expected this as said features hit the ROPs, TMUs and bandwidth harder. Note that even though TMUs are tied to SPs on the G80/G92, they actually run at core clocks, not shader clocks.
These figures may be different across generations of cards so results may vary, but it’s clear if nVidia wanted to improve the performance of a 8800 Ultra without resorting to a GT200 core, the simplest way would be to ramp the core clock.
Needless to say, if you’re looking for extra performance from overclocking a 8800 Ultra, raise the core clock as high as possible.