nVidia GTX470 Bottleneck Investigation
Introduction
This article is a continuation of my ongoing investigation of bottlenecking in video cards. The crux of it is to individually underclock the various clock speeds of a video card to see which has the biggest performance impact. Here are the past articles I’ve written on the same subject:
From the 8xxx series onwards it was possible to control core, shader and memory clocks on nVidia’s cards. With the 4xx series that has changed, and now the shader clock controls all non-memory clocks. More specifically, the core clock is always half of the shader clock.
Since there’s no real limit to how far we can underclock, I’ve chosen a nice round figure of 20%. I’ll individually underclock the shader and memory (leaving the other at stock) to see which has the most performance impact. This equates to a 972 MHz shader clock and a 1339 MHz memory clock.
The tests will be done on a single GTX470, and this card just happens to have about 16% less memory bandwidth than a GTX285, but has more pixel fillrate and shader performance. In light of this, it’ll be interesting to see whether the card is castrated by the reduced bandwidth.
1920×1200 will be used with both 2xAA and 4xAA, because these are quite realistic settings for someone gaming on a single GTX470.
Thanks, BFG10K.. it’s nice to see which games are most affected by the bandwidth.
In AvP and Riddick:DA, the penalty for reducing the bandwidth is nearly identical to that of reducing the core/shader clock, which is rather revaling of the bottleneck. Only in Stalker:CoP is the bandwidth clearly sufficient, probably due to the absolute bottleneck on the shader performance part.
Reducing core/shader clock by 20% while leaving the memory bandwidth alone at 100% stock should “loosen” up the bandwidth bottleneck as it is, which is most likely the reason there is a bit less penalty by reducing the bandwidth in a vice-versa way. Say, for example, you have 1GHz core and 1GHz memory, but you drop the core to 800MHz, then you have a surplus of memory. If you leave the core at 1GHz, but drop the memory to 800MHz, the performance hit is almost identical to reducing the actual “work” by 20% itself. Plus if there were plentiful, bountiful bandwidth, then dropping the core by 20% should have theoretically resulted in a full 20% reduction.
Thanks again for the data.. yummy!
Yet, in Stalker:CoP, there appears to be a problem with the optimization of the architecture/drivers/etc.. being unable to scale very well with increased GPU speed. It could be poorly optimized with the system memory/CPU or something else?
AvP and Riddick only get close at 4xAA, and that’s expected since 4xAA loads the memory more. In some ways using higher AA modes makes the test more “synthetic”.
Stalker could be running out of VRAM capacity. If I ever pull the trigger on a GTX480, I can see if the extra memory makes a difference there.
Thanks for the test! What about a “Core 2” and “Core i5/i7” cpu scaling with the Fermi? Pleeeeeeease 😀
Matrixfan:
i5 750 tested with 2 vs 4 cores: http://alienbabeltech.com/main/?p=19601
I don’t have a Core 2 anymore, but is there something else you’d like to see from my i5 750 + GTX470?