nVidia GTX470 Bottleneck Investigation
Conclusion
The GTX470 does not appear to be primarily limited by its memory bandwidth, even with the 16% reduction over the GTX285’s bandwidth. Even at 4xAA the shader clock is making a much bigger difference overall, and this suggests nVidia has equipped the GTX470 with the bandwidth it needs.
Needless to say, if you’re overclocking a GTX470, raise the shader clock as high as possible.
Please join us in our Forums
Follow us on Twitter
For the latest updates from ABT, please join our RSS News Feed
Join our Distributed Computing teams
- Folding@Home – Team AlienBabelTech – 164304
- SETI@Home – Team AlienBabelTech – 138705
- World Community Grid – Team AlienBabelTech
Thanks, BFG10K.. it’s nice to see which games are most affected by the bandwidth.
In AvP and Riddick:DA, the penalty for reducing the bandwidth is nearly identical to that of reducing the core/shader clock, which is rather revaling of the bottleneck. Only in Stalker:CoP is the bandwidth clearly sufficient, probably due to the absolute bottleneck on the shader performance part.
Reducing core/shader clock by 20% while leaving the memory bandwidth alone at 100% stock should “loosen” up the bandwidth bottleneck as it is, which is most likely the reason there is a bit less penalty by reducing the bandwidth in a vice-versa way. Say, for example, you have 1GHz core and 1GHz memory, but you drop the core to 800MHz, then you have a surplus of memory. If you leave the core at 1GHz, but drop the memory to 800MHz, the performance hit is almost identical to reducing the actual “work” by 20% itself. Plus if there were plentiful, bountiful bandwidth, then dropping the core by 20% should have theoretically resulted in a full 20% reduction.
Thanks again for the data.. yummy!
Yet, in Stalker:CoP, there appears to be a problem with the optimization of the architecture/drivers/etc.. being unable to scale very well with increased GPU speed. It could be poorly optimized with the system memory/CPU or something else?
AvP and Riddick only get close at 4xAA, and that’s expected since 4xAA loads the memory more. In some ways using higher AA modes makes the test more “synthetic”.
Stalker could be running out of VRAM capacity. If I ever pull the trigger on a GTX480, I can see if the extra memory makes a difference there.
Thanks for the test! What about a “Core 2” and “Core i5/i7” cpu scaling with the Fermi? Pleeeeeeease 😀
Matrixfan:
i5 750 tested with 2 vs 4 cores: http://alienbabeltech.com/main/?p=19601
I don’t have a Core 2 anymore, but is there something else you’d like to see from my i5 750 + GTX470?