The GTX 750 Ti arrives as Energy-efficient 28nm Maxwell
Maxwell Architecture
This new GTx 750/Ti chip uses Nvidia’s brand-new Maxwell architecture on the same 28nm process as Kepler architecture. However, this time Maxwell is designed as ‘mobile first’, which means that mobile configurations and power considerations are primary. To introduce Maxwell first, the most entry-level (“cut-down”) designation for Nvidia’s GPU architecture is codenamed “GM107” (“M” for Maxwell) and it was primarily designed for limited-power applications like notebooks and small form factor (SFF) PCs which include the Steam Machine initiative.
Maxwell is a brand new architecture that continues to build on Kepler and Fermi with a new Streaming Multiprocessor (SM) that improves performance per watt and performance per die size. Maxwell improvements over Kepler include, control logic partitioning, workload balancing, clock gating granularity, compiler-based scheduling, and IPC (instructions issued per clock cycle) improvements, allow the Maxwell SMM to exceed Kepler SMX efficiency.
The GM107 GPU contains one Graphics Processing Cluster (GPC) , five Maxwell Streaming Multiprocessors (SMM), and two 64-bit memory controllers (128-bit total). Just as with Kepler, Maxwell also implements multiple SM units within a GPC, and each SM includes a Polymorph Engine and Texture Units, while each GPC includes a Raster Engine.
ROPs are still aligned with L2 cache slices and Memory Controllers. Internally, all the units and crossbar structures have been redesigned, data flows optimized, and power management significantly improved. The SM scheduler architecture and algorithms have been rewritten to be more intelligent and to avoid unnecessary stalls, while further reducing the energy per instruction required for scheduling.
This is the full implementation of the chip in the form of the GeForce GTX 750 Ti. Nvidia has increased the number of Maxwell SMs (GTX 750 Ti) to five in GM107, compared to two SMs in GK107 (GTX 650) with a very modest 25% increase in die size.
The organization of the SM has also changed. Each SM is now partitioned into four separate processing blocks, each with its own instruction buffer, scheduler and 32 CUDA cores. The Kepler approach of having a non-power-of-two number of CUDA cores, with some that are shared, has been eliminated. This partitioning simplifies the design and scheduling logic, saving area and power, and reduces computation latency.
Pairs of processing blocks share four texture filtering units and a texture cache. The compute L1 cache function has now also been combined with the texture cache function, and shared memory is a separate unit, that is shared across all four blocks.
With this new Maxwell design, each SM is significantly smaller while delivering about 90% of the performance of a Kepler SM. Comparing GK107 versus GM107 total SM related metrics, GM107 has five versus two SMs, 25% more peak texture performance, 1.7 times more CUDA cores, and about 2.3 times more delivered shader performance.
Memory System
GM107 internal memory system bandwidth was increased along with improvements in efficiency of the design. In addition, the larger Maxwell GM107 2048KB L2 cache versus 256KB in GK107 allows performance to be increased as fewer requests to the vRAM are needed and overall board power is reduced. This means that a GM107 GTX 750 Ti is expected to deliver two times the performance per watt of Kepler’s GK107 (GTX 650, non Ti).
Unfortunately, we do not have a GTX 650, but will use the more powerful GTX 650 Ti (GK106) which launched at GTX 750 Ti’s (GM107) same $149 pricing. Interestingly, Nvidia is now able to use their most basic chip to fill the $149 price point.
Detailed Architecture and Power savings means 60W TDP for the GTX 750 Ti (!!) versus 110W for the GTX 650 Ti (GK106) and vs. 150W for the R7 265
Although the GK107 in the form of the GTX 650 Ti is able to manage with 64W TDP, the GTX 750 Ti is a much stronger GPU that can manage with less than 60W TDP. And looking back, the GTX 550 Ti gives 1/2 the performance per watt that the GTX 650 Ti gives which in turn is 1/2 the performance per watt of the GTX 750 Ti. Amazingly, the performance doubled on the same 28nm process from Kepler to Maxwell, unlike with doubling from the 40nm to 28nm transition that happened from Fermi to Kepler.
New Video Capabilities and Shadowplay recording with Maxwell
Just one optimize button customizes the settings for a “best playable experience” with the GeForce Experience.ShadowPlay is a better alternative to Fraps when it comes to recording video. The compression is much better and the overhead from recording ones own game is much lower.
Shadowplay is available to Kepler and now Maxwell owners and we will bring you our experiences with it compared to AMD’s Gaming Evolved app in an upcoming evaluation. Nvidia also offers “G-SYNC” which promises to revolutionize gaming by syncing the display to a Nvidia GPU.
One of Kepler’s key innovations over prior GeForce GPUs was its hardware-based H.264 video encoder, NVENC. By integrating dedicated hardware circuitry for video encoding/decoding (rather than using the GeForce GPU’s CUDA Cores) NVENC provids a significant performance speedup for H.264 encoding while consuming less power.
Nvidia used Kepler’s NVENC encoder to introduce ShadowPlay to GeForce GTX 600 series and GTX 700 series last Autumn, allowing them to record their favorite gaming moments. Since launching ShadowPlay, over 3 million videos have been captured, with gamers posting them to YouTube or even streaming their gameplay footage live over Twitch.
To improve video performance, Maxwell features an improved NVENC block that provides faster encode (6-8X real-time for H.264 vs. 4x real-time for Kepler) and 8-10X faster decode, and thanks to a new local decoder cache, higher memory efficiency per stream for video decoding, resulting in lower power for video decode.
Maxwell architecture also features a new GC5 power state designed to reduce the GPU’s power consumption specifically for light workload cases like video playback. GC5 is a low power sleep state that provides considerable power savings over prior Nvidia GPUs for low-power uses.
SFF (Small Form Factor) PCs
Gamers with home theater and other small form factor PCs no longer have to compromise to get a good gaming experience at 1080p since the GeForce GTX 750 Ti fits into a wider range of basic and OEM PCs without the need for upgrading the power supply. Because the GeForce GTX 750 Ti consumes so little power, it runs extremely quiet and generates very little heat, making it ideal for use in a home theater PC. Nvidia is claiming the GTX 750 Ti as the world’s fastest graphics card that doesn’t require a power connector.
GTX 750 Ti/750/650 Ti Specifications compared
The base clock speed of the GeForce GTX 750 Ti is guaranteed to be a minimum of 1020MHz. The typical Boost Clock speed is 1085MHz. The Boost Clock speed is based on the average GeForce GTX 750 Ti card running a wide variety of games and applications. Note that the actual Boost clock will vary from game-to-game depending on actual system conditions. Our own sample of the GTX 750 Ti reached a maximum boost of 1284MHz!
The GTX 750 saves 5W of TDP down from the Ti’s 60W to 55W by being slightly cut down; from 640 to 512 CUDA cores, but the base and boost clocks are the same. We did not receive a GTX 750 from Nvidia.
As a comparison, here are the specifications for the GK106 GTX 650 Ti which also launched at $149:
From comparing the specifications of the new GTX 750 Ti to the GTX 650 Ti, it is quite apparent that the new GPU is on new power-efficient Maxwell architecture. It has the same number of ROPs, the same 128-bit memory interface and 2GB of GDDR5 vRAM at 5400MHz versus 1GB for the 650 Ti. Yet the GTX 750 Ti requires no external power connector, can boost far higher with significantly better performance, all the while using only 60W to the GTX 650 Ti’s 110W! The card is also physically small at 5.7″ long and will fit into most motherboards and cases.
The GTX 650 Ti and Ti BOOST were Nvidia’s replacement for the GTX 550 Ti which launched April, 2011 at $149. The regular GTX 750 Ti and the GTX 650 Ti before it, follows the pricing tradition of the GTX 450 which also debuted at $149. 1080p has become the most popular gaming resolution due to low LCD prices and Nvidia built the GeForce GTX 750 Ti to deliver best-in-class performance for this HD resolution.
The GTX 750 Ti’s Performance Competition – the R7 260X and the R7 265
As with the GTX 650 Ti and GTX 650 Ti BOOST, Nvidia is aiming for slightly less than GTX 660 performance with a GTX 750/750 Ti with a slightly better than “entry-level” video card to directly compete with AMD’s just above entry level gaming cards, the R7 260 and 260X. The R7 260 is Bonaire’s repackaged and slightly slower HD 7790 while the $129 260X is a slightly faster 7790. The R7 265 is a cut-down Pitcarin R9 270 (or faster HD 7850) which is supposed to retail for $149, but may become a pricing victim of the crypto-currency mining fad. From just looking at the specifications, we can expect the GTX 750 Ti to easily outperform the R7 260X.
The GTX 750 Ti’s display outputs include two dual-link DVIs and one mini-HDMI. Three displays may be driven simultaneously from the reference design. No PCIe power connector is required for the 750 Ti’s operation although some partner designs may include it for overclocking.
How does the $149 GTX 750 Ti compare with the GTX 650 Ti it replaces?
This evaluation attempts to analyze and compare performance of the GTX 750 Ti, the GTX 650 Ti, and the GTX 480. We also include factory overclocked Sapphire Vapor-X HD 7770 performance as well as R9 270x although it is in a much higher price range, $199 when it was introduced, although in the USA now it sits at $250 GTX 760 pricing because of mining. We want to see what this new Nvidia Maxwell entry-level gaming GPU brings to the table for about one hundred and forty-nine dollars.
Since we do not want any chance of our CPU “bottlenecking” our graphics, we are testing all of our graphics cards by using our Haswell Intel Core i7-4770K at 4.0GHz, 8 GB Kingston PC2800 DDR3 and ECS’s Z87 golden motherboard. We have also equalized performance between Haswell and Ivy Bridge, so we can confidently bring you our Bigger Picture, including adding more video cards right up to the R9 290X and the GTX 780 Ti
Before we do performance testing, let’s take a look at our test configuration as well as Power Draw, temperatures and overclocking.
Beautiful review! It’s interesting to see how the 60W card can actually play these games at such settings. Ideal for a mITX build – especially as a HTPC “console” that blows XboxOne out of the water with all these games already available (a vast library).