ATi Radeon 4000 Series Anti-Aliasing Investigation
Introduction
In a past article, I compared the image quality of ATi’s and nVidia’s latest hardware, but since there’s a lot left to cover with each vendor’s anti-aliasing, I decided to investigate the issue further.
This article will focus on the ATi 4000 series of cards (whose anti-aliasing algorithms are identical to that of the 2000 and 3000 series), while a future installment will cover current nVidia hardware. For this particular article I will use my trusty Radeon 4850.
Hardware
- Intel Core 2 Duo E6850 (reference 3 GHz clock).
- ATi Radeon 4850 (512 MB, ATi reference clocks).
- 4 GB DDR2-800 RAM (4×1 GB, dual-channel).
- Gigabyte GA-G33M-DS2R (Intel G33 chipset, F7 BIOS).
- Creative X-Fi XtremeMusic.
- 19” Sony CPD-G420 CRT (maximum resolution 1920×1440).
Software
- Windows XP 32 bit SP3
- ATi Catalyst 9.4, high quality mip-map setting.
- DirectX March 2009.
- All games patched to their latest versions.
Settings
- 16xAF & AA forced in the driver, vsync forced off in the driver. .
- Since I’m on XP, all DX10 titles were run under DX9 render paths.
- Highest quality sound (stereo) used in all games.
- All results show an average framerate.
Anti-Aliasing Introduction
In 3D graphics, aliasing occurs because of rasterization, the process of converting a scene from vector primitives (essentially infinite resolution) to screen space (finite resolution). Typical aliasing with 3D scenes includes shimmering, jagged edges, line crawling, wiggling surfaces, and similar.
A higher resolution helps to some degree because there are more pixels to interpolate across, but it’s still quite poor at reducing aliasing compared to current hardware anti-aliasing techniques.
Current 3D hardware AA operates on the basic premise of sampling multiple times for a pixel (as opposed to using a single sample), and blending said samples to form the final pixel. The resulting smoother gradients reduce aliasing.
Multi-Sampling Theory (MSAA)
Multi-sampling first debuted on nVidia’s GeForce 3, and is the most popular hardware AA technique currently used in consumer space. This method takes the concept of super-sampling but decouples the shader and texture samples, thereby saving fillrate.
It operates on the concept of rendering only the geometry at a higher resolution (as opposed to super-sampling which renders everything at a higher resolution), then calculating the final color based on whether multiple depth (Z-buffer) values fall on the edge of a polygon or not.
The disadvantage of this method is that it can’t address texture or shader aliasing (since it only anti-aliases polygon edges), but it’s extremely fast compared to super-sampling, and still shows huge improvements in image quality.
Generally speaking, unless you run a very slow GPU setup, you should try to run at least 2xAA, even if it means dropping the resolution a bit. Even 2xAA can provide a substantial improvement in image quality over no AA.
ATi’s MSAA
ATi’s parts first attained MSAA with the 9700 Pro, which offered 2x and 4x rotated grids, along with 6x sparse grid MSAA. 6xAA was noteworthy as being the first in consumer space to debut six-sample MSAA, along with a pseudo random sampling grid.
Starting with the 2000 series, 2xAA and 4xAA remained unchanged, but 6xAA was dropped in favor of 8xAA, also utilizing a sparse grid. Since the 2000 series, ATi refers to these modes as box modes because all samples come from within a single pixel’s area, just like traditional MSAA.
The box modes were performed on the shaders on the 2000 and 3000 series, which is why they had poor MSAA performance. The 4000 series corrects this deficiency by performing the box modes on the ROPs, and also by packing ROPs that are twice as fast in most situations. ROPs are the units responsible for converting the output from the shaders into pixels.
Here are the sample patterns from a 4850 (these are identical across the entire 2000, 3000 and 4000 product lines).
The green dots are shader/texture samples, and in each case there’s only one sampled at the pixel’s center, as is typical with MSAA. The blue dots are the geometry samples and you can see two, four and eight arranged in rotated and sparse grids. This placement of samples ensures optimal coverage (maximum effective edge resolution).
Custom Filter Anti-Aliasing (CFAA)
The 2000 series also gained CFAA, an extension of MSAA, and it’s also supported on the 3000 and 4000 series. CFAA uses MSAA as a base pattern, but also takes additional samples from adjacent pixels.
The advantage of CFAA is that it gains additional quality over regular MSAA without rendering the scene higher, thus saving memory storage and bandwidth. Unfortunately, narrow and wide tent take one and two samples from each adjacent pixel respectively, and this can cause blurring because they’re blending the internals of polygons, not just their edges.
Edge detect corrects this problem by using the shader to detect polygon edges and only sampling adjacent pixels in those situations. This increases the quality of polygon edges without any scene blurring. Additionally, there are more samples sourced from outside the pixel boundary than either narrow or wide tent use.
There are two edge detect modes: 12xAA, which is 4xMSAA + edge detect, and 24xAA, which is 8xMSAA + edge detect.
Unfortunately I’m unable to show you the sample patterns for CFAA; however this technical slide from ATi should give you an idea as to what is happening:
Most games cannot activate these modes directly, but there are two ways you can use them. Firstly, you can force them directly from the control panel by simply using 4xAA or 8xAA along with the edge detect setting. This should work in most games where forced AA works. The other way is to set the control panel’s AA level to application preference, but set the filter to edge detect. Then by setting the game to use 4x or 8x, you should get 12x or 24x respectively in most cases.
Note that it’s important to be using edge detect in the control panel in order to use these modes. If you use either of the tent modes, the scene will experience blurring because the blending won’t be restricted to polygon edges.
Adaptive Anti-Aliasing (AAA)
As I covered in my image quality comparison article, AAA is a method of anti-aliasing alpha (transparent) textures (e.g. vegetation, wire fences, etc), something regular MSAA cannot do. It works by resubmitting the texture multiple times into the pipeline and shifting the location of the sample each time, then blending the samples into one.
From the findings of my last article, I suspect ATi is selectively applying multi-sampling or super-sampling depending on the scene and game. This explains the “fattened” vegetation and the fact that only the edges of the transparent areas are impacted, as is typical with multi-sampling (nVidia’s multi-sampling TrAA did something similar in the past). However since multi-sampling doesn’t work in every game, ATi likely falls back to super-sampling in such titles.
ATi has two AAA modes, performance and quality. Quality uses the full base multi-sampling pattern while performance only uses half the samples (so 4xAA with performance AAA only uses two samples on alpha textures).
Image Quality (MSAA/CFAA)
For the image quality comparison, I have chosen an outside scene from the original Far Cry. Since AA’s main benefit comes during movement, these images have been zoomed in order to better show differences. I would suggest loading the images into separate br/owser tabs and quickly switching between them. Also ensure br/owser image scaling is disabled.
2xAA provides a small improvement over 0xAA in areas 1 and 4, but not much with 2 and 3, probably because the sampling angle is similar to those surfaces’ angles.
4xAA provides a small improvement over 2xAA with surface 3, but large improvements with the other three surfaces.
8xAA provides slight improvements over 4xAA along all four edges.
12xAA is better than 8xAA on surface 1 and 4, but it’s slightly worse with the other two surfaces.
24xAA is the best overall. It’s even better than 12xAA on surface 1 and 4, while being just as good as 8xAA on the other two surfaces.
Notice how the lonely tree in the top-left is not getting any AA no matter what mode is being used? I’m going to check that tree in more detail in the next section.
Image Quality (AAA)
The tree is actually a transparent texture, likely a result of the game’s LOD system optimizing performance by reducing workload at distances. Because MSAA and CFAA (edge detect) only affect polygon edges, none of them impact the tree. So now I’ll enable AAA and see how it affects the tree. Note that I could’ve used another scene with more alpha content, but by using the same scene, it becomes clearer how these AA techniques all come together as a total package.
Even with just the thumbnails you can clearly see changes to the color gradients at each level of AAA, but feel free to load the larger images if you like. Furthermore, the internals of the tree are being impacted, and the edges aren’t being fattened up like we saw in the last article. This would suggest that in this instance, ATi is applying super-sampling AAA.
Interestingly, there’s a difference between 24xAA and 8xAA when there shouldn’t be given the base level of MSAA is the same in both modes (8x). This suggests that edge detect CFAA may be sampling adjacent pixels with alpha textures when AAA is enabled.
Benchmarks (Part 1)
Since the edge detect modes are seldom playable in new titles, I’ve picked a selection of titles from 2004 and 2005 to test the various AA modes. When viewing the benchmarks, keep in mind that a 4890 is around 40%-50% faster than the 4850, especially with heavy AA, so you can use that to estimate what it would score.
Also I’m not going to benchmark AAA for the simple reason that it depends entirely on scene as to what kind of performance hit you’ll experience. The odd fence or grating won’t impact performance much, but large amounts of vegetation will incur a steep performance hit. For this reason, you’ll quickly know if your AAA setting is usable or not just by playing your game.
In Call of Duty 2, 12xAA is actually faster than 8xAA.
In Doom 3, 12xAA is about the same speed as 8xAA, and even a 4850 can comfortably run this game at 24xAA.
In Far Cry, the edge detect modes incur a steep performance hit over the box modes.
Benchmarks (Part 2)
In Fear, 12xAA is dramatically faster than 8xAA, while 24xAA is barely slower than 8xAA.
The 4850 br/eezes though these Quake 4 and Half-Life 2 benchmarks when running 24xAA.
Benchmarks (Part 3)
In Riddick, 12xAA is substantially faster than 8xAA.
The 4850 br/eezes through UT2004, even with 24xAA.
Conclusion
Some interesting findings have been made in this investigation. Firstly, the difference between 8xAA and 12xAA isn’t as clear-cut as simply assuming 12x is better (and therefore slower) than 8x. In some cases, 8xAA provides better image quality than 12xAA. Also sometimes 12xAA is faster than 8xAA; significantly so on occasion. If you find you have spare performance at 4xAA but 8xAA is too slow, try 12xAA as it might actually run better.
The second interesting finding is that edge detect CFAA seems to impact the image quality of AAA in situations where the base multi-sampling mode is the same. This suggests that edge detect CFAA is capable of sampling outside the pixel’s boundary on alpha textures when AAA is engaged.
And finally, while 4xAA definitely remains the sweet-spot for AA, don’t ignore the higher modes if you have enough performance, as they can provide a visible benefit over 4xAA.
Hopefully this article will encourage everyone to experiment a bit more with their ATi cards when gaming. Stay tuned for the nVidia equivalent of this article sometime in the future.
Please join us in our Forums
Follow us on Twitter
For the latest updates from ABT, please join our RSS News Feed
What’s Happening i am new to this, I stumbled upon this I have discovered It positively helpful and it has aided me out loads. I am hoping to give a contribution & assist other users like its aided me. Good job.