Memory Bandwidth and Interface
No matter how astounding the GPU's number-crunching feats, they're wasted if video-card memory is a bottleneck. Most current graphics cards use DDR memory; a higher DDR clock speed means more bandwidth. Another option is to upgrade the type of memory, as both Nvidia and ATI have with DDR-2 models and as Nvidia is now pursuing with GDDR-3.
An even more important factor in overall memory performance is a GPU's memory interface. Current high-end implementations are substantially more sophisticated than the basic dual-channel DDR controllers seen in today's PC chipsets. ATI's Radeon 9800, for instance, features a 256-bit memory interface broken up into four channels that operate independently, with sequencer logic to control the back-and-forth flow of 64-bit blocks of data. Nvidia offers a similar 256-bit crossbar memory controller; both are much more efficient than simply passing data back and forth 256 bits at a time.
Another way to boost memory bandwidth is not to use more than you need to (e.g., not bothering to draw a beautifully textured background object when an ugly alien warrior is standing in front of it). This is achieved through compression and culling algorithms and processes, including frame-buffer and Z compression, hidden-surface removal, data prediction, and other techniques designed to squeeze out every bit of bandwidth.
Software Secrets
Besides lavishing attention on their GPU hardware, Nvidia and ATI boast about the skill of their programming staff -- while Microsoft waits years between Windows versions, the graphics warriors release new, ever-more-optimized software drivers more or less on a monthly basis. Drivers translate application commands through an application programming interface (API) -- increasingly Microsoft's DirectX for consumer titles, though the older OpenGL, which grew out of the computer-aided design (CAD) universe, continues to be important -- and then into hardware-specific commands to achieve the best overall results.
This is very important when dealing with programmable elements such as vertex and pixel shaders; how well the driver performs these tasks can make or break overall 3D speed and visual quality, along with contributing to application stability and compatibility. Constant driver updates squash bugs and glitches as well as tweaking performance and adding new features (such as selective GPU overclocking or core-speed adjustments, usually coupled with thermal monitoring and throttling back if overheating is imminent, for hardcore gamers).
Microsoft sets the pace here, releasing new versions of DirectX well before GPU makers implement the new features in hardware or game developers ship titles that take advantage of them. (Game vendors are eternally torn between their desire to show off the best-looking, most realistic cinematic experiences possible and their desire to sell software to a large audience instead of the relative handful of hardcore players who spend freely for the newest, fastest hardware.)
DirectX 9, for example, opened the door to floating-point math with 128-bit precision versus its predecessor's 48-bit integer arithmetic, permitting a much broader or smoother spectrum of light and shadow, with a million instead of a few hundred gradations available between blinding light and pitch black. This steady advance keeps the pressure on GPU sellers to implement more registers, pipelines, and other components to win both Microsoft's and gamers' seals of approval.
Keeping It Real
While game fanatics still brag about sheer speed, few users can detect the difference between -- and no current monitors can keep up with -- a 1,600 by 1,200-pixel image delivered at 150 frames per second instead of 130. So the 3D market, having achieved satisfactory speed, is increasingly turning to improved image quality. The current hot spot here is the GPU's antialiasing unit, which is responsible for maintaining detail while smoothing lines and curves in the rendered image.
Older graphics cards relied on supersampling -- up-sampling or adding more detail to image data, which worked well but took a severe toll on performance. The next step was hardware multisampling, in which individual textures are sampled using an algorithm before final pixel data is generated; multisampling algorithms and formats differ between vendors and GPUs as far as how, where, and in what pattern the texture data is sampled. The number of texture samples supported is usually selectable between 2X and 8X; the higher the number, the cleaner the image and the greater the performance penalty.
Anisotropic filtering (along with bilinear and trilinear filtering) is another popular method of improving image quality. This technique samples multiple textures to smooth out various texture artifacts, especially as objects fade into the distance. The higher the AF setting (usually 2X to 16X), the more samples used (usually 2X to 16X). Anisotropic filtering is processed through the pixel engine or shader, and does not have as great a performance impact as antialiasing -- but doesn't have as great an image-quality impact, either.
As shown by the most recent Nvidia announcement, the GPU landscape continues to move ahead at breakneck speed. Later this year, PCI Express is expected to supplant AGP as the performance interface of choice, offering even more bandwidth to high-end GPUs. Microsoft hasn't set a date for DirectX 10, but is already leaking new pixel- and vertex-shader specifications. And pixel pipelines and memory and core speeds continue to climb. The goal is nothing short of virtual reality, and that'll take all the processing power vendors have to give.