The Linux 3Dfx HOWTO: Graphics Accelerator Technology

2. Graphics Accelerator Technology

2.1 Basics

This section gives a very cursory overview of computer graphics accelerator technology, in order to help you understand the concepts used later in the document. You should consult e.g. a book on OpenGL in order to learn more.

Basically, 3D computer graphics often requires a lot of calculations for each single pixel on the screen. This is especially true if the application has to render a polygon world for many frames of an interactive animation. Even with low resolutions like 320x200, this consumes more processing power than even the most powerful PC could deliver.

To overcome that bottleneck, several companies have designed, manufactured and sold processors dedicated to operations needed for 3D computer graphics. So far, virtually none of the boards manufactured so far offered any Linux support. Fortunately, the manufacturer of the Voodoo Graphics (tm) and Voodoo Rush (tm) chipsets, 3Dfx, decided to support use of Voodoo Graphics (tm) based boards with Linux. The purpose of this document is to describe the support currently available.

2.2 Hardware configurations (add-on)

Graphics accelerators come in different flavors: either as a separate PCI board that is able to pass through the video signal of a (possibly 2D or video accelerated) VGA board, or as a PCI board that does both VGA and 3D graphics (effectively replacing older VGA controllers). The 3Dfx boards based on the Voodoo Graphics (tm) belong to the former category. We will get into this again later.

If there is no address conflict, any 3D accelerator board could be present under Linux without interfering, but in order to access the accelerator, you will need a driver.

2.3 Performance limitations

Fill bound

Hardware accelerated graphics is performance bound for several reasons. A typical bottleneck is fill rate: the total number of pixels that the hardware could possibly do under optimal conditions, within a given time interval - e.g. about 40 Mpixels/second. Given a 640x480 screen resolution and zero overdraw, the hardware won't do more than 130 frames/second.

The amount of overdraw depends on the actual depth complexity of the scene (how many polygons would a ray through a pixel intersect) and the efficiency of the visible surface determination algorithm used by the application. Drawing each pixel twice means 65 frames/second, an overdraw of 2 (drawing each pixel thrice) gets you down to about 43 frames/second.

Missing refresh

Next, you will probably render with double buffering, swapping back and front buffer as soon as the frame is completed. Here the refresh rate of the display comes into play: you will only switch buffers during refresh. If your application misses a 60Hz refresh on every single frame, your effective frame rate will drop to 30Hz (every second refresh). Missing two refreshes gets you down to 20Hz.

Primitive bound

If the scene is not very detailed (only a few polygons, but those very large, with lots of overdraw), your application will probably be fill bound - it is possible to throw more primitives (lines, triangles, polygons) at the hardware, but the pixel pipeline can't go any faster anyway.

However, if your application insists on rendering a lot of small triangles or polygons, you might end up primitive bound. Given a PCI bandwidth of 33MHz times 32bit, or 132 MB/sec, and a per-triangle dataset of 3 vertices (9 coordinates, 16bit each, plus 3 colors, 24bit each), and a frame rate of 20Hz, you will transfer about 240K triangles/frame - not counting texture data, disk access, and other operations.

2.4 Hardware accelerated features

The rendering operations usually supported by usefull hardware accelerators are:

Perspective correct texture mapping
Alpha-blending, Fog
Anti-aliasing
Bi-linear and advanced texture filtering
Level of detail (LOD) MIP mapping
Sub-pixel correction
Polygonal-based Gouraud shading and texture modulation
Double buffering
Depth buffering, stencil buffer

Usually, hardware allows increased screen resolution (software-only rendering being limited to 320x200 pixels for interactive frame rates), advanced filtering, true alpha channel translucency, and use true color 16bpp or 24bpp frame buffers.

2.5 A bit of Voodoo Graphics (tm) architecture

Usually, accessing texture memory and frame/depth buffer is a major bottleneck. For each pixel on the screen, there are at least one (nearest), four (bi-linear), or eight (tri-linear mipmapped) read accesses to texture memory, plus a read/write to the depth buffer, and a read/write to frame buffer memory.

The Voodoo Graphics (tm) architecture separates texture memory from frame/depth buffer memory by introducing two separate rendering stages, with two corresponding units (pixelfx and texelfx), each having a separate memory interface to dedicated memory. This gives an above-average fill rate, paid for restrictions in memory management (e.g. unused framebuffer memory can not be used for texture caching).

Moreover, a Voodoo Graphics (tm) could use two TMU's (texture management or texelfx units), and finally, two Voodoo Graphics (tm) could be combined accessing the same RAMDAC with a mechanism called Scan-Line Interleaving (SLI). SLI essentially means that each pixelfx unit effectively provides only every second scanline, which decreases bandwidth impact on each pixelfx' framebuffer memory.