会员: 密码:  免费注册 | 忘记密码 | 会员登录 网页功能: 加入收藏 设为首页 网站搜索  
游戏开发 > 程序设计 > 3D图形
发表日期:2007-01-25 17:34:19作者: 出处:  

Introduction / Vertex Shaders, Hardware vertex shaders support



  Throughout the history of 3D-accelerators, engineers of the world's leading manufacturing companies increased the functionality of 3D-chips along with raising their speed. The first 3D-accelerators were only capable of rasterizing triangles and supported the simplest modes of color blending. Intense competition caused rapid development in this field of science. Today, 3D-accelerators have learned to transform and light triangles, perform complicated operations of blending several textures and execute shaders - short programs operating with vertices and pixels.

  The technology of shaders is widely used in 3D-rendering programs (like Renderman) and represents a powerful and flexible means of surfaces description. Today, this technology is being introduced into 3D-accelerators. However, in order to get the picture we have to see the difference between the Renderman shaders and those used in DirectX/OpenGL. First of all, we should remember that the former are meant for relatively slow rendering, while the latter are purely aimed at real-time rendering.

DirectX shader
Renderman shader

Aimed at real-time rendering

Aimed at relatively slow rendering
Uses Assembler-like language
Uses C-like language
Short shaders, no branching and loops, limited number of registers, works with 4D-vectors
A large number of syntactical constructions, including loops and branching, unlimited shader length and the number of available variables, works with matrices and has a wide range of built-in functions
Shader types: vertex and pixel
Shader types: surface, light source, displacement, transformation, volume, image

  Before the shaders appeared, it was the T&L (transformation and lighting) engine that provided geometrical processing of vertices, while the multitexturing (MT) engine was in charge of rasterization. Vertex and pixel shaders have emerged as ideological successors to these engines. All new 3D-accelerators with hardware shaders support will also include the T&L and MT engines, thus allowing game developers to choose between the new and the old methods of rendering (also called programmable and fixed-function, respectively).

  As the specification for shaders keeps evolving, tracking of versions of pixel and vertex shaders was introduced in DirectX. This article is mostly based on materials about GeForce3 which in terms of versions corresponds to Vertex Shader 1.1 and Pixel Shaders 1.1.

DirectX 8.0
DirectX 8.1
Vertex Shaders 1.0 (obsolete)
Vertex Shaders 1.1
Pixel Shaders 0.5 (obsolete)
Pixel Shaders 1.0 (obsolete)
Pixel Shaders 1.1
Pixel Shaders 1.2
Pixel Shaders 1.3
Pixel Shaders 1.4

顶点着色器(Vertex Shaders)

  A vertex shader consists of two parts: shader function and shader declaration. Shader function defines what operations must be executed on a single vertex. Vertices are read from the vertex data stream and are sequentially processed by the shader function. Data can be supplied either by the vertex buffer or by the primitive tessellation engine. Initial data (vertex coordinates, normal, etc.) are loaded into the input registers v0..v15. From the outset, the specification sets no strict correspondence between the register number and its function, for example a vertex coordinates can be loaded in either v0 or v15 registers. As a matter of fact, when no hardware tessellation is used, the shader is unaware of what data are loaded in the input registers. In order that the accelerator could know what data should be loaded in each input register, shader declaration is required. This is necessary to avoid loading redundant data and therefore to use the video memory bandwidth and that of the AGP bus as efficiently as possible.

  Shader function also has at its disposal a set of temporary registers r0..r7 that can be read and changed, and a set of unchangeable constant registers c0..c95 in which the data required for the shader are loaded (also using the shader declaration). Each register allows storing a four-component vector (components are single-precision real numbers). The results of the shader's work is a fully described vertex: x,y,z,w-coordinates, colors, texture coordinates, fog intensity and point size. Of course, shader is not supposed to initialize every single output register.

  下表显示了顶点着色器可以使用的寄存器。请注意"端口数"这一列;他表示的是每条指令中,同一个寄存器可以被引用的最多次数。显然,端口数越多,顶点着色器执行的指令就越复杂。比如,在1.1版的着色器中,指令"add r0,v0,v0"无法使用,因为该指令包含了v0寄存器的两个引用,而v0寄存器的端口数只有1个。而指令"add r0,r0,r0"就是合法的,因为临时寄存器有三个端口。
  The table below shows the registers available for the vertex shader. Please note the column "number of ports" - the maximal number of references to the same register inside a single instruction. It is understandable that the higher is the number of ports, the more complicated instructions can be executed in the vertex shader. For example, in the version 1.1 of the shader the instruction "add r0, v0, v0" will not work, for it contains two references to the register v0, while there is only one port available. On the other hand, the instruction "add r0, r0, r0" is valid, since three ports are available for the temporary registers.

Number of registers
an 0 (1 in version 1.1) Write/Use 1 Address register
cn 96 Read 1 Constant register
rn 12 Read/Write 3 Temporary register
vn 16 Read 1(2 in version 1.1) Input register

着色器的硬件支持(Hardware vertex shaders support)

  DirectX 8限制顶点着色器程序最大可以由128条指令组成。顶点着色器的命令集很小:只包含了17条指令,用于处理矢量和标量。程序员们必须在这个严格的限制下实现他们的想法。
  The size of a vertex shader in DirectX 8 is limited to 128 instructions. The set of commands for vertex shaders is quite small - it contains as little as 17 instructions that work with vector and scalar quantities. Programmers have to implement their ideas within these strict limits.

  Today, two 3D-accelerators have hardware support for vertex shaders, namely the GeForce3 and the Radeon. The GeForce3 is capable of executing a vertex shader at the speed of 1 instruction per clock cycle (only two instructions - RCP and RSQ - take more clock cycles). The longer is the shader, the slower it is executed, and therefore the lower is the vertex processing speed. There are three ways to increase the speed:

2.增加顶点流水线的条数。Radeon 8500和X-GPU(X-BOX使用的图形处理器)包含两条顶点流水线,因此能够以两倍的速度处理多边形;
3.超标量 - 并行处理多条没有相互影响的指令 - 正如CPU所做的那样。上面提到的X-GPU在单条顶点流水线中,每个时钟周期能够执行2-3条简单指令;

· To code complex operations using various tricks. For example, cross-product of two vectors is coded with only two instructions, MUL and MAD, but the sequence of vector components is changed:
MUL R1, R0.zxyw, R2.yzxw ;
MAD R1, R0.yzxw, R2.zxyw, -R1;

· To increase the number of vertex pipelines. The Radeon 8500 and the X-GPU (graphics processor to be used in the X-Box) will have two vertex pipelines that will double their polygon-processing speed.
· Superscalarity - paralleling several instructions that have no interconnections among them (just as in the CPU). The abovementioned X-GPU can execute 2-3 simple instructions per clock cycle in one vertex pipeline.
· Shaders caching. The GeForce3 allows caching several shaders on the chip at once in order to load them faster. DirectX provides no implicit assignment of where to store the ready shaders, it is the driver that controls the shaders caching. With OpenGL the developers of the GF3 drivers made it possible to indicate which shaders must be "resident".

返回顶部】 【打印本页】 【关闭窗口

关于我们 / 给我留言 / 版权举报 / 意见建议 / 网站编程QQ群   
Copyright ©2003- 2024 Lihuasoft.net webmaster(at)lihuasoft.net 加载时间 0.00269