Coordinate systems

As we have seen, the coordinates of the vertex we're processing arrive in the vertex shader as gl_Vertex and they have to leave as gl_Position. These are technically vec4 data types, and we may assume that since they represent positions their last coordinate is a 1. But in what coordinate systems they're in?

Confusingly, there are multiple different coordinate systems used in rendering, so let's spend some time to pick them apart.

Model coordinates

The most usual case is probably that you have created a mesh using a 3d modeling tool like Blender and load this into the renderer. In this case, the content of gl_Vertex is whatever coordinates you used in the modeling tool.

That's right - you might have the feeling you have arranged a scene by moving things around when setting up the pipeline, but the vertex shader doesn't know this yet. In a flightsim, regardless where the flight dynamics code currently places the plane on Earth, if you created the plane 3d model with the coordinate origin at the nose of the plane, that's where it is as far as gl_Vertex is concerned.

This also means that if you have two models in the scene, they both have different coordinate origins as far as the vertex shader is concerned - at this stage, our coordinate system is a patchwork of different meshes, each with their own origin! This can be a blessing or a curse, dependent on what you want to do. It can be utilized if you create the model with a particular coordinate origin in mind and use that information. For instance, you can create all models such that the z-axis points upward - then you can use the fact that gl_Vertex.z is the upward-pointing component later.

Eye coordinates

The vertex is expected to leave the vertex shader as gl_Position (note that ftransform() sets this automagically) - but this vertex is not in model coordinates, it is in projected eye coordinates.

In eye coordinates (before projection), the camera (or eye) of the renderer is at the coordinate origin, x and y span the plane which will later become the screen and z is the depth into the screen. If you consider that the stage following the vertex shader will involve the projection into screen coordinates, eye coordinates make a lot of sense because projecting into screen space is dead simple - you just drop the z-coordinate and all ends up in the same plane.

Now, changes of coordinate systems can mathematically be written as a matrix which multiplies a vector. So can rotations around an arbitrary axis and scale transformations. If you use 4-component vectors, you can even write translations as matrices. Thus, all the moving around and rotating your model to position it where in the scene it belongs can be written as a long chain of matrices, and the beauty of math is that to get the end result, you can just multiply them. And you can multiply the coordinate transformation matrix to eye coordinates as well.

Technically, all the positioning, scaling and the transition to eye space is hence one single matrix which is made available to the renderer as gl_ModelViewMatrix (technically this is an uniform mat4 data type, but we don't need to declare it as it's a natively available matrix - that's different though in later versions of GLSL). Thus , to go from model space to eye space, we could do

vec4 eyePos = gl_ModelViewMatrix * gl_Vertex;

To go from there into projected space, we need to apply a projection matrix in addition

gl_Position = gl_ProjectionMatrix * gl_ModelViewMatrix * gl_Vertex;

or simply use the pre-defined version

gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;

and (simplifying a bit) is what ftransform() does.

So after you have made the transition to eye coordinates, you have a continuous coordinate system across all objects in the scene, because you have moved everything where it belongs - but in this coordinate system, you have a hard time figuring out where the 'up' direction is, because all coordinate axes change meaning as you look around in the scene, and coordinate values change as you move the eye point. So, how to go back from eye coordinates to model coordinates? Sane transformation matrices (those that don't destroy information by doing a projection) have an inverse which is called gl_ModelViewMatrixInverse - you just multiply your vector in eye coordinates with that, and you'll be back in model coordinates.

Say you want to pull the whole scene two units closer to your eye and then change to world coordinates. Pulling things closer is complicated in model coordinates, as you don't easily know where the eye is and how to get closer. It's trivial in eye coordinates because you know the z-coordinate is the distance from the eye plane. So you can just do

vec4 vEye = gl_ModelViewMatrix * gl_Vertex;
vEye.z -= 2.0;
vec4 vModel = gl_ModelViewMatrixInverse * vEye;

and you're done.

World coordinates

World coordinates are probably closest to how you yourself think about the scene. For instance, in a flightsim, you put a model of a tower to San Fancisco airport. That's a defined location in latitude, longitude and altitude above sea level.

Now, that's a valid coordinate system, but it's a spherical coordinate system and they have their quirks - rendering by and large prefers Cartesian orthonormal systems. So you might think of converting the spherical coordinates into a system where the origin is at the center of Earth, the z-axis points towards the north pole and the x-axis towards the zero longitude line. In this system you define where things are, because that is continuous across all objects and not a patchwork like model coordinates and doesn't change weirdly whenever you move the view like eye coordinates.

Unfortunately OpenGL does not supply access to such a system by default - you have to arrange for the world coordinate transformation matrices to be appended to the state set yourself. If you have for instance the transformation from world space to eye space as osg_ViewMatrix and its inverse, you can do

uniform mat4 osg_ViewMatrixInverse;

vec3 worldPos = (osg_ViewMatrixInverse *gl_ModelViewMatrix * gl_Vertex).xyz;

to reach world coordinates (note the use of swizzling to get a 3-vector out of what is really a 4-vector).

Why don't we habitually render in this useful system? Because the world is a huge place, and we have only floating point precisison - outside the renderer osg_ViewMatrix can be in double precision and works good enough, but on the graphics card we get numerical jitter if we expect fine resolution of a few meters. So dependent on how large your world is, the world coordinates may or may not be that useful.

Using coordinate systems

Matrix multiplications are something the GPU does really well, so we can basically change from one system to the other at lightning speed. The idea is then to organize all computations such that your life is easy. Mathematically it doesn't matter, if a computation is possible in one coordinate system, it is possible in all equivalent ones. Just the work involved may be drastically different.

To give an example, say we want to pull the whole scene closer as above - we've seen that in eye space this is simple, In model space it's slightly more involved to think through. First we need to find the view axis in model space, then we need to move the vertex along this view axis. We do know the view axis in eye coordinates, so this is doable, and it looks like

vec3 viewAxis = (gl_ModelViewMatrixInverse * vec4 (0.0, 0.0, 1.0, 0.0)).xyz;
vec4 vModel = gl_Vertex; -= 2.0 * viewAxis;

Not a real roadblock perhaps, just different.

For many use cases, it doesn't really matter so much what coordinate system you're in, because what matters are relative positions. There's often only three interesting points in the scene - the vertex you're looking at, the eye that is looking and the light source that's illuminating the vertex. To compute illumination, you don't necessarily need to know where the vertex is located in absolute coordinates, you just want to know where it is located relative to the eye and the light. For that, it doesn't really matter whether model coords are not continuous across different meshes, or whether eye coordinates change as you change the view - you just obtain relative vectors in one system and use them everywhere.

Say you have the position of the light source in eye coords (the usual arrangement). The two relative vectors light-vertex and eye-vertex in eye coordinates are

vec4 relVec = gl_ModelViewMatrix * gl_Vertex;
vec4 lightVec = gl_LightSource[0].position - relVec;

In model coordinates the same vectors are

vec4 eyePos = gl_ModelViewMatrixInverse * vec4 (0.0, 0.0, 0.0, 1.0);
vec4 relVec = -;
vec4 lightPos = gl_ModelViewMatrixInverse *gl_LightSource[0].position;
vec4 lightVec = lightPos - relVec;

As you can see from the relative length, GLSL is geared towards computing in eye space - but if you're interested in detailed atmosphere effects, unfortunately that is not optimal because the up-direction is special, so I have structured many of my shader computations to operate in model space.

Continue with Blinn-Phong Shading.

Back to main index     Back to rendering     Back to GLSL

Created by Thorsten Renk 2016 - see the disclaimer and contact information.