Friday, April 15, 2011

Bit-Packing: Depth and Normals

When I discovered that CUDA wouldn't let me register a texture that was of the GL_DEPTH_COMPONENT type, I decided that I needed to write the values of depth into a texture in my g-buffer shader. Also, I had been neglected the task of actually packing the 16-bit normal components into two different channels of an 8-bit RGBA texture. After some searching, I can across the following code for "packing" 16- or 32-bit information into 8-bit channels.

// Packing and unpacking code courtesy of:
// http://www.ozone3d.net/blogs/lab/20080604/glsl-float-to-rgba8-encoder/
// http://olivers.posterous.com/linear-depth-in-glsl-for-real

vec4 packFloatTo4x8(in float val) {
 const vec4 bitSh = vec4(256.0f*256.0f*256.0f, 256.0f*256.0f, 256.0f, 1.0f);
 const vec4 bitMsk = vec4(0.0f, 1.0f/256.0f, 1.0f/256.0f, 1.0f/256.0f);
 vec4 result = fract(val * bitSh);
 result -= result.xxyz * bitMsk;
 return result;
}

vec4 pack2FloatTo4x8(in vec2 val) {
 const vec2 bitSh = vec2(256.0f, 1.0f);
 const vec2 bitMsk = vec2(0.0f, 1.0f/256.0f);
 vec2 res1 = fract(val.x * bitSh);
 res1 -= res1.xx * bitMsk;
 vec2 res2 = fract(val.y * bitSh);
 res2 -= res2.xx * bitMsk;
 return vec4(res1.x,res1.y,res2.x,res2.y);
}

float unpack4x8ToFloat(in vec4 val) {
 const vec4 unshift = vec4(1.0f/(256.0f*256.0f*256.0f), 1.0f/(256.0f*256.0f), 1.0f/256.0f, 1.0f);
 return dot(val, unshift);
}

vec2 unpack4x8To2Float(in vec4 val) {
 const vec2 unshift = vec2(1.0f/256.0f, 1.0f);
 return vec2(dot(val.xy, unshift), dot(val.zw, unshift));
}

The problem with this is that it doesn't actually preserve the identical bit information from the 32-bit float. Some amount of precision is lost, which leads to an obvious difference between forward and deferred rendering as shown below. The picture on the left is forward, and the picture on the right is deferred.


If you look at the white spot of light in the center of each region, you can see that the one on the left is less sharp, indicating that the associated light source might have been further away.

GLSL 4.0 (and earlier versions with the proper extensions defined) supports direct manipulation of the bits and convenient functions for packing/unpacking floats into integer formats. This allows you to preserve perfect bit information. Unfortunately, I couldn't figure out how to read/output texture data as anything but normalized floating point numbers, even when I requested an internal type of unsigned int. I'm assuming that it is possible to do otherwise, I just couldn't figure out how. Anyway, I was able to develop packing/unpacking methods that seem to do a perfect bit pack/unpack, but I'm a little wary due to  the (forced) conversion to normalized floats. The code I ended up using is:

vec4 packFloatTo4x8(in float val) {
 uint a = floatBitsToInt(val);
 return vec4((bitfieldExtract(a,0,8))/256.0f,
    (bitfieldExtract(a,8,8))/256.0f,
    (bitfieldExtract(a,16,8))/256.0f,
    (bitfieldExtract(a,24,8))/256.0f);
}

float unpack4x8ToFloat(in vec4 val) {
 uint a = uint(val.x*256.0f+.5f) + 
 uint(val.y*256.0f+.5f)*256u + 
 uint(val.z*256.0f+.5f)*256u*256u+
 uint(val.w*256.0f+.5f)*256u*256u*256u;
 return uintBitsToFloat(a);
}

vec4 pack2FloatTo4x8(in vec2 val) {
 uint a = packUnorm2x16(val);
 return vec4((bitfieldExtract(a,0,8))/256.0f,
    (bitfieldExtract(a,8,8))/256.0f,
    (bitfieldExtract(a,16,8))/256.0f,
    (bitfieldExtract(a,24,8))/256.0f);
}

vec2 unpack4x8To2Float(in vec4 val) {
 uint a = uint(val.x*256.0f+.5f) + 
 uint(val.y*256.0f+.5f)*256u + 
 uint(val.z*256.0f+.5f)*256u*256u+
 uint(val.w*256.0f+.5f)*256u*256u*256u;
 return unpackUnorm2x16(a);
}

The only problem with this is that it is a bit slower than the less accurate methods posted above. Initially, I thought that they were much slower, but apparently I changed the lighting calculations somewhere between my 3.3 version shaders and my 4.1 version shaders. If anyone comes across this blog and has a more efficient way of packing/unpacking 32- and 16-bit floats into 8-bit RGBA channels, please leave something in the comments.

4 comments:

  1. The only speed up I see is making the vec multiplies separate. So first unpack the bytes into one vec and then multiply instead of divide.

    a const vec4 with all elements set to 1 / 256 for example should go faster than separated computations. I mean constVec * Bytes seems like it would.

    ReplyDelete
  2. glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width(), height(), 0, GL_RGBA, GL_UNSIGNED_INT, NULL);

    In shader:
    #extension GL_ARB_shading_language_packing : enable

    Render to texture:
    gl_FragColor=uvec4(packHalf2x16(vec2(0.5,0.5)),0,0,1);

    Eval Texture
    uvec4 temp = uvec4(texture2D(texGeom,texCoord));
    gl_FragColor = vec4(unpackHalf2x16(temp.x),0,1);

    Regards

    ReplyDelete
  3. تكثر عمليات نقل العفش بالرياض و بالمملكة داخل المدن وخارجها نظرا ً لكثرة تنقل المواطنين بين الوظائف أو لرغبتهم في تغيير المسكن لذلك يبحث الجميع عن أفضل شركة نقل عفش داخل الرياض أو شركة نقل عفش خارج الرياض إذا كان هذا عملائنا ما تبحثون عنه نحن نقدم لكم أرقي الخدمات وبأفضل الأسعار:
    شركة نقل اثاث بالدمام
    شركة نقل اثاث بالمدينة
    شركة نقل اثاث بالطائف
    شركة نقل عفش جدة

    ReplyDelete