I ran across this
http://cache-www.intel.com/cd/00/00/01/76/17699_code_zohar.pdf which is a math library using SSE2 to do fast math operations. I spent a lot of time upgrading his code to be more suitable to games, such as doing a transpose for an inverse of a orthonormal matrix. But after profiling I found his code was slower than the equivalent D3DX math functions, even for matrix multiply.
D3DXMatrixMultiply 10000000 times:
diffGP=564 milliseconds diffDX=396 milliseconds
Bleh.
I also tried this: http://www.cs.nmsu.edu/CSWS/techRpt/2003-003.ps
It appears that D3DX already does better than this as well:
Theirs=695 milliseconds Mine=801 milliseconds
However, the version without scaling was 100 milliseconds faster. However, that is such a special case it’s not worth leaving in.
So Kudos to Direct3D because their math functions are very fast!
By the way, this is something I was able to figure out while experimenting. In every library I’ve ever used, except the one at The Collective, this was very unclear. I think they used to always store the matrices transposed to make them easier to use or something.
[code]
inline D3DXVECTOR3 * GetAtVec(D3DXVECTOR3 *out, D3DXMATRIX *in)
{
out->x=in->_13;
out->y=in->_23;
out->z=in->_33;
return out;
}
inline D3DXVECTOR3 * GetUpVec(D3DXVECTOR3 *out, D3DXMATRIX *in)
{
out->x=in->_12;
out->y=in->_22;
out->z=in->_23;
return out;
}
inline D3DXVECTOR3 * GetRightVec(D3DXVECTOR3 *out, D3DXMATRIX *in)
{
out->x=in->_11;
out->y=in->_21;
out->z=in->_31;
return out;
}
inline D3DXVECTOR3 GetAtVec(D3DXMATRIX *in)
{
return D3DXVECTOR3(in->_13,in->_23,in->_33);
}
inline D3DXVECTOR3 GetUpVec(D3DXMATRIX *in)
{
return D3DXVECTOR3(in->_12,in->_22,in->_23);
}
inline D3DXVECTOR3 GetRightVec(D3DXMATRIX *in)
{
return D3DXVECTOR3(in->_11,in->_21,in->_31);
}
[/code]