Archive

Posts Tagged ‘programming’

A Scalable Solution Part Two: World Reference Frame

In part one of this series, I talked about scaling objects in the local reference frame and why that process is relatively simple. In this post, I will be introducing scaling in the world reference frame as well as detailing some of the differences between the two types of scaling. I will also be detailing some of the complexities as well as a possible implementation.

World reference frame scaling is somewhat more complex when compared to scaling in the local reference frame. As the name implies, this method scales an object against the world reference frame. This is not as simple as it sounds as there are some subtle complexities.

As an operation, scaling in the world reference frame is lossy if other operations such as move and rotate are performed between the scale operations. This is somewhat similar to rotating an object in 3D. In 3D rotation, rotating an object about a set of axes in some order produces a certain result where changing that order would change the final result. When scaling against the world reference frame, one needs to keep in mind that the object may rotate or scale imbetween operations while the world reference frame remains fixed. When scaling groups of objects against the world reference frame, it is also expected that the difference between their positions and the position of the centroid of the group will also scale in the same way as the objects themselves. This does not map very well to the independent transform controls typically offered by most 3D engines today; instead, we will most likely have to create and maintain our own transformation matrix, feeding that to the rendering engine on render time.

We would like to allow the user to scale an object or groups of objects in local and in world reference frames. A relatively simple way to implement this is by storing all transformations in a single transformation matrix. To retrieve the axes for scaling in local reference frame, the transformation matrix is decomposed and the rotation component is used to compute the three axial vectors. In world reference frame, decomposition is not required as the axial vectors are simply the unit vectors of the world’s reference frame ([1,0,0], [0,1,0], [0,0,1] respectively). In either case, the scale is applied by calculating a scale matrix from the axial vectors and some specified scale amount. The scale matrix is then applied to the current transform matrix thereby giving us the object’s new transformation matrix.

This is how an object is scaled in the world reference frame.

This implementation requires that scale and rotation be stored as a combined transform matrix. A side-effect is that scaling in the world reference frame may change rotation. I was against this idea initially as it felt like the “purity” of independent scale and rotation values is being lost, not to mention the performance hit of the frequent matrix compositions and decompositions. Having said that, I don’t think that there is another way for this to be done. Not to mention that scaling in the world reference frame against an arbitrarily rotated object is essentially a skew operation. Comments from more experienced hands on this subject would certainly be appreciated.

Advertisements

Euler Quaternions and the Trouble with Y-Up Space, Part 1

May 25, 2012 1 comment

Yesterday I did something that I didn’t think I was capable of. It involved quaternions, matrices and a decent amount of math. Let’s start with the problem definition:

  • We have a system that tracks rotations using quaternions
  • We would like people to be able to use Euler coordinates (Tait-Bryan angles if you’re picky)
  • Our world is Y-UP

Doesn’t sound too hard. In fact, half of it was fairly easy. Turning a set of Euler coordinates into a quaternion is as simple as creating three quaternions using angle axis pairs derived from the incoming Euler rotation. For us, the three Quaternions were as follows:

  • [Angle: Yaw, Axis: [0, 1, 0]]
  • [Angle: Pitch, Axis: [0, 0, 1]]
  • [Angle: Roll, Axis: [1, 0, 0]]

We create a combined rotation quaternion by conjugating the three quaternions together in the following order:

Roll * (Pitch * (Yaw) * Pitch-1) * Roll-1

This is all that is required to produce a quaternion from a set of Euler coordinates. Now, we also had to retrieve Euler coordinates from a given quaternion. This is more difficult than taking Euler coordinates and converting them into a quaternion as the task requires a fairly solid understanding of the math. This too would be simple, if it weren’t for the fact that the entire Internet agrees that Z-Up is the way to go for Euler/Quaternion conversions. From what I could tell, the equations for converting a rotation quaternion into a Y-Up Euler coordinate do not exist! (On the internet).

To my shock, horror and dismay, this meant that I had to derive the equations myself. Needless to say, this was a daunting task. But, after an arm and a leg… and the other arm… and the other leg… and parts of my face/body and an assortment of pieces from whichever organs remain… I was victorious. My glorious victory over the maths was quite euphoric.

First, let’s talk about why Y-Up Euler coordinates are so different from Z-Up Euler coordinates. The idea behind Euler coordinates is that given three angular values, you apply three rotational transformations in a predetermined order around three dynamic or constant axes (depending on convention) and arrive at a destination rotation.

Rotational transformations are not commutative, that means that applying them in a different order will produce a different rotation all-together. I won’t get into why that is but it has to do with the fact that the space of all 3D rotations is something akin to the surface of a 4 dimensional hypersphere and applying a rotation is like traveling from one point on the surface of that sphere to another surface point while always remaining on the surface. (Feel like going for a dip in some advanced math? Read here).

The effect is that a rotation quaternion composed from rotations around axes in the order ZYX is fundamentally different from one composed from rotations in the order YZX, which is what we want. So, the equations must be derived. I won’t delve very much into why the following works as that could take several blogs. First, we must build a rotation matrix that will compose our Euler coordinates in the order that we want. This is as simple as getting the angle-axis matrix formula and plugging some numbers into it. That formula is the following:

Where c is cos(θ), s is sin(θ) and C is 1-c, θ is the angle of rotation and (x,y,z) are the vector components of the axis. This is taken from here . By plugging in axis and angle, we construct three rotation matrices:

Yaw, about [0, 1, 0], angle sign Ψ (Psi):


Pitch, about [0, 0, 1], angle sign Θ (Theta):


Roll, about [1,0,0], angle sign Φ (Phi):


Now, we multiply the matrices together in the order (Roll)(Pitch)(Yaw), producing the following:


Next step is to refer to the Quaternion -> Rotation matrix formula:


This formula was taken from here. The quaternion components are [w, x, y, z] -> [0, 1, 2, 3]

Finally, to retrieve the Euler coordinates we must match the matrix form of the rotation quaternion to our RPY rotation matrix exploiting cells where the values are easy to retrieve. In this case, the values are as follows (Notation, Matrix Row Column where row and column are in the range [1,3]):

  • Φ, roll: atan2(RPY32, RPY22) = atan2(2(q0q1 + q2q3), q0q0 – q1q1 + q2q2 – q3q3)
  • Θ, pitch: asin(-RPY12) = asin(2(q0q3 – q1q2))
  • Ψ, yaw: atan2(RPY13, RPY11) = atan2(2(q0q2 + q1q3), q0q0 + q1q1 – q2q2 – q3q3)

This is almost enough. What is left is to test for pitch singularities (+90, -90) and treat them specially. Since this blog has gone on for far too long, I will stop now as what has been given is certainly enough to get a similar system working. In my next blog, I will detail how we could cheaply test for the singularities and what we could do in that case. You can do some advance reading on that here.

A Scalable Solution Part One: Local Reference Frame

May 10, 2012 1 comment

Blog Reading Requirement: you need to know what a reference frame is, from wiki:

“A frame of reference in physics, may refer to a coordinate system or set of axes within which to measure the position, orientation, and other properties of objects in it, or it may refer to an observational reference frame tied to the state of motion of an observer. It may also refer to both an observational reference frame and an attached coordinate system as a unit…”

Read the source article here.

In this series of blog posts, I will be talking about scaling 3D objects in their local reference frame and in the world reference frame. In part one, I will be introducing the topic and detailing scaling in the local reference frame. In part two, I will be introducing scaling in the world reference frame as well as detailing some of the differences between the two types of scaling. I will also be detailing some of the complexities as well as a possible implementation.

As an artist, the ability to scale groups of objects is fairly important when working with 3D models. When scaling an object, it is done against a known axial reference frame; typically, the two most useful reference frames are the world reference frame and the local reference frame and they are vastly different. The local reference frame changes as the object is rotated while the world reference frame never changes. I will take this blog to elaborate on some of the challenges and requirements involved in implementing these manipulations as well as giving a possible implementation.

Local reference frame scaling is the simpler of the two as it is performed against the local reference frame of the object that is being scaled. With groups of objects, it is expected that each object in the group is scaled against its own local reference frame as opposed to the reference frame of the group. It is also expected that the positions of the objects in the group will not be affected by the scaling operation.

This scaling method is simple because it only involves applying a scaling transform against the 3 known axes of any given object. This means that the scale parameters can be stored as 3 floats. During frame rendering, the transform could be applied after position and rotation have been applied; this fits in with the rendering pipeline of most 3D engines and means that one could simply alter the scale parameter present on a renderable object in most 3D engines.

In part two, I will be introducing world reference frame scaling and what complexities may arise from allowing the combination of the two methods. Thank you for reading.

Release 0.3!

November 23, 2011 Leave a comment

Whew! It’s been a while but my 0.3 has now been released! In this release, I’ve been working on Paladin/configurator subsystem. By now, it has become quite mature and achieves all required functionality. Also being used is IndexedDB as the load/store backend. Also important is that the codebase has now been brought up-to-date with the rest of the engine.

Changelog:

  • Added comments to all of the Configurator tests that didn’t have any. Improved some of the comments that were there before as well as improving some test structures slightly.
  • Removed Cookie.js from the engine. Cookies are no longer used as load/store backend, now replaced with IndexedDB.
  • Many functions had vague local variable names; their names have been improved.
  • load/store are now both asynchronous.
  • Added config/default.js for local configurations that need to be baked into the engine.
  • Configurator now uses gameID to load/save configurations.
  • Brought Configurator codebase up-to-date.

commit: a94ea288daabd84ea222a5a08da78ec4095479b3

issue thread: https://github.com/alankligman/gladius/issues/21

pull request: https://github.com/alankligman/gladius/pull/52

I also have some work to do for XSS Me but that has taken a back-seat to the work on Paladin. Currently, the plan is to dedicate most of 0.4 towards XSS Me. We’ll see how this all pans out.

Done and Done, Pt. 2

November 13, 2011 5 comments

Success! I’ve managed to alter the reported location of the cursor! Here’s what I did to get it to work:

The file that I had to modify was content/events/src/nsDOMMouseEvent.cpp, this is the implementation file of the DOM MouseEvent in firefox.
The updates were as follows:

// ... snip ... line 223

NS_METHOD nsDOMMouseEvent::GetScreenX(PRInt32* aScreenX)
{
  NS_ENSURE_ARG_POINTER(aScreenX);
  //*aScreenX = GetScreenPoint().x;
  *aScreenX = 17000;
  return NS_OK;
}

NS_IMETHODIMP
nsDOMMouseEvent::GetScreenY(PRInt32* aScreenY)
{
  NS_ENSURE_ARG_POINTER(aScreenY);
  //*aScreenY = GetScreenPoint().y;
  *aScreenY = 18000;
  return NS_OK;
}

NS_METHOD nsDOMMouseEvent::GetClientX(PRInt32* aClientX)
{
  NS_ENSURE_ARG_POINTER(aClientX);
  //*aClientX = GetClientPoint().x;
  *aClientX = 7000;
  return NS_OK;
}

NS_IMETHODIMP
nsDOMMouseEvent::GetClientY(PRInt32* aClientY)
{
  NS_ENSURE_ARG_POINTER(aClientY);
  //*aClientY = GetClientPoint().y;
  *aClientY = 8000;
  return NS_OK;
}

// ... /snip ...

As you can see, I forced the reported values for the cursor’s position to be 17000/18000 for the screenX/screenY members and 7000/8000 for the clientX/clientY members. This is a pretty naive way to do it but it proved to produce results. Check it out!

This is what happens now when a user attempts to right click anywhere in the page. As you can see, the context menu appears far off into the right bottom corner ( since the coordinates go off screen ).

Interestingly enough, the changes to the event class do not affect the native windows right click context menu ( can be triggered by right clicking in the same vertical space as the minimize/maximize/close buttons ), as can be seen here. Also not affected is text selection and clicking buttons/links and forms.

Also interesting is that no assertion errors are usually thrown. I have managed to provoke an assertion error however simply by middle clicking ( click the wheel of your mouse if you have a mouse wheel ) somewhere inside the page; the page being used should have sufficient text for the quick drag middle click popup to appear, for an example this page should work. Scroll around a bit if you don’t first succeed.

In conclusion, it seems that it is not immediately obvious how MouseEvent is used internally. On the one hand, the values reported through the event to the js engine are the only way for the js to figure out where the cursor is; so as far as the js is concerned, whatever we return here will be where the js believes the cursor to be. On the other hand, some internals seem to use the values returned from the getter functions of the mouse event object as well; this may make updating the values to our whims possibly difficult due to the sheer size of firefox’s internals and the fact that we don’t *exactly* know what will be affected by our meddling.


P.S. The MouseEvent class also defines some longs (screenX, screenY, clientX, clientY); from what I could tell, these are only updated during initialization:

// ... snip ... line 106

NS_IMETHODIMP
nsDOMMouseEvent::InitMouseEvent(const nsAString & aType, PRBool aCanBubble, PRBool aCancelable,
                                nsIDOMAbstractView *aView, PRInt32 aDetail, PRInt32 aScreenX, 
                                PRInt32 aScreenY, PRInt32 aClientX, PRInt32 aClientY, 
                                PRBool aCtrlKey, PRBool aAltKey, PRBool aShiftKey, 
                                PRBool aMetaKey, PRUint16 aButton, nsIDOMEventTarget *aRelatedTarget)
{
  nsresult rv = nsDOMUIEvent::InitUIEvent(aType, aCanBubble, aCancelable, aView, aDetail);
  NS_ENSURE_SUCCESS(rv, rv);

  switch(mEvent->eventStructType)
  {
    case NS_MOUSE_EVENT:
    case NS_MOUSE_SCROLL_EVENT:
    case NS_DRAG_EVENT:
    case NS_SIMPLE_GESTURE_EVENT:
    case NS_MOZTOUCH_EVENT:
    {
       static_cast(mEvent)->relatedTarget = aRelatedTarget;
       static_cast(mEvent)->button = aButton;
       nsInputEvent* inputEvent = static_cast(mEvent);
       inputEvent->isControl = aCtrlKey;
       inputEvent->isAlt = aAltKey;
       inputEvent->isShift = aShiftKey;
       inputEvent->isMeta = aMetaKey;
       mClientPoint.x = aClientX;
       mClientPoint.y = aClientY;
       inputEvent->refPoint.x = aScreenX;
       inputEvent->refPoint.y = aScreenY;

       if (mEvent->eventStructType == NS_MOUSE_EVENT) {
         nsMouseEvent* mouseEvent = static_cast(mEvent);
         mouseEvent->clickCount = aDetail;
       }
       break;
    }
    default:
       break;
  }

  return NS_OK;
} 

// ... /snip ...

It may also be interesting to play around with these values; I have not touched these values for this update.

Game Development for the Student Enthusiast/Entrepreneur and Why Threads Aren’t Always Great

November 1, 2011 Leave a comment

This is the story of one student’s quest to code up a massively parallel game engine. It all started about 6 months ago when my group-mates and I, one of which I now co-own a startup game company with, decided to write up a game engine for our PRJ666 project requirement. At the end of PRJ666, we had a fully functioning 3D game engine. Complete with physics, shaders, sound, input, etc… Our home-made engine possessed the absolute essentials for someone to be able to create a game. However, our engine did suffer from a few problems; the one I will be talking about today is performance.

I will explain how the engine operates in order to give you an idea of why we’re even running into a performance bottleneck.
The starting point is a highly serial and synchronous game engine. We constructed this engine on the backs of many open source back-end systems; systems like OGRE for graphics, ODE for physics, Lua for scripting, etc…

The idea was that a time keeping core and a list of objects, together with a number of specialized subsystems, could be made to represent a game world by continually updating each object at each tick. The picture above shows what a typical tick cycle looks like. Typically, each subsystem will iterate over all objects and update them as required then pass control onto the next subsystem. The Lua subsystem does a bit more work as it iterates over each object in the object list and calls the object’s update function, passing it the amount of time that had passed.

This architecture proved to work but it suffered from heavy performance issues, mostly during the execution of the Lua subsystem.

After butting heads for a while with my partner and a quick consultation with my father, we came up with what we thought to be the perfect solution: a massively parallelized architecture. The goal here was not just to be subsystem-parallel but to be object-parallel as well. That meant that all objects in the world would be updating in parallel and all subsystems would be updating in parallel, although all subsystem updates (except for scripting/Lua) had to happen before any object updates were started. The idea was that by dividing each and every update step into discrete read and write steps, we could run many things in parallel and gain a massive performance boost.

To assess this architecture, we decided to ask for some good-old-fashioned academic review. In this case, the victims were David Humphrey and Chris Szalwinski; my “Open Source Development” and “Game Engine Design” professors, respectively. Having lured them in with the promise of coffee and donuts, they were going to give me feedback on this proposed architecture.

My meeting with them was today. Walking into the meeting, I honestly had no idea where to start. By the end of the meeting, I had learned quite a lot about threading and about where some of the problems of our current system lie. Here’s a summary:

  1. Parallelism breaks down if the unit of work is too small; in this case, an object’s update function is far too small to offset the costs of thread synchronization.
  2. Cause and effect become ambiguous as object updates lose their sequential nature.
  3. The number of bugs and the complexity of bugs dramatically increase as micro-threading issues enter the fray.
  4. Threads are a band-aid solution that should be used with caution.

Before anyone jumps on my head for implying that threads may not always be the best idea, let me elaborate. The first concern is perfectly legitimate and would take sound design decisions to avoid; for young developers such as ourselves, going down this route so early is premature.

The second concern is interesting as real-world philosophical problems have suddenly jump into the synthetic game world when objects begin updating in parallel, even if only the read steps are parallel. On its own, this is not so bad; it simply implies that our programming practices would have to adapt to this new environment with its set of constraints. Mixed with the third concern however, this is a recipe for disaster.

The third concern is of course, heisenbugs. Everyone knows that threading is sometimes tricky, and that’s on large systems with obvious and static threading routines. On a system like ours, where all objects thread dynamically ( assign parallel tasks to a dispatcher on the fly ) and interact in non-obvious ways, this is a nightmare.

Really it was the fourth concern which finally turned me away from going down the threads route so quickly. The reason why threads should be considered a band-aid solution is because we are using hardware potential to solve a design problem. The design problem in this case is that the system does not update efficiently. The reason for this is that all objects are treated as equals. Due to this, it is impossible for the system to cull objects from an update cycle. This is the most basic and fundamental problem of this system.

The solution then is not to treat all objects as equals. One way to gain the ability to filter objects is by using context scopes to isolate and group groups of related objects. These structures can then be used to traverse trees of relevance. This really is the key point; the system must be able to detect the relevance of a given object and to filter it from some or all update activities given the current game state. A key tenet here is that objects in one context should not be able to directly change objects in parent or sibling contexts.

Furthermore, this divergence of contexts can later be threaded where appropriate and the performance boosts will then be obvious and controlled.

In order to break large lists of objects into contexts however, some concrete data must be used to govern the design choices. To that end, I will re-iterate what many big businesses already know: automated testing is king. A test harness full of performance and integration tests, coupled with a fixed time demo that puts the engine through some typical scenarios while monitoring performance, can be used to generate graph after graph of data. This data can be used to find where the system is behaving inefficiently and influence later design decisions. This also made the need for some kind of automated build system obvious.

The benefits of a system that could build the codebase given a particular changeset target, automatically run tests against it, collect test data and then publish the data somewhere should be obvious. Such a system could be used on a daily basis by developers to gauge the performance of their code on the fly. This is crucial as this way, it becomes clear what broke the system down, in which way, when and to what effect.

With this in mind, I went to the library, took out some books on simulating the natural processes of the world on a computer and set to reading. Today was quite useful and inspirational and I had hoped to share as much of it as possible with others. I hope this hasn’t been *too* boring of a read and wish you luck on your journey in software development.

Configuration Registry 0.2

October 29, 2011 Leave a comment

I am now calling the 0.2 version of the configuration registry released! The configuration registry is now almost completely useable. In this revision, I’ve implemented loading/storing to local browser storage ( ie a cookie ); as well, clearing and JSON serialization were implemented in this revision. The new features are backed up by rigorous ( more or less 🙂 ) unit tests.

The only thing left to implement is loading using xhr. Due to the lack of a good way to unit test xhr, I’ve been hesitant to start writing this feature in. Version 0.3 of this bug will include xhr loading.

Since only xhr loading is left, I plan to start working on other bugs/other projects for the 0.3 release.

I look forward to my code being reviewed. If you have the time, come over to the pull request thread and drop a word or two 🙂

Pull request: https://github.com/alankligman/gladius/pull/52#issuecomment-2563950
Issue thread: https://github.com/alankligman/gladius/issues/21