Home > PlanetCDOT, programming > Game Development for the Student Enthusiast/Entrepreneur and Why Threads Aren’t Always Great

Game Development for the Student Enthusiast/Entrepreneur and Why Threads Aren’t Always Great

This is the story of one student’s quest to code up a massively parallel game engine. It all started about 6 months ago when my group-mates and I, one of which I now co-own a startup game company with, decided to write up a game engine for our PRJ666 project requirement. At the end of PRJ666, we had a fully functioning 3D game engine. Complete with physics, shaders, sound, input, etc… Our home-made engine possessed the absolute essentials for someone to be able to create a game. However, our engine did suffer from a few problems; the one I will be talking about today is performance.

I will explain how the engine operates in order to give you an idea of why we’re even running into a performance bottleneck.
The starting point is a highly serial and synchronous game engine. We constructed this engine on the backs of many open source back-end systems; systems like OGRE for graphics, ODE for physics, Lua for scripting, etc…

The idea was that a time keeping core and a list of objects, together with a number of specialized subsystems, could be made to represent a game world by continually updating each object at each tick. The picture above shows what a typical tick cycle looks like. Typically, each subsystem will iterate over all objects and update them as required then pass control onto the next subsystem. The Lua subsystem does a bit more work as it iterates over each object in the object list and calls the object’s update function, passing it the amount of time that had passed.

This architecture proved to work but it suffered from heavy performance issues, mostly during the execution of the Lua subsystem.

After butting heads for a while with my partner and a quick consultation with my father, we came up with what we thought to be the perfect solution: a massively parallelized architecture. The goal here was not just to be subsystem-parallel but to be object-parallel as well. That meant that all objects in the world would be updating in parallel and all subsystems would be updating in parallel, although all subsystem updates (except for scripting/Lua) had to happen before any object updates were started. The idea was that by dividing each and every update step into discrete read and write steps, we could run many things in parallel and gain a massive performance boost.

To assess this architecture, we decided to ask for some good-old-fashioned academic review. In this case, the victims were David Humphrey and Chris Szalwinski; my “Open Source Development” and “Game Engine Design” professors, respectively. Having lured them in with the promise of coffee and donuts, they were going to give me feedback on this proposed architecture.

My meeting with them was today. Walking into the meeting, I honestly had no idea where to start. By the end of the meeting, I had learned quite a lot about threading and about where some of the problems of our current system lie. Here’s a summary:

  1. Parallelism breaks down if the unit of work is too small; in this case, an object’s update function is far too small to offset the costs of thread synchronization.
  2. Cause and effect become ambiguous as object updates lose their sequential nature.
  3. The number of bugs and the complexity of bugs dramatically increase as micro-threading issues enter the fray.
  4. Threads are a band-aid solution that should be used with caution.

Before anyone jumps on my head for implying that threads may not always be the best idea, let me elaborate. The first concern is perfectly legitimate and would take sound design decisions to avoid; for young developers such as ourselves, going down this route so early is premature.

The second concern is interesting as real-world philosophical problems have suddenly jump into the synthetic game world when objects begin updating in parallel, even if only the read steps are parallel. On its own, this is not so bad; it simply implies that our programming practices would have to adapt to this new environment with its set of constraints. Mixed with the third concern however, this is a recipe for disaster.

The third concern is of course, heisenbugs. Everyone knows that threading is sometimes tricky, and that’s on large systems with obvious and static threading routines. On a system like ours, where all objects thread dynamically ( assign parallel tasks to a dispatcher on the fly ) and interact in non-obvious ways, this is a nightmare.

Really it was the fourth concern which finally turned me away from going down the threads route so quickly. The reason why threads should be considered a band-aid solution is because we are using hardware potential to solve a design problem. The design problem in this case is that the system does not update efficiently. The reason for this is that all objects are treated as equals. Due to this, it is impossible for the system to cull objects from an update cycle. This is the most basic and fundamental problem of this system.

The solution then is not to treat all objects as equals. One way to gain the ability to filter objects is by using context scopes to isolate and group groups of related objects. These structures can then be used to traverse trees of relevance. This really is the key point; the system must be able to detect the relevance of a given object and to filter it from some or all update activities given the current game state. A key tenet here is that objects in one context should not be able to directly change objects in parent or sibling contexts.

Furthermore, this divergence of contexts can later be threaded where appropriate and the performance boosts will then be obvious and controlled.

In order to break large lists of objects into contexts however, some concrete data must be used to govern the design choices. To that end, I will re-iterate what many big businesses already know: automated testing is king. A test harness full of performance and integration tests, coupled with a fixed time demo that puts the engine through some typical scenarios while monitoring performance, can be used to generate graph after graph of data. This data can be used to find where the system is behaving inefficiently and influence later design decisions. This also made the need for some kind of automated build system obvious.

The benefits of a system that could build the codebase given a particular changeset target, automatically run tests against it, collect test data and then publish the data somewhere should be obvious. Such a system could be used on a daily basis by developers to gauge the performance of their code on the fly. This is crucial as this way, it becomes clear what broke the system down, in which way, when and to what effect.

With this in mind, I went to the library, took out some books on simulating the natural processes of the world on a computer and set to reading. Today was quite useful and inspirational and I had hoped to share as much of it as possible with others. I hope this hasn’t been *too* boring of a read and wish you luck on your journey in software development.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: