Context, context, context - and locality - how not to lose your mind in parallel times

Alex Champandard from AiGameDev.com ranted about the negative sides of Singletons on Twitter:

Been thinking about the best argument against Singletons. It's down to assumptions, they make an ass out of your code now and bite me later.

There have been lots of discussions and battles regarding the pros and cons of Singletons and I don't want to incite them here again. Singleton's are a tool to use and as every tool they have their positive and negative sides.

It started with a Singleton

From a "parallelization of legacy code" perspective, Singleton's are problematic - like most globals.

When looking at code - data structures, classes, (member) functions - you can't spot the usage of globals and Singletons just from the interface. Their use is hidden inside the implementation, buried deep in the bowels of code.

Why is that bad? Just by looking at functions and methods I can't determine if they have side effects or not. I can't easily see and understand the data-relationships - if and when data is read and written. I have no idea if data is shared. But sharing needs synchronization, synchronization hinders scalability and can lead to contention, and contention kills performance (read Joe Duffy's Some performance implications of CAS operations and Herb Sutter's Sharing is the Root of All Contention).

C and C++, the programming languages most prevalent in game programming, don't offer direct support for contracts or aspect-oriented programming like other languages might do - therefore contracts or guarantees in which way data is shared, or for pureness (side-effect freeness), or explanations how to use the functions and data concurrently, need to be expressed by hand and coding style.

Ok, this is my reason for disliking Singletons: they hide manipulation of (shared) global state which can easily introduce race conditions under concurrent execution.

Embrace context and locality

And now to the core point - and question - of this post: How to live without Singletons or globals?

Hidden sharing and side effects

In my experience, easy to parallelize code (the combination of data and instructions) typically shows the context under which it runs very explicitly. All data manipulated by functions or methods is passed into it as arguments - the user controls and introduces sharing explicitly and there are no hidden side effects on global or even instance state.

"Easy" parallelization in this context (pun intended, oh well...) means parallelizing applications without producing hard to find errors or errors that wait a long time till they strike in the worst moment possible (parallel or threading errors are like sleepers...).

By the way: calling C's rand, or IO functions, or even malloc, free, or C++'s new and delete, all fall into the category of global application state side effects. This category also includes object-oriented code with non-abstract super classes that carry state - because from a fast look at the interface of an inherited class it isn't clear if a method call depends on state stored somewhere deep down in the inheritance hierarchy.

Context parameters versus enormous parameter lists

But..., but code that receives all the data and state via arguments has unwieldy numbers of arguments... and everything needs to be passed into it - data, allocators, IO functionality, logging and tracing facilities - such code isn't easy to read or write (even with code completion in todays editors)...

Currently I can think of three - no - four ways to mitigate this parameter-count-increase problem:

  1. Group data, allocators, functions, etc. in context structures - Allan Kelly described this pattern and coined it Encapsulated Context . The se context groups reduce the number of function signature parameters.

    I used this technique when refactoring OpenSteer for parallelization with OpenMP (ParCo2007 - paper and slides) to gain more control over the data the functions and methods were reading and writing. Even more I learned from reading about its use in Insomniac Game's SPU shaders, see Joe Valenzuela's slides about SPU gameplay from GDC 2009 and look out for struct global_funcs_t.

  2. Do less per function call and minimize the need for context. For example, ask yourself:
    • Does a function really need to allocate memory or could the function caller allocate the memory and hand it to the function?
    • Do I need to actually store values in a concurrent container data structure or could I write a lookaside data structure that controls concurrency in combination with a non-concurrent container that holds the actual values?
    • Do I really want to call rand which alters the global sequence of random numbers and leads to non-determinism under concurrency as the order of calls isn't controllable anymore?
    Minimization of context means to solve only very clear-cut problems per function and to model the data like a glove that fits the real problem to solve.

    The less context a function or a member function needs, the less aspects (memory management, file IO, concurrency-control, access to global or singleton service facilities, access and manipulation of instance data) it straddles, the easier it is to understand and the less likely it hides side effects.

    The data feed into such a function or method can be controlled locally - the way it works can be understood locally - I don't need to understand a whole cluster of systems and its dependencies and internal data flow. The result: the user can control the context of execution as locally or as globally as needed - local, like in: local to a block inside a function, or local to a thread calling the function directly or indirectly.

  3. Keep the function callstack shallow - the less functions or methods are called directly or indirectly from inside another function, the less parameters needed by the nested function calls need to be passed to them, the less parameters the encapsulating function needs.

    One way to keep callstacks shallow is by deferring computations. Don't run every calculation directly but collect commands and (batch) execute them at a later point in time - typically at a point in time where the callstack is less deep than inside the function that created the deferred computation commands.

    But this is a topic for its own article(s).

  4. Ok, the fourth solution: ask yourself, do I really understand the problem and how to solve it, or am I creating overly complex abstractions which need a lot of facilities, services, data for its context because I really don't grasp the problem at hand?

    Reads and feels like I am dodging the question, doesn't it. Well, this is taken nearly one-to-one from presentations of Mike Acton and from discussions with him - and I am still struggling and fighting to truly grasp its impact and fully deploy its invaluable insight.

The return of instance data

Even with all of these techniques often many context parameters are needed. Calling one function directly or indirectly from inside another function results in the need to add parameters to the encapsulating function to hand the context down (at least if the context is truly global).

I carve in - an objects instance fields define state and context, too. Instance field use isn't directly visible from the signature of class member functions, therefore only introduce object state with thought and ponder its pros and cons:

  • The number of member function parameters can be minimized, versus
  • manipulation of instance state isn't save under concurrency and needs to be synchronized - or mindfully controlled.

Qualify member functions that don't have side effects on their instance state via const. And, honestly, don't use mutable - mutable data can hide behind const-proclaiming interfaces, but truly needs synchronization - which isn't obvious from the const-qualified method signature you call.

Concurrency needs control of context

Ok, one last point - I promise - under sequential execution, a context (be it controlled via function arguments or via object state) is used from the point it is set until it is changed again. Therefore contexts are used in serial intervals inside sequential apps. Contexts can be stacked, can build hierarchies, e.g. a certain context (memory allocators, resource managers) is set for a whole computer game. It is refined for the currently loaded level (level assets, game logic), and the context for a specific entity inside this level is added on top (entity state).

In concurrent or parallel applications there are many streams of instructions at once - so to say: there are multi-dimensional intervals of context-use. Here we need to differentiate:

  • context per local block/scope,
  • context of (thread-)local function calls,
  • context per instance lifetime (globally per app, or inside a local thread scope),
  • global context used by all instruction streams/threads,
  • context per thread.

All these different global and local context can form hierarchies - even graphs - too.

Matteo Frigo from CilkArts (just bought by Intel) describes a related phenomenon in The thorny problem of the cactus stack.

I am still searching and experimenting with ways to explicitly express and control the scope and kind of context of (member) functions, to keep sharing and side effects under control while minimizing the need for frivolously long function parameter lists.

Perhaps Haskell or other functional programming languages with explicit constructs to introduce non-determinism and side effects, or contract- and aspect-oriented programming systems offer techniques that can be mapped to C and C++.

The journey goes on - with context and locality as my guides.


Bjoern

 

PS: When I started working with C - and departing from C++ with its templates, classes, operator overloading, etc. - I always wondered how to write reusable containers without resorting to void* all the time. Then I stumbled over Mike Acton's Multi-threaded Optimization by Example slides which describe a single thread reader, single thread writer look-aside helper structure for bounded ring buffer's - and I saw a glimpse of the solution: store your data exactly like you want/need it and coordinate concurrent access and manipulation of your data with the look-aside control structure. Decoupling at its best.