Lobby code redesign

My current Lobby system for RakNet separated out the database calls from the C++ calls as follows: 1. Lobby operation is via a generalized operation, such as changing the status variable of a row. There is a unit test in a separate project for each lobby operation. 2. C++ operation is higher level, and uses […]

My current Lobby system for RakNet separated out the database calls from the C++ calls as follows:

1. Lobby operation is via a generalized operation, such as changing the status variable of a row. There is a unit test in a separate project for each lobby operation.
2. C++ operation is higher level, and uses the lobby operation in a more specific manner, such as setting your status to be a clan member.
3. Client code serializes the operation, does some local checks, and sends it to the server
4. Server code deserializes the operation, performs as many C++ checks as possible (returning error codes on failure) and performs the lobby operation in a thread
5. When the lobby operation completes, server serializes the result back to the calling client, updates internal state variables (such as creating a room), and sends notifications to other clients if needed.
6. Lastly, the client gets the serialized result, updates its own state variables (such as now in a room), and calls a callback that the user registered.

It’s a lot of work to even type all that, and every operation takes a HUGE time investment. I mean like half an hour of solid nonstop typing just for one operation, and there’s over 50 operations. That doesn’t even account for time spent debugging and documenting.

The problems I’ve been having is that there is too much code, tons of copy/paste, the system is somewhat buggy due to so much code, the same data can be duplicated as many as 3 times (database, server, and client), and it’s very hard and painful to change or write anything new. The system IS quite efficient though – many operations can be performed in C++ without ever accessing the database.

I am thinking of changing this as to use a stored procedure for each specific operation. ALL commands (login, create room, etc) are encapsulated into structures. The same structures are used for both input and output data. The structure is a functor, meaning it can be operated on in a thread. Therefore, all the client has to do is allocate a command, fill in the input details, and send it to the client interface. The client interface does nothing but calls the serialize function, which sends it to the server, which automatically adds it to a database processing thread. The thread will query and get results from a stored procedure on the database, serialize the results, and send it back to the client in the same structure. Lastly, a callback is called of the same name, and generated automatically through macros.

I think this would cut down on the total amount of code by 1/2. The lobby client and lobby server now do little more than call stored procedures. I would no longer have problems with complex database operations getting out of synch. And the system becomes scalable for free, because database operations are scalable to begin with. All I’d really need to handle on the server is network connectivity.

The bulk of our game servers are monolithic, single-threaded code that has to use a callback API for any kind of asynchronous operation, we have a (mostly legacy) background process which specifically handled synchronous database queries (e.g. write a score change to a persona record and then return changes to the persona caused by possible promotion).

Our callback API was fairly efficient but just the process of creating and registering all the callbacks was a major chore.

The fact that we use bi-directional (serializing) pack functions for our structures was always a huge workload reducer, meaning you shared a single pack function between client and server, but that required a couple of extra stub functions. So you still had to write 8-10 functions for a two-way conversation: invoker, send, pack, receiver, recipient; return invoker, return send, [return pack], return receiver, final destination. And then we had big, unwiedly, functions for registering all of the callbacks.

Our original coders were very anti-C++ so it was all done in C. Throughout most of our old C code there was a penchant for writing “hand off functions” – basically a ctor for a C struct that takes arguments, populates a structure and invokes the network network API with a callback ID. This upped the amount of work but mean’t that the host/network guys could provide the user (client team) with a simple function to call. And 95%+ of them are one-time usage. Yet, the network API itself is largely structure passing, so the register functions, for instance, were 1000s of lines of repopulating the same structure over and over to pass “struct CALLBACK_INIT” to the registry function.

It was blaringly obvious to me that the “user-friendly” line in the API was just a level further down than it needed to be, fairly sure it would have been to the network guy if he’d taken a week off and come back to it.

I initially stitched most of this up with some Macros reminiscent of MFC.

Then it took me a day to add the extra layer, 2 days to replace all of the existing callback registrations, and couple of days to write some wrapper/object code that replaced a couple of the previous middle layers and since then I’ve probably saved a month of coding in time I would have spent writing all this glue code.

This translated rather nicely to database queries too – which I either encapsulate through a simple threadDBQuery(“UPDATE player SET last_login = NOW()”), for instance, or by inheriting the callback class and overriding the necessary members. The class definition and packer are shared between client and host, reducing code overhead, just the implementations are specific to which side of the connection you are on. I even have a variant which deals with throws to make it fairly clean.

Just avoid the temptation to think PSQL lets you simply farm it off to the database. The database isn’t a magical beastie that absorbs all comers and PSQL isn’t just another bit of coding. It’s a 4GL scripting language with its own scalability, locking and race-condition concers. Sure, it’s the same stuff you have to factor in at the C/C++/C#/Java level ordinarily but only on the surface. If you wind up being database heavy you’re going to wind up needing someone who thinks in database terms on a regular basis, and more specifically, your database.

For instance, a short while ago I introduced a fairly nasty bug into our game. When a player completes a sortie and despawns, their score is calculated, the sortie summarized and written to the database. The lobby server equivalent then resumes their session and the client returns them to their ready room ready for their next sortie. I tweaked some database indexes and added an innocent looking extra field to one of our database calls.

A while afterwards we started getting odd claims of players gaining a rank but then losing it. There was absolutely no evidence to support this. It took a while but finally I recreated the issue but despite my significant experience with our database engine it took a DBA to explain to me what was causing the problem. The extra tweaks actually made the scoring saves more efficient, but they also made certain reads of the player’s character data significantly faster. Part of resuming the player’s lobby session was re-reading their character data – allowing for external events that might have promoted, demoted or otherwise changed the players’ data independently of their last sortie.

Despite everything else the lobby server had to do, the scoring save was now occasionally taking just long enough on the production servers, with all the other things the database was doing in the background, that the read happened first, demoting the player in memory so that if they continued to a next sortie without exiting completely, their previous advancements/score would be undone.

In Short: If you start writing a lot of PSQL you have to treat the database – as a whole – as part of your codebase and incorporate and understand, directly or by proxy, its considerations and implications.

One reply on “Lobby code redesign”

Leave a Reply Cancel reply