GameServer multiple versions

Hi.
I’m trying to find a way that will result in deploying a new version of GameServer (C#) without downtime and less GameLift infra management (builds, fleets, queues etc) and also to be able to support multiple GameServer (and clients) versions in parallel, without launching a new set of GameLift infra (builds, fleets, queues etc).
This ideea consists in using the GameServer binary (server process) to load the required version code from a DLL.
The versioned DLLs will be stored in S3 together with a JSON index file. The advantage is that for a new version I just need to upload the new DLL to S3 and update the JSON index file, then the server process will pick up the changes and do the rest.

Actual flow:
When the GameServer EC2 instance launches, will start the 50 processes that will check-in with GameLift as part of the InitSDK and then will wait for a GameSession/onStartGameSession. The server process gets the version DLL that should load, from GameSession data and downloads it from S3. Then, the server process runs the onStartGameSession callback function. When ready to accept player connections, the server process calls ActivateGameSession() and waits for player connections.

Will this flow work?
Thank you.

Hi @Lucian_Gutu

Interesting idea. My hunch is that this shouldn’t be something used in production since this goes against any examples I’ve seen for large software development. However, I don’t see why this cannot be achieved.

First and foremost, if you are just trying to speed up development iteration or testing different versions of build, then there is a workaround to replace the build directly on the instance. Here is a post on this: How to SSH to instance and replace build in-place to iterate quickly

If you are thinking about doing this in production, here are some thoughts that I have about this approach:

  1. You’ll probably want to cache the DLL and avoid downloading the same DLL twice. This is to save cost. However, this will certainly increase complexity. What happens if 2 processes start at the same time? Or if one process haven’t finished downloading and another process starts up requiring the DLL? You’ll need some orchestration and locking mechanism.
  2. Even if you apply caching on an instance level, every single instance will initiate at least 1 S3 download per version. Depends on the number of instances that you have, the sizes of the DLLs, and the cadence of your update, the S3 download cost may add up. See S3 pricing plan. This is a scalability concern.
  3. I’m curious to know why you think deploying new version of GameServer would cause downtime. The recommended practice is that, you create new fleets and put them in a new queue (easy if you have proper automation setup), then you swap the queue ARN in your backend service so that StartGameSessionPlacement/StartMatchmaking now calls the new queue with the new build. Is it possible that you are storing queue name in the client directly? If that’s the case then we’d recommending adding a backend service to proxy between clients and GameLift. Not only this gives your more flexibility like hotswapping queue ARN, there is also security improvement (i.e. you no longer need to store AWS credentials on the client and risk getting bad actors denying your GameLift access via excessive calls that triggers throttling). See example of game backend service: https://github.com/aws-samples/aws-gamelift-and-serverless-backend-sample. You can also use the GameLift Unity Plugin to create a backend within minutes: https://github.com/aws/amazon-gamelift-plugin-unity
1 Like

Hi.
Thank you for the reply. Valuable information in there. Didn’t know that you can replace the binary in an active fleet. I though that the GameLift Java service would complain and restart the instance.

1&2 Yes, a cache mechanism should be in place and only one time the DLL will be downloaded on an instance (with validation like SHA etc).

  1. Actually we are not changing the Queue arn. We are using the same Queue and just replacing/updating the fleets in that Queue. The client doesn’t have the information about Queue or anything GameLift related. The client is calling Lambdas for MatchMaking Start/Status/Stop.

The reason for the initial post was to explore other possibilities of doing A/B testing, blue-green/canary deployments, staged rollouts for GameServers in GameLift.

Got it, your setup sounds great. Please let us know how well it works if you decide to go with this approach :slight_smile:

You might already know this, but just leaving the more conventional alternatives here:

A/B testing wise, you can create 2 MM configurations with different queues containing different builds (it’s not advisable to put fleets with different builds in the same queue since you cannot guarantee % of distribution), and use lambda to randomly select a configuration to call StartMatchmaking.

Above can also be applied to staged rollouts, where your lambda would gradually dial down the MM configuration with the older build, and dial back up to rollback if issues occur with your new builds. If you setup autoscaling on your fleets correctly, your instances on the underutilized fleets should eventually be scaled in thus saving cost.

Regarding your comment for staged rollouts, do you have any example of the Lambda that will handle that? I’m curious how does the Lambda keeps track of the percentage and where does it get the info about the current MM configurations to send players to. I was thinking that a microservice will be better suited for this kind of task.

Auto scaling is configured together with a buffer to compensate for the slow instance start.
Thank you.

Either Lambda (serverless) or microservice (I assume you meant your own managed resources) would work here. I only mentioned Lambda because that’s what you said your clients are currently calling.

With lambda, if implementing the AB testing logic by ourselves, on top of my head, one way would be to create a hashing function for your player ids, for instance, hash to values uniformly distributed between 0 and 100 (example), and in your lambda environment variable, assign a THRESHOLD integer (e.g. 25): if the hash is greater than threshold, use logic A, if the hash is less than the threshold, use logic B.

If you want to use a managed solution within AWS, this could be helpful: A/B testing on AWS CloudFront with Lambda@Edge | by Lorenzo Nicora | buildit | Medium

1 Like

Great stuff. Appreciate your responses. Cheers!