Fleets leaking game sessions

Hi,

We’ve noticed our fleets are leaking game sessions. The issue seems to have started around the time AWS experienced outages in us-east-1, around Nov 25th – though we’re unsure whether it’s related. We haven’t made any application code changes and we were not experiencing this issue before then. We took a quick look at some of the instances and there doesn’t seem to be any immediately obvious issues when calling processEnding.

We have auto-scaling enabled and the fleet will eventually spin up instances until it hits the max. Our last fleet spun up ~6 instances and maintained them even when there were close to 0 player sessions at some odd hour of the night. Looking at the console, though, there were many game sessions with 0 player sessions with days of uptime.

We’ve since phased out this fleet (and continue to do so). Here’s a fleet we’ve removed from our queue with 0 player sessions, 28 dangling game sessions, and 3 active instances. Each instance is set up to allow for 35 concurrent processes – why are there 3 active instances? I was hoping a GameLift engineer might take a look at our fleet and illuminate what the problem is:

Region: us-east-1
FleetId: fleet-1188ab57-ac39-42cb-b0a1-29f4020012f4

Cheers

Hi Kip!

A small note on support level for these forums; due to the public nature of posts here we can’t share specific customer details which limits our ability to respond to the question here. If you need help with specific debugging of your resources I would recommend reaching out to AWS Support.

With that disclaimer I’ve looked at the following metrics graphed on the same chart for a 1 week time period and all using the sum metric; ActiveHosts, IdleHosts, AvailableGameSessions, ActiveGameSessions. On the five minute time period we can see that there is a regular traffic pattern on the ActiveGameSessions, which correspondingly decreases AvailableGameSessions. Good so far! Now lets address your questions;

  1. " ~6 instances and maintained them even when there were close to 0 player sessions at some odd hour of the night." - The metric to look at here is actually GameSessions not player sessions. Game sessions are what dictate if a fleet should scale, as they are a measure of how many hosts are needed to server the current sessions as well as have a buffer for future sections.
  2. " queue with 0 player sessions , 28 dangling game sessions , and 3 active instances ." - Looking at the graph you are correct there are no player sessions, however there are currently ActiveGameSessions. When a host has an active game session GameLift will not spin it down, as an active game session is a signal to gamelift there are players in the session and spinning it down would interrupt that game. You’ll need to identify why the processending is not being called on those active game sessions and ensure they end when there are no players in them.

Hey @AlexE-aws,

I decided to kill all the node processes on those instances and was happy to see Active game sessions drop to 0. So, there must be something wrong on our end. Killing processes that have been running far too long is a fine stopgap until we identify what’s wrong :upside_down_face:

Thanks for your time!

Glad to hear that solved the issue!