Making sense of Gamelift cpu utilization graph

Hi all!

I’m hoping someone can help me make sense of this graph. I’m trying to see how CPU utilization changes as more game sessions get squeezed onto the same server instance. Here, I opened up about 50 sessions over the course of like 15 minutes.

The red line is game sessions, and the brown line is “CPU Utilization.” The brown line is what’s confusing me.

Usually, when I think of CPU utilization, I think of it being basically 0% when nothing is happening, and then it creeps up to 100% as more things happen. And, in fact, that is what I saw last night when I did a similar test using a different instance type.

But here, it starts around 80% and drops to 30% as we reach 50 game sessions. Then it goes back up to 80% the moment the game sessions are end. It seems to directly conflict with what I saw last time.

I suppose what possibility is that by CPU utilization, Amazon really means CPU “availability,” so that might explain why the value seems inversely related to the number of game sessions. But if that were the case, then why does it hover at 80% and 100% when it’s just sitting idle with no game sessions? And furthermore, why did this seem to be the opposite behavior as what I witnessed during my previous test?

Thanks so much in advance for any light you can shed on this issue!

Here’s the data from that early test I mentioned.

As you can see, CPU utilization goes up as game sessions increase, flatlining at 100%. This makes more sense to me.

It’s kinda weird though that at 0 game sessions, CPU utilization is already at 50%. Why not 0%?

Thanks for reaching out. I’ll cut a ticket for the GameLift team to investigate. The second graph which seems correct, is that from your CloudWatch console?

Oh excellent, thank you!!

It’s captured from the “Metrics” tag from my Fleet console in GameLift. I don’t believe I have CloudWatch set up just yet.

Would you mind providing your fleet id to help us investigate?

Also, you don’t actually need to setup CloudWatch, just select the region where your fleet is located in, go into the CloudWatch console (search for “CloudWatch” in the AWS console), and the GameLift metrics should be there. It’s very handy if you want to build a customized dashboard, viewing metrics externally from AWS Console, etc.

1 Like

Oh, shoot, I terminated the fleet before I went to bed to avoid racking up the bill! I don’t think I have a record of the fleet number either.

I’ll start a new fleet now and redo the test. Then I’ll let you know once it’s done!

1 Like

@JamesM_Aws Thanks for the Cloudwatch explanation! I’ll take a look.

I just did another test with a new Fleet and basically go the same type of thing going on, so at least we know it’s consistent. I’ll PM you with the fleet ID in a minute.

Once again, cpu utilization starts at 80% and then goes down to around 30% as game sessions reach the max. Then, at the 11:56 mark when I stop all game sessions, cpu utilization immediately shoots back up to 80%.

@JamesM_Aws Shoot, is there a way to do DM’s on this site? :sweat_smile:

I guess I need to unlock the ability to DM :thinking:

@JamesM_Aws I’ve just been given DM privileges, so I just sent you you the fleet info. I did a test a few minutes ago with the same results as above.

Thanks, I got it, sorry for the trouble. I’ve added the fleet id in the investigation ticket.

P48493534

1 Like

Fantastic, thank you!!

Hey @TheoTowers ,

The CPU metrics for the fleet provided are coming directly from the ec2 instance your Game Servers are running on. Unfortunately, I’m unable to see a breakdown of what’s causing the high CPU usage without access to the instance itself.

Are you able to SSH onto the instance and run the top command to see which processes are taking up all the CPU? Remotely access GameLift fleet instances - Amazon GameLift

It may just be that the 50 processes running on the fleet are too much for the given instance-type, but you’re right that it strange that CPU decreases with more Active GameSessions.

Hi @TheoTowers,

I also took a look at the metrics on our end, and I agree with Nathan’s assessment:

  • I’ve verified the CPU Utilization metrics from both EC2 and AutoScaling, they both show the same trend – ~77% at 0 Active Game Sessions, 30% at ~46 Active Game Sessions. Meaning that there is no bug on the console, we are relaying the exact metrics from EC2
  • Try to SSH into the instance (see recommended approach in Unity server built with IL2CPP fails - #2 by JamesM_Aws), and run top commands to see what is consuming so much CPU even without ACTIVE game sessions? (NOTE: even though your instance doesn’t have ACTIVE game sessions, it still runs 50 processes that are waiting for game sessions to be started on them.)
  • Seems like the process uses less CPU when game session is activated on it. Are the processes doing anything compute-heavy while waiting for game sessions? E.g. short polling anything?
  • My other suspicion was that your processes were crashing as game sessions were being connected to it, which reduced the overall CPU utilization; however, that suspicion is not true because the ActiveGameSession metrics were seemingly stable and I didn’t see any AbnormalProcessTermination metrics.

Thanks @JamesM_Aws and @Nathan! I will take a look at what’s going on using SSH – and while I’m at it I should look at what’s happening in the Unity profiler. I’ll report back with my findings.

Here’s what I’m seeing when I use the top command. ‘spoopy-server’ is the name of my build, so I’m guessing my build is just idling filling up CPU usage for no good reason. I’m going to check this out in the Unity profiler next.

image

Looking at my code again and reading your response above, @JamesM_Aws, I’m thinking I may have had a complete misunderstanding about the lifetime of each server process.

I was under the impression that a process started when a client created a game session, and then should be terminated once the game session is complete. However, it seems that’s not how it’s supposed to work. Instead, a process starts up at the beginning and gets used for sessions indefinitely?

If that’s correct, that would make more sense from a resource usage perspective. It would also explain why my processes are using so much of the CPU - they’re constantly starting and stopping with my code that attempts to prevent zombie sessions from being created!

Ok, so after making the corrections, my graph now looks like this:

Which is infinitely better haha. It’s odd though that we’re still hovering around 30% just waiting for sessions, but I can continuing investigating to see what the cause of that is.

Thanks so much for helping me realize the fatal mistake I was making!!

Hey Theo,

Glad you figured out the issue! Let us know if there’s stuff we can help with for the 30%, best of luck.