Install node_exporter on GameLift instances

Hi.
I’m trying to configure node_exporter (Prometheus) to start at instance boot, using the install.sh script from GameLift, but I’m unable to do so.

My install.sh snippet for node_exporter looks like this:

##############################
# Node Exporter
##############################
wget -q https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
tar xzf node_exporter-1.0.1.linux-amd64.tar.gz
sudo cp node_exporter-1.0.1.linux-amd64/node_exporter /usr/local/bin/node_exporter
sudo sh -c 'nohup /usr/local/bin/node_exporter --web.listen-address=:9100 --collector.diskstats.ignored-devices="^(ram|loop|fd)\\d+$" &> /var/log/node_exporter.log &'

/var/log/node_exporter.log after booting:

level=info ts=2020-12-09T12:12:09.713Z caller=node_exporter.go:177 msg="Starting node_exporter" version="(version=1.0.1, branch=HEAD, revision=3715be6ae899f2a9b9dbfd9c39f3e09a7bd4559f)"
level=info ts=2020-12-09T12:12:09.713Z caller=node_exporter.go:178 msg="Build context" build_context="(go=go1.14.4, user=root@1f76dbbcfa55, date=20200616-12:44:12)"
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:105 msg="Enabled collectors"
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=arp
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=bcache
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=bonding
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=btrfs
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=conntrack
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=cpu
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=cpufreq
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=diskstats
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=edac
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=entropy
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=filefd
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=filesystem
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=hwmon
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=infiniband
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=ipvs
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=loadavg
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=mdadm
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=meminfo
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=netclass
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=netdev
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=netstat
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=nfs
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=nfsd
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=powersupplyclass
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=pressure
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=rapl
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=schedstat
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=sockstat
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=softnet
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=stat
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=textfile
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=thermal_zone
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=time
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=timex
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=udp_queues
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=uname
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=vmstat
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=xfs
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:112 collector=zfs
level=info ts=2020-12-09T12:12:09.714Z caller=node_exporter.go:191 msg="Listening on" address=:9100
level=info ts=2020-12-09T12:12:09.714Z caller=tls_config.go:170 msg="TLS is disabled and it cannot be enabled on the fly." http2=false

But the service is not started. It seems that for a brief moment it starts and then is killed.

If I run the below command manually from instance console, node_exporter starts just fine:

[gl-user-remote@ip-10-174-165-154 ~]$ sudo sh -c 'nohup /usr/local/bin/node_exporter --web.listen-address=:9100 --collector.diskstats.ignored-devices="^(ram|loop|fd)\\d+$" &> /var/log/node_exporter.log &'
[gl-user-remote@ip-10-174-165-154 ~]$ sudo netstat -tulpn | grep node
tcp        0      0 :::9100                     :::*                        LISTEN      6930/node_exporter 

GameLift Events for Running installer at /local/game/install.sh (FLEET_CREATION_RUNNING_INSTALLER) finished with:

Installer Exit Code: 0

I’ve tried different methods to start the node_exporter (using init service, without nohup etc) before posting here.

Please help.

Is the process killed or does it quit? What exit code do you have? Can you provide the section of your install.sh logs from the FleetConsole.

The instance will be stopped at some point to make a snapshot so I wonder if you are seeing that behavior. Do you see your process running again once you get a server up and running?

install.sh job is to install things. GameLift then makes a copy of the instance state and uses that as a template to start new instances.

So it seems like it gets installed and is runnable. As a short term fix, you could get your server to check if its running when its launched and start the process up.

Am going to do some digging about how launching a 3rd party process should be handled on GameLift as I know its come up before.

Thank you for the fast reply.

I will try to capture the exit code, but not sure how to do that. Maybe with a binary that I would write myself and replace node_exporter with it.
In the /var/log/node_exporter.log there is nothing else then what I’ve posted above.

No, I don’t see the node_exporter process running.

I also have the CloudWatch Agent installed (from install.sh) on the instance (amazon-cloudwatch-agent-ctl) and that process is running fine. Maybe GameLift doesn’t like other processes to start listening on network ports, except the game server process?

Ok, I’ve figure it out with your help.
Didn’t knew that part of the GameLift process.
"The instance will be stopped at some point to make a snapshot so I wonder if you are seeing that behavior. "

Since the instance gets rebooted and the install.sh is executed only once and not at every boot, the node_exporter didn’t had a way to be started as a service. Not sure why my tries with init didn’t worked the 1st time, but now it works with some adjustments.

All good now.

Thank you.