Linux Dedicated Server Debug Symbols Missing in 1.26

Hello friends,

My team has encountered some issues with crashes on our dedicated server, which are affecting our progress.

Lumberyard uses a C++ call backtrace_symbols_fd() (in CrySystem\SystemInit.cpp) to safely dump stack trace information to a file in the event of a crash or segmentation fault.

However, this backtrace is lacking information about file and line numbers. This is a blocker for us. How are other teams getting backtrace information on linux server builds?

Notes:

  • I am aware that _WAF_/settings/platforms/common.clang.json declares the compile flag -fvisibility=hidden, which I understand hides symbols in shared objects to avoid conflicts between the declared names in one object and another. Disabling this flag causes the game to lock up after loading gems, so that appears to be a bad idea.

  • Using -fvisibility=protected introduces a number of compile errors whose fixes would be a hassle for us to maintain across updates.

  • Use of the -rdynamic and -Wl,--export-dynamic flags have not helped.

  • The linked libunwind library is only applicable for Android devices.

  • Prior to 1.22, the file waf-1.7.13/lmbrwaflib/compile_settings_linux.py contained a task to separate out the debug symbols into separate files:

    def run(self):
    exec_abs_path = self.inputs[0].abspath()
    os.system(‘objcopy --only-keep-debug {0} {0}.debug’.format(exec_abs_path))
    os.system(‘objcopy --strip-debug {0}’.format(exec_abs_path))
    os.system(‘objcopy --add-gnu-debuglink={0}.debug {0}’.format(exec_abs_path))
    return False

This approach of separate symbol files (much like Visual Studio’s .pdbs) worked for us in the past, and I have written a python script to reintroduce that behaviour. This also does not work.

We’ve done about as much homework on this as we can by ourselves. What am I missing?

Thanks,
Softmints

1 Like

Hi @Softmints, were you able to get any further with this? In the past I’ve debugged running linux dedicated server by running WSL (Windows Subsystem Linux) and connecting to the debugger via Visual Studio/Visual Studio Code https://code.visualstudio.com/docs/remote/wsl-tutorial

However, this is not debugging core dumps which is what you want. I recommend getting yourself setup to debug core dumps with a known stack/setup first by doing something like this:

  1. Install WSL or hop on an Ubuntu machine
  2. Compile debug Linux dedicated server which should leave symbols intact
  3. Transfer over any assets you need so you can run the Linux dedicated server (sounds like you already have these)
  4. Run the dedicated server manually and force a crash using the sys_crashtest console command. This way you know what the call stack should look like and the location of the code.

That should give you a build that has debug symbols and a core to debug with gdb directly or try with Visual Studio/Visual Studio Code. Supposedly VS Code has a coreDumpPath launch config setting you can point at a core dump, but I haven’t used that yet.

Let me know how it goes, I’ve been meaning to lay out some exact steps, but have been caught up in Windows land.

Thanks for the prompt reply @petrocket.

We’ve been using the debug build on linux throughout, and had no problem reproducing crashes. The new information I can share is that we managed to get information out of the raw call stack using a 13-year-old python script based on addr2line, with some modifications to handle .so output in the backtrace. (We run this on the /user/log/crash.log file.)

# After line 114:
is_so = text.find('.so') > 0
if is_so:
    self.addr = text[f+1:b-1] 

Therefore, the debug symbols exist (as one would expect from the -g compiler flag), but backtrace_symbols_fd() can’t grab them to output the function names and line numbers in a single convenient step. Official documentation suggests -rdynamic or clang equivalent would be enough—and I stress that it was enough in LY 1.12—but not in 1.22+.

We have used that python script to uncover and solve the immediate bugs affecting us. I don’t consider it a solution, but it is a workaround.

I don’t know how other LY customers are handling the situation but if there were an easy fix for clean backtrace information I’m sure it would be a value add. Perhaps someone on the GameLift team knows more?

I have seen some linux server crashes which left neither logging nor backtrace, so WSL + VSCode might be an option for chasing those and I appreciate the suggestion. Thanks again for your time.

1 Like

Hi @Softmints, thanks for the update! Did you try setting ‘fvisibility=default’ in common.clang.json? or possibly in just the launcher like we do for android in dev/Code/LauncherUnified/Platform/Android\wscript?

 android_cflags              = ['-fvisibility=default'],

Hi @petrocket,

We did try removing that flag, which appeared to result in the main thread of the server process locking up after gems had loaded (according to the server log). As I understand it, removal of that flag is equivalent to declaring it as default on linux.

It didn’t work for us, but perhaps it would work with default LY. There is always the possibility that something about our setup is interfering.