The Microsoft blog recently published an article written by bill Randolph, Blizzard’s senior software engineer who is developing Diablo IV. This article explains some of the features of working on Diablo IV and, in particular, explains how to use Visual Studio to debug code designed for Linux. Today we offer to your attention the translation of the material.
While working on Diablo IV, we write all the code in Windows, and then compile it for different platforms. This also applies to our servers that are running Linux. (The code includes conditional compilation directives and, where necessary, it contains fragments written specifically for a specific platform.) Our work is organized this way for many reasons. For starters, the key professional skills of our team relate to Windows. Even our server programmers are most familiar with Windows development. We value the ability of all programmers on our team to use the same tools and knowledge base.
Another and most important reason why we are developing on Windows is that we can use the highly functional and reliable set of tools that Visual Studio gives us. And even if we were developing something on Linux, I can say that there is nothing in the Linux world that can even be compared with Visual Studio.
However, because of this, we face some difficulties that arise when the server crashes and we need to debug a memory dump. We have the ability to remotely log in to the VM (or, more precisely, to the container) that failed, and we can run gdb to find out why this happened. But this approach has many drawbacks. For example — we don’t deploy binary files together with the source code.as a result, when working with a VM or container, the source code is not available in the gdb session.
Another complication lies in gdb itself. The fact is that if you do not use this tool constantly, on a regular basis, you cannot master it at a level that would suit us. Simply put, our developers would much rather use familiar code debugging tools. Since only 2-3 of our developers know gdb very well, when something goes wrong, they are the ones who look for the problem. And this is not an optimal distribution of the load on programmers.
We’ve always wanted to find a way to debug Linux code that is intuitive and easy to understand. That is why we are so excited to be able to use the new Visual Studio feature, which allows us to do exactly this task in a familiar environment! And it is not an exaggeration to say that thanks to this, our dream came true.
Debugging Linux code in Visual Studio is only possible if the Windows subsystem for Linux (WSL) is installed on the system, or if the connection Manager is configured to connect to Linux. All of our server developers have installed WSL using the distribution we are deploying our project on. We run a script I wrote that installs all the development tools and auxiliary libraries needed to build our server in WSL.
(I’ll briefly digress from our main topic. I would like to emphasize that we have come to the conclusion that WSL is the best available environment for developers to test changes in Linux builds. This scheme of work is extremely convenient: go to WSL, use the command
cdto enter the shared directory with the code, and build the project directly from there. This is a much better solution than using a VM or even a container. If you build projects using CMake, this means that you can also use Visual Studio’s built-in WSL support.)
I’ll tell you a little about our builds. We are developing code on Windows and we have a Windows version of our server designed to work on this OS. This is of benefit to us when working on the normal capacity of the project. But we deploy our server code in a Linux environment, which requires running builds on Linux. Linux builds are created on an Assembly farm. It uses a build system running on a Linux computer. It is used to build our server project and the corresponding container, which is then deployed. Executable files designed for Linux are deployed only in containers. Developers usually don’t have access to these containers.
When one of our infrastructure servers crashes, we are notified by a special script, after which the dump files are written to a shared network folder. To debug these files, either on Linux or in Visual Studio, you need a working program. When debugging, it is useful to use exactly the same shared libraries that were used in the deployed container. To get these files, we use a different script. First, we copy the dump to the local machine, and then run the script and pass it information about this dump. The script loads the Docker container that was built for the code version under study, extracts our server executables from it, as well as certain shared runtime libraries. All this is necessary for gdb. (This, when working with gdb, avoids compatibility issues that may arise if the WSL version of the system is not exactly the same as its deployed Linux version.) The script, setting up a debugging session, writes data to
~/.gdbinit, indicating that shared libraries are system libraries.
Then we go to Visual Studio, where the fun begins. We are uploading a solution to build the Windows version of our servers. Then we open a new debugging dialog box using the command
Debug -> Other Debug Targets -> Debug Linux Core Dump with Native Only. We select the check box
Debug on WSLand enter the paths to the dump files and server binaries (intended for WSL!). After that, just click on the button
Debugand watch what is happening.
Running debugging in Visual Studio
Visual Studio independently runs gdb in WSL. After the system has been working with the disk for some time, the call stack of the failed program is displayed, and the instruction pointer is set to the corresponding line of code. This is truly a brave new world!
Next we deal with the identification of the failure. We have a failure handler that intercepts the corresponding event to perform some service procedures. Therefore, information about the crash itself is located, on a single-threaded server, deeper in the call stack. But some of our servers are multithreaded. And a crash can occur in any of their threads. The error handler logs information about the code of the failed file and the line number. Therefore, the study of these data gives us the first clue. We are looking for the place in the call stack that corresponds to the execution of this code.
In the old days, namely, a few weeks ago, we would have used gdb to backtrace all threads, and then we would have looked at the resulting list in search of the thread whose call stack most likely failed. For example, if a thread was in a sleep state, it probably didn’t crash. We need a stack that contains more than a few frames and information that we are dealing with a sleeping thread. Next, we need to examine the code in order to understand what the problem is. If it’s something simple — you can see it right in the code. If we are faced with a more complex problem — we will have to resort to gdb’s capabilities to study the state of the process.
But Visual Studio gives us much more powerful features than we had before. In multithreaded environments, you can open a window in a debugging session
Threadsand click on threads to view their stacks. This, however, is very similar to the approach used in gdb. Therefore, if you need to study, say, 50 threads, this can turn into a rather time-consuming and boring task. Fortunately, Visual Studio has a tool that makes this task much easier. This is the Parallel Stacks window.
I’ll admit that most of us didn’t know about Parallel Stacks until Erica Sweet and her team told us about it. If you run the command during a debugging session
Debug -> Windows -> Parallel Stacks, a new window opens that displays information about the call stack of each thread in the process under investigation. This is something like a bird’s-eye view of the entire process space. You can double-click on any stack frame of any thread. After that, Visual Studio will move to this frame in both the source code window and the call stack window. This helps us save a lot of time.
Once we see the code located in the vicinity of the crash site, we can examine the variables using the mouse, using QuickWatch, or using any of the many Visual Studio tools. Of course, in release builds, many variables are optimized, but at the same time, many are not! Using the Visual Studio interface, we can identify the problem much faster than before using gdb.