Debugging GKE unprivileged containers with gdb and nsenter

While I was operating a GKE cluster on which busy puma(Rails) servers running, our alert system reported an incident to my team. These manifested that our ruby processes had stopped doing any work unexpectedly.

What’s the problem?

The stuck process is an infrequent but common problem for many people, especially with large traffic applications. When it comes to debugging a process deeply, a GDB debugger (or anything that uses ptrace) can help.

However, as far as a process running inside the container of a Pod is concerned, it is not as straightforward as attaching gdb to the host PID of the container process or to the target process within the container.

❯❯❯ gcloud beta compute ssh --zone "asia-northeast1-a" "gke-cluster-n1-standard-8-po-a0cea807-5327" --project "hogehoge"$ gcore 3866
...
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable
❯❯❯ kubectl exec -it api-8bf8f6474-slvkz --container api -- sh# apk add gdb
# gcore 1
...
Permission denied

intuitive solution

The naive solution that would come to our mind is to add an additional capability to the container to attach. It can resolve Permission denied error.

cap_add:
- SYS_PTRACE

There’s another way to force a running process to dump a core file. SIGQUIT is a correct signal to send to a program if you wish to produce a core dump.

# kill -3 23
# ls
core.23.1611217516

How to solve the problem?

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable

So, let’s tackle this warning. It states that the container and the host belong to different PID namespaces.

nsenter is a command which allows you to run a program in different namespaces. By which, we can execute a command from the host namespace to the container namespace.

Wrapping up

Using nsenter to run gdb for the process in the container would be the most desirable. Finally, the explanation in the debugging-docker-containers-with-gdb-and-nsenter helped me to debug even in the GKE environment.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store