Debugging GKE unprivileged containers with gdb and nsenter

3 min readApr 4, 2021

While I was operating a GKE cluster on which busy puma(Rails) servers running, our alert system reported an incident to my team. These manifested that our ruby processes had stopped doing any work unexpectedly.

What’s the problem?

The stuck process is an infrequent but common problem for many people, especially with large traffic applications. When it comes to debugging a process deeply, a GDB debugger (or anything that uses ptrace) can help.

However, as far as a process running inside the container of a Pod is concerned, it is not as straightforward as attaching gdb to the host PID of the container process or to the target process within the container.

If you try to attach gdb to the host PID(i.e., 3866) of the container process, it prints the following warning:

❯❯❯ gcloud beta compute ssh --zone "asia-northeast1-a" "gke-cluster-n1-standard-8-po-a0cea807-5327" --project "hogehoge"$ gcore 3866
...
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable

Although you can install gdb in the container and attach it to the process directly, this won’t work for an unprivileged container or one which doesn’t have CAP_SYS_PTRACE capability. It gives the following error:

❯❯❯ kubectl exec -it api-8bf8f6474-slvkz --container api -- sh# apk add gdb
# gcore 1
...
Permission denied

intuitive solution

The naive solution that would come to our mind is to add an additional capability to the container to attach. It can resolve Permission denied error.

The problem is that it requires us to lift some reasonable limitations and restart the container. This is simple but not desirable.

cap_add:
    - SYS_PTRACE

There’s another way to force a running process to dump a core file. SIGQUIT is a correct signal to send to a program if you wish to produce a core dump.

I gave up this approach because the core dump generated in a forceful manner can be partially broken, and I couldn’t peek a ruby backtrace sometimes.

# kill -3 23
# ls
core.23.1611217516

How to solve the problem?

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable

So, let’s tackle this warning. It states that the container and the host belong to different PID namespaces.

Linux namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. Namespaces underpin container systems such as Docker, on which our Kubernetes cluster depends.

nsenter is a command which allows you to run a program in different namespaces. By which, we can execute a command from the host namespace to the container namespace.

Note that running gdb requires entering the mount, network, and PID namespace of the target process. Each namespace wraps a particular global system resource in an abstraction.

Wrapping up

Using nsenter to run gdb for the process in the container would be the most desirable. Finally, the explanation in the debugging-docker-containers-with-gdb-and-nsenter helped me to debug even in the GKE environment.

Debugging GKE unprivileged containers with gdb and nsenter

What’s the problem?

intuitive solution

How to solve the problem?

Wrapping up

Written by Yohei Yoshimuta

No responses yet