Debugging namespace deletion issue in Kubernetes
I got a support ticket from a developer about their job breaking and when checked, the issue was with namespace stuck in Terminating state forever. Namespace stuck in termination is one of the most classic problems faced by every Kubernetes engineer out there. Yet, the reasons are not always the same π.
This article is about the experience while debugging namespace deletion issue in Kubernetes. Just one of many fun moments in the day to day life of an Kubernetes engineer :)
This article has 3 sections
- Quick workaround
- Debugging the underlying issue
- Namespace deletion under the hood
Workaround
When i exported the namespace, the status showed that there was some issue with one of the objects β
unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Tried finding out which service uses/creates/manages this using
~ k api-resources
The results didn't have the external.metrics.k8s.io/v1beta1
resource and moreover also had message that it was unable to retrieve list.
Since this was blocking developer pipeline, i had to do cheap workaround to delete the namespace forcefully β
Tried out patching/editing spec to remove finalizers, but didn't work. Finally the above script worked for me. What i did above essentially was, removing finalizers from namespace object. More about finalizers here.
Debugging
Now moving on, to fix the actual issueβ¦.
Upon brief ducking(DuckDuckGo FTW), found that this was installed by external metrics adapter. Found this β https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/deploy/manifests/custom-metrics-apiservice.yaml#L37 We were testing external metric sources and my colleagues tried out Keda as well on cluster. This also made me realise why the k api-resources
didnβt yield this in list, it was using the aggregated api server. The error itself was saying unable to retrieve the complete list of server API's
π€¦ββοΈ
~ k get APIService v1beta1.external.metrics.k8s.ioNAME SERVICE AVAILABLE AGE
v1beta1.external.metrics.k8s.io keda/keda-operator-metrics-apiserver False(MissingEndpoints) 13h
So this cleared that the issue is with keda controller deployed into keda namespace. Upon checking, there was some issue with deployment and once the keda controller was fixed, the namespace deletion issue vanished.
~ k get APIService v1beta1.external.metrics.k8s.ioNAME SERVICE AVAILABLE AGE
v1beta1.external.metrics.k8s.io keda/keda-operator-metrics-apiserver True 13h
Peek under the hood
How does the namespace deletion work under the hood?? π€
This took me to namespace
package in the Kubernetes code repo
β kubernetes/pkg/controller/namespace/namespace
.
βββ OWNERS
βββ config
β βββ OWNERS
β βββ doc.go
β βββ types.go
β βββ v1alpha1
β β βββ conversion.go
β β βββ defaults.go
β β βββ doc.go
β β βββ register.go
β β βββ zz_generated.conversion.go
β β βββ zz_generated.deepcopy.go
β βββ zz_generated.deepcopy.go
βββ deletion
β βββ namespaced_resources_deleter.go # File of our interest :p
β βββ namespaced_resources_deleter_test.go
β βββ status_condition_utils.go
β βββ status_condition_utils_test.go
βββ doc.go
βββ namespace_controller.go # Main controller3 directories, 17 files
namespace_controller.go
spins up workers which read the namespace from the queue. Each worker picks a namespace from queue and syncs its status (think control loops) until it receives stop signal in its control channel.
There are multiple namespace conditions defined in deletion_controller as below:
The worker calls the deletion controller to determine the status of the namespace and sets its conditions accordingly on every sync cycle. The entire call stack is roughly as below:
Issue in my case was, when the deletion controller tried to discover all the resources, the aggregated api server(keda in this case) didnβt respond with respective resources, leading to discovery failure.
Wow this code exploration felt very good π₯³ Ending this article with a hope that iβll be writing more of these on various Kubernetes objects.