ArgoCD — Custom actions on resources
Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes.
For un-initiated, ArgoCD is a Continuous delivery tool to deploy the Kubernetes manifests from git to cluster. It has integration with various Kubernetes tooling like helm, Kustomize and bare manifests making this one of the best CD tools for the applications deployed into Kubernetes.
We use ArgoCD to deploy applications onto the cluster. Argo CD has a very good UI which we also provide to devs to visualise the resources and perform some basic actions like ability to restart their deployments when needed. BTW, Argo also supports RBAC which helps to provide granular permissions to logged in users.
This is all good but we had a problem with restart action — time it took to rotate all pods. The restart action basically updates annotation on pod spec which essentially means new Replicaset is created and thereby ending up creating new pods (under new Replicaset). Now this rollout happens using some parameters which are defined in deployment spec namely maxSurge
and maxUnavailable
. By default the maxSurge is 25%, meaning in one batch, the controller only creates 25% of new pods for rotation. This means for production deployments with huge number of pods and health check thresholds, the time taken to rotate is quite a lot. But again, we can’t increase maxSurge
to 100% always because it also means if there was a bad deployment, it will result in large number of pods with errors.
We wanted a solution to quickly rotate pods when needed but without altering default rollout behaviour permanently. Since devs already are aware of Argo UI, it would be a plus if the solution has similar UX as existing restart action. Welcome custom resource actions 🥳
From https://argo-cd.readthedocs.io/en/stable/operator-manual/resource_actions/#custom-resource-actions
Argo CD supports custom resource actions written in Lua. This is useful if you:
Have a custom resource for which Argo CD does not provide any built-in actions.
Have a commonly performed manual task that might be error prone if executed by users via
kubectl
You can define your own custom resource actions in the
argocd-cm
ConfigMap.
So we have ended up trying out the custom restart actions on deployment resource which essentially does below
- Sets
maxSurge
value to 100% - Updates annotation in pod spec
How is this different from directly setting the value in git? Because we set this value from the action, it ends up making Argo app unhealthy because there is diff in the defined vs expected spec.
- If you have Argo app self heal enabled, it automatically sets this back to value defined in git.
- If you do not have self heal enabled, when argo app sync is triggered by Github, it just overrides the set value by us.
Beauty of all this process, the app diff sync doesn’t restart the deployment again, because the change doesn’t happen to pod spec.
There are two parts for defining the custom action on resource— Discovery and action.
- discovery.lua — enables which actions are defined against given resource type.
- action.lua — defines logic for the defined action
The actual deployment of this action is simple. Just edit the argocd-cm
configmap in your ArgoCD installation and add above code for the key
resource.customizations.actions.apps_Deployment
which is of format — resource.customizations.actions.<apiGroup_Kind>
In our case, the kind is department
and it belongs to apps
apiGroup. If you are unsure about this, you can always get this data from kubectl
kubectl api-resources | grep -i Deployment
deployments deploy apps/v1 true Deployment
or through the api reference doc for your cluster version.
The
discovery.lua
script must return a table where the key name represents the action name. You can optionally include logic to enable or disable certain actions based on the current object state.Each action name must be represented in the list of
definitions
with an accompanyingaction.lua
script to control the resource modifications. Theobj
is a global variable which contains the resource. Each action script must return an optionally modified version of the resource.
In above example, we have added new action named restart-100%-in-1-batch
which is returned in discovery and in action logic, updated the deployment rollout strategy maxSurge
value to our choice and added annotation to initiate restart.
Once you update the argocd-cm configmap, restart the argocd-server so that these changes are picked up. You would be seeing extra actions on all deployment resources as below
NOTE — There is an active bug in Argo which makes any new custom actions override the upstream defined ones and is being tracked here.
To work around this, we need to add all the upstream defined actions on resources along with our custom actions. You can find an example full config reference here for deployment.
Also checkout official Argo project which helps extend multiple core functions in ArgoCD — https://github.com/argoproj-labs/argocd-extensions