Node crash recovery

To speed up node crash recovery process, edit kube-controller-manager pod:

kubectl get -n kube-system pod | grep kube-controller-manager | cut -f1 -d ' ' | xargs kubectl edit -n kube-system pod

Add/edit the following parameters:

  • --pod-eviction-timeout=30s (default 5m0s)

  • --node-monitor-period=2s (default 5s)

  • --node-monitor-grace-period=16s (default 40s)

  • --pod-eviction-timeout=30s (default 5m)

  • --node-status-update-frequency

And, of course, you can always have your deployments with replica 2 and service will be up even if one node goes down.

References

https://stackoverflow.com/questions/47317682/kubernetes-node-shutdown-crash-recovery

https://github.com/kubernetes/kubernetes/issues/55713

Last updated