Monitoring
K8s monitoring.
metrics-server
Clone metrics-server.
git clone https://github.com/kubernetes-incubator/metrics-server.git
cd metrics-serverEdit resource-reader.yaml.
nano deploy/1.8+/resource-reader.yamlEdit the resources section as follows:
...
resources:
- pods
- nodes
- namespaces
- nodes/stats
...Edit metrics-server-deployment.yaml
nano deploy/1.8+/metrics-server-deployment.yamlEdit as follows:
...
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.3
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
imagePullPolicy: Always
...Deploy it.
Wait a few minutes and run:
References
https://medium.com/@cagri.ersen/kubernetes-metrics-server-installation-d93380de008
https://github.com/kubernetes-incubator/metrics-server/issues/247
http://d0o0bz.cn/2018/12/deploying-metrics-server-for-kubernetes/
Rancher
Add a cluster and run on you cluster the manifest it generates.
Also check: https://github.com/rancher/fleet
Audit
SSH to your master node.
Create a policy file:
Paste:
Edit K8s API server config file:
Add:
Restart kubelet:
Tail the log file:
References
https://www.outcoldsolutions.com/docs/monitoring-kubernetes/v4/audit/
Prometheus
Create namespace
Create Prometheus config
Paste:
Create a ConfigMap from the config file:
If you need to update the ConfigMap...
Edit the file:
Update the ConfigMap:
Now we need to roll out the new ConfigMap. By the time of this writing (2019-02-15), this subjects seems to be a little tricky. Please find some options bellow:
Roll out ConfigMap: option 1 - scale deployment
This is the only way that will "always" work, although there will be a few seconds of downtime:
Roll out ConfigMap: option 2 - patch the deployment
Roll out ConfigMap: option 3 - create a new ConfigMap
Create a new ConfigMap:
Edit the deployment:
Edit volumes.configMap.name and use cm-prometheus-new. The change will force K8s to create new pods with the new config.
Deploy Prometheus
SSH to the node which will host Prometheus and create a directory to persist its data:
Deploy Prometheus:
Expose Prometheus
Test the deployment
On your workstation access http://YOUR.CLUSTER.IP:30909
References
https://sysdig.com/blog/kubernetes-monitoring-prometheus/
https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/
Manifest example
https://gist.github.com/philips/7ddeeb2fdab2ff4e4f8a035fc567f3d0
Grafana
Create namespace
Create Grafana config
Paste:
Create a ConfigMap from the config file:
Create Grafana secrets
Generate base64 strings:
Create Secret:
Deploy Grafana
SSH to the node which will host Prometheus and create a directory to persist its data:
Deploy Grafana:
Expose Grafana
Test the deployment
On your workstation access http://YOUR.CLUSTER.IP:30000
Dashboards
https://grafana.com/dashboards/2115
Prometheus exporters
node-exporter
Create a DaemonSet to ensure all nodes have node-exporter:
Add node-exporter scraper to Prometheus
Edit Prometheus config file:
Add the scraper:
Grafana dashboard
ID: 1860
https://grafana.com/dashboards/1860
kube-state-metrics
Deploy dependencies:
Expose kube-state-metrics
Add Prometheus scraper
Update the ConfigMap:
Roll out ConfigMap:
Grafana dashboard
Dashboard ID: 7249
https://grafana.com/dashboards/7249
Dashboard ID: 747
https://grafana.com/dashboards/747
Grafana panels
nvidia-gpu-exporter
Label your nodes:
Deploy it:
Prometheus scraper:
Update config map following the instructions above.
Grafana dashboard:
References
https://github.com/mindprince/nvidia_gpu_prometheus_exporter
https://github.com/andreyvelich/nvidia_gpu_prometheus_exporter
prometheus-operator
These instuctions are not working properly due Persistent Volume issues. They are here only as reference.
Install
Create a StorageClass
Create the manifest:
Deploy it:
Install the helm chart
Create the manifest:
Make sure you have helm and tiller set up.
Deploy prometheus-operator:
Check statuses:
Port forwarding
Prometheus:
Alert manager:
Grafana:
References
https://github.com/helm/charts/tree/master/stable/prometheus-operator
https://github.com/coreos/prometheus-operator
https://akomljen.com/get-kubernetes-cluster-metrics-with-prometheus-in-5-minutes/
https://www.sachsenhofer.io/setup-prometheus-operator-kube-prometheus-kubernetes-cluster/
Uninstall
To uninstall/delete:
Custom values
Check:
https://github.com/helm/charts/tree/master/stable/prometheus-operator#configuration
https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md
Last updated