Monitoring

K8s monitoring.

metrics-server

Clone metrics-server.

git clone https://github.com/kubernetes-incubator/metrics-server.git
cd metrics-server

Edit resource-reader.yaml.

nano deploy/1.8+/resource-reader.yaml

Edit the resources section as follows:

...
resources:
  - pods
  - nodes
  - namespaces
  - nodes/stats
...

Edit metrics-server-deployment.yaml

nano deploy/1.8+/metrics-server-deployment.yaml

Edit as follows:

...
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.3
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        imagePullPolicy: Always
...

Deploy it.

Wait a few minutes and run:

References

https://medium.com/@cagri.ersen/kubernetes-metrics-server-installation-d93380de008

https://github.com/kubernetes-incubator/metrics-server/issues/247

http://d0o0bz.cn/2018/12/deploying-metrics-server-for-kubernetes/

Rancher

Add a cluster and run on you cluster the manifest it generates.

Also check: https://github.com/rancher/fleet

Audit

SSH to your master node.

Create a policy file:

Paste:

Edit K8s API server config file:

Add:

Restart kubelet:

If the changes did not take effect, stop the API server docker container (it will be started automatically):

Tail the log file:

References

https://www.outcoldsolutions.com/docs/monitoring-kubernetes/v4/audit/

Prometheus

Create namespace

Create Prometheus config

Paste:

Create a ConfigMap from the config file:

If you need to update the ConfigMap...

Edit the file:

Update the ConfigMap:

Now we need to roll out the new ConfigMap. By the time of this writing (2019-02-15), this subjects seems to be a little tricky. Please find some options bellow:

Roll out ConfigMap: option 1 - scale deployment

This is the only way that will "always" work, although there will be a few seconds of downtime:

Roll out ConfigMap: option 2 - patch the deployment

Roll out ConfigMap: option 3 - create a new ConfigMap

Create a new ConfigMap:

Edit the deployment:

Edit volumes.configMap.name and use cm-prometheus-new. The change will force K8s to create new pods with the new config.

If by any reason you deployed Prometheus with hostNetwork: true, options 2 and 3 will return this error:

0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.

In this case, use option 1.

If you need more info regarding rolling out ConfigMaps, please refer to: https://stackoverflow.com/questions/37317003/restart-pods-when-configmap-updates-in-kubernetes

https://github.com/kubernetes/kubernetes/issues/22368

Deploy Prometheus

SSH to the node which will host Prometheus and create a directory to persist its data:

Deploy Prometheus:

Expose Prometheus

Test the deployment

On your workstation access http://YOUR.CLUSTER.IP:30909

Alternatively you can port forward:

Then access http://localhost:9090

References

https://sysdig.com/blog/kubernetes-monitoring-prometheus/

https://sysdig.com/blog/kubernetes-monitoring-with-prometheus-alertmanager-grafana-pushgateway-part-2/

https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/

Manifest example

https://gist.github.com/philips/7ddeeb2fdab2ff4e4f8a035fc567f3d0

Grafana

Create namespace

Create Grafana config

Paste:

Create a ConfigMap from the config file:

Create Grafana secrets

Generate base64 strings:

Create Secret:

To retrieve admin username and password, run:

Deploy Grafana

SSH to the node which will host Prometheus and create a directory to persist its data:

Deploy Grafana:

Expose Grafana

Test the deployment

On your workstation access http://YOUR.CLUSTER.IP:30000

Alternatively you can port forward:

Then access http://localhost:9090

Dashboards

https://grafana.com/dashboards/2115

Prometheus exporters

node-exporter

Create a DaemonSet to ensure all nodes have node-exporter:

Add node-exporter scraper to Prometheus

Edit Prometheus config file:

Add the scraper:

Grafana dashboard

ID: 1860

https://grafana.com/dashboards/1860

kube-state-metrics

Deploy dependencies:

Expose kube-state-metrics

Add Prometheus scraper

Update the ConfigMap:

Roll out ConfigMap:

Grafana dashboard

Dashboard ID: 7249

https://grafana.com/dashboards/7249

Dashboard ID: 747

https://grafana.com/dashboards/747

Grafana panels

nvidia-gpu-exporter

Label your nodes:

Deploy it:

Prometheus scraper:

Update config map following the instructions above.

Grafana dashboard:

References

https://github.com/mindprince/nvidia_gpu_prometheus_exporter

https://github.com/andreyvelich/nvidia_gpu_prometheus_exporter

prometheus-operator

Install

Create a StorageClass

Create the manifest:

Deploy it:

Install the helm chart

Create the manifest:

Make sure you have helm and tiller set up.

Deploy prometheus-operator:

If you need to update the custom values, run:

Check statuses:

Port forwarding

Prometheus:

Alert manager:

Grafana:

References

https://github.com/helm/charts/tree/master/stable/prometheus-operator

https://github.com/coreos/prometheus-operator

https://akomljen.com/get-kubernetes-cluster-metrics-with-prometheus-in-5-minutes/

https://www.sachsenhofer.io/setup-prometheus-operator-kube-prometheus-kubernetes-cluster/

Uninstall

To uninstall/delete:

Custom values

Check:

https://github.com/helm/charts/tree/master/stable/prometheus-operator#configuration

https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md

Last updated