# Monitoring

## metrics-server

Clone metrics-server.

```bash
git clone https://github.com/kubernetes-incubator/metrics-server.git
cd metrics-server
```

Edit resource-reader.yaml.

```bash
nano deploy/1.8+/resource-reader.yaml
```

Edit the resources section as follows:

```
...
resources:
  - pods
  - nodes
  - namespaces
  - nodes/stats
...
```

Edit metrics-server-deployment.yaml

```bash
nano deploy/1.8+/metrics-server-deployment.yaml
```

Edit as follows:

```
...
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.3
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        imagePullPolicy: Always
...
```

Deploy it.

```bash
kubectl apply -f deploy/1.8+/
```

Wait a few minutes and run:

```
kubectl top node
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" |jq
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/YOUR-NAMESPACE/pods" |jq
```

### References

<https://medium.com/@cagri.ersen/kubernetes-metrics-server-installation-d93380de008>

<https://github.com/kubernetes-incubator/metrics-server/issues/247>

<http://d0o0bz.cn/2018/12/deploying-metrics-server-for-kubernetes/>

## Rancher

```bash
docker run \
  -tid \
  --name=rancher \
  --restart=unless-stopped \
  -p 80:80 -p 443:443 \
  rancher/rancher:latest
```

Add a cluster and run on you cluster the manifest it generates.

Also check: <https://github.com/rancher/fleet>

## Audit

SSH to your master node.

Create a policy file:

```bash
mkdir /etc/kubernetes/policies
nano /etc/kubernetes/policies/audit-policy.yaml
```

Paste:

```
# Log all requests at the Metadata level.
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
```

Edit K8s API server config file:

```bash
nano /etc/kubernetes/manifests/kube-apiserver.yaml
```

Add:

```
...
spec:
  containers:
  - command:
    - kube-apiserver
...
    - --audit-policy-file=/etc/kubernetes/policies/audit-policy.yaml
    - --audit-log-path=/var/log/apiserver/audit.log
    - --audit-log-format=json
...
    volumeMounts:
...
    - mountPath: /etc/kubernetes/policies
      name: policies
      readOnly: true
...
  volumes:
...
  - hostPath:
      path: /etc/kubernetes/policies
      type: DirectoryOrCreate
    name: policies
```

Restart kubelet:

```bash
systemctl restart kubelet
```

{% hint style="info" %}
If the changes did not take effect, stop the API server docker container (it will be started automatically):

```bash
docker stop $(docker ps | grep "k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system" | awk '{print $1}')
```

{% endhint %}

Tail the log file:

```bash
docker exec -it $(docker ps |grep "k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system" | awk '{print $1}') tail -f /var/log/apiserver/audit.log
```

### References

<https://www.outcoldsolutions.com/docs/monitoring-kubernetes/v4/audit/>

## Prometheus

### Create namespace

```bash
kubectl create namespace monitoring
```

### Create Prometheus config

```bash
nano prometheus.yml
```

Paste:

```
global:
  scrape_interval:     15s
  external_labels:
    monitor: 'codelab-monitor'
scrape_configs:

  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
```

{% hint style="info" %}
Prometheus config file example: <https://github.com/prometheus/prometheus/blob/master/docs/getting_started.md>
{% endhint %}

Create a ConfigMap from the config file:

```bash
kubectl -n monitoring create configmap cm-prometheus --from-file prometheus.yml
```

### If you need to update the ConfigMap...

Edit the file:

```bash
nano prometheus.yml
```

Update the ConfigMap:

```bash
kubectl -n monitoring \
  create configmap cm-prometheus \
  --from-file=prometheus.yml \
  -o yaml --dry-run | kubectl apply -f -
```

Now we need to roll out the new ConfigMap. By the time of this writing (2019-02-15), this subjects seems to be a little tricky. Please find some options bellow:

**Roll out ConfigMap: option 1 - scale deployment**

This is the only way that will "always" work, although there will be a few seconds of downtime:

```bash
kubectl -n monitoring scale deployment/prometheus --replicas=0
kubectl -n monitoring scale deployment/prometheus --replicas=1
```

**Roll out ConfigMap: option 2 - patch the deployment**

```bash
kubectl -n monitoring \
  patch deployment prometheus \
  -p '{"spec":{"template":{"metadata":{"labels":{"date":"2019-02-15"}}}}}'
```

**Roll out ConfigMap: option 3 - create a new ConfigMap**

Create a new ConfigMap:

```bash
kubectl -n monitoring \
  create configmap cm-prometheus-new \
  --from-file=prometheus.yml \
  -o yaml --dry-run | kubectl apply -f -
```

Edit the deployment:

```bash
export EDITOR=nano
kubectl -n monitoring edit deployments prometheus
```

Edit `volumes.configMap.name` and use `cm-prometheus-new`. The change will force K8s to create new pods with the new config.

{% hint style="info" %}
If by any reason you deployed Prometheus with `hostNetwork: true`, options 2 and 3 will return this error:

`0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.`

In this case, use option 1.

If you need more info regarding rolling out ConfigMaps, please refer to: <https://stackoverflow.com/questions/37317003/restart-pods-when-configmap-updates-in-kubernetes>

<https://github.com/kubernetes/kubernetes/issues/22368>
{% endhint %}

### Deploy Prometheus

SSH to the node which will host Prometheus and create a directory to persist its data:

```bash
mkdir -p /storage/storage-001/mnt-prometheus
chown -R nobody:nogroup /storage/storage-001/mnt-prometheus
```

Deploy Prometheus:

```bash
kubectl create -f - <<EOF

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      securityContext:
        runAsUser: 65534
        fsGroup: 65534
      containers:
      - name: prometheus
        image: prom/prometheus:latest
                   
        ports:
        - containerPort: 9090
        
        args:
        - --config.file=/etc/prometheus/prometheus.yml
        - --storage.tsdb.path=/prometheus
        - --web.console.libraries=/usr/share/prometheus/console_libraries
        - --web.console.templates=/usr/share/prometheus/consoles
        - --storage.tsdb.retention.time=90d

        volumeMounts:
          - name: config-volume
            mountPath: /etc/prometheus/prometheus.yml
            subPath: prometheus.yml
              
          - name: mnt-prometheus
            mountPath: /prometheus

      volumes:
        - name: config-volume
          configMap:
           name: cm-prometheus
           
        - name: mnt-prometheus
          hostPath:
            path: /storage/storage-001/mnt-prometheus
            
      nodeSelector:
        kubernetes.io/hostname: k8snode

EOF
```

### Expose Prometheus

```bash
kubectl create -f - <<EOF
        
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: prometheus
  name: srv-prometheus
  namespace: monitoring
spec:
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 30909
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: prometheus
  sessionAffinity: None
  type: NodePort

EOF
```

#### Test the deployment

On your workstation access [http://YOUR.CLUSTER.IP:30909](http://your.cluster.ip:30909)

{% hint style="info" %}
Alternatively you can port forward:

```bash
export NAMESPACE=monitoring
kubectl port-forward \
  -n $NAMESPACE \
  $(kubectl -n $NAMESPACE get pods |grep "prometheus-" | awk '{print $1}') \
  9090
```

Then access <http://localhost:9090>
{% endhint %}

{% hint style="info" %}
If you need info about exposing a service, please refer to: <https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/>
{% endhint %}

### References

<https://sysdig.com/blog/kubernetes-monitoring-prometheus/>

<https://sysdig.com/blog/kubernetes-monitoring-with-prometheus-alertmanager-grafana-pushgateway-part-2/>

<https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/>

#### **Manifest example**

<https://gist.github.com/philips/7ddeeb2fdab2ff4e4f8a035fc567f3d0>

## Grafana

### Create namespace

```bash
kubectl create namespace monitoring
```

### Create Grafana config

```bash
nano grafana.ini
```

Paste:

```
# ConfigMap
##################### Grafana Configuration Example #####################
#
# Everything has defaults so you only need to uncomment things you want to
# change

# possible values : production, development
;app_mode = production

# instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty
;instance_name = ${HOSTNAME}

#################################### Paths ####################################
[paths]
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
;data = /var/lib/grafana

# Temporary files in `data` directory older than given duration will be removed
;temp_data_lifetime = 24h

# Directory where grafana can store logs
;logs = /var/log/grafana

# Directory where grafana will automatically scan and look for plugins
;plugins = /var/lib/grafana/plugins

# folder that contains provisioning config files that grafana will apply on startup and while running.
;provisioning = conf/provisioning

#################################### Server ####################################
[server]
# Protocol (http, https, socket)
;protocol = http

# The ip address to bind to, empty will bind to all interfaces
;http_addr =

# The http port  to use
;http_port = 3000

# The public facing domain name used to access grafana from a browser
;domain = localhost

# Redirect to correct domain if host header does not match domain
# Prevents DNS rebinding attacks
;enforce_domain = false

# The full public facing url you use in browser, used for redirects and emails
# If you use reverse proxy and sub path specify full url (with sub path)
;root_url = http://localhost:3000

# Log web requests
;router_logging = false

# the path relative working path
;static_root_path = public

# enable gzip
;enable_gzip = false

# https certs & key file
;cert_file =
;cert_key =

# Unix socket path
;socket =

#################################### Database ####################################
[database]
# You can configure the database connection by specifying type, host, name, user and password
# as separate properties or as on string using the url properties.

# Either "mysql", "postgres" or "sqlite3", it's your choice
;type = sqlite3
;host = 127.0.0.1:3306
;name = grafana
;user = root
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
;password =

# Use either URL or the previous fields to configure the database
# Example: mysql://user:secret@host:port/database
;url =

# For "postgres" only, either "disable", "require" or "verify-full"
;ssl_mode = disable

# For "sqlite3" only, path relative to data_path setting
;path = grafana.db

# Max idle conn setting default is 2
;max_idle_conn = 2

# Max conn setting default is 0 (mean not set)
;max_open_conn =

# Connection Max Lifetime default is 14400 (means 14400 seconds or 4 hours)
;conn_max_lifetime = 14400

# Set to true to log the sql calls and execution times.
log_queries =

#################################### Session ####################################
[session]
# Either "memory", "file", "redis", "mysql", "postgres", default is "file"
;provider = file

# Provider config options
# memory: not have any config yet
# file: session dir path, is relative to grafana data_path
# redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=grafana`
# mysql: go-sql-driver/mysql dsn config string, e.g. `user:password@tcp(127.0.0.1:3306)/database_name`
# postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable
;provider_config = sessions

# Session cookie name
;cookie_name = grafana_sess

# If you use session in https only, default is false
;cookie_secure = false

# Session life time, default is 86400
;session_life_time = 86400

#################################### Data proxy ###########################
[dataproxy]

# This enables data proxy logging, default is false
;logging = false

#################################### Analytics ####################################
[analytics]
# Server reporting, sends usage counters to stats.grafana.org every 24 hours.
# No ip addresses are being tracked, only simple counters to track
# running instances, dashboard and error counts. It is very helpful to us.
# Change this option to false to disable reporting.
;reporting_enabled = true

# Set to false to disable all checks to https://grafana.net
# for new vesions (grafana itself and plugins), check is used
# in some UI views to notify that grafana or plugin update exists
# This option does not cause any auto updates, nor send any information
# only a GET request to http://grafana.com to get latest versions
;check_for_updates = true

# Google Analytics universal tracking code, only enabled if you specify an id here
;google_analytics_ua_id =

#################################### Security ####################################
[security]
# default admin user, created on startup
;admin_user = admin

# default admin password, can be changed before first start of grafana,  or in profile settings
;admin_password = admin

# used for signing
;secret_key = SW2YcwTIb9zpOOhoPsMm

# Auto-login remember days
;login_remember_days = 7
;cookie_username = grafana_user
;cookie_remember_name = grafana_remember

# disable gravatar profile images
;disable_gravatar = false

# data source proxy whitelist (ip_or_domain:port separated by spaces)
;data_source_proxy_whitelist =

# disable protection against brute force login attempts
;disable_brute_force_login_protection = false

#################################### Snapshots ###########################
[snapshots]
# snapshot sharing options
;external_enabled = true
;external_snapshot_url = https://snapshots-origin.raintank.io
;external_snapshot_name = Publish to snapshot.raintank.io

# remove expired snapshot
;snapshot_remove_expired = true

#################################### Dashboards History ##################
[dashboards]
# Number dashboard versions to keep (per dashboard). Default: 20, Minimum: 1
;versions_to_keep = 20

#################################### Users ###############################
[users]
# disable user signup / registration
;allow_sign_up = true

# Allow non admin users to create organizations
;allow_org_create = true

# Set to true to automatically assign new users to the default organization (id 1)
;auto_assign_org = true

# Default role new users will be automatically assigned (if disabled above is set to true)
;auto_assign_org_role = Viewer

# Background text for the user field on the login page
;login_hint = email or username

# Default UI theme ("dark" or "light")
;default_theme = dark

# External user management, these options affect the organization users view
;external_manage_link_url =
;external_manage_link_name =
;external_manage_info =

# Viewers can edit/inspect dashboard settings in the browser. But not save the dashboard.
;viewers_can_edit = false

[auth]
# Set to true to disable (hide) the login form, useful if you use OAuth, defaults to false
;disable_login_form = false

# Set to true to disable the signout link in the side menu. useful if you use auth.proxy, defaults to false
;disable_signout_menu = false

# URL to redirect the user to after sign out
;signout_redirect_url =

# Set to true to attempt login with OAuth automatically, skipping the login screen.
# This setting is ignored if multiple OAuth providers are configured.
;oauth_auto_login = false

#################################### Anonymous Auth ##########################
[auth.anonymous]
# enable anonymous access
;enabled = false

# specify organization name that should be used for unauthenticated users
;org_name = Main Org.

# specify role for unauthenticated users
;org_role = Viewer

#################################### Github Auth ##########################
[auth.github]
;enabled = false
;allow_sign_up = true
;client_id = some_id
;client_secret = some_secret
;scopes = user:email,read:org
;auth_url = https://github.com/login/oauth/authorize
;token_url = https://github.com/login/oauth/access_token
;api_url = https://api.github.com/user
;team_ids =
;allowed_organizations =

#################################### Google Auth ##########################
[auth.google]
;enabled = false
;allow_sign_up = true
;client_id = some_client_id
;client_secret = some_client_secret
;scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
;auth_url = https://accounts.google.com/o/oauth2/auth
;token_url = https://accounts.google.com/o/oauth2/token
;api_url = https://www.googleapis.com/oauth2/v1/userinfo
;allowed_domains =

#################################### Generic OAuth ##########################
[auth.generic_oauth]
;enabled = false
;name = OAuth
;allow_sign_up = true
;client_id = some_id
;client_secret = some_secret
;scopes = user:email,read:org
;auth_url = https://foo.bar/login/oauth/authorize
;token_url = https://foo.bar/login/oauth/access_token
;api_url = https://foo.bar/user
;team_ids =
;allowed_organizations =
;tls_skip_verify_insecure = false
;tls_client_cert =
;tls_client_key =
;tls_client_ca =

#################################### Grafana.com Auth ####################
[auth.grafana_com]
;enabled = false
;allow_sign_up = true
;client_id = some_id
;client_secret = some_secret
;scopes = user:email
;allowed_organizations =

#################################### Auth Proxy ##########################
[auth.proxy]
;enabled = false
;header_name = X-WEBAUTH-USER
;header_property = username
;auto_sign_up = true
;ldap_sync_ttl = 60
;whitelist = 192.168.1.1, 192.168.2.1
;headers = Email:X-User-Email, Name:X-User-Name

#################################### Basic Auth ##########################
[auth.basic]
;enabled = true

#################################### Auth LDAP ##########################
[auth.ldap]
;enabled = false
;config_file = /etc/grafana/ldap.toml
;allow_sign_up = true

#################################### SMTP / Emailing ##########################
[smtp]
;enabled = false
;host = localhost:25
;user =
# If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
;password =
;cert_file =
;key_file =
;skip_verify = false
;from_address = admin@grafana.localhost
;from_name = Grafana
# EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = dashboard.example.com

[emails]
;welcome_email_on_sign_up = false

#################################### Logging ##########################
[log]
# Either "console", "file", "syslog". Default is console and  file
# Use space to separate multiple modes, e.g. "console file"
;mode = console file

# Either "debug", "info", "warn", "error", "critical", default is "info"
;level = info

# optional settings to set different levels for specific loggers. Ex filters = sqlstore:debug
;filters =

# For "console" mode only
[log.console]
;level =

# log line format, valid options are text, console and json
;format = console

# For "file" mode only
[log.file]
;level =

# log line format, valid options are text, console and json
;format = text

# This enables automated log rotate(switch of following options), default is true
;log_rotate = true

# Max line number of single file, default is 1000000
;max_lines = 1000000

# Max size shift of single file, default is 28 means 1 << 28, 256MB
;max_size_shift = 28

# Segment log daily, default is true
;daily_rotate = true

# Expired days of log file(delete after max days), default is 7
;max_days = 7

[log.syslog]
;level =

# log line format, valid options are text, console and json
;format = text

# Syslog network type and address. This can be udp, tcp, or unix. If left blank, the default unix endpoints will be used.
;network =
;address =

# Syslog facility. user, daemon and local0 through local7 are valid.
;facility =

# Syslog tag. By default, the process' argv[0] is used.
;tag =

#################################### Alerting ############################
[alerting]
# Disable alerting engine & UI features
;enabled = true
# Makes it possible to turn off alert rule execution but alerting UI is visible
;execute_alerts = true

# Default setting for new alert rules. Defaults to categorize error and timeouts as alerting. (alerting, keep_state)
;error_or_timeout = alerting

# Default setting for how Grafana handles nodata or null values in alerting. (alerting, no_data, keep_state, ok)
;nodata_or_nullvalues = no_data

# Alert notifications can include images, but rendering many images at the same time can overload the server
# This limit will protect the server from render overloading and make sure notifications are sent out quickly
;concurrent_render_limit = 5

#################################### Explore #############################
[explore]
# Enable the Explore section
;enabled = false

#################################### Internal Grafana Metrics ##########################
# Metrics available at HTTP API Url /metrics
[metrics]
# Disable / Enable internal metrics
;enabled           = true

# Publish interval
;interval_seconds  = 10

# Send internal metrics to Graphite
[metrics.graphite]
# Enable by setting the address setting (ex localhost:2003)
;address =
;prefix = prod.grafana.%(instance_name)s.

#################################### Distributed tracing ############
[tracing.jaeger]
# Enable by setting the address sending traces to jaeger (ex localhost:6831)
;address = localhost:6831
# Tag that will always be included in when creating new spans. ex (tag1:value1,tag2:value2)
;always_included_tag = tag1:value1
# Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote
;sampler_type = const
# jaeger samplerconfig param
# for "const" sampler, 0 or 1 for always false/true respectively
# for "probabilistic" sampler, a probability between 0 and 1
# for "rateLimiting" sampler, the number of spans per second
# for "remote" sampler, param is the same as for "probabilistic"
# and indicates the initial sampling rate before the actual one
# is received from the mothership
;sampler_param = 1

#################################### Grafana.com integration  ##########################
# Url used to import dashboards directly from Grafana.com
[grafana_com]
;url = https://grafana.com

#################################### External image storage ##########################
[external_image_storage]
# Used for uploading images to public servers so they can be included in slack/email messages.
# you can choose between (s3, webdav, gcs, azure_blob, local)
;provider =

[external_image_storage.s3]
;bucket =
;region =
;path =
;access_key =
;secret_key =

[external_image_storage.webdav]
;url =
;public_url =
;username =
;password =

[external_image_storage.gcs]
;key_file =
;bucket =
;path =

[external_image_storage.azure_blob]
;account_name =
;account_key =
;container_name =

[external_image_storage.local]
# does not require any configuration

[rendering]
# Options to configure external image rendering server like https://github.com/grafana/grafana-image-renderer
;server_url =
;callback_url =

[enterprise]
# Path to a valid Grafana Enterprise license.jwt file
;license_path =

```

Create a ConfigMap from the config file:

```bash
kubectl -n monitoring create configmap cm-grafana --from-file grafana.ini
```

### Create Grafana secrets

Generate `base64` strings:

```bash
# This will be the admin-username. Copy the output.
echo -n 'admin' | base64

# This will be the admin-password. Copy the output.
echo -n 'PUT-YOUR-PASSWORD-HERE' | base64
```

Create Secret:

```bash
kubectl create -f - <<EOF

apiVersion: v1
kind: Secret
metadata:
  name: grafana
  namespace: monitoring
type: Opaque
data:
  admin-username: PASTE admin-username base64 HERE
  admin-password: PASTE admin-password base64 HERE
  
EOF
```

{% hint style="info" %}
To retrieve admin username and password, run:

```bash
kubectl -n monitoring \
  get secret grafana \
  -o jsonpath="{.data.admin-username}" \
  | base64 --decode ; echo

kubectl -n monitoring \
  get secret grafana \
  -o jsonpath="{.data.admin-password}" \
  | base64 --decode ; echo
```

{% endhint %}

### Deploy Grafana

SSH to the node which will host Prometheus and create a directory to persist its data:

```bash
mkdir -p /storage/storage-001/mnt-grafana
chown -R nobody:nogroup /storage/storage-001/mnt-grafana
```

Deploy Grafana:

```bash
kubectl create -f - <<EOF

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: monitoring
  labels:
    app: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      securityContext:
        runAsUser: 65534 #nobody
        fsGroup: 65534 #nogroup
      containers:
      - name: grafana
        image: grafana/grafana
        
        ports:
        - containerPort: 3000
        
        env:
          - name: GF_AUTH_BASIC_ENABLED
            value: "true"
            
          - name: GF_SECURITY_ADMIN_USER
            #value: "admin"
            valueFrom:
              secretKeyRef:
                name: grafana
                key: admin-username
            
          - name: GF_SECURITY_ADMIN_PASSWORD
            #value: "PLAIN-PWD"
            valueFrom:
              secretKeyRef:
                name: grafana
                key: admin-password
            
          #- name: GF_AUTH_ANONYMOUS_ENABLED
          #  value: "false"
          
          # If you want allow anonymous admin acess use the following
          # config instead  
          #- name: GF_AUTH_BASIC_ENABLED
          #  value: "false"
          #- name: GF_AUTH_ANONYMOUS_ENABLED
          #  value: "true"
          #- name: GF_AUTH_ANONYMOUS_ORG_ROLE
          #  value: Admin
          
        volumeMounts:
          - name: config-volume
            mountPath: /etc/grafana/grafana.ini
            subPath: grafana.ini
            
          - name: mnt-grafana
            mountPath: /var/lib/grafana
            
      volumes:
        - name: config-volume
          configMap:
           name: cm-grafana
           
        - name: mnt-grafana
          hostPath:
            path: /storage/storage-001/mnt-grafana

      nodeSelector:
        kubernetes.io/hostname: k8snode

EOF
```

### Expose Grafana

```bash
kubectl create -f - <<EOF
        
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: srv-grafana
  namespace: monitoring
spec:
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 30000
    port: 3000
    protocol: TCP
    targetPort: 3000
  selector:
    app: grafana
  sessionAffinity: None
  type: NodePort

EOF
```

#### Test the deployment

On your workstation access [http://YOUR.CLUSTER.IP:30000](http://your.cluster.ip:30000)

{% hint style="info" %}
Alternatively you can port forward:

```bash
export NAMESPACE=monitoring
kubectl port-forward \
  -n $NAMESPACE \
  $(kubectl -n $NAMESPACE get pods |grep "grafana-" | awk '{print $1}') \
  3000
```

Then access [http://localhost:9090](http://localhost:3000)
{% endhint %}

{% hint style="info" %}
If you need info about exposing a service, please refer to: <https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/>
{% endhint %}

### Dashboards

<https://grafana.com/dashboards/2115>

## Prometheus exporters

### node-exporter

Create a DaemonSet to ensure all nodes have node-exporter:

```bash
kubectl create -f - <<EOF

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    name: node-exporter
spec:
  template:
    metadata:
      labels:
        name: node-exporter
      annotations:
         prometheus.io/scrape: "true"
         prometheus.io/port: "9100"
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
        - ports:
            - containerPort: 9100
              protocol: TCP
          resources:
            requests:
              cpu: 0.15
          securityContext:
            privileged: true
          image: prom/node-exporter:latest
          args:
            - --path.procfs
            - /host/proc
            - --path.sysfs
            - /host/sys
            - --collector.filesystem.ignored-mount-points
            - '"^/(sys|proc|dev|host|etc)($|/)"'
          name: node-exporter
          volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: rootfs
              mountPath: /rootfs
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
    
EOF
```

#### Add node-exporter scraper to Prometheus

Edit Prometheus config file:

```bash
nano prometheus.yml
```

Add the scraper:

```
- job_name: 'node_exporter_test'
    static_configs:
    - targets: ['YOUR-NODE-IP:9100']
    #relabel_configs:
    #  - source_labels: [__address__]
    #    target_label: instance
    #    replacement: "NEW-LABEL"
    #relabel_configs:
    #  - source_labels: [__address__]
    #    target_label: __address__
    #    replacement: k8snode:9100
    #metric_relabel_configs:
    #  - source_labels: ["__name__"]
    #    target_label: "job"
    #    replacement: "job"

```

#### Grafana dashboard

ID: 1860

<https://grafana.com/dashboards/1860>

### kube-state-metrics

Deploy dependencies:

```bash
git clone https://github.com/kubernetes/kube-state-metrics.git
kubectl apply -f kube-state-metrics/kubernetes/
```

#### Expose kube-state-metrics

```bash
kubectl create -f - <<EOF
        
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: prometheus
  name: srv-custom-kube-state-metrics
  namespace: kube-system
spec:
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 32767
    name: metrics
    port: 8080
    protocol: TCP
    targetPort: 8080
  - nodePort: 32766
    name: telemetry
    port: 8081
    protocol: TCP
    targetPort: 8081
  selector:
    k8s-app: kube-state-metrics
  sessionAffinity: None
  type: NodePort

EOF
```

#### Add Prometheus scraper

```
  - job_name: 'kube-state-metrics-metrics'
    static_configs:
    - targets: ['NODE.IP:32767']
    
  - job_name: 'kube-state-metrics-telemetry'
    static_configs:
    - targets: ['NODE.IP:32766']
```

Update the ConfigMap:

```bash
kubectl -n monitoring \
  create configmap cm-prometheus \
  --from-file=prometheus.yml \
  -o yaml --dry-run | kubectl apply -f -
```

Roll out `ConfigMap`:

```bash
kubectl -n monitoring scale deployment/prometheus --replicas=0
kubectl -n monitoring scale deployment/prometheus --replicas=1
```

#### Grafana dashboard

Dashboard ID: 7249

<https://grafana.com/dashboards/7249>

Dashboard ID: 747

<https://grafana.com/dashboards/747>

#### Grafana panels

```
{
  "columns": [],
  "fontSize": "100%",
  "gridPos": {
    "h": 9,
    "w": 12,
    "x": 0,
    "y": 0
  },
  "id": 2,
  "links": [],
  "pageSize": null,
  "scroll": true,
  "showHeader": true,
  "sort": {
    "col": 2,
    "desc": true
  },
  "styles": [
    {
      "alias": "Time",
      "dateFormat": "YYYY-MM-DD HH:mm:ss",
      "pattern": "Time",
      "type": "date"
    },
    {
      "alias": "",
      "colorMode": null,
      "colors": [
        "rgba(245, 54, 54, 0.9)",
        "rgba(237, 129, 40, 0.89)",
        "rgba(50, 172, 45, 0.97)"
      ],
      "decimals": 2,
      "pattern": "/.*/",
      "thresholds": [],
      "type": "number",
      "unit": "short"
    }
  ],
  "targets": [
    {
      "expr": "sum(kube_pod_container_status_restarts_total{namespace=~\"^$namespace$\",pod=~\"^$pod$\"}) by (pod)",
      "format": "table",
      "intervalFactor": 1,
      "refId": "A"
    }
  ],
  "title": "Pod restart history",
  "transform": "table",
  "type": "table"
}
```

```
{
  "columns": [],
  "fontSize": "100%",
  "gridPos": {
    "h": 9,
    "w": 12,
    "x": 0,
    "y": 0
  },
  "id": 2,
  "links": [],
  "pageSize": null,
  "scroll": true,
  "showHeader": true,
  "sort": {
    "col": 5,
    "desc": true
  },
  "styles": [
    {
      "alias": "Time",
      "dateFormat": "YYYY-MM-DD HH:mm:ss",
      "pattern": "Time",
      "type": "date"
    },
    {
      "alias": "",
      "colorMode": null,
      "colors": [
        "rgba(245, 54, 54, 0.9)",
        "rgba(237, 129, 40, 0.89)",
        "rgba(50, 172, 45, 0.97)"
      ],
      "decimals": 2,
      "pattern": "/.*/",
      "thresholds": [],
      "type": "number",
      "unit": "short"
    }
  ],
  "targets": [
    {
      "expr": "sum(kube_pod_container_status_restarts_total{namespace=~\"^$namespace$\",pod=~\"^$pod$\"}) by (namespace, pod, container, job)",
      "format": "table",
      "intervalFactor": 1,
      "refId": "A",
      "legendFormat": "",
      "interval": "",
      "instant": false
    }
  ],
  "title": "Pod restart history",
  "transform": "table",
  "type": "table"
}
```

### nvidia-gpu-exporter

Label your nodes:

```bash
kubectl label nodes PUT-YOUR-NODE-HERE hardware-type=NVIDIAGPU
```

Deploy it:

```bash
kubectl create -f - <<EOF

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: nvidia-gpu-exporter
  namespace: monitoring
  labels:
    app: nvidia-gpu-exporter
    component: nvidia-gpu-exporter
spec:
  template:
    metadata:
      name: nvidia-gpu-exporter
      labels:
        app: prometheus
        component: gpu-exporter
    spec:
      containers:
      - image: swiftdiaries/gpu_prom_metrics
        name: nvidia-gpu-exporter
        ports:
        - name: prom-gpu-exp
          containerPort: 9445
          hostPort: 9445
      hostNetwork: true
      nodeSelector:
        hardware-type: "NVIDIAGPU"

EOF
```

Prometheus scraper:

```
  - job_name: 'gpu'
    static_configs:
    - targets: ['NODE.IP:9445']
```

Update config map [following the instructions above](https://devops-buzz.gitbook.io/public/~/edit/drafts/-LaN7Lta-TTk4zX-v6QF/kubernetes/monitoring#if-you-need-to-update-the-configmap).

Grafana dashboard:

```
{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 3,
  "iteration": 1553034339729,
  "links": [],
  "panels": [
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 7,
        "w": 2,
        "x": 0,
        "y": 0
      },
      "id": 2,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": false
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "nvidia_gpu_num_devices{instance=\"$node:9445\"}",
          "format": "time_series",
          "intervalFactor": 1,
          "refId": "A"
        }
      ],
      "thresholds": "",
      "timeFrom": null,
      "timeShift": null,
      "title": "GPUs",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "N/A",
          "value": "null"
        }
      ],
      "valueName": "avg"
    },
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "prometheus-k8s",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": true,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 7,
        "w": 5,
        "x": 2,
        "y": 0
      },
      "id": 10,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": false
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "nvidia_gpu_temperature_celsius{instance=\"$node:9445\",minor_number=\"$gpu\"}",
          "format": "time_series",
          "intervalFactor": 1,
          "refId": "A"
        }
      ],
      "thresholds": "33,66,100",
      "title": "Temperature (C)",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "N/A",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "fill": 1,
      "gridPos": {
        "h": 7,
        "w": 19,
        "x": 0,
        "y": 7
      },
      "id": 4,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "paceLength": 10,
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "nvidia_gpu_memory_total_bytes{instance=\"$node:9445\",minor_number=\"$gpu\"}/100000000",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Total",
          "refId": "A"
        },
        {
          "expr": "nvidia_gpu_memory_used_bytes{instance=\"$node:9445\",minor_number=\"$gpu\"}/100000000",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Used",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Memory (MB)",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "fill": 1,
      "gridPos": {
        "h": 7,
        "w": 19,
        "x": 0,
        "y": 14
      },
      "id": 6,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "paceLength": 10,
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "nvidia_gpu_duty_cycle{instance=\"$node:9445\",minor_number=\"$gpu\"}",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Total",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "GPU Utilisation (%)",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "fill": 1,
      "gridPos": {
        "h": 7,
        "w": 19,
        "x": 0,
        "y": 21
      },
      "id": 8,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "paceLength": 10,
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "nvidia_gpu_power_usage_milliwatts{instance=\"$node:9445\",minor_number=\"$gpu\"}/1000",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Total",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Power Usage (watts)",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "schemaVersion": 18,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "allValue": null,
        "current": {
          "selected": true,
          "text": "hamilton.cv-prod-nz-air-new-zealand.com",
          "value": "hamilton.cv-prod-nz-air-new-zealand.com"
        },
        "datasource": "prometheus-k8s",
        "definition": "nvidia_gpu_power_usage_milliwatts",
        "hide": 0,
        "includeAll": false,
        "label": "Host:",
        "multi": false,
        "name": "node",
        "options": [],
        "query": "nvidia_gpu_power_usage_milliwatts",
        "refresh": 1,
        "regex": "/.*instance=\"([^\"]*):.*/",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": null,
        "current": {
          "tags": [],
          "text": "1",
          "value": "1"
        },
        "datasource": "prometheus-k8s",
        "definition": "nvidia_gpu_temperature_celsius",
        "hide": 0,
        "includeAll": false,
        "label": "GPU:",
        "multi": false,
        "name": "gpu",
        "options": [],
        "query": "nvidia_gpu_temperature_celsius",
        "refresh": 1,
        "regex": "/minor_number=\"(.*?)\"/",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "",
  "title": "GPU",
  "uid": "oaFpztCmk",
  "version": 8
}
```

#### References

<https://github.com/mindprince/nvidia_gpu_prometheus_exporter>

<https://github.com/andreyvelich/nvidia_gpu_prometheus_exporter>

## prometheus-operator

{% hint style="danger" %}
These instuctions are not working properly due Persistent Volume issues. They are here only as reference.
{% endhint %}

### Install

#### Create a StorageClass

Create the manifest:

```bash
cat > storage-class.yml <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

EOF
```

Deploy it:

```bash
kubectl create -f storage-class.yml
```

#### Install the helm chart

Create the manifest:

```bash
cat > custom-values.yaml <<EOF

# Depending on which DNS solution you have installed in your cluster enable the right exporter
coreDns:
  enabled: false

kubeDns:
  enabled: true

alertmanager:
  alertmanagerSpec:
    nodeSelector:
      kubernetes.io/hostname: minikube
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          storageClassName: local-storage
          resources:
            requests:
              storage: 10Gi

prometheus:
  prometheusSpec:
    nodeSelector:
      kubernetes.io/hostname: minikube
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          storageClassName: local-storage
          resources:
            requests:
              storage: 10Gi

prometheusOperator:
  nodeSelector:
    kubernetes.io/hostname: minikube

grafana:
  adminPassword: "YourPass123#"
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
      kubernetes.io/tls-acme: "true"
    hosts:
      - grafana.test.akomljen.com
    tls:
      - secretName: grafana-tls
        hosts:
          - grafana.test.akomljen.com
  persistence:
    enabled: true
    accessModes: ["ReadWriteOnce"]
    storageClassName: local-storage
    size: 10Gi

EOF
```

Make sure you have [helm and tiller set up](https://devops-buzz.gitbook.io/public/kubernetes/helm#helm-init-secure-tls).

Deploy prometheus-operator:

```bash
helm install \
  --tls \
  --name prom \
  --namespace monitoring\
  -f custom-values.yaml \
  stable/prometheus-operator
```

{% hint style="info" %}
If you need to update the custom values, run:

```bash
helm upgrade -f custom-values.yaml /
  prom stable/prometheus-operator
```

{% endhint %}

Check statuses:

```bash
kubectl --namespace monitoring get pods -l "release=prom"
```

#### Port forwarding

Prometheus:

```bash
kubectl port-forward \
  -n monitoring \
  prometheus-prom-prometheus-operator-prometheus-0 9090
```

Alert manager:

```bash
kubectl port-forward \
  -n monitoring alertmanager-prom-prometheus-operator-alertmanager-0 9093
```

Grafana:

```bash
kubectl port-forward \
  -n monitoring \
  $(kubectl -n monitoring get pods |grep "prom-grafana" | awk '{print $1}') \
  3000:3000
```

### References

<https://github.com/helm/charts/tree/master/stable/prometheus-operator>

<https://github.com/coreos/prometheus-operator>

<https://akomljen.com/get-kubernetes-cluster-metrics-with-prometheus-in-5-minutes/>

<https://www.sachsenhofer.io/setup-prometheus-operator-kube-prometheus-kubernetes-cluster/>

### Uninstall

To uninstall/delete:

```bash
helm delete --purge prom
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete namespace monitoring
```

### Custom values

Check:

<https://github.com/helm/charts/tree/master/stable/prometheus-operator#configuration>

<https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.devops.buzz/public/kubernetes/monitoring.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.