Monitoring
K8s monitoring.
metrics-server
Clone metrics-server.
git clone https://github.com/kubernetes-incubator/metrics-server.git
cd metrics-server
Edit resource-reader.yaml.
nano deploy/1.8+/resource-reader.yaml
Edit the resources section as follows:
...
resources:
- pods
- nodes
- namespaces
- nodes/stats
...
Edit metrics-server-deployment.yaml
nano deploy/1.8+/metrics-server-deployment.yaml
Edit as follows:
...
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.3
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
imagePullPolicy: Always
...
Deploy it.
kubectl apply -f deploy/1.8+/
Wait a few minutes and run:
kubectl top node
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" |jq
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/YOUR-NAMESPACE/pods" |jq
References
https://medium.com/@cagri.ersen/kubernetes-metrics-server-installation-d93380de008
https://github.com/kubernetes-incubator/metrics-server/issues/247
http://d0o0bz.cn/2018/12/deploying-metrics-server-for-kubernetes/
Rancher
docker run \
-tid \
--name=rancher \
--restart=unless-stopped \
-p 80:80 -p 443:443 \
rancher/rancher:latest
Add a cluster and run on you cluster the manifest it generates.
Also check: https://github.com/rancher/fleet
Audit
SSH to your master node.
Create a policy file:
mkdir /etc/kubernetes/policies
nano /etc/kubernetes/policies/audit-policy.yaml
Paste:
# Log all requests at the Metadata level.
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
Edit K8s API server config file:
nano /etc/kubernetes/manifests/kube-apiserver.yaml
Add:
...
spec:
containers:
- command:
- kube-apiserver
...
- --audit-policy-file=/etc/kubernetes/policies/audit-policy.yaml
- --audit-log-path=/var/log/apiserver/audit.log
- --audit-log-format=json
...
volumeMounts:
...
- mountPath: /etc/kubernetes/policies
name: policies
readOnly: true
...
volumes:
...
- hostPath:
path: /etc/kubernetes/policies
type: DirectoryOrCreate
name: policies
Restart kubelet:
systemctl restart kubelet
If the changes did not take effect, stop the API server docker container (it will be started automatically):
docker stop $(docker ps | grep "k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system" | awk '{print $1}')
Tail the log file:
docker exec -it $(docker ps |grep "k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system" | awk '{print $1}') tail -f /var/log/apiserver/audit.log
References
https://www.outcoldsolutions.com/docs/monitoring-kubernetes/v4/audit/
Prometheus
Create namespace
kubectl create namespace monitoring
Create Prometheus config
nano prometheus.yml
Paste:
global:
scrape_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
Prometheus config file example: https://github.com/prometheus/prometheus/blob/master/docs/getting_started.md
Create a ConfigMap from the config file:
kubectl -n monitoring create configmap cm-prometheus --from-file prometheus.yml
If you need to update the ConfigMap...
Edit the file:
nano prometheus.yml
Update the ConfigMap:
kubectl -n monitoring \
create configmap cm-prometheus \
--from-file=prometheus.yml \
-o yaml --dry-run | kubectl apply -f -
Now we need to roll out the new ConfigMap. By the time of this writing (2019-02-15), this subjects seems to be a little tricky. Please find some options bellow:
Roll out ConfigMap: option 1 - scale deployment
This is the only way that will "always" work, although there will be a few seconds of downtime:
kubectl -n monitoring scale deployment/prometheus --replicas=0
kubectl -n monitoring scale deployment/prometheus --replicas=1
Roll out ConfigMap: option 2 - patch the deployment
kubectl -n monitoring \
patch deployment prometheus \
-p '{"spec":{"template":{"metadata":{"labels":{"date":"2019-02-15"}}}}}'
Roll out ConfigMap: option 3 - create a new ConfigMap
Create a new ConfigMap:
kubectl -n monitoring \
create configmap cm-prometheus-new \
--from-file=prometheus.yml \
-o yaml --dry-run | kubectl apply -f -
Edit the deployment:
export EDITOR=nano
kubectl -n monitoring edit deployments prometheus
Edit volumes.configMap.name
and use cm-prometheus-new
. The change will force K8s to create new pods with the new config.
If by any reason you deployed Prometheus with hostNetwork: true
, options 2 and 3 will return this error:
0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.
In this case, use option 1.
If you need more info regarding rolling out ConfigMaps, please refer to: https://stackoverflow.com/questions/37317003/restart-pods-when-configmap-updates-in-kubernetes
Deploy Prometheus
SSH to the node which will host Prometheus and create a directory to persist its data:
mkdir -p /storage/storage-001/mnt-prometheus
chown -R nobody:nogroup /storage/storage-001/mnt-prometheus
Deploy Prometheus:
kubectl create -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
securityContext:
runAsUser: 65534
fsGroup: 65534
containers:
- name: prometheus
image: prom/prometheus:latest
ports:
- containerPort: 9090
args:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --web.console.libraries=/usr/share/prometheus/console_libraries
- --web.console.templates=/usr/share/prometheus/consoles
- --storage.tsdb.retention.time=90d
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus/prometheus.yml
subPath: prometheus.yml
- name: mnt-prometheus
mountPath: /prometheus
volumes:
- name: config-volume
configMap:
name: cm-prometheus
- name: mnt-prometheus
hostPath:
path: /storage/storage-001/mnt-prometheus
nodeSelector:
kubernetes.io/hostname: k8snode
EOF
Expose Prometheus
kubectl create -f - <<EOF
---
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
name: srv-prometheus
namespace: monitoring
spec:
externalTrafficPolicy: Cluster
ports:
- nodePort: 30909
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus
sessionAffinity: None
type: NodePort
EOF
Test the deployment
On your workstation access http://YOUR.CLUSTER.IP:30909
Alternatively you can port forward:
export NAMESPACE=monitoring
kubectl port-forward \
-n $NAMESPACE \
$(kubectl -n $NAMESPACE get pods |grep "prometheus-" | awk '{print $1}') \
9090
Then access http://localhost:9090
If you need info about exposing a service, please refer to: https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/
References
https://sysdig.com/blog/kubernetes-monitoring-prometheus/
https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/
Manifest example
https://gist.github.com/philips/7ddeeb2fdab2ff4e4f8a035fc567f3d0
Grafana
Create namespace
kubectl create namespace monitoring
Create Grafana config
nano grafana.ini
Paste:
# ConfigMap
##################### Grafana Configuration Example #####################
#
# Everything has defaults so you only need to uncomment things you want to
# change
# possible values : production, development
;app_mode = production
# instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty
;instance_name = ${HOSTNAME}
#################################### Paths ####################################
[paths]
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
;data = /var/lib/grafana
# Temporary files in `data` directory older than given duration will be removed
;temp_data_lifetime = 24h
# Directory where grafana can store logs
;logs = /var/log/grafana
# Directory where grafana will automatically scan and look for plugins
;plugins = /var/lib/grafana/plugins
# folder that contains provisioning config files that grafana will apply on startup and while running.
;provisioning = conf/provisioning
#################################### Server ####################################
[server]
# Protocol (http, https, socket)
;protocol = http
# The ip address to bind to, empty will bind to all interfaces
;http_addr =
# The http port to use
;http_port = 3000
# The public facing domain name used to access grafana from a browser
;domain = localhost
# Redirect to correct domain if host header does not match domain
# Prevents DNS rebinding attacks
;enforce_domain = false
# The full public facing url you use in browser, used for redirects and emails
# If you use reverse proxy and sub path specify full url (with sub path)
;root_url = http://localhost:3000
# Log web requests
;router_logging = false
# the path relative working path
;static_root_path = public
# enable gzip
;enable_gzip = false
# https certs & key file
;cert_file =
;cert_key =
# Unix socket path
;socket =
#################################### Database ####################################
[database]
# You can configure the database connection by specifying type, host, name, user and password
# as separate properties or as on string using the url properties.
# Either "mysql", "postgres" or "sqlite3", it's your choice
;type = sqlite3
;host = 127.0.0.1:3306
;name = grafana
;user = root
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
;password =
# Use either URL or the previous fields to configure the database
# Example: mysql://user:secret@host:port/database
;url =
# For "postgres" only, either "disable", "require" or "verify-full"
;ssl_mode = disable
# For "sqlite3" only, path relative to data_path setting
;path = grafana.db
# Max idle conn setting default is 2
;max_idle_conn = 2
# Max conn setting default is 0 (mean not set)
;max_open_conn =
# Connection Max Lifetime default is 14400 (means 14400 seconds or 4 hours)
;conn_max_lifetime = 14400
# Set to true to log the sql calls and execution times.
log_queries =
#################################### Session ####################################
[session]
# Either "memory", "file", "redis", "mysql", "postgres", default is "file"
;provider = file
# Provider config options
# memory: not have any config yet
# file: session dir path, is relative to grafana data_path
# redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=grafana`
# mysql: go-sql-driver/mysql dsn config string, e.g. `user:password@tcp(127.0.0.1:3306)/database_name`
# postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable
;provider_config = sessions
# Session cookie name
;cookie_name = grafana_sess
# If you use session in https only, default is false
;cookie_secure = false
# Session life time, default is 86400
;session_life_time = 86400
#################################### Data proxy ###########################
[dataproxy]
# This enables data proxy logging, default is false
;logging = false
#################################### Analytics ####################################
[analytics]
# Server reporting, sends usage counters to stats.grafana.org every 24 hours.
# No ip addresses are being tracked, only simple counters to track
# running instances, dashboard and error counts. It is very helpful to us.
# Change this option to false to disable reporting.
;reporting_enabled = true
# Set to false to disable all checks to https://grafana.net
# for new vesions (grafana itself and plugins), check is used
# in some UI views to notify that grafana or plugin update exists
# This option does not cause any auto updates, nor send any information
# only a GET request to http://grafana.com to get latest versions
;check_for_updates = true
# Google Analytics universal tracking code, only enabled if you specify an id here
;google_analytics_ua_id =
#################################### Security ####################################
[security]
# default admin user, created on startup
;admin_user = admin
# default admin password, can be changed before first start of grafana, or in profile settings
;admin_password = admin
# used for signing
;secret_key = SW2YcwTIb9zpOOhoPsMm
# Auto-login remember days
;login_remember_days = 7
;cookie_username = grafana_user
;cookie_remember_name = grafana_remember
# disable gravatar profile images
;disable_gravatar = false
# data source proxy whitelist (ip_or_domain:port separated by spaces)
;data_source_proxy_whitelist =
# disable protection against brute force login attempts
;disable_brute_force_login_protection = false
#################################### Snapshots ###########################
[snapshots]
# snapshot sharing options
;external_enabled = true
;external_snapshot_url = https://snapshots-origin.raintank.io
;external_snapshot_name = Publish to snapshot.raintank.io
# remove expired snapshot
;snapshot_remove_expired = true
#################################### Dashboards History ##################
[dashboards]
# Number dashboard versions to keep (per dashboard). Default: 20, Minimum: 1
;versions_to_keep = 20
#################################### Users ###############################
[users]
# disable user signup / registration
;allow_sign_up = true
# Allow non admin users to create organizations
;allow_org_create = true
# Set to true to automatically assign new users to the default organization (id 1)
;auto_assign_org = true
# Default role new users will be automatically assigned (if disabled above is set to true)
;auto_assign_org_role = Viewer
# Background text for the user field on the login page
;login_hint = email or username
# Default UI theme ("dark" or "light")
;default_theme = dark
# External user management, these options affect the organization users view
;external_manage_link_url =
;external_manage_link_name =
;external_manage_info =
# Viewers can edit/inspect dashboard settings in the browser. But not save the dashboard.
;viewers_can_edit = false
[auth]
# Set to true to disable (hide) the login form, useful if you use OAuth, defaults to false
;disable_login_form = false
# Set to true to disable the signout link in the side menu. useful if you use auth.proxy, defaults to false
;disable_signout_menu = false
# URL to redirect the user to after sign out
;signout_redirect_url =
# Set to true to attempt login with OAuth automatically, skipping the login screen.
# This setting is ignored if multiple OAuth providers are configured.
;oauth_auto_login = false
#################################### Anonymous Auth ##########################
[auth.anonymous]
# enable anonymous access
;enabled = false
# specify organization name that should be used for unauthenticated users
;org_name = Main Org.
# specify role for unauthenticated users
;org_role = Viewer
#################################### Github Auth ##########################
[auth.github]
;enabled = false
;allow_sign_up = true
;client_id = some_id
;client_secret = some_secret
;scopes = user:email,read:org
;auth_url = https://github.com/login/oauth/authorize
;token_url = https://github.com/login/oauth/access_token
;api_url = https://api.github.com/user
;team_ids =
;allowed_organizations =
#################################### Google Auth ##########################
[auth.google]
;enabled = false
;allow_sign_up = true
;client_id = some_client_id
;client_secret = some_client_secret
;scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
;auth_url = https://accounts.google.com/o/oauth2/auth
;token_url = https://accounts.google.com/o/oauth2/token
;api_url = https://www.googleapis.com/oauth2/v1/userinfo
;allowed_domains =
#################################### Generic OAuth ##########################
[auth.generic_oauth]
;enabled = false
;name = OAuth
;allow_sign_up = true
;client_id = some_id
;client_secret = some_secret
;scopes = user:email,read:org
;auth_url = https://foo.bar/login/oauth/authorize
;token_url = https://foo.bar/login/oauth/access_token
;api_url = https://foo.bar/user
;team_ids =
;allowed_organizations =
;tls_skip_verify_insecure = false
;tls_client_cert =
;tls_client_key =
;tls_client_ca =
#################################### Grafana.com Auth ####################
[auth.grafana_com]
;enabled = false
;allow_sign_up = true
;client_id = some_id
;client_secret = some_secret
;scopes = user:email
;allowed_organizations =
#################################### Auth Proxy ##########################
[auth.proxy]
;enabled = false
;header_name = X-WEBAUTH-USER
;header_property = username
;auto_sign_up = true
;ldap_sync_ttl = 60
;whitelist = 192.168.1.1, 192.168.2.1
;headers = Email:X-User-Email, Name:X-User-Name
#################################### Basic Auth ##########################
[auth.basic]
;enabled = true
#################################### Auth LDAP ##########################
[auth.ldap]
;enabled = false
;config_file = /etc/grafana/ldap.toml
;allow_sign_up = true
#################################### SMTP / Emailing ##########################
[smtp]
;enabled = false
;host = localhost:25
;user =
# If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
;password =
;cert_file =
;key_file =
;skip_verify = false
;from_address = admin@grafana.localhost
;from_name = Grafana
# EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = dashboard.example.com
[emails]
;welcome_email_on_sign_up = false
#################################### Logging ##########################
[log]
# Either "console", "file", "syslog". Default is console and file
# Use space to separate multiple modes, e.g. "console file"
;mode = console file
# Either "debug", "info", "warn", "error", "critical", default is "info"
;level = info
# optional settings to set different levels for specific loggers. Ex filters = sqlstore:debug
;filters =
# For "console" mode only
[log.console]
;level =
# log line format, valid options are text, console and json
;format = console
# For "file" mode only
[log.file]
;level =
# log line format, valid options are text, console and json
;format = text
# This enables automated log rotate(switch of following options), default is true
;log_rotate = true
# Max line number of single file, default is 1000000
;max_lines = 1000000
# Max size shift of single file, default is 28 means 1 << 28, 256MB
;max_size_shift = 28
# Segment log daily, default is true
;daily_rotate = true
# Expired days of log file(delete after max days), default is 7
;max_days = 7
[log.syslog]
;level =
# log line format, valid options are text, console and json
;format = text
# Syslog network type and address. This can be udp, tcp, or unix. If left blank, the default unix endpoints will be used.
;network =
;address =
# Syslog facility. user, daemon and local0 through local7 are valid.
;facility =
# Syslog tag. By default, the process' argv[0] is used.
;tag =
#################################### Alerting ############################
[alerting]
# Disable alerting engine & UI features
;enabled = true
# Makes it possible to turn off alert rule execution but alerting UI is visible
;execute_alerts = true
# Default setting for new alert rules. Defaults to categorize error and timeouts as alerting. (alerting, keep_state)
;error_or_timeout = alerting
# Default setting for how Grafana handles nodata or null values in alerting. (alerting, no_data, keep_state, ok)
;nodata_or_nullvalues = no_data
# Alert notifications can include images, but rendering many images at the same time can overload the server
# This limit will protect the server from render overloading and make sure notifications are sent out quickly
;concurrent_render_limit = 5
#################################### Explore #############################
[explore]
# Enable the Explore section
;enabled = false
#################################### Internal Grafana Metrics ##########################
# Metrics available at HTTP API Url /metrics
[metrics]
# Disable / Enable internal metrics
;enabled = true
# Publish interval
;interval_seconds = 10
# Send internal metrics to Graphite
[metrics.graphite]
# Enable by setting the address setting (ex localhost:2003)
;address =
;prefix = prod.grafana.%(instance_name)s.
#################################### Distributed tracing ############
[tracing.jaeger]
# Enable by setting the address sending traces to jaeger (ex localhost:6831)
;address = localhost:6831
# Tag that will always be included in when creating new spans. ex (tag1:value1,tag2:value2)
;always_included_tag = tag1:value1
# Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote
;sampler_type = const
# jaeger samplerconfig param
# for "const" sampler, 0 or 1 for always false/true respectively
# for "probabilistic" sampler, a probability between 0 and 1
# for "rateLimiting" sampler, the number of spans per second
# for "remote" sampler, param is the same as for "probabilistic"
# and indicates the initial sampling rate before the actual one
# is received from the mothership
;sampler_param = 1
#################################### Grafana.com integration ##########################
# Url used to import dashboards directly from Grafana.com
[grafana_com]
;url = https://grafana.com
#################################### External image storage ##########################
[external_image_storage]
# Used for uploading images to public servers so they can be included in slack/email messages.
# you can choose between (s3, webdav, gcs, azure_blob, local)
;provider =
[external_image_storage.s3]
;bucket =
;region =
;path =
;access_key =
;secret_key =
[external_image_storage.webdav]
;url =
;public_url =
;username =
;password =
[external_image_storage.gcs]
;key_file =
;bucket =
;path =
[external_image_storage.azure_blob]
;account_name =
;account_key =
;container_name =
[external_image_storage.local]
# does not require any configuration
[rendering]
# Options to configure external image rendering server like https://github.com/grafana/grafana-image-renderer
;server_url =
;callback_url =
[enterprise]
# Path to a valid Grafana Enterprise license.jwt file
;license_path =
Create a ConfigMap from the config file:
kubectl -n monitoring create configmap cm-grafana --from-file grafana.ini
Create Grafana secrets
Generate base64
strings:
# This will be the admin-username. Copy the output.
echo -n 'admin' | base64
# This will be the admin-password. Copy the output.
echo -n 'PUT-YOUR-PASSWORD-HERE' | base64
Create Secret:
kubectl create -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: grafana
namespace: monitoring
type: Opaque
data:
admin-username: PASTE admin-username base64 HERE
admin-password: PASTE admin-password base64 HERE
EOF
To retrieve admin username and password, run:
kubectl -n monitoring \
get secret grafana \
-o jsonpath="{.data.admin-username}" \
| base64 --decode ; echo
kubectl -n monitoring \
get secret grafana \
-o jsonpath="{.data.admin-password}" \
| base64 --decode ; echo
Deploy Grafana
SSH to the node which will host Prometheus and create a directory to persist its data:
mkdir -p /storage/storage-001/mnt-grafana
chown -R nobody:nogroup /storage/storage-001/mnt-grafana
Deploy Grafana:
kubectl create -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
runAsUser: 65534 #nobody
fsGroup: 65534 #nogroup
containers:
- name: grafana
image: grafana/grafana
ports:
- containerPort: 3000
env:
- name: GF_AUTH_BASIC_ENABLED
value: "true"
- name: GF_SECURITY_ADMIN_USER
#value: "admin"
valueFrom:
secretKeyRef:
name: grafana
key: admin-username
- name: GF_SECURITY_ADMIN_PASSWORD
#value: "PLAIN-PWD"
valueFrom:
secretKeyRef:
name: grafana
key: admin-password
#- name: GF_AUTH_ANONYMOUS_ENABLED
# value: "false"
# If you want allow anonymous admin acess use the following
# config instead
#- name: GF_AUTH_BASIC_ENABLED
# value: "false"
#- name: GF_AUTH_ANONYMOUS_ENABLED
# value: "true"
#- name: GF_AUTH_ANONYMOUS_ORG_ROLE
# value: Admin
volumeMounts:
- name: config-volume
mountPath: /etc/grafana/grafana.ini
subPath: grafana.ini
- name: mnt-grafana
mountPath: /var/lib/grafana
volumes:
- name: config-volume
configMap:
name: cm-grafana
- name: mnt-grafana
hostPath:
path: /storage/storage-001/mnt-grafana
nodeSelector:
kubernetes.io/hostname: k8snode
EOF
Expose Grafana
kubectl create -f - <<EOF
---
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: srv-grafana
namespace: monitoring
spec:
externalTrafficPolicy: Cluster
ports:
- nodePort: 30000
port: 3000
protocol: TCP
targetPort: 3000
selector:
app: grafana
sessionAffinity: None
type: NodePort
EOF
Test the deployment
On your workstation access http://YOUR.CLUSTER.IP:30000
Alternatively you can port forward:
export NAMESPACE=monitoring
kubectl port-forward \
-n $NAMESPACE \
$(kubectl -n $NAMESPACE get pods |grep "grafana-" | awk '{print $1}') \
3000
Then access http://localhost:9090
If you need info about exposing a service, please refer to: https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/
Dashboards
https://grafana.com/dashboards/2115
Prometheus exporters
node-exporter
Create a DaemonSet to ensure all nodes have node-exporter:
kubectl create -f - <<EOF
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
name: node-exporter
spec:
template:
metadata:
labels:
name: node-exporter
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9100"
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- ports:
- containerPort: 9100
protocol: TCP
resources:
requests:
cpu: 0.15
securityContext:
privileged: true
image: prom/node-exporter:latest
args:
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'
name: node-exporter
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /rootfs
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
EOF
Add node-exporter scraper to Prometheus
Edit Prometheus config file:
nano prometheus.yml
Add the scraper:
- job_name: 'node_exporter_test'
static_configs:
- targets: ['YOUR-NODE-IP:9100']
#relabel_configs:
# - source_labels: [__address__]
# target_label: instance
# replacement: "NEW-LABEL"
#relabel_configs:
# - source_labels: [__address__]
# target_label: __address__
# replacement: k8snode:9100
#metric_relabel_configs:
# - source_labels: ["__name__"]
# target_label: "job"
# replacement: "job"
Grafana dashboard
ID: 1860
https://grafana.com/dashboards/1860
kube-state-metrics
Deploy dependencies:
git clone https://github.com/kubernetes/kube-state-metrics.git
kubectl apply -f kube-state-metrics/kubernetes/
Expose kube-state-metrics
kubectl create -f - <<EOF
---
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
name: srv-custom-kube-state-metrics
namespace: kube-system
spec:
externalTrafficPolicy: Cluster
ports:
- nodePort: 32767
name: metrics
port: 8080
protocol: TCP
targetPort: 8080
- nodePort: 32766
name: telemetry
port: 8081
protocol: TCP
targetPort: 8081
selector:
k8s-app: kube-state-metrics
sessionAffinity: None
type: NodePort
EOF
Add Prometheus scraper
- job_name: 'kube-state-metrics-metrics'
static_configs:
- targets: ['NODE.IP:32767']
- job_name: 'kube-state-metrics-telemetry'
static_configs:
- targets: ['NODE.IP:32766']
Update the ConfigMap:
kubectl -n monitoring \
create configmap cm-prometheus \
--from-file=prometheus.yml \
-o yaml --dry-run | kubectl apply -f -
Roll out ConfigMap
:
kubectl -n monitoring scale deployment/prometheus --replicas=0
kubectl -n monitoring scale deployment/prometheus --replicas=1
Grafana dashboard
Dashboard ID: 7249
https://grafana.com/dashboards/7249
Dashboard ID: 747
https://grafana.com/dashboards/747
Grafana panels
{
"columns": [],
"fontSize": "100%",
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
},
"id": 2,
"links": [],
"pageSize": null,
"scroll": true,
"showHeader": true,
"sort": {
"col": 2,
"desc": true
},
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "date"
},
{
"alias": "",
"colorMode": null,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"decimals": 2,
"pattern": "/.*/",
"thresholds": [],
"type": "number",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(kube_pod_container_status_restarts_total{namespace=~\"^$namespace$\",pod=~\"^$pod$\"}) by (pod)",
"format": "table",
"intervalFactor": 1,
"refId": "A"
}
],
"title": "Pod restart history",
"transform": "table",
"type": "table"
}
{
"columns": [],
"fontSize": "100%",
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
},
"id": 2,
"links": [],
"pageSize": null,
"scroll": true,
"showHeader": true,
"sort": {
"col": 5,
"desc": true
},
"styles": [