Monitoring
K8s monitoring.
Clone metrics-server.
git clone https://github.com/kubernetes-incubator/metrics-server.git
cd metrics-server
Edit resource-reader.yaml.
nano deploy/1.8+/resource-reader.yaml
Edit the resources section as follows:
...
resources:
- pods
- nodes
- namespaces
- nodes/stats
...
Edit metrics-server-deployment.yaml
nano deploy/1.8+/metrics-server-deployment.yaml
Edit as follows:
...
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.3
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
imagePullPolicy: Always
...
Deploy it.
kubectl apply -f deploy/1.8+/
Wait a few minutes and run:
kubectl top node
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" |jq
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/YOUR-NAMESPACE/pods" |jq
docker run \
-tid \
--name=rancher \
--restart=unless-stopped \
-p 80:80 -p 443:443 \
rancher/rancher:latest
Add a cluster and run on you cluster the manifest it generates.
SSH to your master node.
Create a policy file:
mkdir /etc/kubernetes/policies
nano /etc/kubernetes/policies/audit-policy.yaml
Paste:
# Log all requests at the Metadata level.
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
Edit K8s API server config file:
nano /etc/kubernetes/manifests/kube-apiserver.yaml
Add:
...
spec:
containers:
- command:
- kube-apiserver
...
- --audit-policy-file=/etc/kubernetes/policies/audit-policy.yaml
- --audit-log-path=/var/log/apiserver/audit.log
- --audit-log-format=json
...
volumeMounts:
...
- mountPath: /etc/kubernetes/policies
name: policies
readOnly: true
...
volumes:
...
- hostPath:
path: /etc/kubernetes/policies
type: DirectoryOrCreate
name: policies
Restart kubelet:
systemctl restart kubelet
If the changes did not take effect, stop the API server docker container (it will be started automatically):
docker stop $(docker ps | grep "k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system" | awk '{print $1}')
Tail the log file:
docker exec -it $(docker ps |grep "k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system" | awk '{print $1}') tail -f /var/log/apiserver/audit.log
kubectl create namespace monitoring
nano prometheus.yml
Paste:
global:
scrape_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
Prometheus config file example: https://github.com/prometheus/prometheus/blob/master/docs/getting_started.md
Create a ConfigMap from the config file:
kubectl -n monitoring create configmap cm-prometheus --from-file prometheus.yml
Edit the file:
nano prometheus.yml
Update the ConfigMap:
kubectl -n monitoring \
create configmap cm-prometheus \
--from-file=prometheus.yml \
-o yaml --dry-run | kubectl apply -f -
Now we need to roll out the new ConfigMap. By the time of this writing (2019-02-15), this subjects seems to be a little tricky. Please find some options bellow:
Roll out ConfigMap: option 1 - scale deployment
This is the only way that will "always" work, although there will be a few seconds of downtime:
kubectl -n monitoring scale deployment/prometheus --replicas=0
kubectl -n monitoring scale deployment/prometheus --replicas=1
Roll out ConfigMap: option 2 - patch the deployment
kubectl -n monitoring \
patch deployment prometheus \
-p '{"spec":{"template":{"metadata":{"labels":{"date":"2019-02-15"}}}}}'
Roll out ConfigMap: option 3 - create a new ConfigMap
Create a new ConfigMap:
kubectl -n monitoring \
create configmap cm-prometheus-new \
--from-file=prometheus.yml \
-o yaml --dry-run | kubectl apply -f -
Edit the deployment:
export EDITOR=nano
kubectl -n monitoring edit deployments prometheus
Edit
volumes.configMap.name
and use cm-prometheus-new
. The change will force K8s to create new pods with the new config.If by any reason you deployed Prometheus with
hostNetwork: true
, options 2 and 3 will return this error:0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.
In this case, use option 1.
If you need more info regarding rolling out ConfigMaps, please refer to: https://stackoverflow.com/questions/37317003/restart-pods-when-configmap-updates-in-kubernetes
SSH to the node which will host Prometheus and create a directory to persist its data:
mkdir -p /storage/storage-001/mnt-prometheus
chown -R nobody:nogroup /storage/storage-001/mnt-prometheus
Deploy Prometheus:
kubectl create -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
securityContext:
runAsUser: 65534
fsGroup: 65534
containers:
- name: prometheus
image: prom/prometheus:latest
ports:
- containerPort: 9090
args:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --web.console.libraries=/usr/share/prometheus/console_libraries
- --web.console.templates=/usr/share/prometheus/consoles
- --storage.tsdb.retention.time=90d
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus/prometheus.yml
subPath: prometheus.yml
- name: mnt-prometheus
mountPath: /prometheus
volumes:
- name: config-volume
configMap:
name: cm-prometheus
- name: mnt-prometheus
hostPath:
path: /storage/storage-001/mnt-prometheus
nodeSelector:
kubernetes.io/hostname: k8snode
EOF
kubectl create -f - <<EOF
---
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
name: srv-prometheus
namespace: monitoring
spec:
externalTrafficPolicy: Cluster
ports:
- nodePort: 30909
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus
sessionAffinity: None
type: NodePort
EOF
Alternatively you can port forward:
export NAMESPACE=monitoring
kubectl port-forward \
-n $NAMESPACE \
$(kubectl -n $NAMESPACE get pods |grep "prometheus-" | awk '{print $1}') \
9090
If you need info about exposing a service, please refer to: https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/
kubectl create namespace monitoring
nano grafana.ini
Paste:
# ConfigMap
##################### Grafana Configuration Example #####################
#
# Everything has defaults so you only need to uncomment things you want to
# change
# possible values : production, development
;app_mode = production
# instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty
;instance_name = ${HOSTNAME}
#################################### Paths ####################################
[paths]
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
;data = /var/lib/grafana
# Temporary files in `data` directory older than given duration will be removed
;temp_data_lifetime = 24h
# Directory where grafana can store logs
;logs = /var/log/grafana
# Directory where grafana will automatically scan and look for plugins
;plugins = /var/lib/grafana/plugins
# folder that contains provisioning config files that grafana will apply on startup and while running.
;provisioning = conf/provisioning
#################################### Server ####################################
[server]
# Protocol (http, https, socket)
;protocol = http
# The ip address to bind to, empty will bind to all interfaces
;http_addr =
# The http port to use
;http_port = 3000
# The public facing domain name used to access grafana from a browser
;domain = localhost
# Redirect to correct domain if host header does not match domain
# Prevents DNS rebinding attacks
;enforce_domain = false
# The full public facing url you use in browser, used for redirects and emails
# If you use reverse proxy and sub path specify full url (with sub path)
;root_url = http://localhost:3000
# Log web requests
;router_logging = false
# the path relative working path
;static_root_path = public
# enable gzip
;enable_gzip = false
# https certs & key file
;cert_file =
;cert_key =
# Unix socket path
;socket =
#################################### Database ####################################
[database]
# You can configure the database connection by specifying type, host, name, user and password
# as separate properties or as on string using the url properties.
# Either "mysql", "postgres" or "sqlite3", it's your choice
;type = sqlite3
;host = 127.0.0.1:3306
;name = grafana
;user = root
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
;password =
# Use either URL or the previous fields to configure the database
# Example: mysql://user:secret@host:port/database
;url =
# For "postgres" only, either "disable", "require" or "verify-full"
;ssl_mode = disable
# For "sqlite3" only, path relative to data_path setting
;path = grafana.db
# Max idle conn setting default is 2
;max_idle_conn = 2
# Max conn setting default is 0 (mean not set)
;max_open_conn =
# Connection Max Lifetime default is 14400 (means 14400 seconds or 4 hours)
;conn_max_lifetime = 14400
# Set to true to log the sql calls and execution times.
log_queries =
#################################### Session ####################################
[session]
# Either "memory", "file", "redis", "mysql", "postgres", default is "file"
;provider = file
# Provider config options
# memory: not have any config yet
# file: session dir path, is relative to grafana data_path
# redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=grafana`
# mysql: go-sql-driver/mysql dsn config string, e.g. `user:password@tcp(127.0.0.1:3306)/database_name`
# postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable
;provider_config = sessions
# Session cookie name
;cookie_name = grafana_sess
# If you use session in https only, default is false
;cookie_secure = false
# Session life time, default is 86400
;session_life_time = 86400
#################################### Data proxy ###########################
[dataproxy]
# This enables data proxy logging, default is false
;logging = false
#################################### Analytics ####################################
[analytics]
# Server reporting, sends usage counters to stats.grafana.org every 24 hours.
# No ip addresses are being tracked, only simple counters to track
# running instances, dashboard and error counts. It is very helpful to us.
# Change this option to false to disable reporting.
;reporting_enabled = true
# Set to false to disable all checks to https://grafana.net
# for new vesions (grafana itself and plugins), check is used
# in some UI views to notify that grafana or plugin update exists
# This option does not cause any auto updates, nor send any information
# only a GET request to http://grafana.com to get latest versions
;check_for_updates = true
# Google Analytics universal tracking code, only enabled if you specify an id here
;google_analytics_ua_id =
#################################### Security ####################################
[security]
# default admin user, created on startup
;admin_user = admin
# default admin password, can be changed before first start of grafana, or in profile settings
;admin_password = admin
# used for signing
;secret_key = SW2YcwTIb9zpOOhoPsMm
# Auto-login remember days
;login_remember_days = 7
;cookie_username = grafana_user
;cookie_remember_name = grafana_remember
# disable gravatar profile images
;disable_gravatar = false
# data source proxy whitelist (ip_or_domain:port separated by spaces)
;data_source_proxy_whitelist =
# disable protection against brute force login attempts
;disable_brute_force_login_protection = false
#################################### Snapshots ###########################
[snapshots]
# snapshot sharing options
;external_enabled = true
;external_snapshot_url = https://snapshots-origin.raintank.io
;external_snapshot_name = Publish to snapshot.raintank.io
# remove expired snapshot
;snapshot_remove_expired = true
#################################### Dashboards History ##################
[dashboards]
# Number dashboard versions to keep (per dashboard). Default: 20, Minimum: 1
;versions_to_keep = 20
#################################### Users ###############################
[users]
# disable user signup / registration
;allow_sign_up = true
# Allow non admin users to create organizations
;allow_org_create = true
# Set to true to automatically assign new users to the default organization (id 1)
;auto_assign_org = true
# Default role new users will be automatically assigned (if disabled above is set to true)
;auto_assign_org_role = Viewer
# Background text for the user field on the login page
;login_hint = email or username
# Default UI theme ("dark" or "light")
;default_theme = dark
# External user management, these options affect the organization users view
;external_manage_link_url =
;external_manage_link_name =
;external_manage_info =