DevOps Buzz
Search…
Bash / Shell
Bitbucket
Distros
Elasticsearch
General
Guidelines / Standards
microk8s
Prometheus
RabbitMQ
VirtualBox
Monitoring
K8s monitoring.

metrics-server

Clone metrics-server.
1
git clone https://github.com/kubernetes-incubator/metrics-server.git
2
cd metrics-server
Copied!
Edit resource-reader.yaml.
1
nano deploy/1.8+/resource-reader.yaml
Copied!
Edit the resources section as follows:
1
...
2
resources:
3
- pods
4
- nodes
5
- namespaces
6
- nodes/stats
7
...
Copied!
Edit metrics-server-deployment.yaml
1
nano deploy/1.8+/metrics-server-deployment.yaml
Copied!
Edit as follows:
1
...
2
containers:
3
- name: metrics-server
4
image: k8s.gcr.io/metrics-server-amd64:v0.3.3
5
command:
6
- /metrics-server
7
- --kubelet-insecure-tls
8
- --kubelet-preferred-address-types=InternalIP
9
imagePullPolicy: Always
10
...
Copied!
Deploy it.
1
kubectl apply -f deploy/1.8+/
Copied!
Wait a few minutes and run:
1
kubectl top node
2
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" |jq
3
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/YOUR-NAMESPACE/pods" |jq
Copied!

References

Rancher

1
docker run \
2
-tid \
3
--name=rancher \
4
--restart=unless-stopped \
5
-p 80:80 -p 443:443 \
6
rancher/rancher:latest
Copied!
Add a cluster and run on you cluster the manifest it generates.

Audit

SSH to your master node.
Create a policy file:
1
mkdir /etc/kubernetes/policies
2
nano /etc/kubernetes/policies/audit-policy.yaml
Copied!
Paste:
1
# Log all requests at the Metadata level.
2
apiVersion: audit.k8s.io/v1
3
kind: Policy
4
rules:
5
- level: Metadata
Copied!
Edit K8s API server config file:
1
nano /etc/kubernetes/manifests/kube-apiserver.yaml
Copied!
Add:
1
...
2
spec:
3
containers:
4
- command:
5
- kube-apiserver
6
...
7
- --audit-policy-file=/etc/kubernetes/policies/audit-policy.yaml
8
- --audit-log-path=/var/log/apiserver/audit.log
9
- --audit-log-format=json
10
...
11
volumeMounts:
12
...
13
- mountPath: /etc/kubernetes/policies
14
name: policies
15
readOnly: true
16
...
17
volumes:
18
...
19
- hostPath:
20
path: /etc/kubernetes/policies
21
type: DirectoryOrCreate
22
name: policies
Copied!
Restart kubelet:
1
systemctl restart kubelet
Copied!
If the changes did not take effect, stop the API server docker container (it will be started automatically):
1
docker stop $(docker ps | grep "k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system" | awk '{print $1}')
Copied!
Tail the log file:
1
docker exec -it $(docker ps |grep "k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system" | awk '{print $1}') tail -f /var/log/apiserver/audit.log
Copied!

References

Prometheus

Create namespace

1
kubectl create namespace monitoring
Copied!

Create Prometheus config

1
nano prometheus.yml
Copied!
Paste:
1
global:
2
scrape_interval: 15s
3
external_labels:
4
monitor: 'codelab-monitor'
5
scrape_configs:
6
7
- job_name: 'prometheus'
8
scrape_interval: 5s
9
static_configs:
10
- targets: ['localhost:9090']
Copied!
Create a ConfigMap from the config file:
1
kubectl -n monitoring create configmap cm-prometheus --from-file prometheus.yml
Copied!

If you need to update the ConfigMap...

Edit the file:
1
nano prometheus.yml
Copied!
Update the ConfigMap:
1
kubectl -n monitoring \
2
create configmap cm-prometheus \
3
--from-file=prometheus.yml \
4
-o yaml --dry-run | kubectl apply -f -
Copied!
Now we need to roll out the new ConfigMap. By the time of this writing (2019-02-15), this subjects seems to be a little tricky. Please find some options bellow:
Roll out ConfigMap: option 1 - scale deployment
This is the only way that will "always" work, although there will be a few seconds of downtime:
1
kubectl -n monitoring scale deployment/prometheus --replicas=0
2
kubectl -n monitoring scale deployment/prometheus --replicas=1
Copied!
Roll out ConfigMap: option 2 - patch the deployment
1
kubectl -n monitoring \
2
patch deployment prometheus \
3
-p '{"spec":{"template":{"metadata":{"labels":{"date":"2019-02-15"}}}}}'
Copied!
Roll out ConfigMap: option 3 - create a new ConfigMap
Create a new ConfigMap:
1
kubectl -n monitoring \
2
create configmap cm-prometheus-new \
3
--from-file=prometheus.yml \
4
-o yaml --dry-run | kubectl apply -f -
Copied!
Edit the deployment:
1
export EDITOR=nano
2
kubectl -n monitoring edit deployments prometheus
Copied!
Edit volumes.configMap.name and use cm-prometheus-new. The change will force K8s to create new pods with the new config.
If by any reason you deployed Prometheus with hostNetwork: true, options 2 and 3 will return this error:
0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.
In this case, use option 1.
If you need more info regarding rolling out ConfigMaps, please refer to: https://stackoverflow.com/questions/37317003/restart-pods-when-configmap-updates-in-kubernetes

Deploy Prometheus

SSH to the node which will host Prometheus and create a directory to persist its data:
1
mkdir -p /storage/storage-001/mnt-prometheus
2
chown -R nobody:nogroup /storage/storage-001/mnt-prometheus
Copied!
Deploy Prometheus:
1
kubectl create -f - <<EOF
2
3
apiVersion: apps/v1
4
kind: Deployment
5
metadata:
6
name: prometheus
7
namespace: monitoring
8
labels:
9
app: prometheus
10
spec:
11
replicas: 1
12
selector:
13
matchLabels:
14
app: prometheus
15
template:
16
metadata:
17
labels:
18
app: prometheus
19
spec:
20
securityContext:
21
runAsUser: 65534
22
fsGroup: 65534
23
containers:
24
- name: prometheus
25
image: prom/prometheus:latest
26
27
ports:
28
- containerPort: 9090
29
30
args:
31
- --config.file=/etc/prometheus/prometheus.yml
32
- --storage.tsdb.path=/prometheus
33
- --web.console.libraries=/usr/share/prometheus/console_libraries
34
- --web.console.templates=/usr/share/prometheus/consoles
35
- --storage.tsdb.retention.time=90d
36
37
volumeMounts:
38
- name: config-volume
39
mountPath: /etc/prometheus/prometheus.yml
40
subPath: prometheus.yml
41
42
- name: mnt-prometheus
43
mountPath: /prometheus
44
45
volumes:
46
- name: config-volume
47
configMap:
48
name: cm-prometheus
49
50
- name: mnt-prometheus
51
hostPath:
52
path: /storage/storage-001/mnt-prometheus
53
54
nodeSelector:
55
kubernetes.io/hostname: k8snode
56
57
EOF
Copied!

Expose Prometheus

1
kubectl create -f - <<EOF
2
3
---
4
apiVersion: v1
5
kind: Service
6
metadata:
7
labels:
8
app: prometheus
9
name: srv-prometheus
10
namespace: monitoring
11
spec:
12
externalTrafficPolicy: Cluster
13
ports:
14
- nodePort: 30909
15
port: 9090
16
protocol: TCP
17
targetPort: 9090
18
selector:
19
app: prometheus
20
sessionAffinity: None
21
type: NodePort
22
23
EOF
Copied!

Test the deployment

On your workstation access http://YOUR.CLUSTER.IP:30909
Alternatively you can port forward:
1
export NAMESPACE=monitoring
2
kubectl port-forward \
3
-n $NAMESPACE \
4
$(kubectl -n $NAMESPACE get pods |grep "prometheus-" | awk '{print $1}') \
5
9090
Copied!
Then access http://localhost:9090

References

Manifest example

Grafana

Create namespace

1
kubectl create namespace monitoring
Copied!

Create Grafana config

1
nano grafana.ini
Copied!
Paste:
1
# ConfigMap
2
##################### Grafana Configuration Example #####################
3
#
4
# Everything has defaults so you only need to uncomment things you want to
5
# change
6
7
# possible values : production, development
8
;app_mode = production
9
10
# instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty
11
;instance_name = ${HOSTNAME}
12
13
#################################### Paths ####################################
14
[paths]
15
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
16
;data = /var/lib/grafana
17
18
# Temporary files in `data` directory older than given duration will be removed
19
;temp_data_lifetime = 24h
20
21
# Directory where grafana can store logs
22
;logs = /var/log/grafana
23
24
# Directory where grafana will automatically scan and look for plugins
25
;plugins = /var/lib/grafana/plugins
26
27
# folder that contains provisioning config files that grafana will apply on startup and while running.
28
;provisioning = conf/provisioning
29
30
#################################### Server ####################################
31
[server]
32
# Protocol (http, https, socket)
33
;protocol = http
34
35
# The ip address to bind to, empty will bind to all interfaces
36
;http_addr =
37
38
# The http port to use
39
;http_port = 3000
40
41
# The public facing domain name used to access grafana from a browser
42
;domain = localhost
43
44
# Redirect to correct domain if host header does not match domain
45
# Prevents DNS rebinding attacks
46
;enforce_domain = false
47
48
# The full public facing url you use in browser, used for redirects and emails
49
# If you use reverse proxy and sub path specify full url (with sub path)
50
;root_url = http://localhost:3000
51
52
# Log web requests
53
;router_logging = false
54
55
# the path relative working path
56
;static_root_path = public
57
58
# enable gzip
59
;enable_gzip = false
60
61
# https certs & key file
62
;cert_file =
63
;cert_key =
64
65
# Unix socket path
66
;socket =
67
68
#################################### Database ####################################
69
[database]
70
# You can configure the database connection by specifying type, host, name, user and password
71
# as separate properties or as on string using the url properties.
72
73
# Either "mysql", "postgres" or "sqlite3", it's your choice
74
;type = sqlite3
75
;host = 127.0.0.1:3306
76
;name = grafana
77
;user = root
78
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
79
;password =
80
81
# Use either URL or the previous fields to configure the database
82
# Example: mysql://user:[email protected]:port/database
83
;url =
84
85
# For "postgres" only, either "disable", "require" or "verify-full"
86
;ssl_mode = disable
87
88
# For "sqlite3" only, path relative to data_path setting
89
;path = grafana.db
90
91
# Max idle conn setting default is 2
92
;max_idle_conn = 2
93
94
# Max conn setting default is 0 (mean not set)
95
;max_open_conn =
96
97
# Connection Max Lifetime default is 14400 (means 14400 seconds or 4 hours)
98
;conn_max_lifetime = 14400
99
100
# Set to true to log the sql calls and execution times.
101
log_queries =
102
103
#################################### Session ####################################
104
[session]
105
# Either "memory", "file", "redis", "mysql", "postgres", default is "file"
106
;provider = file
107
108
# Provider config options
109
# memory: not have any config yet
110
# file: session dir path, is relative to grafana data_path
111
# redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=grafana`
112
# mysql: go-sql-driver/mysql dsn config string, e.g. `user:[email protected](127.0.0.1:3306)/database_name`
113
# postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable
114
;provider_config = sessions
115
116
# Session cookie name
117
;cookie_name = grafana_sess
118
119
# If you use session in https only, default is false
120
;cookie_secure = false
121
122
# Session life time, default is 86400
123
;session_life_time = 86400
124
125
#################################### Data proxy ###########################
126
[dataproxy]
127
128
# This enables data proxy logging, default is false
129
;logging = false
130
131
#################################### Analytics ####################################
132
[analytics]
133
# Server reporting, sends usage counters to stats.grafana.org every 24 hours.
134
# No ip addresses are being tracked, only simple counters to track
135
# running instances, dashboard and error counts. It is very helpful to us.
136
# Change this option to false to disable reporting.
137
;reporting_enabled = true
138
139
# Set to false to disable all checks to https://grafana.net
140
# for new vesions (grafana itself and plugins), check is used
141
# in some UI views to notify that grafana or plugin update exists
142
# This option does not cause any auto updates, nor send any information
143
# only a GET request to http://grafana.com to get latest versions
144
;check_for_updates = true
145
146
# Google Analytics universal tracking code, only enabled if you specify an id here
147
;google_analytics_ua_id =
148
149
#################################### Security ####################################
150
[security]
151
# default admin user, created on startup
152
;admin_user = admin
153
154
# default admin password, can be changed before first start of grafana, or in profile settings
155
;admin_password = admin
156
157
# used for signing
158
;secret_key = SW2YcwTIb9zpOOhoPsMm
159
160
# Auto-login remember days
161
;login_remember_days = 7
162
;cookie_username = grafana_user
163
;cookie_remember_name = grafana_remember
164
165
# disable gravatar profile images
166
;disable_gravatar = false
167
168
# data source proxy whitelist (ip_or_domain:port separated by spaces)
169
;data_source_proxy_whitelist =
170
171
# disable protection against brute force login attempts
172
;disable_brute_force_login_protection = false
173
174
#################################### Snapshots ###########################
175
[snapshots]
176
# snapshot sharing options
177
;external_enabled = true
178
;external_snapshot_url = https://snapshots-origin.raintank.io
179
;external_snapshot_name = Publish to snapshot.raintank.io
180
181
# remove expired snapshot
182
;snapshot_remove_expired = true
183
184
#################################### Dashboards History ##################
185
[dashboards]
186
# Number dashboard versions to keep (per dashboard). Default: 20, Minimum: 1
187
;versions_to_keep = 20
188
189
#################################### Users ###############################
190
[users]
191
# disable user signup / registration
192
;allow_sign_up = true
193
194
# Allow non admin users to create organizations
195
;allow_org_create = true
196
197
# Set to true to automatically assign new users to the default organization (id 1)
198
;auto_assign_org = true
199
200
# Default role new users will be automatically assigned (if disabled above is set to true)
201
;auto_assign_org_role = Viewer
202
203
# Background text for the user field on the login page
204
;login_hint = email or username
205
206
# Default UI theme ("dark" or "light")
207
;default_theme = dark
208
209
# External user management, these options affect the organization users view
210
;external_manage_link_url =
211
;external_manage_link_name =
212
;external_manage_info =
213
214
# Viewers can edit/inspect dashboard settings in the browser. But not save the dashboard.
215
;viewers_can_edit = false
216
217
[auth]
218
# Set to true to disable (hide) the login form, useful if you use OAuth, defaults to false
219
;disable_login_form = false
220
221
# Set to true to disable the signout link in the side menu. useful if you use auth.proxy, defaults to false
222
;disable_signout_menu = false
223
224
# URL to redirect the user to after sign out
225
;signout_redirect_url =
226
227
# Set to true to attempt login with OAuth automatically, skipping the login screen.
228
# This setting is ignored if multiple OAuth providers are configured.
229
;oauth_auto_login = false
230
231
#################################### Anonymous Auth ##########################
232
[auth.anonymous]
233
# enable anonymous access
234
;enabled = false
235
236
# specify organization name that should be used for unauthenticated users
237
;org_name = Main Org.
238
239
# specify role for unauthenticated users
240
;org_role = Viewer
241
242
#################################### Github Auth ##########################
243
[auth.github]
244
;enabled = false
245
;allow_sign_up = true
246
;client_id = some_id
247
;client_secret = some_secret
248
;scopes = user:email,read:org
249
;auth_url = https://github.com/login/oauth/authorize
250
;token_url = https://github.com/login/oauth/access_token
251
;api_url = https://api.github.com/user
252
;team_ids =
253
;allowed_organizations =
254
255
#################################### Google Auth ##########################
256
[auth.google]
257
;enabled = false
258
;allow_sign_up = true
259
;client_id = some_client_id
260
;client_secret = some_client_secret
261
;scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
262
;auth_url = https://accounts.google.com/o/oauth2/auth
263
;token_url = https://accounts.google.com/o/oauth2/token
264
;api_url = https://www.googleapis.com/oauth2/v1/userinfo
265
;allowed_domains =
266
267
#################################### Generic OAuth ##########################
268
[auth.generic_oauth]
269
;enabled = false
270
;name = OAuth
271
;allow_sign_up = true
272
;client_id = some_id
273
;client_secret = some_secret
274
;scopes = user:email,read:org
275
;auth_url = https://foo.bar/login/oauth/authorize
276
;token_url = https://foo.bar/login/oauth/access_token
277
;api_url = https://foo.bar/user
278
;team_ids =
279
;allowed_organizations =
280
;tls_skip_verify_insecure = false
281
;tls_client_cert =
282
;tls_client_key =
283
;tls_client_ca =
284
285
#################################### Grafana.com Auth ####################
286
[auth.grafana_com]
287
;enabled = false
288
;allow_sign_up = true
289
;client_id = some_id
290
;client_secret = some_secret
291
;scopes = user:email
292
;allowed_organizations =
293
294
#################################### Auth Proxy ##########################
295
[auth.proxy]
296
;enabled = false
297
;header_name = X-WEBAUTH-USER
298
;header_property = username
299
;auto_sign_up = true
300
;ldap_sync_ttl = 60
301
;whitelist = 192.168.1.1, 192.168.2.1
302
;headers = Email:X-User-Email, Name:X-User-Name
303
304
#################################### Basic Auth ##########################
305
[auth.basic]
306
;enabled = true
307
308
#################################### Auth LDAP ##########################
309
[auth.ldap]
310
;enabled = false
311
;config_file = /etc/grafana/ldap.toml
312
;allow_sign_up = true
313
314
#################################### SMTP / Emailing ##########################
315
[smtp]
316
;enabled = false
317
;host = localhost:25
318
;user =
319
# If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
320
;password =
321
;cert_file =
322
;key_file =
323
;skip_verify = false
324
;from_address = [email protected]
325
;from_name = Grafana
326
# EHLO identity in SMTP dialog (defaults to instance_name)
327
;ehlo_identity = dashboard.example.com
328
329
[emails]
330
;welcome_email_on_sign_up = false
331
332
#################################### Logging ##########################
333
[log]
334
# Either "console", "file", "syslog". Default is console and file
335
# Use space to separate multiple modes, e.g. "console file"
336
;mode = console file
337
338
# Either "debug", "info", "warn", "error", "critical", default is "info"
339
;level = info
340
341
# optional settings to set different levels for specific loggers. Ex filters = sqlstore:debug
342
;filters =
343
344
# For "console" mode only
345
[log.console]
346
;level =
347
348
# log line format, valid options are text, console and json
349
;format = console
350
351
# For "file" mode only
352
[log.file]
353
;level =
354
355
# log line format, valid options are text, console and json
356
;format = text
357
358
# This enables automated log rotate(switch of following options), default is true
359
;log_rotate = true
360
361
# Max line number of single file, default is 1000000
362
;max_lines = 1000000
363
364
# Max size shift of single file, default is 28 means 1 << 28, 256MB
365
;max_size_shift = 28
366
367
# Segment log daily, default is true
368
;daily_rotate = true
369
370
# Expired days of log file(delete after max days), default is 7
371
;max_days = 7
372
373
[log.syslog]
374
;level =
375
376
# log line format, valid options are text, console and json
377
;format = text
378
379
# Syslog network type and address. This can be udp, tcp, or unix. If left blank, the default unix endpoints will be used.
380
;network =
381
;address =
382
383
# Syslog facility. user, daemon and local0 through local7 are valid.
384
;facility =
385
386
# Syslog tag. By default, the process' argv[0] is used.
387
;tag =
388
389
#################################### Alerting ############################
390
[alerting]
391
# Disable alerting engine & UI features
392
;enabled = true
393
# Makes it possible to turn off alert rule execution but alerting UI is visible
394
;execute_alerts = true
395
396
# Default setting for new alert rules. Defaults to categorize error and timeouts as alerting. (alerting, keep_state)
397
;error_or_timeout = alerting
398
399
# Default setting for how Grafana handles nodata or null values in alerting. (alerting, no_data, keep_state, ok)
400
;nodata_or_nullvalues = no_data
401
402
# Alert notifications can include images, but rendering many images at the same time can overload the server
403
# This limit will protect the server from render overloading and make sure notifications are sent out quickly
404
;concurrent_render_limit = 5
405
406
#################################### Explore #############################
407
[explore]
408
# Enable the Explore section
409
;enabled = false
410
411
#################################### Internal Grafana Metrics ##########################
412
# Metrics available at HTTP API Url /metrics
413
[metrics]
414
# Disable / Enable internal metrics
415
;enabled = true
416
417
# Publish interval
418
;interval_seconds = 10
419
420
# Send internal metrics to Graphite
421
[metrics.graphite]
422
# Enable by setting the address setting (ex localhost:2003)
423
;address =
424
;prefix = prod.grafana.%(instance_name)s.
425
426
#################################### Distributed tracing ############
427
[tracing.jaeger]
428
# Enable by setting the address sending traces to jaeger (ex localhost:6831)
429
;address = localhost:6831
430
# Tag that will always be included in when creating new spans. ex (tag1:value1,tag2:value2)
431
;always_included_tag = tag1:value1
432
# Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote
433
;sampler_type = const
434
# jaeger samplerconfig param
435
# for "const" sampler, 0 or 1 for always false/true respectively
436
# for "probabilistic" sampler, a probability between 0 and 1
437
# for "rateLimiting" sampler, the number of spans per second
438
# for "remote" sampler, param is the same as for "probabilistic"
439
# and indicates the initial sampling rate before the actual one
440
# is received from the mothership
441
;sampler_param = 1
442
443
#################################### Grafana.com integration ##########################
444
# Url used to import dashboards directly from Grafana.com
445
[grafana_com]
446
;url = https://grafana.com
447
448
#################################### External image storage ##########################
449
[external_image_storage]
450
# Used for uploading images to public servers so they can be included in slack/email messages.
451
# you can choose between (s3, webdav, gcs, azure_blob, local)
452
;provider =
453
454
[external_image_storage.s3]
455
;bucket =
456
;region =
457
;path =
458
;access_key =
459
;secret_key =
460
461
[external_image_storage.webdav]
462
;url =
463
;public_url =
464
;username =
465
;password =
466
467
[external_image_storage.gcs]
468
;key_file =
469
;bucket =
470
;path =
471
472
[external_image_storage.azure_blob]
473
;account_name =
474
;account_key =
475
;container_name =
476
477
[external_image_storage.local]
478
# does not require any configuration
479
480
[rendering]
481
# Options to configure external image rendering server like https://github.com/grafana/grafana-image-renderer
482
;server_url =
483
;callback_url =
484
485
[enterprise]
486
# Path to a valid Grafana Enterprise license.jwt file
487
;license_path =
488
Copied!
Create a ConfigMap from the config file:
1
kubectl -n monitoring create configmap cm-grafana --from-file grafana.ini
Copied!

Create Grafana secrets

Generate base64 strings:
1
# This will be the admin-username. Copy the output.
2
echo -n 'admin' | base64
3
4
# This will be the admin-password. Copy the output.
5
echo -n 'PUT-YOUR-PASSWORD-HERE' | base64
Copied!
Create Secret:
1
kubectl create -f - <<EOF
2
3
apiVersion: v1
4
kind: Secret
5
metadata:
6
name: grafana
7
namespace: monitoring
8
type: Opaque
9
data:
10
admin-username: PASTE admin-username base64 HERE
11
admin-password: PASTE admin-password base64 HERE
12
13
EOF
Copied!
To retrieve admin username and password, run:
1
kubectl -n monitoring \
2
get secret grafana \
3
-o jsonpath="{.data.admin-username}" \
4
| base64 --decode ; echo
5
6
kubectl -n monitoring \
7
get secret grafana \
8
-o jsonpath="{.data.admin-password}" \
9
| base64 --decode ; echo
Copied!

Deploy Grafana

SSH to the node which will host Prometheus and create a directory to persist its data:
1
mkdir -p /storage/storage-001/mnt-grafana
2
chown -R nobody:nogroup /storage/storage-001/mnt-grafana
Copied!
Deploy Grafana:
1
kubectl create -f - <<EOF
2
3
apiVersion: apps/v1
4
kind: Deployment
5
metadata:
6
name: grafana
7
namespace: monitoring
8
labels:
9
app: grafana
10
spec:
11
replicas: 1
12
selector:
13
matchLabels:
14
app: grafana
15
template:
16
metadata:
17
labels:
18
app: grafana
19
spec:
20
securityContext:
21
runAsUser: 65534 #nobody
22
fsGroup: 65534 #nogroup
23
containers:
24
- name: grafana
25
image: grafana/grafana
26
27
ports:
28
- containerPort: 3000
29
30
env:
31
- name: GF_AUTH_BASIC_ENABLED
32
value: "true"
33
34
- name: GF_SECURITY_ADMIN_USER
35
#value: "admin"
36
valueFrom:
37
secretKeyRef:
38
name: grafana
39
key: admin-username
40
41
- name: GF_SECURITY_ADMIN_PASSWORD
42
#value: "PLAIN-PWD"
43
valueFrom:
44
secretKeyRef:
45
name: grafana
46
key: admin-password
47
48
#- name: GF_AUTH_ANONYMOUS_ENABLED
49
# value: "false"
50
51
# If you want allow anonymous admin acess use the following
52
# config instead
53
#- name: GF_AUTH_BASIC_ENABLED
54
# value: "false"
55
#- name: GF_AUTH_ANONYMOUS_ENABLED
56
# value: "true"
57
#- name: GF_AUTH_ANONYMOUS_ORG_ROLE
58
# value: Admin
59
60
volumeMounts:
61
- name: config-volume
62
mountPath: /etc/grafana/grafana.ini
63
subPath: grafana.ini
64
65
- name: mnt-grafana
66
mountPath: /var/lib/grafana
67
68
volumes:
69
- name: config-volume
70
configMap:
71
name: cm-grafana
72
73
- name: mnt-grafana
74
hostPath:
75
path: /storage/storage-001/mnt-grafana
76
77
nodeSelector:
78
kubernetes.io/hostname: k8snode
79
80
EOF
Copied!

Expose Grafana

1
kubectl create -f - <<EOF
2
3
---
4
apiVersion: v1
5
kind: Service
6
metadata:
7
labels:
8
app: grafana
9
name: srv-grafana
10
namespace: monitoring
11
spec:
12
externalTrafficPolicy: Cluster
13
ports:
14
- nodePort: 30000
15
port: 3000
16
protocol: TCP
17
targetPort: 3000
18
selector:
19
app: grafana
20
sessionAffinity: None
21
type: NodePort
22
23
EOF
Copied!

Test the deployment

On your workstation access http://YOUR.CLUSTER.IP:30000
Alternatively you can port forward:
1
export NAMESPACE=monitoring
2
kubectl port-forward \
3
-n $NAMESPACE \
4
$(kubectl -n $NAMESPACE get pods |grep "grafana-" | awk '{print $1}') \
5
3000
Copied!
Then access http://localhost:9090

Dashboards

Prometheus exporters

node-exporter

Create a DaemonSet to ensure all nodes have node-exporter:
1
kubectl create -f - <<EOF
2
3
apiVersion: extensions/v1beta1
4
kind: DaemonSet
5
metadata:
6
name: node-exporter
7
namespace: monitoring
8
labels:
9
name: node-exporter
10
spec:
11
template:
12
metadata:
13
labels:
14
name: node-exporter
15
annotations:
16
prometheus.io/scrape: "true"
17
prometheus.io/port: "9100"
18
spec:
19
hostPID: true
20
hostIPC: true
21
hostNetwork: true
22
containers:
23
- ports:
24
- containerPort: 9100
25
protocol: TCP
26
resources:
27
requests:
28
cpu: 0.15
29
securityContext:
30
privileged: true
31
image: prom/node-exporter:latest
32
args:
33
- --path.procfs
34
- /host/proc
35
- --path.sysfs
36
- /host/sys
37
- --collector.filesystem.ignored-mount-points
38
- '"^/(sys|proc|dev|host|etc)($|/)"'
39
name: node-exporter
40
volumeMounts:
41
- name: dev
42
mountPath: /host/dev
43
- name: proc
44
mountPath: /host/proc
45
- name: sys
46
mountPath: /host/sys
47
- name: rootfs
48
mountPath: /rootfs
49
volumes:
50
- name: proc
51
hostPath:
52
path: /proc
53
- name: dev
54
hostPath:
55
path: /dev
56
- name: sys
57
hostPath:
58
path: /sys
59
- name: rootfs
60
hostPath:
61
path: /
62
63
EOF
Copied!

Add node-exporter scraper to Prometheus

Edit Prometheus config file:
1
nano prometheus.yml
Copied!
Add the scraper:
1
- job_name: 'node_exporter_test'
2
static_configs:
3
- targets: ['YOUR-NODE-IP:9100']
4
#relabel_configs:
5
# - source_labels: [__address__]
6
# target_label: instance
7
# replacement: "NEW-LABEL"
8
#relabel_configs:
9
# - source_labels: [__address__]
10
# target_label: __address__
11
# replacement: k8snode:9100
12
#metric_relabel_configs:
13
# - source_labels: ["__name__"]
14
# target_label: "job"
15
# replacement: "job"
16
Copied!

Grafana dashboard

ID: 1860

kube-state-metrics

Deploy dependencies:
1
git clone https://github.com/kubernetes/kube-state-metrics.git
2
kubectl apply -f kube-state-metrics/kubernetes/
Copied!

Expose kube-state-metrics

1
kubectl create -f - <<EOF
2
3
---
4
apiVersion: v1
5
kind: Service
6
metadata:
7
labels:
8
app: prometheus
9
name: srv-custom-kube-state-metrics
10
namespace: kube-system
11
spec:
12
externalTrafficPolicy: Cluster
13
ports:
14
- nodePort: 32767
15
name: metrics
16
port: 8080
17
protocol: TCP
18
targetPort: 8080
19
- nodePort: 32766
20
name: telemetry
21
port: 8081
22
protocol: TCP
23
targetPort: 8081
24
selector:
25
k8s-app: kube-state-metrics
26
sessionAffinity: None
27
type: NodePort
28
29
EOF
Copied!

Add Prometheus scraper

1
- job_name: 'kube-state-metrics-metrics'
2
static_configs:
3
- targets: ['NODE.IP:32767']
4
5
- job_name: 'kube-state-metrics-telemetry'
6
static_configs:
7
- targets: ['NODE.IP:32766']
Copied!
Update the ConfigMap:
1
kubectl -n monitoring \
2
create configmap cm-prometheus \
3
--from-file=prometheus.yml \
4
-o yaml --dry-run | kubectl apply -f -
Copied!
Roll out ConfigMap:
1
kubectl -n monitoring scale deployment/prometheus --replicas=0
2
kubectl -n monitoring scale deployment/prometheus --replicas=1
Copied!

Grafana dashboard

Dashboard ID: 7249
Dashboard ID: 747

Grafana panels

1
{
2
"columns": [],
3
"fontSize": "100%",
4
"gridPos": {
5
"h": 9,
6
"w": 12,
7
"x": 0,
8
"y": 0
9
},
10
"id": 2,
11
"links": [],
12
"pageSize": null,
13
"scroll": true,
14