Improve new control plane dashboards #206

metalmatze · 2019-05-17T12:59:48Z

With #205 merged we have a few new dashboards for the control plane (apiserver, scheduler, proxy, kubelet).

Here are a few TODOs outline for the future:

We should unclutter the names and separate components and workload dashboards more.
We should make sure that components alerts are represented in dashboards. Example: KubeAPIErrorsHigh needs to be visible in the apiserver dashboard. Reuse recording rule.
Reuse more recording rules for control plane dashboards (lots of similar queries across dashboards).
Go metrics about components should probably be separated. Either own dashboard or no need at all? Let's discuss.
Add more from the discussion below

brancz · 2019-05-17T13:35:22Z

Incoming/Outgoing HTTP requests differentiation is not super obvious right now, I think explicitly labeling those would be good.

povilasv · 2019-05-17T16:46:44Z

Go metrics about components should probably be separated. Either own dashboard or no need at all? Let's discuss.

I find those really useful when you are debugging a control plane failure, OOMs / crashloops etc.

The way I envisioned this is that SREs get a single view where they look for symptoms, like "cpu is going thru the roof and we are getting tons of requests, " type of deal.

Agreed on all other points.

metalmatze added enhancement New feature or request help wanted Extra attention is needed labels May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve new control plane dashboards #206

Improve new control plane dashboards #206

metalmatze commented May 17, 2019

brancz commented May 17, 2019

povilasv commented May 17, 2019

Improve new control plane dashboards #206

Improve new control plane dashboards #206

Comments

metalmatze commented May 17, 2019

brancz commented May 17, 2019

povilasv commented May 17, 2019