More explanation of metrics #529

olicla · 2024-06-28T18:34:12Z

Describe the problem to be solved

Looking for a deeper explanation of metrics,

specifically:

what is the difference between spegel_advertised_keys and spegel_advertised_images?
what exactly is spegel_mirror_requests?

Proposed solution to the problem

No response

phillebaba · 2024-07-01T20:55:12Z

This should be documented but I can give a quick answer first.

The difference between spegel_advertised_images and spegel_advertised_keys is that the former is a counter of the amount of images that have been advertised while the latter is a counter for the amount of images and the amount of layers that have been advertised. An image consists of multiple layers of varying amounts which is why this is interesting to know.

The metric spegel_mirror_requests counts the inflight registry requests and if they are mirrored or not. This is useful to know to determine for example how many mirrored requests are failing or if requests that are being served from disk are failing.

olicla · 2024-07-01T21:03:26Z

Thank you much.

If this is indeed documented somewhere, please share, as I have been unable to find.

Maybe misunderstanding, but on its own, how can spegel_mirror_requests be used to determine requests being served from disk are failing?

olicla · 2024-07-01T22:33:27Z

Further, I think I understand what exactly it means for an image and key to be "advertised" but I don't really see any surrounding docs. This would be helpful to define more precisely, in my opinion.

phillebaba · 2024-07-02T19:29:24Z

You have the label cache=hit|miss on the metric spegel_mirror_requests which indicate if the request resolved locally or not. This way you can determine the hit rate of requests to your cache.

I am probably not going to spend time writing docs on the details of how Spegel works anytime soon. Mostly because it requires a lot of background in both how OCI artifacts are composed but also how Kademlia works and is used in Spegel. I just dont have time for that right now.

olicla · 2024-07-02T19:45:53Z

Fair enough,
are you able to explain exactly the difference between the source=external|internal label on spegel_mirror_requests?
thanks for your time

AhmedTremo · 2024-10-04T17:37:30Z

@phillebaba I had same question, what's does source=external|internal refer to?

phillebaba · 2024-10-06T21:49:18Z

The source refers to if the request is coming from the same node or a different node. Right now Spegel configures two mirrors. The first one is for the local instance of Spegel and the second one is for any other instance within the cluster. This is for situations where Spegel is not running on the node for a some reason.

Does this answer your question?

AhmedTremo · 2024-10-06T21:52:46Z

yes, thanks @phillebaba

AhmedTremo · 2024-10-09T00:08:07Z

I just want to note down here, for future reference that there is something wrong with internal/external definition because from experience i saw zero external hits even though the pulls where serviced from other nodes and not from the external registry.

Update: I think I misunderstood internal/external, I thought internal means the image was stored locally on the node, but I figured out it's not (if it was stored locally Kubelet wouldn't have asked containerd to pull :D), Internal here just means that the local Spegel registry was used to route the request "It could have routed it to any other node". (This behavior make sense now sorry for the misunderstanding)

AhmedTremo · 2024-10-15T21:47:14Z

I think now we still need a way to quantify how many requests were served from Spegel mirrors vs External Registry for monitoring, measuring speedup, and error tracking.

Update: I can simply consider the misses count as served from external registry :D (Need to ensure no double counting for HostPort/NodePort) Mirrors

phillebaba · 2024-10-17T08:07:50Z

You could also disable the external fallback if you like to. I added it because I thought it would be useful in situations where Spegel was not available for some reason. Initially my plan was to solve this to prefer local services but that did not work because of differences in iptables and ipvs. So the current fallback is not optimal.

Solutions like k3s does not need this because it runs at a process on the host instead of in a container.

If you end up creating a good Grafana dashboard to visualize this information it would be appreciated if you considered contributing the changes.

olicla added the enhancement New feature or request label Jun 28, 2024

phillebaba added the documentation Improvements or additions to documentation label Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More explanation of metrics #529

More explanation of metrics #529

olicla commented Jun 28, 2024

phillebaba commented Jul 1, 2024

olicla commented Jul 1, 2024

olicla commented Jul 1, 2024

phillebaba commented Jul 2, 2024

olicla commented Jul 2, 2024 •

edited

Loading

AhmedTremo commented Oct 4, 2024

phillebaba commented Oct 6, 2024

AhmedTremo commented Oct 6, 2024

AhmedTremo commented Oct 9, 2024 •

edited

Loading

AhmedTremo commented Oct 15, 2024 •

edited

Loading

phillebaba commented Oct 17, 2024

More explanation of metrics #529

More explanation of metrics #529

Comments

olicla commented Jun 28, 2024

Describe the problem to be solved

Proposed solution to the problem

phillebaba commented Jul 1, 2024

olicla commented Jul 1, 2024

olicla commented Jul 1, 2024

phillebaba commented Jul 2, 2024

olicla commented Jul 2, 2024 • edited Loading

AhmedTremo commented Oct 4, 2024

phillebaba commented Oct 6, 2024

AhmedTremo commented Oct 6, 2024

AhmedTremo commented Oct 9, 2024 • edited Loading

AhmedTremo commented Oct 15, 2024 • edited Loading

phillebaba commented Oct 17, 2024

olicla commented Jul 2, 2024 •

edited

Loading

AhmedTremo commented Oct 9, 2024 •

edited

Loading

AhmedTremo commented Oct 15, 2024 •

edited

Loading