Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More explanation of metrics #529

Open
olicla opened this issue Jun 28, 2024 · 11 comments
Open

More explanation of metrics #529

olicla opened this issue Jun 28, 2024 · 11 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@olicla
Copy link

olicla commented Jun 28, 2024

Describe the problem to be solved

Looking for a deeper explanation of metrics,

specifically:

  • what is the difference between spegel_advertised_keys and spegel_advertised_images?

  • what exactly is spegel_mirror_requests?

Proposed solution to the problem

No response

@olicla olicla added the enhancement New feature or request label Jun 28, 2024
@phillebaba phillebaba added the documentation Improvements or additions to documentation label Jul 1, 2024
@phillebaba
Copy link
Member

This should be documented but I can give a quick answer first.

The difference between spegel_advertised_images and spegel_advertised_keys is that the former is a counter of the amount of images that have been advertised while the latter is a counter for the amount of images and the amount of layers that have been advertised. An image consists of multiple layers of varying amounts which is why this is interesting to know.

The metric spegel_mirror_requests counts the inflight registry requests and if they are mirrored or not. This is useful to know to determine for example how many mirrored requests are failing or if requests that are being served from disk are failing.

@olicla
Copy link
Author

olicla commented Jul 1, 2024

Thank you much.

If this is indeed documented somewhere, please share, as I have been unable to find.

Maybe misunderstanding, but on its own, how can spegel_mirror_requests be used to determine requests being served from disk are failing?

@olicla
Copy link
Author

olicla commented Jul 1, 2024

Further, I think I understand what exactly it means for an image and key to be "advertised" but I don't really see any surrounding docs. This would be helpful to define more precisely, in my opinion.

@phillebaba
Copy link
Member

You have the label cache=hit|miss on the metric spegel_mirror_requests which indicate if the request resolved locally or not. This way you can determine the hit rate of requests to your cache.

I am probably not going to spend time writing docs on the details of how Spegel works anytime soon. Mostly because it requires a lot of background in both how OCI artifacts are composed but also how Kademlia works and is used in Spegel. I just dont have time for that right now.

@olicla
Copy link
Author

olicla commented Jul 2, 2024

Fair enough,
are you able to explain exactly the difference between the source=external|internal label on spegel_mirror_requests?
thanks for your time

@AhmedTremo
Copy link
Contributor

@phillebaba I had same question, what's does source=external|internal refer to?

@phillebaba
Copy link
Member

The source refers to if the request is coming from the same node or a different node. Right now Spegel configures two mirrors. The first one is for the local instance of Spegel and the second one is for any other instance within the cluster. This is for situations where Spegel is not running on the node for a some reason.

Does this answer your question?

@AhmedTremo
Copy link
Contributor

yes, thanks @phillebaba

@AhmedTremo
Copy link
Contributor

AhmedTremo commented Oct 9, 2024

I just want to note down here, for future reference that there is something wrong with internal/external definition because from experience i saw zero external hits even though the pulls where serviced from other nodes and not from the external registry.

Update: I think I misunderstood internal/external, I thought internal means the image was stored locally on the node, but I figured out it's not (if it was stored locally Kubelet wouldn't have asked containerd to pull :D), Internal here just means that the local Spegel registry was used to route the request "It could have routed it to any other node". (This behavior make sense now sorry for the misunderstanding)

@AhmedTremo
Copy link
Contributor

AhmedTremo commented Oct 15, 2024

I think now we still need a way to quantify how many requests were served from Spegel mirrors vs External Registry for monitoring, measuring speedup, and error tracking.

Update: I can simply consider the misses count as served from external registry :D (Need to ensure no double counting for HostPort/NodePort) Mirrors

@phillebaba
Copy link
Member

You could also disable the external fallback if you like to. I added it because I thought it would be useful in situations where Spegel was not available for some reason. Initially my plan was to solve this to prefer local services but that did not work because of differences in iptables and ipvs. So the current fallback is not optimal.

Solutions like k3s does not need this because it runs at a process on the host instead of in a container.

If you end up creating a good Grafana dashboard to visualize this information it would be appreciated if you considered contributing the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants