Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(healthcheck): Update Hubble Readiness & Liveliness probes #1048

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mereta
Copy link
Contributor

@mereta mereta commented Nov 20, 2024

Description

Updating Readiness probe to use hubble status command
Adding a liveliness probe to use hubble status command

Related Issue

#1047

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Checks functioning as expected. Tested with an invalid tcp address to make the gRPC connection for hubble status and containers get recycled as expected.
image

Additional Notes


@mereta mereta requested a review from a team as a code owner November 20, 2024 18:05
- hubble
- status
initialDelaySeconds: 30
periodSeconds: 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does the hubble status cmd take to respond at scale? If the command is generally quick and fits within the 30s window, perfect, but if it's frequently greater than 30s for a healthy response then restarting the pod may do more harm than good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to run some scale tests. TBC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref: https://github.com/cilium/cilium/blob/c7347199af4fe426c87fc0f5572f493559259ea6/pkg/hubble/observer/local_observer.go#L210

The status should be pretty static w.r.t. scale. Should have the ring buffer size (which is fixed) as upper limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some adjustments needed to run scale tests for hubble atm

@MikeZappa87
Copy link

Do any conditions exist with Hubble status where the exit code is 0 however the retina process itself is in a bad state?

@anubhabMajumdar
Copy link
Contributor

anubhabMajumdar commented Nov 25, 2024

Do any conditions exist with Hubble status where the exit code is 0 however the retina process itself is in a bad state?

Following the code of Hubble status here I think there's no such scenario if hubble status -ocompact is not used

@MikeZappa87
Copy link

Do any conditions exist with Hubble status where the exit code is 0 however the retina process itself is in a bad state?

Following the code of Hubble status here I think there's no such scenario if hubble status -ocompact is not used

If retina is deadlocked how does the current approach determine this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants