Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries for publishing metrics & health checks #4105

Open
strategicpause opened this issue Mar 5, 2024 · 0 comments
Open

Add retries for publishing metrics & health checks #4105

strategicpause opened this issue Mar 5, 2024 · 0 comments

Comments

@strategicpause
Copy link

Summary

This is a request to add retries in the case of the agent failing to publish metrics or health check messages to TACS.

Description

I noticed in my logs that I see cases where the ecs agent is emitting the message "Error publishing metrics" to the logs. From looking at the code it looks like the tcsClientServer.publishMessages is reading metrics & health metrics from a channel and then emitting an error if the metrics were unable to be published. This behavior will result in either metrics or health checks failed to be reported to TACS when there is an error sending a message to TACS. For example, this could occur when a WS connection is closed from the server, which results in the client initiating a new connection.

Expected Behavior

I would expect some kind of retry mechanism which would attempt to send the metrics or health checks over the connection. I don't see any retry logic further down the stack either ie: ClientServerImpl.MakeRequest.

Observed Behavior

The following log line:

05:20:14.273 | {"level":"warn","time":"2024-03-02T05:20:14.032","msg":"Error publishing metrics","error":"websocket: close sent"}

Environment Details

Running on AL2 with kernel 5.10

Supporting Log Snippets

See above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants