You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a request to add retries in the case of the agent failing to publish metrics or health check messages to TACS.
Description
I noticed in my logs that I see cases where the ecs agent is emitting the message "Error publishing metrics" to the logs. From looking at the code it looks like the tcsClientServer.publishMessages is reading metrics & health metrics from a channel and then emitting an error if the metrics were unable to be published. This behavior will result in either metrics or health checks failed to be reported to TACS when there is an error sending a message to TACS. For example, this could occur when a WS connection is closed from the server, which results in the client initiating a new connection.
Expected Behavior
I would expect some kind of retry mechanism which would attempt to send the metrics or health checks over the connection. I don't see any retry logic further down the stack either ie: ClientServerImpl.MakeRequest.
Observed Behavior
The following log line:
05:20:14.273 | {"level":"warn","time":"2024-03-02T05:20:14.032","msg":"Error publishing metrics","error":"websocket: close sent"}
Environment Details
Running on AL2 with kernel 5.10
Supporting Log Snippets
See above.
The text was updated successfully, but these errors were encountered:
Summary
This is a request to add retries in the case of the agent failing to publish metrics or health check messages to TACS.
Description
I noticed in my logs that I see cases where the ecs agent is emitting the message "Error publishing metrics" to the logs. From looking at the code it looks like the
tcsClientServer.publishMessages
is reading metrics & health metrics from a channel and then emitting an error if the metrics were unable to be published. This behavior will result in either metrics or health checks failed to be reported to TACS when there is an error sending a message to TACS. For example, this could occur when a WS connection is closed from the server, which results in the client initiating a new connection.Expected Behavior
I would expect some kind of retry mechanism which would attempt to send the metrics or health checks over the connection. I don't see any retry logic further down the stack either ie:
ClientServerImpl.MakeRequest
.Observed Behavior
The following log line:
Environment Details
Running on AL2 with kernel 5.10
Supporting Log Snippets
See above.
The text was updated successfully, but these errors were encountered: