You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Azure Event Hub binding will continue working after receiving a 503 error from the storage act used for checkpointing.
Actual Behavior
The event hub binding dies after receiving a 503 error from table storage. The component does not appear to retry the calls to the storage act, and just gives up. Logs become filled with Error from event processor: the Processor has been stopped. Create a new instance to start processing again
We are using a standard storage act and not a premium act. I could switch to a premium act for better throughput, but I would still have concerns that the component could still fail to process messages if it experiences any sort of transient error.
Can we somehow configure a retry policy to restart this component instead of filling the logs with an error?
time="2024-10-17T19:48:21.031356514Z" level=warning msg="The default value for 'spec.metric.http.increasedCardinality' will change to 'false' in Dapr 1.15 or later" app_id=component-telemetry instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.runtime.diagnostics type=log ver=1.14.4
time="2024-10-17T19:48:24.243756578Z" level=warning msg="The default value for 'spec.metric.http.increasedCardinality' will change to 'false' in Dapr 1.15 or later" app_id=component-telemetry instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.runtime.http type=log ver=1.14.4
time="2024-10-18T01:25:52.104204839Z" level=error msg="Error processing events from partition client: error receiving events: client has been closed by user" app_id=component-telemetry component="component-telemetry-1 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:25:56.922083584Z" level=error msg="Error processing events from partition client: error receiving events: client has been closed by user" app_id=component-telemetry component="component-telemetry-0 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:26:02.932347006Z" level=error msg="Error from event processor: GET https://my-storage-act.blob.core.windows.net/dapr-component-telemetry\n--------------------------------------------------------------------------------\nRESPONSE 503: 503 The server is busy.\nERROR CODE: ServerBusy\n--------------------------------------------------------------------------------\n\ufeff<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>ServerBusy</Code><Message>The server is busy.\nRequestId:9e9a9f71-c01e-0015-15fc-202deb000000\nTime:2024-10-18T01:25:52.0107247Z</Message></Error>\n--------------------------------------------------------------------------------\n" app_id=component-telemetry component="component-telemetry-1 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:26:07.643955727Z" level=error msg="Error from event processor: GET https://my-storage-act.blob.core.windows.net/dapr-component-telemetry\n--------------------------------------------------------------------------------\nRESPONSE 503: 503 The server is busy.\nERROR CODE: ServerBusy\n--------------------------------------------------------------------------------\n\ufeff<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>ServerBusy</Code><Message>The server is busy.\nRequestId:9e9aad9c-c01e-0015-75fc-202deb000000\nTime:2024-10-18T01:25:56.8329114Z</Message></Error>\n--------------------------------------------------------------------------------\n" app_id=component-telemetry component="component-telemetry-0 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:26:07.933024175Z" level=error msg="Error from event processor: the Processor has been stopped. Create a new instance to start processing again" app_id=component-telemetry component="component-telemetry-1 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:26:12.644981572Z" level=error msg="Error from event processor: the Processor has been stopped. Create a new instance to start processing again" app_id=component-telemetry component="component-telemetry-0 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
Steps to Reproduce the Problem
Unable to reproduce on demand. Leave the component running long enough and this can happen.
The text was updated successfully, but these errors were encountered:
// Run handles the load balancing loop, blocking until the passed in context is cancelled
// or it encounters an unrecoverable error. On cancellation, it will return a nil error.
//
// This function should run for the lifetime of your application, or for as long as you want
// to continue to claim and process partitions.
//
// Once a Processor has been stopped it cannot be restarted and a new instance must
// be created.
The main loop has a 5 second sleep time before retrying, which I believe lined up with the timestamps in my logs.
This component has to be handle the fact that the errors returned by processor.Run are not recoverable, and this component needs to create a new event hub processor.
It might be as simple as moving the call to aeh.getProcessorForTopic into the subscriberLoop func
Expected Behavior
Azure Event Hub binding will continue working after receiving a 503 error from the storage act used for checkpointing.
Actual Behavior
The event hub binding dies after receiving a 503 error from table storage. The component does not appear to retry the calls to the storage act, and just gives up. Logs become filled with
Error from event processor: the Processor has been stopped. Create a new instance to start processing again
We are using a standard storage act and not a premium act. I could switch to a premium act for better throughput, but I would still have concerns that the component could still fail to process messages if it experiences any sort of transient error.
Can we somehow configure a retry policy to restart this component instead of filling the logs with an error?
Steps to Reproduce the Problem
Unable to reproduce on demand. Leave the component running long enough and this can happen.
The text was updated successfully, but these errors were encountered: