Support data limit when reading a batch with TopicReaderSync #431

JasonRammoray · 2024-06-09T21:10:27Z

Allow a client to control the amount of data it receives, when reading a batch through TopicReaderSync.

Pull request type

Please check the type of change your PR introduces:

What is the current behavior?

TopicReaderSync.receive_batch ignores max_messages and max_bytes parameters, which means a client has no control over the amount of received data.

Issue Number: 365

What is the new behavior?

TopicReaderSync.receive_batch now takes max_messages and max_bytes into account.

Other information

Decisions made:

Enforce at least one message and at least a single byte on a batch.
Since a batch could theoretically be empty, don't use _commit_get_partition_session, and rather copy _partition_session from the batch to a new (sliced) batch.
Make no slice in case if neither max_messages, nor max_bytes were provided.

… a sync topic reader

rekby · 2024-06-10T08:16:14Z

ydb/_topic_reader/topic_reader_sync.py

+        max_messages: typing.Union[int, None] = None,
+        max_bytes: typing.Union[int, None] = None,
+    ) -> Union[PublicBatch, None]:
+        all_amount = float("inf")


Why do you need all_amount as float const?

The rationale is that by default we have no limits on a data flow.
I'm not sure if UInt64 max value is sufficient enough, therefore, I chose infinity, which happens to be a float.

rekby · 2024-06-10T08:17:06Z

ydb/_topic_reader/topic_reader_sync.py

+            max_bytes = all_amount
+
+        is_batch_set = batch is not None
+        is_msg_limit_set = max_messages < all_amount


why do you need all_amount instead check max_messages is not None?

Because max_messages is being set to all_amount (up above) in case if it hasn't been provided (e.g. it's None).

rekby · 2024-06-10T08:30:37Z

ydb/_topic_reader/topic_reader_sync.py

@@ -86,6 +86,57 @@ def async_wait_message(self) -> concurrent.futures.Future:

        return self._caller.unsafe_call_with_future(self._async_reader.wait_message())

+    def _make_batch_slice(


IMPORTANT

After applying the function, a caller will lose messages that have been trimmed from the batch and will not see these messages in the read session. A server does not allow to skip messages during commit. This can cause problems:

If the caller commits messages with ack, the software will hang up forever (because the server will wait for skipped messages before ack the commit).

If the caller commits messages without ack. After reconnecting all messages after the last successfully commit (first batch with cut messages) will be re-read. A log of extra work is required to re-read these messages and real progress will be very slow.

If the progress is saved on the user's side and messages are not committed to the SDK, the will be lost and cannot be recovered.

Ok, I see, though I'm not quite sure I fully understand the path to a solution.
What was the expected approach to take?

Support max_{messages,bytes} parameters, when reading a batch through…

acbd95d

… a sync topic reader

JasonRammoray changed the title ~~Support max_{messages,bytes} parameters, when reading a batch through…~~ Support data limit when reading a batch with TopicReaderSync Jun 9, 2024

rekby requested changes Jun 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support data limit when reading a batch with TopicReaderSync #431

Support data limit when reading a batch with TopicReaderSync #431

JasonRammoray commented Jun 9, 2024 •

edited

Loading

rekby Jun 10, 2024

JasonRammoray Jun 10, 2024

rekby Jun 10, 2024

JasonRammoray Jun 10, 2024

rekby Jun 10, 2024

JasonRammoray Jun 10, 2024

		@@ -86,6 +86,57 @@ def async_wait_message(self) -> concurrent.futures.Future:

		return self._caller.unsafe_call_with_future(self._async_reader.wait_message())

		def _make_batch_slice(

Support data limit when reading a batch with TopicReaderSync #431

Are you sure you want to change the base?

Support data limit when reading a batch with TopicReaderSync #431

Conversation

JasonRammoray commented Jun 9, 2024 • edited Loading

Pull request type

What is the current behavior?

What is the new behavior?

Other information

rekby Jun 10, 2024

Choose a reason for hiding this comment

JasonRammoray Jun 10, 2024

Choose a reason for hiding this comment

rekby Jun 10, 2024

Choose a reason for hiding this comment

JasonRammoray Jun 10, 2024

Choose a reason for hiding this comment

rekby Jun 10, 2024

Choose a reason for hiding this comment

JasonRammoray Jun 10, 2024

Choose a reason for hiding this comment

JasonRammoray commented Jun 9, 2024 •

edited

Loading