Skip to content

Add retry to in-memory storage system (close #156)#305

Merged
Miranda Wilson (mscwilson) merged 21 commits intorelease/0.12.0from
issue/156-retry_on_failure
Mar 3, 2022
Merged

Add retry to in-memory storage system (close #156)#305
Miranda Wilson (mscwilson) merged 21 commits intorelease/0.12.0from
issue/156-retry_on_failure

Conversation

@mscwilson
Copy link
Copy Markdown
Contributor

@mscwilson Miranda Wilson (mscwilson) commented Feb 23, 2022

For issue #156.

This (breaking) change adds a backoff and retry mechanism to the Emitter. Previously, events that failed to send were returned, so that the user could choose to use a callback to track them again. Now, when a request fails, the events are returned to the event buffer for retry in a subsequent request.

When a batch of events (TrackerPayloads) is removed from the event buffer for sending, the events are now copied into a hashmap, and turned into a BatchPayload, a new wrapper class. BatchPayload stores the TrackerPayloads for the request along with the hashmap key. If the request is sent successfully, the batched events are deleted from the hashmap. If not, they are inserted back at the front of the event buffer. The aim is to send events approximately first in, first out.

The backoff mechanism uses the schedule() of ScheduledThreadPoolExecutor to delay all new requests after a failure has occurred. The initial delay is 50 ms, which increases exponentially with every subsequent failure. When one request succeeds, the retry delay returns to 0.

For testing purposes, it made sense here to allow users to set the InMemoryEventStore event buffer capacity on creation. The LinkedBlockingDeque by default has a capacity of Integer.MAX_VALUE.

Having a bufferCapacity field made the name bufferSize to mean batchSize extra confusing, so I changed it. It's not really part of retry though, so I might move these commits to their own issues. This name change is now issue #306.

While testing this, I discovered that processing Events into TrackerPayloads in the Tracker is very fast. This means that the Tracker threadpool I introduced in PR #293 isn't necessary. I've removed it (making PR #298 irrelevant).

Note to community contributor Paul Laturaze (@AcidFlow): we merged in your PR #259 in version 0.11.0 which allowed users to select their own ExecutorService. My current changes now require the ScheduledExecutorService interface. Any thoughts on this change?

@snowplowcla Snowplow CLA bot (snowplowcla) added the cla:no [Auto generated] Snowplow Contributor License Agreement has not been signed. label Feb 23, 2022
// Events that didn't send are inserted at the head of the eventBuffer
// for immediate resending.
if (!successfullySent) {
while (events.size() > 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit needs work. If the queue is full, and we want to drop the newer events, then this method must be able to hold the eventBuffer long enough to removeLast and offerFirst all the failed events. Otherwise more events will arrive and it will get full again.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can add a GH issue. This is not a regression as the same problem was present before. In theory we could check the size before to add considering not only the queue size but also the pending events. It seems a bit too much for this PR so I would just move this problem in a separate issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #308

@mscwilson Miranda Wilson (mscwilson) removed the cla:no [Auto generated] Snowplow Contributor License Agreement has not been signed. label Feb 23, 2022
Copy link
Copy Markdown
Collaborator

@AlexBenny AlexBenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I've just browsed the code and added some comments.
I'd like to run the code and look into it a bit deeper before to approve, hope you can wait just another day.

Comment thread examples/simple-console/src/main/java/com/snowplowanalytics/Main.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/BatchEmitter.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/BatchEmitter.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/BatchEmitter.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/BatchEmitter.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/BatchEmitter.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/EventStore.java Outdated
// Events that didn't send are inserted at the head of the eventBuffer
// for immediate resending.
if (!successfullySent) {
while (events.size() > 0) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can add a GH issue. This is not a regression as the same problem was present before. In theory we could check the size before to add considering not only the queue size but also the pending events. It seems a bit too much for this PR so I would just move this problem in a separate issue.

@AcidFlow
Copy link
Copy Markdown
Contributor

Hey Miranda Wilson (@mscwilson) !

I haven't look at the whole changelog yet as I don't have much time, but regarding your change now requiring a ScheduledExecutorService I think this should be fine. IIRC my previous contribution was to give more control on the Executor service so you could control its thread factory, pool and rejected execution handler so users would control the amount of thread they use, and could add metrics on rejection for instance. Using a ScheduledExecutorService still gives the user the same control with more reliability thanks to your changes :)

@mscwilson
Copy link
Copy Markdown
Contributor Author

Great, thanks Paul Laturaze (@AcidFlow)!

Copy link
Copy Markdown
Collaborator

@AlexBenny AlexBenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! 👍 👍 👍

I think it's a great result. Making the track method synchronous has reduced a bit the performance for the tracking side but shouldn't be a big problem. The benchmarking shows longer time for the track method. The track method synchronous helps to avoid too much complexity with two different thread-pools. The unique emitter thread-pool seems simple and fast (we don't have a benchmark here but you did some manual testing). I think the current design is a good compromise between performance and extensibility of the tracker.

I think we should deprecate the SimpleEmitter if we don't remove it.

Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/AbstractEmitter.java Outdated
* Set a custom ExecutorService to send http request.
* Set a custom ScheduledExecutorService to send http request.
*
* /!\ Be aware that calling `close()` on a BatchEmitter instance has a side-effect and will shutdown that ExecutorService.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a weird notation for warnings. I would update it to @implNote (https://nipafx.dev/javadoc-tags-apiNote-implSpec-implNote/)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately implNote is a non-standard javadoc tag! It breaks the build. I couldn't see an obvious way to specify it as a tag in the build.gradle, so for now I've added html into the javadoc comments instead.

Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/BatchEmitter.java Outdated
@mscwilson Miranda Wilson (mscwilson) merged commit d5d3f0f into release/0.12.0 Mar 3, 2022
@mscwilson Miranda Wilson (mscwilson) deleted the issue/156-retry_on_failure branch March 3, 2022 11:35
@mscwilson Miranda Wilson (mscwilson) restored the issue/156-retry_on_failure branch March 3, 2022 11:36
@mscwilson Miranda Wilson (mscwilson) deleted the issue/156-retry_on_failure branch March 3, 2022 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants