Add retry to in-memory storage system (close #156) by mscwilson · Pull Request #305 · snowplow/snowplow-java-tracker

Miranda Wilson (mscwilson) · 2022-02-23T15:52:53Z

For issue #156.

This (breaking) change adds a backoff and retry mechanism to the Emitter. Previously, events that failed to send were returned, so that the user could choose to use a callback to track them again. Now, when a request fails, the events are returned to the event buffer for retry in a subsequent request.

When a batch of events (TrackerPayloads) is removed from the event buffer for sending, the events are now copied into a hashmap, and turned into a BatchPayload, a new wrapper class. BatchPayload stores the TrackerPayloads for the request along with the hashmap key. If the request is sent successfully, the batched events are deleted from the hashmap. If not, they are inserted back at the front of the event buffer. The aim is to send events approximately first in, first out.

The backoff mechanism uses the schedule() of ScheduledThreadPoolExecutor to delay all new requests after a failure has occurred. The initial delay is 50 ms, which increases exponentially with every subsequent failure. When one request succeeds, the retry delay returns to 0.

For testing purposes, it made sense here to allow users to set the InMemoryEventStore event buffer capacity on creation. The LinkedBlockingDeque by default has a capacity of Integer.MAX_VALUE.

Having a bufferCapacity field made the name bufferSize to mean batchSize extra confusing, so I changed it. It's not really part of retry though, so I might move these commits to their own issues. This name change is now issue #306.

While testing this, I discovered that processing Events into TrackerPayloads in the Tracker is very fast. This means that the Tracker threadpool I introduced in PR #293 isn't necessary. I've removed it (making PR #298 irrelevant).

Note to community contributor Paul Laturaze (@AcidFlow): we merged in your PR #259 in version 0.11.0 which allowed users to select their own ExecutorService. My current changes now require the ScheduledExecutorService interface. Any thoughts on this change?

Miranda Wilson (mscwilson) · 2022-02-23T15:59:19Z

+        // Events that didn't send are inserted at the head of the eventBuffer
+        // for immediate resending.
+        if (!successfullySent) {
+            while (events.size() > 0) {


This bit needs work. If the queue is full, and we want to drop the newer events, then this method must be able to hold the eventBuffer long enough to removeLast and offerFirst all the failed events. Otherwise more events will arrive and it will get full again.

yeah, we can add a GH issue. This is not a regression as the same problem was present before. In theory we could check the size before to add considering not only the queue size but also the pending events. It seems a bit too much for this PR so I would just move this problem in a separate issue.

AlexBenny

Awesome! I've just browsed the code and added some comments.
I'd like to run the code and look into it a bit deeper before to approve, hope you can wait just another day.

AlexBenny · 2022-02-24T10:39:20Z

+        // Events that didn't send are inserted at the head of the eventBuffer
+        // for immediate resending.
+        if (!successfullySent) {
+            while (events.size() > 0) {


yeah, we can add a GH issue. This is not a regression as the same problem was present before. In theory we could check the size before to add considering not only the queue size but also the pending events. It seems a bit too much for this PR so I would just move this problem in a separate issue.

Paul Laturaze (AcidFlow) · 2022-02-24T14:35:52Z

Hey Miranda Wilson (@mscwilson) !

I haven't look at the whole changelog yet as I don't have much time, but regarding your change now requiring a ScheduledExecutorService I think this should be fine. IIRC my previous contribution was to give more control on the Executor service so you could control its thread factory, pool and rejected execution handler so users would control the amount of thread they use, and could add metrics on rejection for instance. Using a ScheduledExecutorService still gives the user the same control with more reliability thanks to your changes :)

Miranda Wilson (mscwilson) · 2022-02-24T16:15:46Z

Great, thanks Paul Laturaze (@AcidFlow)!

AlexBenny

Great job! 👍 👍 👍

I think it's a great result. Making the track method synchronous has reduced a bit the performance for the tracking side but shouldn't be a big problem. The benchmarking shows longer time for the track method. The track method synchronous helps to avoid too much complexity with two different thread-pools. The unique emitter thread-pool seems simple and fast (we don't have a benchmark here but you did some manual testing). I think the current design is a good compromise between performance and extensibility of the tracker.

I think we should deprecate the SimpleEmitter if we don't remove it.

AlexBenny · 2022-03-02T09:31:24Z

-         * Set a custom ExecutorService to send http request.
+         * Set a custom ScheduledExecutorService to send http request.
         *
         *  /!\ Be aware that calling `close()` on a BatchEmitter instance has a side-effect and will shutdown that ExecutorService.


This seems a weird notation for warnings. I would update it to @implNote (https://nipafx.dev/javadoc-tags-apiNote-implSpec-implNote/)

Unfortunately implNote is a non-standard javadoc tag! It breaks the build. I couldn't see an obvious way to specify it as a tag in the build.gradle, so for now I've added html into the javadoc comments instead.

Miranda Wilson added 10 commits February 23, 2022 11:58

Refactor InMemoryEventStore

aaaf014

Add a hashmap to eventStore for events currently being sent

e509bff

Add basic retry

912e019

Add benchmarking test for InMemoryEventStore design

ec731dc

Use LinkedBlockingDeque in InMemoryEventStore

20cfaa9

Alter simple-console for throughput test

3ff266f

Remove Tracker threadpool

4839798

Use scheduled request Runnable to add retry backoff time

9094799

Enclose Emitter request method in try catch block

26cfa0f

Allow event buffer max capacity to be configured

2242726

Miranda Wilson (mscwilson) requested a review from AlexBenny February 23, 2022 15:52

Snowplow CLA bot (snowplowcla) added the cla:no [Auto generated] Snowplow Contributor License Agreement has not been signed. label Feb 23, 2022

Miranda Wilson (mscwilson) commented Feb 23, 2022

View reviewed changes

Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/BatchEmitter.java Outdated

Miranda Wilson (mscwilson) commented Feb 23, 2022

View reviewed changes

Snowplow (snowplow) deleted a comment from Snowplow CLA bot (snowplowcla) Feb 23, 2022

Miranda Wilson (mscwilson) removed the cla:no [Auto generated] Snowplow Contributor License Agreement has not been signed. label Feb 23, 2022

Miranda Wilson added 2 commits February 23, 2022 17:38

Fix simple-console demo

9390345

Remove unused benchmark class

4ae3d40

Miranda Wilson (mscwilson) force-pushed the issue/156-retry_on_failure branch from af8db73 to 4ae3d40 Compare February 23, 2022 17:40

Restore SimpleEmitter functionality

3b02062

AlexBenny reviewed Feb 24, 2022

View reviewed changes

Miranda Wilson (mscwilson) mentioned this pull request Feb 24, 2022

Preserve oldest events when event buffer is full #308

Open

Use atomicLong more effectively

d68018d

Remove unnecessary 'this'es

3adc24e

AlexBenny approved these changes Mar 2, 2022

View reviewed changes

Miranda Wilson added 3 commits March 2, 2022 17:16

Tidy up Emitter code

af8c625

Update BatchEmitter javadoc comments

dadf07f

Remove implNote javadoc tags

73ceabb

Miranda Wilson added 3 commits March 2, 2022 19:17

Tidy up tests

e02468f

Rename EventStore getEventBatch to getEventsBatch

436a4ff

Merge branch 'release/0.12.0' into issue/156-retry_on_failure

6bf2077

Miranda Wilson (mscwilson) merged commit d5d3f0f into release/0.12.0 Mar 3, 2022

Miranda Wilson (mscwilson) deleted the issue/156-retry_on_failure branch March 3, 2022 11:35

Miranda Wilson (mscwilson) restored the issue/156-retry_on_failure branch March 3, 2022 11:36

Miranda Wilson (mscwilson) deleted the issue/156-retry_on_failure branch March 3, 2022 11:45

Conversation

Miranda Wilson (mscwilson) commented Feb 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Miranda Wilson (mscwilson) Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

AlexBenny Feb 24, 2022

Choose a reason for hiding this comment

Uh oh!

Miranda Wilson (mscwilson) Feb 24, 2022

Choose a reason for hiding this comment

Uh oh!

AlexBenny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexBenny Feb 24, 2022

Choose a reason for hiding this comment

Uh oh!

Paul Laturaze (AcidFlow) commented Feb 24, 2022

Uh oh!

Miranda Wilson (mscwilson) commented Feb 24, 2022

Uh oh!

AlexBenny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlexBenny Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

Miranda Wilson (mscwilson) Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Miranda Wilson (mscwilson) commented Feb 23, 2022 •

edited

Loading