Refactor TrackerEvents for event payload creation (close #291) by mscwilson · Pull Request #293 · snowplow/snowplow-java-tracker

Miranda Wilson (mscwilson) · 2022-01-17T14:17:14Z

For issue #291, as part of a larger refactoring of Tracker and Emitter to allow event sending retry and batching by byte size.

Tracker.track(event) now directly creates a TrackerPayload, which is passed to the Emitter for storage and sending. The Tracker event processing is currently within the main thread.

TrackerEvent has been removed. Event sending callbacks have been partially removed - the functionality is gone but the builder options remain for now.

AlexBenny

Great job! 👍
I think it's all good, I just left some minor comments!

About the SimpleEmitter: we need to check if the behaviour is exactly the one we can achieve with BatchEmitter with bufferOption=1. In that case we can remove it (maybe in a different GH issue).

About the callbacks: we can remove them in the builder too. I'm sure someone use them but we can't provide any particular useful info except the number of events sent and failed, in case. It's something I wouldn't add now btw.

About the event-processing thread: Now, the Emitter has a pool of thread to send the events. Then, we will have the Emitter with a single thread waiting the response of the last request. So, in theory, the Emitter should have a single running thread polling on EventStore and using the back-off-retry feature, meanwhile the Tracker should have the thread-pool and a new thread would be launched each time an event is tracked.

AlexBenny · 2022-01-17T16:31:39Z

+    /**
+     * Builds the final event context.
+     *
+     * @param entities the base event context
+     * @return the final event context json with many entities inside
+     */


Opinion sharing:
I don't see much value in overzealous javadoc on private methods (I know you just copied/pasted this code).
I feel that in private methods we can remove these when the method signature is already self-explanatory, or just leaving a simple comment like "Builds the final event context." without the other rows.
Not something to change, btw.

Miranda Wilson (mscwilson) · 2022-01-17T17:10:31Z

The main difference between SimpleEmitter and BatchEmitter is GET vs POST. So it's an opportunity to be opinionated :)

Miranda Wilson (mscwilson) · 2022-01-17T18:38:35Z

I've copied and pasted the threadpool (and threadfactory) from Emitter into Tracker. I left the Emitter threads as is for now.

Paul Boocock (paulboocock) · 2022-01-17T20:59:29Z

                .build());

        // Then
+        Thread.sleep(500);


I'm surprised we have to slow the tests down this much, half a second is quite the delay, especially when we're dealing with a mock emitter. This makes me think of the F in FIRST.

As an aside, this also makes me curious about the performance of this change, does is have similar perfromance properties to previous releases? A new threading paradigm could have some unexpected consequences that we'd be wise to test more deeply with some performance testing.

I didn't try shorter waits - 500ms was already being used for the Emitter tests. I'll see if it can be improved.

I tried shorter wait times, but they all failed tests occasionally, even 450ms.

Does this suggest that even in a "real" environment, the difference between created timestamp and sent timestamp is at least 500ms? That seems reasonably high to me. Might be out of scope here but worth investigating the cause if true.

Have we got any "real" Java tracker data lying around anywhere which we could compare timestamps?

This is out of scope here yeah. We're planning to finish this PR soon without looking into performance. The simple-console is broken currently, so I'll fix that next and then extend it for performance testing.

I don't think the test is a good parameter to compare the performance. The Thread.sleep is mostly needed to allow the Thread to start and do the job before the test is finished and it mostly depends by the machine where they are running on. I think, if we have concerns on this we should compare the two implementations in a real test with simple-console as suggested. However, we need to decide what we want to test. I guess we want to test how fast the track method can return back. You mentioned the difference between created_tstamp and sent_tstamp but even if the gap is longer the derived_tstamp would mitigate the discrepancy. We base our data modelling on the derived_tstamp, so even a longer gap between the two shouldn't be disruptive for the customer. IMO if we are really concerned about the performance of the track method we could add a buffer in the input consumed by the tracker thread-pool which would make the performance exactly the same as before.

Paul Boocock (paulboocock)

Looks good. Some areas like SimpleEmitter look much cleaner, removing those callbacks makes things a lot nicer.

AlexBenny

👍 LGTM

I think we could follow this plan:

proceed merging this PR,
fix the simple-console adding a building step in the CI workflow
test performance between this and previous version
(optional) improve performance if needed
add back-off-retry

Miranda Wilson added 4 commits January 17, 2022 12:12

Start changing Tracker to create TrackerPayloads not TrackerEvents

4e66eec

Copy payload creation methods into Tracker

9fec27c

Refactor Emitter to use TrackerPayload

355fb91

Remove TrackerEvent

8d332b5

Miranda Wilson (mscwilson) requested a review from AlexBenny January 17, 2022 14:17

Snowplow CLA bot (snowplowcla) added the cla:no [Auto generated] Snowplow Contributor License Agreement has not been signed. label Jan 17, 2022

Snowplow (snowplow) deleted a comment from Snowplow CLA bot (snowplowcla) Jan 17, 2022

Miranda Wilson (mscwilson) removed the cla:no [Auto generated] Snowplow Contributor License Agreement has not been signed. label Jan 17, 2022

AlexBenny reviewed Jan 17, 2022

View reviewed changes

Miranda Wilson added 2 commits January 17, 2022 17:45

Remove request callbacks

5db4405

Use threadpool inside Tracker

60ae054

Miranda Wilson (mscwilson) marked this pull request as ready for review January 17, 2022 18:38

Paul Boocock (paulboocock) reviewed Jan 17, 2022

View reviewed changes

Add comment about event ID and dtm timestamp

9fc40f3

AlexBenny approved these changes Jan 18, 2022

View reviewed changes

Miranda Wilson (mscwilson) merged commit 813203b into release/0.12.0 Jan 18, 2022

Miranda Wilson (mscwilson) deleted the issue/291-refactor_trackerevent branch January 18, 2022 13:07

Miranda Wilson (mscwilson) mentioned this pull request Jan 24, 2022

Enable shutdown of Tracker threads (close #297) #298

Merged

Miranda Wilson (mscwilson) mentioned this pull request Feb 23, 2022

Add retry to in-memory storage system (close #156) #305

Merged

Miranda Wilson (mscwilson) mentioned this pull request Mar 7, 2022

Return eventId from Tracker.track() (close #304) #310

Merged

Conversation

Miranda Wilson (mscwilson) commented Jan 17, 2022

Uh oh!

AlexBenny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlexBenny Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Miranda Wilson (mscwilson) commented Jan 17, 2022

Uh oh!

Miranda Wilson (mscwilson) commented Jan 17, 2022

Uh oh!

Paul Boocock (paulboocock) Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

Miranda Wilson (mscwilson) Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

Miranda Wilson (mscwilson) Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

Paul Boocock (paulboocock) Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Miranda Wilson (mscwilson) Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

Miranda Wilson (mscwilson) Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

AlexBenny Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

Paul Boocock (paulboocock) left a comment

Choose a reason for hiding this comment

Uh oh!

AlexBenny left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Paul Boocock (paulboocock) Jan 18, 2022 •

edited

Loading