Skip to content

Support for creating TrackerPayload asynchronously (close #222)#226

Merged
Paul Boocock (paulboocock) merged 18 commits into
release/0.10.0from
issue/222-async_payload
Jun 8, 2020
Merged

Support for creating TrackerPayload asynchronously (close #222)#226
Paul Boocock (paulboocock) merged 18 commits into
release/0.10.0from
issue/222-async_payload

Conversation

@paulboocock

@paulboocock Paul Boocock (paulboocock) commented May 5, 2020

Copy link
Copy Markdown
Contributor

This PR was initially based on and takes inspiration from the fork within issue #222.

I've removed the need for a new API method and I've made the async payload creation the default behaviour when using Tracker.track().

I've also gone a little beyond simply moving the payload, I've also improved the asynchronous behaviour of the Tracker so that we can remove the syhconrized keyword from the BatchEmiiter.emit() method. This was tricky to solve due to the relationship between emit(), flushBuffer and close(), in the end I've ended up with two LinkedBlockingQueues, in a producer (emit()) and consumer (getBufferConsumerRunnable()) model, that I believe will give better throughput and if nothing else allows the emit() method to return much quicker, so the host application can continue with it's work and not have to wait for the java-tracker. I've done quite a bit of testing of this but if you see (or even fear) any potential pitfalls then please let me know.

There is also one breaking API change with the release. Rather than exposing the internal TrackerPayload on the API, I now have the ability to return the original Event so that is what I've done. This means the standard track(Event) method can be used to retry failed sends.

I've left my commit stream intact, I'll rebase this before merging into the release branch.

@paulboocock Paul Boocock (paulboocock) marked this pull request as ready for review May 9, 2020 21:48
@paulboocock

Copy link
Copy Markdown
Contributor Author

There are two possible options for emit() in BatchEmitter. It's unlikely either of these worst case scenarios will ever happen, the queue is unbounded so the capacity is Integer.MAX_VALUE but if events are being added faster than they can be consumed then the buffer may become full.

Option 1: Will block on put() if eventBuffer becomes full. This means we won't lose events but it would block the thread that is adding the event, potentially the main thread of the hosting application. This could lead to a negative impact on the hosting application.

    public void emit(final TrackerEvent event) {
        try {
            eventBuffer.put(event); //Add to buffer and quickly return back to application
        } catch (Exception e) {
            LOGGER.error("Unable to add event to emitter", e);
        }
    }

Option 2: Will not block on offer() if eventBuffer becomes full but will throw away the event. This means we will lose events but it would never block the thread that is adding the event, removing the risk of blocking the main thread of the hosting application.

    public void emit(final TrackerEvent event) {
        boolean result = eventBuffer.offer(event); // Add to buffer and quickly return back to application
        
        if (!result) {
            LOGGER.error("Unable to add event to emitter, emitter buffer is full");
        }
    }

I've switched to Option 2, whilst either scenario is unlikely, I believe this is most similar to previous behaviour and the risk of tracking impacting the main purpose of the application doesn't feel like the right choice.

Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/BatchEmitter.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/emitter/SimpleEmitter.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/payload/TrackerEvent.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/payload/TrackerEvent.java Outdated
Comment thread src/main/java/com/snowplowanalytics/snowplow/tracker/Subject.java Outdated
@paulboocock

Paul Boocock (paulboocock) commented Jun 6, 2020

Copy link
Copy Markdown
Contributor Author

Made the changes based on your suggestions. Nice ideas!

Removing the payload cache also had an unintended consequence in that that cache accidently made the BatchEmiiterTests pass, as the STM parameter was added to the cached payload reference when in reality it shouldn't have been added to these caches payloads.
This made the tests fail as the Maps didn't equal each other, so I've brought in Hamcrest to do some better assertions on the Map entries, ignoring extra entries that might be present in the captured payload and checking all the values sent from the original event are present. This leaves other tests to test additinal params, like STM.

I've also removed the ability to mutate the tracker properties once the tracker has been constructed (small breaking API change but this is 0.x). This seems particularly important now that the payloads are created asynchronously and changing tracker parameters could lead to events already passed to the tracker ended up being created based on parameters that have been adjusted after they have been passed to the tradcker. If users want to change tracker parameters, we will suggest constructing a new instance of Tracker using the TrackerBuilder.

@istreeter Ian Streeter (istreeter) left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very neat now. No more comments to add 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Switch Emitter to use threadsafe collection for buffer

3 participants