Refactor TrackerEvents for event payload creation (close #291)#293
Conversation
AlexBenny
left a comment
There was a problem hiding this comment.
Great job! 👍
I think it's all good, I just left some minor comments!
About the SimpleEmitter: we need to check if the behaviour is exactly the one we can achieve with BatchEmitter with bufferOption=1. In that case we can remove it (maybe in a different GH issue).
About the callbacks: we can remove them in the builder too. I'm sure someone use them but we can't provide any particular useful info except the number of events sent and failed, in case. It's something I wouldn't add now btw.
About the event-processing thread: Now, the Emitter has a pool of thread to send the events. Then, we will have the Emitter with a single thread waiting the response of the last request. So, in theory, the Emitter should have a single running thread polling on EventStore and using the back-off-retry feature, meanwhile the Tracker should have the thread-pool and a new thread would be launched each time an event is tracked.
| /** | ||
| * Builds the final event context. | ||
| * | ||
| * @param entities the base event context | ||
| * @return the final event context json with many entities inside | ||
| */ |
There was a problem hiding this comment.
Opinion sharing:
I don't see much value in overzealous javadoc on private methods (I know you just copied/pasted this code).
I feel that in private methods we can remove these when the method signature is already self-explanatory, or just leaving a simple comment like "Builds the final event context." without the other rows.
Not something to change, btw.
|
The main difference between SimpleEmitter and BatchEmitter is GET vs POST. So it's an opportunity to be opinionated :) |
|
I've copied and pasted the threadpool (and threadfactory) from Emitter into Tracker. I left the Emitter threads as is for now. |
| .build()); | ||
|
|
||
| // Then | ||
| Thread.sleep(500); |
There was a problem hiding this comment.
I'm surprised we have to slow the tests down this much, half a second is quite the delay, especially when we're dealing with a mock emitter. This makes me think of the F in FIRST.
As an aside, this also makes me curious about the performance of this change, does is have similar perfromance properties to previous releases? A new threading paradigm could have some unexpected consequences that we'd be wise to test more deeply with some performance testing.
There was a problem hiding this comment.
I didn't try shorter waits - 500ms was already being used for the Emitter tests. I'll see if it can be improved.
There was a problem hiding this comment.
I tried shorter wait times, but they all failed tests occasionally, even 450ms.
There was a problem hiding this comment.
Does this suggest that even in a "real" environment, the difference between created timestamp and sent timestamp is at least 500ms? That seems reasonably high to me. Might be out of scope here but worth investigating the cause if true.
There was a problem hiding this comment.
Have we got any "real" Java tracker data lying around anywhere which we could compare timestamps?
There was a problem hiding this comment.
This is out of scope here yeah. We're planning to finish this PR soon without looking into performance. The simple-console is broken currently, so I'll fix that next and then extend it for performance testing.
There was a problem hiding this comment.
I don't think the test is a good parameter to compare the performance. The Thread.sleep is mostly needed to allow the Thread to start and do the job before the test is finished and it mostly depends by the machine where they are running on. I think, if we have concerns on this we should compare the two implementations in a real test with simple-console as suggested. However, we need to decide what we want to test. I guess we want to test how fast the track method can return back. You mentioned the difference between created_tstamp and sent_tstamp but even if the gap is longer the derived_tstamp would mitigate the discrepancy. We base our data modelling on the derived_tstamp, so even a longer gap between the two shouldn't be disruptive for the customer. IMO if we are really concerned about the performance of the track method we could add a buffer in the input consumed by the tracker thread-pool which would make the performance exactly the same as before.
Paul Boocock (paulboocock)
left a comment
There was a problem hiding this comment.
Looks good. Some areas like SimpleEmitter look much cleaner, removing those callbacks makes things a lot nicer.
AlexBenny
left a comment
There was a problem hiding this comment.
👍 LGTM
I think we could follow this plan:
- proceed merging this PR,
- fix the simple-console adding a building step in the CI workflow
- test performance between this and previous version
- (optional) improve performance if needed
- add back-off-retry
For issue #291, as part of a larger refactoring of Tracker and Emitter to allow event sending retry and batching by byte size.
Tracker.track(event)now directly creates a TrackerPayload, which is passed to the Emitter for storage and sending. The Tracker event processing is currently within the main thread.TrackerEvent has been removed. Event sending callbacks have been partially removed - the functionality is gone but the builder options remain for now.