feat: compute txt and xml rfc-indexes in parallel#10542
feat: compute txt and xml rfc-indexes in parallel#10542jennifer-richards wants to merge 2 commits into
Conversation
kesara
left a comment
There was a problem hiding this comment.
You forgot to update the tests in ietf/api/tests_views_rpc.py
|
No objection, but consider whether having them fate-shared is a feature. |
That's a good point. The existing implementation doesn't have that feature, as the first computed is posted immediately. Hmm. |
|
New commit ties the fate of the two tasks together. This is something more of an exploration of making this work than it is a serious proposal. Instead of launching two independent tasks that each save their results to the R2 bucket, this uses Celery's Canvas tools to create the txt and xml in parallel, then gathers the results of those tasks together and if both succeeded, stores the results to the bucket. If either fails, it abandons them. Doing it this way requires storing the created files somewhere persistent. I've added a non-replicated blobdb Storage to serve as a shared temporary space for passing the files between celery tasks. This would need some additional error handling before actually accepting, so I'm converting to draft. (Also, I haven't looked at the tests yet. @kesara will be disappointed.) |
Refreshes the text and xml RFC indexes in separate tasks, which should normally result in their running in parallel. Speeds up the computation but ties up two celery workers.
First runs of the task in staging took about 7 minutes to compute both index files sequentially, 3 minutes for txt and 4 for xml. This should roughly half the time to update the indexes. This is a quick fix, and may well be temporary: I imagine we can do better with some optimization. In particular, the two formats are generated completely separately, which certainly involves a lot of redundant queries.