Skip to content

feat: compute txt and xml rfc-indexes in parallel#10542

Draft
jennifer-richards wants to merge 2 commits into
ietf-tools:mainfrom
jennifer-richards:parallel-indexes
Draft

feat: compute txt and xml rfc-indexes in parallel#10542
jennifer-richards wants to merge 2 commits into
ietf-tools:mainfrom
jennifer-richards:parallel-indexes

Conversation

@jennifer-richards

Copy link
Copy Markdown
Member

Refreshes the text and xml RFC indexes in separate tasks, which should normally result in their running in parallel. Speeds up the computation but ties up two celery workers.

First runs of the task in staging took about 7 minutes to compute both index files sequentially, 3 minutes for txt and 4 for xml. This should roughly half the time to update the indexes. This is a quick fix, and may well be temporary: I imagine we can do better with some optimization. In particular, the two formats are generated completely separately, which certainly involves a lot of redundant queries.

@kesara kesara left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to update the tests in ietf/api/tests_views_rpc.py

@rjsparks

Copy link
Copy Markdown
Member

No objection, but consider whether having them fate-shared is a feature.

@jennifer-richards

Copy link
Copy Markdown
Member Author

No objection, but consider whether having them fate-shared is a feature.

That's a good point. The existing implementation doesn't have that feature, as the first computed is posted immediately. Hmm.

@jennifer-richards

jennifer-richards commented Mar 13, 2026

Copy link
Copy Markdown
Member Author

New commit ties the fate of the two tasks together. This is something more of an exploration of making this work than it is a serious proposal. Instead of launching two independent tasks that each save their results to the R2 bucket, this uses Celery's Canvas tools to create the txt and xml in parallel, then gathers the results of those tasks together and if both succeeded, stores the results to the bucket. If either fails, it abandons them.

Doing it this way requires storing the created files somewhere persistent. I've added a non-replicated blobdb Storage to serve as a shared temporary space for passing the files between celery tasks.

This would need some additional error handling before actually accepting, so I'm converting to draft.

(Also, I haven't looked at the tests yet. @kesara will be disappointed.)

@jennifer-richards jennifer-richards marked this pull request as draft March 13, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants