feat: compute txt and xml rfc-indexes in parallel by jennifer-richards · Pull Request #10542 · ietf-tools/datatracker

jennifer-richards · 2026-03-13T05:42:03Z

Refreshes the text and xml RFC indexes in separate tasks, which should normally result in their running in parallel. Speeds up the computation but ties up two celery workers.

First runs of the task in staging took about 7 minutes to compute both index files sequentially, 3 minutes for txt and 4 for xml. This should roughly half the time to update the indexes. This is a quick fix, and may well be temporary: I imagine we can do better with some optimization. In particular, the two formats are generated completely separately, which certainly involves a lot of redundant queries.

kesara

You forgot to update the tests in ietf/api/tests_views_rpc.py

rjsparks · 2026-03-13T15:26:43Z

No objection, but consider whether having them fate-shared is a feature.

jennifer-richards · 2026-03-13T17:11:12Z

No objection, but consider whether having them fate-shared is a feature.

That's a good point. The existing implementation doesn't have that feature, as the first computed is posted immediately. Hmm.

jennifer-richards · 2026-03-13T23:34:55Z

New commit ties the fate of the two tasks together. This is something more of an exploration of making this work than it is a serious proposal. Instead of launching two independent tasks that each save their results to the R2 bucket, this uses Celery's Canvas tools to create the txt and xml in parallel, then gathers the results of those tasks together and if both succeeded, stores the results to the bucket. If either fails, it abandons them.

Doing it this way requires storing the created files somewhere persistent. I've added a non-replicated blobdb Storage to serve as a shared temporary space for passing the files between celery tasks.

This would need some additional error handling before actually accepting, so I'm converting to draft.

(Also, I haven't looked at the tests yet. @kesara will be disappointed.)

feat: compute txt and xml rfc-indexes in parallel

44b9387

jennifer-richards requested review from kesara and rjsparks March 13, 2026 05:42

kesara approved these changes Mar 13, 2026

View reviewed changes

kesara requested changes Mar 13, 2026

View reviewed changes

refactor: more sophisticated parallelism

ccc2656

jennifer-richards marked this pull request as draft March 13, 2026 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: compute txt and xml rfc-indexes in parallel#10542

feat: compute txt and xml rfc-indexes in parallel#10542
jennifer-richards wants to merge 2 commits into
ietf-tools:mainfrom
jennifer-richards:parallel-indexes

jennifer-richards commented Mar 13, 2026

Uh oh!

kesara left a comment

Uh oh!

rjsparks commented Mar 13, 2026

Uh oh!

jennifer-richards commented Mar 13, 2026

Uh oh!

jennifer-richards commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jennifer-richards commented Mar 13, 2026

Uh oh!

kesara left a comment

Choose a reason for hiding this comment

Uh oh!

rjsparks commented Mar 13, 2026

Uh oh!

jennifer-richards commented Mar 13, 2026

Uh oh!

jennifer-richards commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jennifer-richards commented Mar 13, 2026 •

edited

Loading