Skip to content

Commit 3705bed

Browse files
jennifer-richardsNGPixelrjsparks
authored
feat: Celery support and asynchronous draft submission API (ietf-tools#4037)
* ci: add Dockerfile and action to build celery worker image * ci: build celery worker on push to jennifer/celery branch * ci: also build celery worker for main branch * ci: Add comment to celery Dockerfile * chore: first stab at a celery/rabbitmq docker-compose * feat: add celery configuration and test task / endpoint * chore: run mq/celery containers for dev work * chore: point to ghcr.io image for celery worker * refactor: move XML parsing duties into XMLDraft Move some PlaintextDraft methods into the Draft base class and implement for the XMLDraft class. Use xml2rfc code from ietf.submit as a model for the parsing. This leaves some mismatch between the PlaintextDraft and the Draft class spec for the get_author_list() method to be resolved. * feat: add api_upload endpoint and beginnings of async processing This adds an api_upload() that behaves analogously to the api_submit() endpoint. Celery tasks to handle asynchronous processing are added but are not yet functional enough to be useful. * perf: index Submission table on submission_date This substantially speeds up submission rate threshold checks. * feat: remove existing files when accepting a new submission After checking that a submission is not in progress, remove any files in staging that have the same name/rev with any extension. This should guard against stale files confusing the submission process if the usual cleanup fails or is skipped for some reason. * refactor: make clear that deduce_group() uses only the draft name * refactor: extract only draft name/revision in clean() method Minimizing the amount of validation done when accepting a file. The data extraction will be moved to asynchronous processing. * refactor: minimize checks and data extraction in api_upload() view * ci: fix dockerfiles to match sandbox testing * ci: tweak celery container docker-compose settings * refactor: clean up Draft parsing API and usage * remove get_draftname() from Draft api; set filename during init * further XMLDraft work - remember xml_version after parsing - extract filename/revision during init - comment out long broken get_abstract() method * adjust form clean() method to use changed API * feat: flesh out async submission processing First basically working pass! * feat: add state name for submission being validated asynchronously * feat: cancel submissions that async processing can't handle * refactor: simplify/consolidate async tasks and improve error handling * feat: add api_submission_status endpoint * refactor: return JSON from submission api endpoints * refactor: reuse cancel_submission method * refactor: clean up error reporting a bit * feat: guard against cancellation of a submission while validating Not bulletproof but should prevent * feat: indicate that a submission is still being validated * fix: do not delete submission files after creating them * chore: remove debug statement * test: add tests of the api_upload and api_submission_status endpoints * test: add tests and stubs for async side of submission handling * fix: gracefully handle (ignore) invalid IDs in async submit task * test: test process_uploaded_submission method * fix: fix failures of new tests * refactor: fix type checker complaints * test: test submission_status view of submission in "validating" state * fix: fix up migrations * fix: use the streamlined SubmissionBaseUploadForm for api_upload * feat: show submission history event timestamp as mouse-over text * fix: remove 'manual' as next state for 'validating' submission state * refactor: share SubmissionBaseUploadForm code with Deprecated version * fix: validate text submission title, update a couple comments * chore: disable requirements updating when celery dev container starts * feat: log traceback on unexpected error during submission processing * feat: allow secretariat to cancel "validating" submission * feat: indicate time since submission on the status page * perf: check submission rate thresholds earlier when possible No sense parsing details of a draft that is going to be dropped regardless of those details! * fix: create Submission before saving to reduce race condition window * fix: call deduce_group() with filename * refactor: remove code lint * refactor: change the api_upload URL to api/submission * docs: update submission API documentation * test: add tests of api_submission's text draft consistency checks * refactor: rename api_upload to api_submission to agree with new URL * test: test API documentation and submission thresholds * fix: fix a couple api_submission view renames missed in templates * chore: use base image + add arm64 support * ci: try to fix workflow_dispatch for celery worker * ci: another attempt to fix workflow_dispatch * ci: build celery image for submit-async branch * ci: fix typo * ci: publish celery worker to ghcr.io/painless-security * ci: install python requirements in celery image * ci: fix up requirements install on celery image * chore: remove XML_LIBRARY references that crept back in * feat: accept 'replaces' field in api_submission * docs: update api_submission documentation * fix: remove unused import * test: test "replaces" validation for submission API * test: test that "replaces" is set by api_submission * feat: trap TERM to gracefully stop celery container * chore: tweak celery/mq settings * docs: update installation instructions * ci: adjust paths that trigger celery worker image build * ci: fix branches/repo names left over from dev * ci: run manage.py check when initializing celery container Driver here is applying the patches. Starting the celery workers also invokes the check task, but this should cause a clearer failure if something fails. * docs: revise INSTALL instructions * ci: pass filename to pip update in celery container * docs: update INSTALL to include freezing pip versions Will be used to coordinate package versions with the celery container in production. * docs: add explanation of frozen-requirements.txt * ci: build image for sandbox deployment * ci: add additional build trigger path * docs: tweak INSTALL * fix: change INSTALL process to stop datatracker before running migrations * chore: use ietf.settings for manage.py check in celery container * chore: set uid/gid for celery worker * chore: create user/group in celery container if needed * chore: tweak docker compose/init so celery container works in dev * ci: build mq docker image * fix: move rabbitmq.pid to writeable location * fix: clear password when CELERY_PASSWORD is empty Setting to an empty password is really not a good plan! * chore: add shutdown debugging option to celery image * chore: add django-celery-beat package * chore: run "celery beat" in datatracker-celery image * chore: fix docker image name * feat: add task to cancel stale submissions * test: test the cancel_stale_submissions task * chore: make f-string with no interpolation a plain string Co-authored-by: Nicolas Giard <github@ngpixel.com> Co-authored-by: Robert Sparks <rjsparks@nostrum.com>
1 parent 1dab3b1 commit 3705bed

36 files changed

Lines changed: 2401 additions & 361 deletions
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
name: Build Celery Worker Docker Image
2+
3+
on:
4+
push:
5+
branches:
6+
- 'main'
7+
- 'jennifer/submit-async'
8+
paths:
9+
- 'requirements.txt'
10+
- 'dev/celery/**'
11+
- '.github/workflows/build-celery-worker.yml'
12+
13+
workflow_dispatch:
14+
15+
jobs:
16+
publish:
17+
runs-on: ubuntu-latest
18+
permissions:
19+
contents: read
20+
packages: write
21+
22+
steps:
23+
- uses: actions/checkout@v2
24+
25+
- name: Set up QEMU
26+
uses: docker/setup-qemu-action@v2
27+
28+
- name: Set up Docker Buildx
29+
uses: docker/setup-buildx-action@v2
30+
31+
- name: Login to GitHub Container Registry
32+
uses: docker/login-action@v2
33+
with:
34+
registry: ghcr.io
35+
username: ${{ github.actor }}
36+
password: ${{ secrets.GITHUB_TOKEN }}
37+
38+
- name: Docker Build & Push
39+
uses: docker/build-push-action@v3
40+
with:
41+
context: .
42+
file: dev/celery/Dockerfile
43+
platforms: linux/amd64,linux/arm64
44+
push: true
45+
# tags: ghcr.io/ietf-tools/datatracker-celery:latest
46+
tags: ghcr.io/painless-security/datatracker-celery:latest
47+
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: Build MQ Broker Docker Image
2+
3+
on:
4+
push:
5+
branches:
6+
- 'main'
7+
- 'jennifer/submit-async'
8+
paths:
9+
- 'dev/mq/**'
10+
- '.github/workflows/build-mq-worker.yml'
11+
12+
workflow_dispatch:
13+
14+
jobs:
15+
publish:
16+
runs-on: ubuntu-latest
17+
permissions:
18+
contents: read
19+
packages: write
20+
21+
steps:
22+
- uses: actions/checkout@v2
23+
24+
- name: Set up QEMU
25+
uses: docker/setup-qemu-action@v2
26+
27+
- name: Set up Docker Buildx
28+
uses: docker/setup-buildx-action@v2
29+
30+
- name: Login to GitHub Container Registry
31+
uses: docker/login-action@v2
32+
with:
33+
registry: ghcr.io
34+
username: ${{ github.actor }}
35+
password: ${{ secrets.GITHUB_TOKEN }}
36+
37+
- name: Docker Build & Push
38+
uses: docker/build-push-action@v3
39+
with:
40+
context: .
41+
file: dev/mq/Dockerfile
42+
platforms: linux/amd64,linux/arm64
43+
push: true
44+
# tags: ghcr.io/ietf-tools/datatracker-mq:latest
45+
tags: ghcr.io/painless-security/datatracker-mq:latest
46+

dev/INSTALL

Lines changed: 59 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,38 +29,78 @@ General Instructions for Deployment of a New Release
2929
python3.9 -mvenv env
3030
source env/bin/activate
3131
pip install -r requirements.txt
32+
pip freeze > frozen-requirements.txt
33+
34+
(The pip freeze command records the exact versions of the Python libraries that pip installed.
35+
This is used by the celery docker container to ensure it uses the same library versions as
36+
the datatracker service.)
3237

3338
5. Move static files into place for CDN (/a/www/www6s/lib/dt):
3439

3540
ietf/manage.py collectstatic
3641

3742
6. Run system checks (which patches the just installed modules)::
3843

39-
ietf/manage.py check
44+
ietf/manage.py check
45+
46+
7. Switch to the docker directory and update images:
47+
48+
cd /a/docker/datatracker-cel
49+
docker image tag ghcr.io/ietf-tools/datatracker-celery:latest datatracker-celery-fallback
50+
docker image tag ghcr.io/ietf-tools/datatracker-mq:latest datatracker-mq-fallback
51+
docker-compose pull
52+
53+
8. Stop and remove the async task containers:
54+
Wait for this to finish cleanly. It may take up to about 10 minutes for the 'stop' command to
55+
complete if a long-running task is in progress.
56+
57+
docker-compose down
58+
59+
9. Stop the datatracker
60+
(consider doing this with a second shell at ietfa to avoid the exit and shift back to wwwrun)
61+
62+
exit
63+
sudo systemctl stop datatracker.socket datatracker.service
64+
sudo su - -s /bin/bash wwwrun
4065

41-
7. Run migrations:
66+
10. Return to the release directory and run migrations:
4267

68+
cd /a/www/ietf-datatracker/${releasenumber}
4369
ietf/manage.py migrate
4470

4571
Take note if any migrations were executed.
4672

47-
8. Back out one directory level, then re-point the 'web' symlink::
73+
11. Back out one directory level, then re-point the 'web' symlink::
4874

4975
cd ..
5076
rm ./web; ln -s ${releasenumber} web
5177

52-
9. Reload the datatracker service (it is no longer necessary to restart apache) ::
78+
12. Start the datatracker service (it is no longer necessary to restart apache) ::
5379

5480
exit # or CTRL-D, back to root level shell
55-
systemctl restart datatracker
81+
sudo systemctl start datatracker.service datatracker.socket
82+
83+
13. Start async task worker and message broker:
5684

57-
10. Verify operation:
85+
cd /a/docker/datatracker-cel
86+
bash startcommand
87+
88+
14. Verify operation:
5889

5990
http://datatracker.ietf.org/
6091

61-
11. If install failed and there were no migrations at step 7, revert web symlink and repeat the restart in step 9.
62-
If there were migrations at step 7, they will need to be reversed before the restart at step 9. If it's not obvious
63-
what to do to reverse the migrations, contact the dev team.
92+
15. If install failed and there were no migrations at step 9, revert web symlink and docker update and repeat the
93+
restart in steps 11 and 12. To revert the docker update:
94+
95+
cd /a/docker/datatracker-cel
96+
docker-compose down
97+
docker image rm ghcr.io/ietf-tools/datatracker-celery:latest ghcr.io/ietf-tools/datatracker-mq:latest
98+
docker image tag datatracker-celery-fallback ghcr.io/ietf-tools/datatracker-celery:latest
99+
docker image tag datatracker-mq-fallback ghcr.io/ietf-tools/datatracker-mq:latest
100+
cd -
101+
102+
If there were migrations at step 10, they will need to be reversed before the restart at step 12.
103+
If it's not obvious what to do to reverse the migrations, contact the dev team.
64104

65105

66106
Patching a Production Release
@@ -95,8 +135,17 @@ The following process should be used:
95135
6. Edit ``.../ietf/__init__.py`` in the new patched release to indicate the patch
96136
version in the ``__patch__`` string.
97137

98-
7. Change the 'web' symlink, reload etc. as described in
138+
7. Stop the async task container (this may take a few minutes if tasks are in progress):
139+
140+
cd /a/docker/datatracker-cel
141+
docker-compose stop celery
142+
143+
8. Change the 'web' symlink, reload etc. as described in
99144
`General Instructions for Deployment of a New Release`_.
100145

146+
9. Start async task worker:
147+
148+
cd /a/docker/datatracker-cel
149+
bash startcommand
101150

102151

dev/celery/Dockerfile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Dockerfile for celery worker
2+
#
3+
FROM ghcr.io/ietf-tools/datatracker-app-base:latest
4+
LABEL maintainer="IETF Tools Team <tools-discuss@ietf.org>"
5+
6+
ENV DEBIAN_FRONTEND=noninteractive
7+
8+
RUN apt-get purge -y imagemagick imagemagick-6-common
9+
10+
# Copy the startup file
11+
COPY dev/celery/docker-init.sh /docker-init.sh
12+
RUN sed -i 's/\r$//' /docker-init.sh && \
13+
chmod +x /docker-init.sh
14+
15+
# Install current datatracker python dependencies
16+
COPY requirements.txt /tmp/pip-tmp/
17+
RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt
18+
RUN rm -rf /tmp/pip-tmp
19+
20+
ENTRYPOINT [ "/docker-init.sh" ]

dev/celery/docker-init.sh

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
#!/bin/bash
2+
#
3+
# Environment parameters:
4+
#
5+
# CELERY_APP - name of application to pass to celery (defaults to ietf)
6+
#
7+
# CELERY_ROLE - 'worker' or 'beat' (defaults to 'worker')
8+
#
9+
# CELERY_UID - numeric uid for the celery worker process
10+
#
11+
# CELERY_GID - numeric gid for the celery worker process
12+
#
13+
# UPDATES_REQUIREMENTS_FROM - path, relative to /workspace mount, to a pip requirements
14+
# file that should be installed at container startup. Default is no package install/update.
15+
#
16+
# DEBUG_TERM_TIMING - if non-empty, writes debug messages during shutdown after a TERM signal
17+
#
18+
WORKSPACEDIR="/workspace"
19+
CELERY_ROLE="${CELERY_ROLE:-worker}"
20+
21+
cd "$WORKSPACEDIR" || exit 255
22+
23+
if [[ -n "${UPDATE_REQUIREMENTS_FROM}" ]]; then
24+
reqs_file="${WORKSPACEDIR}/${UPDATE_REQUIREMENTS_FROM}"
25+
echo "Updating requirements from ${reqs_file}..."
26+
pip install --upgrade -r "${reqs_file}"
27+
fi
28+
29+
if [[ "${CELERY_ROLE}" == "worker" ]]; then
30+
echo "Running initial checks..."
31+
/usr/local/bin/python $WORKSPACEDIR/ietf/manage.py check
32+
fi
33+
34+
CELERY_OPTS=( "${CELERY_ROLE}" )
35+
if [[ -n "${CELERY_UID}" ]]; then
36+
# ensure that some group with the necessary GID exists in container
37+
if ! id "${CELERY_UID}" ; then
38+
adduser --system --uid "${CELERY_UID}" --no-create-home --disabled-login "celery-user-${CELERY_UID}"
39+
fi
40+
CELERY_OPTS+=("--uid=${CELERY_UID}")
41+
fi
42+
43+
if [[ -n "${CELERY_GID}" ]]; then
44+
# ensure that some group with the necessary GID exists in container
45+
if ! getent group "${CELERY_GID}" ; then
46+
addgroup --gid "${CELERY_GID}" "celery-group-${CELERY_GID}"
47+
fi
48+
CELERY_OPTS+=("--gid=${CELERY_GID}")
49+
fi
50+
51+
log_term_timing_msgs () {
52+
# output periodic debug message
53+
while true; do
54+
echo "Waiting for celery worker shutdown ($(date --utc --iso-8601=ns))"
55+
sleep 0.5s
56+
done
57+
}
58+
59+
cleanup () {
60+
# Cleanly terminate the celery app by sending it a TERM, then waiting for it to exit.
61+
if [[ -n "${celery_pid}" ]]; then
62+
echo "Gracefully terminating celery worker. This may take a few minutes if tasks are in progress..."
63+
kill -TERM "${celery_pid}"
64+
if [[ -n "${DEBUG_TERM_TIMING}" ]]; then
65+
log_term_timing_msgs &
66+
fi
67+
wait "${celery_pid}"
68+
fi
69+
}
70+
71+
trap 'trap "" TERM; cleanup' TERM
72+
# start celery in the background so we can trap the TERM signal
73+
celery --app="${CELERY_APP:-ietf}" "${CELERY_OPTS[@]}" "$@" &
74+
celery_pid=$!
75+
wait "${celery_pid}"

dev/mq/Dockerfile

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Dockerfile for RabbitMQ worker
2+
#
3+
FROM rabbitmq:3-alpine
4+
LABEL maintainer="IETF Tools Team <tools-discuss@ietf.org>"
5+
6+
# Copy the startup file
7+
COPY dev/mq/ietf-rabbitmq-server.bash /ietf-rabbitmq-server.bash
8+
RUN sed -i 's/\r$//' /ietf-rabbitmq-server.bash && \
9+
chmod +x /ietf-rabbitmq-server.bash
10+
11+
# Put the rabbitmq.conf in the conf.d so it runs after 10-defaults.conf.
12+
# Can override this for an individual container by mounting additional
13+
# config files in /etc/rabbitmq/conf.d.
14+
COPY dev/mq/rabbitmq.conf /etc/rabbitmq/conf.d/20-ietf-config.conf
15+
COPY dev/mq/definitions.json /definitions.json
16+
17+
CMD ["/ietf-rabbitmq-server.bash"]

dev/mq/definitions.json

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"permissions": [
3+
{
4+
"configure": ".*",
5+
"read": ".*",
6+
"user": "datatracker",
7+
"vhost": "dt",
8+
"write": ".*"
9+
}
10+
],
11+
"users": [
12+
{
13+
"hashing_algorithm": "rabbit_password_hashing_sha256",
14+
"limits": {},
15+
"name": "datatracker",
16+
"password_hash": "",
17+
"tags": []
18+
}
19+
],
20+
"vhosts": [
21+
{
22+
"limits": [],
23+
"metadata": {
24+
"description": "",
25+
"tags": []
26+
},
27+
"name": "dt"
28+
}
29+
]
30+
}

dev/mq/ietf-rabbitmq-server.bash

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/bin/bash -x
2+
#
3+
# Environment parameters:
4+
#
5+
# CELERY_PASSWORD - password for the datatracker celery user
6+
#
7+
export RABBITMQ_PID_FILE=/tmp/rabbitmq.pid
8+
9+
update_celery_password () {
10+
rabbitmqctl wait "${RABBITMQ_PID_FILE}" --timeout 300
11+
rabbitmqctl await_startup --timeout 300
12+
if [[ -n "${CELERY_PASSWORD}" ]]; then
13+
rabbitmqctl change_password datatracker <<END
14+
${CELERY_PASSWORD}
15+
END
16+
else
17+
rabbitmqctl clear_password datatracker
18+
fi
19+
}
20+
21+
update_celery_password &
22+
exec rabbitmq-server "$@"

dev/mq/rabbitmq.conf

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# prevent guest from logging in over tcp
2+
loopback_users.guest = true
3+
4+
# load saved definitions
5+
load_definitions = /definitions.json
6+
7+
# Ensure that enough disk is available to flush to disk. To do this, need to limit the
8+
# memory available to the container to something reasonable. See
9+
# https://www.rabbitmq.com/production-checklist.html#monitoring-and-resource-usage
10+
# for recommendations.
11+
12+
# 1-1.5 times the memory available to the container is adequate for disk limit
13+
disk_free_limit.absolute = 6000MB
14+
15+
# This should be ~40% of the memory available to the container. Use an
16+
# absolute number because relative will be proprtional to the full machine
17+
# memory.
18+
vm_memory_high_watermark.absolute = 1600MB

0 commit comments

Comments
 (0)