Skip to content

feat(helm): add separate MongoDB backup CronJob for linker_output and webpages_text#3141

Closed
yodem wants to merge 3 commits intomasterfrom
feature/sc-42543/create-separate-backup-job-for-linker-output
Closed

feat(helm): add separate MongoDB backup CronJob for linker_output and webpages_text#3141
yodem wants to merge 3 commits intomasterfrom
feature/sc-42543/create-separate-backup-job-for-linker-output

Conversation

@yodem
Copy link
Collaborator

@yodem yodem commented Mar 10, 2026

Description

Creates a new mongobackup-linker CronJob that independently backs up the linker_output and webpages_text collections — excluded from the main backup in sc-42542 — with a 5-day rolling retention policy.

Code Changes

helm-chart/sefaria/templates/cronjob/mongo-backup-linker.yaml (new)

  • New CronJob {deployEnv}-mongobackup-linker, scheduled at 02:00 daily
  • Same affinity/tolerations as the main backup to co-locate with MongoDB
  • 12Gi emptyDir shared volume between dumper init container and uploader container
  • Controlled by backup.mongo.linkerEnabled

helm-chart/sefaria/templates/configmap/create-linker-dumps.yaml (new)

  • Dumps linker_output and webpages_text collections separately into the shared volume

helm-chart/sefaria/templates/configmap/upload-linker-dumps.yaml (new)

  • Streams each collection as its own tarball to GCS (linker_output_DD.MM.YY.tar.gz, webpages_text_DD.MM.YY.tar.gz)
  • Deletes the file from 5 days ago to enforce retention

helm-chart/sefaria/values.yaml

  • Added backup.mongo.linkerEnabled: false and backup.mongo.historyEnabled: false

Notes

  • Each collection is uploaded as a separate tarball per Noah's request ("dump them separately to their own files")
  • 5-day retention keeps a few snapshots as agreed by Akiva ("a few snapshots are enough")
  • Closes sc-42543

… webpages_text

Implements sc-42543. Creates a new mongobackup-linker CronJob that dumps
linker_output and webpages_text collections into separate tarballs, with a
5-day rolling retention policy. Controlled by backup.mongo.linkerEnabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mergify
Copy link

mergify bot commented Mar 10, 2026

🧪 CI Insights

Here's what we observed from your CI run for 2d0c167.

🟢 All jobs passed!

But CI Insights is watching 👀

@yodem yodem requested a review from BrendanGalloway March 10, 2026 13:15
yodem added 2 commits March 10, 2026 15:33
…d configurations

Updated production-values.yaml to enable extras for MongoDB backup. Modified values.yaml to reflect the change from linkerEnabled to extrasEnabled. Removed deprecated linker dump scripts and associated CronJob configurations to streamline the backup process.
…e related scripts

Updated production-values.yaml and values.yaml to rename the configuration from extrasEnabled to linkerOutputEnabled for MongoDB backup. Deleted obsolete scripts and CronJob configurations related to MongoDB extras to simplify the backup process.
@yodem
Copy link
Collaborator Author

yodem commented Mar 11, 2026

Closing in favor of a single consolidated PR that combines history, linker_output, and webpages_text into one weekly CronJob. See sc-42396 for the new consolidated subtask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants