-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Created new lambdas with updated code to handled processing exports, ocr, reconciliation more efficiently. They all use SQS instead of SNS notification. Adding them all here with an explanation. Also added a step function to handle large zip files.
BiospexImageProcess:
A. Handles image processing during exports.
B. Saves images to s3.
BiospexZipCreator:
A. Handles creating zip files for export downloads.
B. Uses Lambda /tmp space for Expeditions with over 8000 images.
C. Over 8k, it uses EFS.
D. Can run 4 at a time for when needed by ZipBatchOrchestrator Step Function
ZipBatchOrchestrator:
A. Handles orchestration of zip file creation if Expeditions have more than 8000 images.
B. ZooniverseExportBuildCsvJob sends data to either BiospexZipCreator or ZipBatchOrchestrator depending on the number of images.
C. Breaks up zip files into 5000 image batches then send to BiospexZipMerger.
E. This was done because trying to zip large image collections timed out in Lambda.
BiospexZipMerger:
A. Handles merging zip files into one.
B. Uses SQS to send messages to results back to listener on server.
BiospexLabelReconcile: handles label reconciliation and explanations for expert reviews.
A. Built using the label_reconciliations v4.3
B. Updated to Python 3.10.
C. Some options might not be available as I only updated it enough for what we needed.
D. When csv file is downloaded to label-reconciliations directory, it will be pick up by BiospexLabelReconcile.
E. It will then send the results back to the listener on the server.
BiospexTesseractOcr:
A. Handles OCR of images.
B. Uses SQS to send messages to results back to listener on server.
C. Updated to latest version of tesseract-ocr.