Skip to content

Conversation

@rudransh-shrivastava
Copy link
Collaborator

@rudransh-shrivastava rudransh-shrivastava commented Nov 13, 2025

Resolves #2595

Proposed change

Fix and ensure that all ECS tasks are functional.

Note: PR will be rebased once #2551 is merged.

Checklist

  • I've read and followed the contributing guidelines.
  • I've run make check-test locally; all checks and tests passed.

@rudransh-shrivastava rudransh-shrivastava changed the title Feature/nest zappa migration ecs tasks Fix long-running ECS/Fargate Tasks Nov 13, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for custom data fixture paths via command-line option during deployment.
    • Introduced optional direct execution mode for backend commands alongside containerized execution.
  • Infrastructure

    • Integrated AWS Systems Manager Parameter Store for centralized secrets management.
    • Made RDS proxy creation configurable for flexible database access patterns.
    • Optimized task memory allocations for better resource efficiency.
  • Documentation

    • Updated deployment guides to reflect new secrets population workflow and enhanced task configuration steps.

Walkthrough

Adds AWS SSM Parameter Store integration and related Terraform parameter module, makes RDS Proxy optional with API renames, switches ECS tasks to use SSM parameter ARNs and direct/Make-based commands, adds CLI fixture-path for load_data, updates Makefile/Docker for EXEC_MODE, and populates SSM at Django/WGSI startup.

Changes

Cohort / File(s) Summary
Backend Makefile & Docker
backend/Makefile, backend/docker/Dockerfile
Make targets now branch on EXEC_MODE to run commands locally (direct) or via docker exec. Dockerfile copies the Makefile, installs make in the builder image, and creates a backend symlink.
Backend management command & tests
backend/apps/common/management/commands/load_data.py, backend/tests/apps/common/management/commands/load_data_test.py
Added --fixture-path CLI argument (default data/nest.json.gz) and updated tests to use call_command("load_data"), including asserting fixture-path/verbosity usage.
Backend SSM & staging settings
backend/wsgi.py, backend/settings/staging.py, backend/zappa_settings.example.json
wsgi.py now loads env vars from SSM when AWS_SYSTEMS_MANAGER_PARAM_STORE_PATH is set. Removed AWS keys from staging. zappa_settings.example.json now references aws_environment_variables and adds SSM permissions.
Terraform: new parameters module
infrastructure/modules/parameters/... (.../main.tf, .../outputs.tf, .../variables.tf)
New module provisioning many aws_ssm_parameter resources for DJANGO_* settings, a random_string secret, and an output ssm_parameter_arns mapping env names → ARNs.
Terraform: ECS and task changes
infrastructure/modules/ecs/main.tf, infrastructure/modules/ecs/modules/task/main.tf, infrastructure/modules/ecs/modules/task/variables.tf, infrastructure/modules/ecs/variables.tf
Replaced container_environment with container_parameters_arns (map env name → SSM ARN). Task definitions use secrets mapping. Added SSM IAM policy/attachment for task execution role. Task commands switched to shell sequences invoking EXEC_MODE=direct and Make targets. Reduced several task memory defaults.
Terraform: database module & RDS proxy opt-in
infrastructure/modules/database/main.tf, infrastructure/modules/database/outputs.tf, infrastructure/modules/database/variables.tf
Added create_rds_proxy variable, renamed db_usernamedb_user, made RDS proxy resources conditional with count and updated indexed references; output db_proxy_endpoint now conditional on proxy creation.
Terraform: security & SG rules
infrastructure/modules/security/main.tf, infrastructure/modules/security/outputs.tf, infrastructure/modules/security/variables.tf
Added create_rds_proxy variable; introduced aws_security_group.rds, conditional rds_proxy SG, and explicit conditional aws_security_group_rule resources replacing inline ingress. rds_proxy_sg_id output made conditional/indexed.
Terraform: cache module & variables
infrastructure/modules/cache/main.tf, infrastructure/modules/cache/variables.tf
Simplified Redis auth token handling to always generate via random_password; removed redis_auth_token input variable.
Terraform: top-level wiring & removed outputs/vars
infrastructure/main.tf, infrastructure/outputs.tf, infrastructure/variables.tf, infrastructure/terraform.tfvars.example
Added module "parameters" and wired container_parameters_arns in main.tf. Removed multiple Django-related variables and sensitive outputs (database_endpoint, redis_endpoint, db_password, redis_auth_token). Renamed db_usernamedb_user and added create_rds_proxy; updated environment validation and example tfvars.
Docs
infrastructure/README.md
Reworked deployment steps to instruct populating SSM Parameter Store with DJANGO_* secrets, removed Terraform output copy steps, clarified migration/load task instructions, and added helpful commands.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas to focus on:

  • infrastructure/modules/parameters/* — verify parameter types (String vs SecureString), lifecycle ignore_changes usage, and naming consistency with outputs.
  • infrastructure/modules/ecs/* — confirm IAM policies/attachments, secrets mapping, and correctness of task command substitutions to shell/Make/EXEC_MODE.
  • create_rds_proxy conditionals — ensure all indexed references ([0]) and outputs/SG wiring are consistent across database, security, and outputs.
  • Variable renames (db_usernamedb_user) — ensure no leftover references.
  • backend/wsgi.py SSM fetch — validate boto3 usage, error handling, and environment variable naming.

Possibly related PRs

Suggested reviewers

  • arkid15r
  • kasya

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Fix long-running ECS/Fargate Tasks' directly matches the linked issue #2595 and accurately summarizes the main objective of resolving task execution problems.
Description check ✅ Passed The PR description adequately explains the proposed change (fixing and ensuring all ECS tasks are functional) and references the linked issue #2595, making it relevant to the changeset.
Linked Issues check ✅ Passed The changeset implements comprehensive infrastructure and backend modifications to fix long-running ECS/Fargate tasks: introduces SSM Parameter Store integration for secure configuration management [#2595], adds conditional RDS proxy support, updates ECS task definitions to use secrets instead of environment variables, and includes a new parameters module for managing Django configuration.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing ECS/Fargate task execution: infrastructure refactoring (parameters module, security groups, database proxy), backend configuration (SSM integration, load_data fixture path), Makefile for direct execution, and Docker setup—all directly supporting the objective to fix and ensure task functionality.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added docs Improvements or additions to documentation backend makefile labels Nov 13, 2025
@rudransh-shrivastava rudransh-shrivastava changed the base branch from feature/nest-zappa-migration to main November 13, 2025 14:49
@github-actions
Copy link

Test: This PR must be linked to an issue.

@rudransh-shrivastava rudransh-shrivastava changed the base branch from main to feature/nest-zappa-migration November 13, 2025 14:50
@rudransh-shrivastava rudransh-shrivastava changed the base branch from feature/nest-zappa-migration to main November 13, 2025 14:50
@rudransh-shrivastava rudransh-shrivastava changed the base branch from main to feature/nest-zappa-migration November 13, 2025 14:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
infrastructure/modules/ecs/main.tf (1)

119-256: OWASP Make targets missing; load_data_task pattern inconsistent; image URL tagging requires alignment.

Critical issues confirmed:

  1. Missing OWASP Make targets: The tasks reference owasp-update-project-health-requirements, owasp-update-project-health-metrics, and owasp-update-project-health-scores, but these targets do not exist in backend/Makefile. These tasks will fail at runtime.

  2. load_data_task pattern inconsistency (lines 216–242): This task directly executes Python commands rather than wrapping in EXEC_MODE=direct make load-data. While the Make target exists, the task diverges from the pattern used by other tasks. Align with the Make infrastructure or document why this task requires direct execution.

  3. Image URL tagging inconsistency confirmed (lines 131, 160, 181 vs. 201, 230, 250): Three tasks append :latest to the ECR repository URL while others do not. Standardize the tagging strategy across all tasks.

  4. Task module compatibility verified: container_parameters_arns is properly supported in the task module variables and configuration.

🧹 Nitpick comments (4)
infrastructure/README.md (3)

56-60: Clarify which parameters need updating and where to find them.

The instruction references parameters with to-be-set-in-aws-console values but doesn't explain how users identify which DJANGO_* parameters require updating or what values to populate. Consider adding a reference to the parameters module documentation or more explicit guidance (e.g., "Look for parameters with the value to-be-set-in-aws-console and replace with your actual secret values").


106-108: Clarify what causes deployment failures and what "invalid" secrets mean.

The warning states that deployment might fail if DJANGO_* secrets like DJANGO_SLACK_BOT_TOKEN are "invalid," but it doesn't clarify what makes a secret invalid (e.g., missing value, wrong format, wrong key name) or describe the actual failure mode (beyond "no logs"). Consider expanding this note with concrete examples or a troubleshooting reference.


166-166: Verify AWS subnet auto-selection behavior and consider adding explicit subnet guidance.

The instruction assumes that subnets will auto-select when a VPC is chosen in the AWS console. This behavior may not be guaranteed across all AWS console versions or configurations. Consider either:

  1. Verifying this is consistent AWS behavior, or
  2. Adding explicit subnet selection guidance if users may need to manually select them.

This would prevent user confusion if the auto-selection doesn't occur as expected.

backend/apps/common/management/commands/load_data.py (1)

13-26: Consider improving type hint and removing redundant parameter.

The implementation is correct and maintains backward compatibility with the default fixture path. Two minor improvements:

  1. The parser parameter type hint could be more specific: parser: argparse.ArgumentParser
  2. The required=False parameter is redundant since it's the default for add_argument

Apply this diff:

+    def add_arguments(self, parser: argparse.ArgumentParser) -> None:
-    def add_arguments(self, parser) -> None:
         """Add command-line arguments to the parser.
 
         Args:
-            parser (argparse.ArgumentParser): The argument parser instance.
+            parser: The argument parser instance.
 
         """
         parser.add_argument(
             "--fixture-path",
             default="data/nest.json.gz",
-            required=False,
             type=str,
             help="Path to the fixture file",
         )

You'll also need to import argparse at the top of the file:

import argparse
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2175602 and c6e42aa.

📒 Files selected for processing (26)
  • backend/Makefile (1 hunks)
  • backend/apps/common/management/commands/load_data.py (1 hunks)
  • backend/docker/Dockerfile (1 hunks)
  • backend/settings/staging.py (0 hunks)
  • backend/wsgi.py (1 hunks)
  • backend/zappa_settings.example.json (1 hunks)
  • infrastructure/README.md (4 hunks)
  • infrastructure/main.tf (4 hunks)
  • infrastructure/modules/cache/main.tf (2 hunks)
  • infrastructure/modules/cache/variables.tf (0 hunks)
  • infrastructure/modules/database/main.tf (4 hunks)
  • infrastructure/modules/database/outputs.tf (1 hunks)
  • infrastructure/modules/database/variables.tf (2 hunks)
  • infrastructure/modules/ecs/main.tf (9 hunks)
  • infrastructure/modules/ecs/modules/task/main.tf (1 hunks)
  • infrastructure/modules/ecs/modules/task/variables.tf (1 hunks)
  • infrastructure/modules/ecs/variables.tf (5 hunks)
  • infrastructure/modules/parameters/main.tf (1 hunks)
  • infrastructure/modules/parameters/outputs.tf (1 hunks)
  • infrastructure/modules/parameters/variables.tf (1 hunks)
  • infrastructure/modules/security/main.tf (3 hunks)
  • infrastructure/modules/security/outputs.tf (1 hunks)
  • infrastructure/modules/security/variables.tf (1 hunks)
  • infrastructure/outputs.tf (0 hunks)
  • infrastructure/terraform.tfvars.example (1 hunks)
  • infrastructure/variables.tf (2 hunks)
💤 Files with no reviewable changes (3)
  • backend/settings/staging.py
  • infrastructure/outputs.tf
  • infrastructure/modules/cache/variables.tf
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.
📚 Learning: 2025-11-08T11:16:25.725Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.

Applied to files:

  • backend/zappa_settings.example.json
  • infrastructure/main.tf
  • infrastructure/modules/parameters/variables.tf
  • infrastructure/modules/parameters/main.tf
  • infrastructure/terraform.tfvars.example
  • infrastructure/modules/ecs/modules/task/variables.tf
📚 Learning: 2025-10-23T19:22:23.811Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.

Applied to files:

  • backend/zappa_settings.example.json
  • infrastructure/README.md
📚 Learning: 2025-10-17T15:25:53.713Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/modules/database/main.tf:22-60
Timestamp: 2025-10-17T15:25:53.713Z
Learning: The infrastructure code in the `infrastructure/` directory is intended for quick testing purposes only, not production-grade deployment. Production-grade security hardening controls (such as IAM database authentication, deletion protection, Performance Insights KMS encryption) are not required for this testing infrastructure.

Applied to files:

  • infrastructure/README.md
  • infrastructure/terraform.tfvars.example
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • backend/Makefile
📚 Learning: 2025-11-08T11:43:19.276Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:16-26
Timestamp: 2025-11-08T11:43:19.276Z
Learning: KMS CMK encryption for SSM SecureString parameters in infrastructure/modules/parameters/ is planned to be implemented after S3 state management is completed. Currently using AWS-managed keys for the testing infrastructure.

Applied to files:

  • infrastructure/modules/parameters/main.tf
🧬 Code graph analysis (1)
backend/apps/common/management/commands/load_data.py (2)
backend/apps/common/management/commands/restore_backup.py (1)
  • handle (13-17)
backend/apps/core/utils/index.py (1)
  • disable_indexing (74-92)
🪛 Checkov (3.2.334)
infrastructure/modules/parameters/main.tf

[high] 16-26: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 28-38: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 80-86: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 104-114: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 124-130: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 132-138: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 152-162: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 164-174: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 176-186: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
  • GitHub Check: CodeQL (python)
🔇 Additional comments (38)
infrastructure/README.md (1)

189-193: Good addition for observability.

The new "Helpful Commands" section with the zappa tail command is a useful addition for users to troubleshoot deployments and monitor logs.


Summary: The documentation updates align well with the infrastructure migration to AWS SSM Parameter Store and ECS task management. The main areas for improvement are:

  1. Clarity on secret population (lines 56–60): Users need explicit guidance on identifying which parameters to update and with what values.
  2. Deployment failure messaging (lines 106–108): The phrasing about "invalid" secrets and "no logs" could be more specific to help users troubleshoot.
  3. AWS behavior assumption (line 166): Verify or document the subnet auto-selection behavior to avoid user confusion.

The dynamic references (task revisions, Zappa-provided URLs) and new helpful commands are good improvements. Based on learnings, this documentation should ideally reference the parameters module setup to help users understand where the to-be-set-in-aws-console placeholder values come from.

backend/Makefile (2)

24-36: EXEC_MODE conditional pattern is clean and consistent.

Both targets properly branch execution: when EXEC_MODE=direct, commands run locally; otherwise they use Docker as before. The pattern is idiomatic Makefile and maintains backward compatibility (Docker mode is the implicit default when EXEC_MODE is unset).


22-22: Integration pattern is well-established throughout the Makefile.

Calling targets consistently set CMD before invoking exec-backend-command or exec-backend-command-it. This pattern holds across all invocations (create-superuser, migrations, data operations, etc.), ensuring predictable behavior.

Also applies to: 42-48, 72-95

backend/docker/Dockerfile (3)

41-41: Dockerfile changes properly enable Make target execution in both build and runtime contexts.

The Makefile copy + symlink approach cleanly solves path resolution: ln -s . backend converts backend/apps/*/Makefile includes to apps/*/Makefile without duplicating directory structure. Installing make in the final stage (line 54) ensures Make is available when ECS tasks execute inside the container.

Also applies to: 46-47, 54-54


41-41: Build stage structure correctly propagates Make infrastructure from builder to runtime.

The Makefile and symlink are created in the builder stage and inherited by the final stage via the wholesale directory copy (line 64), eliminating duplication and keeping the runtime image lean.

Also applies to: 46-47, 64-64


54-54: Package installation approach is efficient.

The make package is installed in the final stage with appropriate system update sequencing, consolidated into a single RUN layer for image efficiency.

infrastructure/modules/database/variables.tf (2)

7-11: LGTM!

The create_rds_proxy variable is well-defined with an appropriate default value of false, ensuring opt-in behavior for RDS proxy creation.


87-90: Variable rename is complete and consistent across all references.

Verification confirms all references to db_username have been removed and successfully replaced with db_user throughout the codebase. The variable is properly declared and used in all necessary locations.

infrastructure/modules/security/variables.tf (1)

7-11: LGTM!

The create_rds_proxy variable is well-defined and consistent with the same variable in the database module.

backend/apps/common/management/commands/load_data.py (1)

28-34: LGTM!

The parameterization of the fixture path enables flexible configuration for ECS tasks while maintaining backward compatibility through the default value.

backend/wsgi.py (1)

29-29: LGTM!

The noqa: E402 comment is justified since the environment must be populated from SSM before importing Django's WSGI application.

infrastructure/modules/security/outputs.tf (1)

6-9: LGTM!

The conditional output correctly returns the security group ID when the RDS proxy is created, or null otherwise. The indexed access [0] properly references the count-based resource.

infrastructure/modules/ecs/modules/task/main.tf (1)

46-49: LGTM - security improvement!

The migration from environment to secrets is a significant security improvement. Sensitive configuration values are now fetched from SSM Parameter Store at task runtime rather than being stored directly in the task definition.

infrastructure/terraform.tfvars.example (1)

1-10: LGTM - appropriate for testing infrastructure.

The configuration values are appropriate for a testing/staging environment. The db_backup_retention_period = 0 and force_destroy_bucket = true settings are acceptable since this infrastructure is intended for quick testing purposes only, not production-grade deployment.

Based on learnings.

infrastructure/modules/ecs/modules/task/variables.tf (1)

17-21: LGTM!

The variable rename from container_environment to container_parameters_arns clearly reflects the shift to SSM-based secrets. The updated description accurately documents that it maps environment variable names to SSM parameter ARNs.

infrastructure/modules/cache/main.tf (1)

16-20: Verify unconditional Redis auth token generation doesn't break existing deployments.

Line 18 now unconditionally references random_password.redis_auth_token[0].result, meaning Redis auth is always enabled. This changes from prior conditional behavior. While this aligns with security best practices, if deployments existed without auth tokens, this could cause issues.

Please verify:

  1. Whether the parameters module still receives redis_auth_token from the cache module output (since infrastructure/main.tf line 96 shows redis_password = module.cache.redis_auth_token)
  2. Whether existing deployments can handle the mandatory auth token, or if any manual Terraform state interventions are needed
backend/zappa_settings.example.json (1)

4-20: Clarify production environment configuration.

The example shows staging environment with SSM parameter store configuration. Based on the learnings note that the parameters module is "currently configured for staging environment only," verify whether this example needs production-equivalent configuration, or if it's intentionally staging-only for this PR iteration.

infrastructure/modules/database/outputs.tf (1)

7-10: Conditional endpoint selection is correct.

The output properly switches between RDS Proxy and direct database endpoints based on the feature flag, with correct indexing for the conditional resource.

infrastructure/variables.tf (3)

13-17: Good addition of RDS proxy feature flag.

The create_rds_proxy variable with safe default (false) enables optional RDS proxy provisioning without breaking existing deployments.


68-72: Variable rename is clear and consistent.

The db_usernamedb_user rename improves naming consistency and aligns with usage throughout the codebase.


74-82: Environment validation improves deployment safety.

The validation block restricting environment to staging or production prevents misconfiguration and aligns with the parameters module's environment-driven design.

infrastructure/modules/ecs/variables.tf (2)

12-16: SSM parameter ARN mapping variable rename is appropriate.

The django_environment_variablescontainer_parameters_arns rename accurately reflects the shift from direct environment variables to SSM Parameter Store ARN references. Removal of sensitive=true is correct since ARNs are not sensitive.


68-118: Verify that 50% memory reduction won't cause task failures.

Multiple ECS tasks have memory defaults reduced by 50% (2048 MB → 1024 MB):

  • migrate_task_memory (line 71)
  • sync_data_task_memory (line 93)
  • update_project_health_metrics_task_memory (line 105)
  • update_project_health_scores_task_memory (line 117)

Given the PR objective to "fix long-running ECS tasks," verify whether these reductions are intentional optimizations or if they relate to addressing task failures. If tasks were previously hitting memory limits or timeouts, reducing memory could exacerbate issues.

Please confirm:

  1. Whether memory reductions were tested with actual workloads
  2. Whether the original 2048 MB allocation was based on measured peak usage or arbitrary defaults
  3. Any correlation between memory and the long-running task issues mentioned in #2595
infrastructure/main.tf (2)

84-97: Parameters module wiring is well-positioned and correctly ordered.

The new parameters module is properly placed after database and cache modules to consume their outputs, establishing correct dependency ordering. The use of module.database.db_proxy_endpoint correctly adapts to the conditional proxy endpoint logic.


24-36: Verify cache module still exposes redis_auth_token as output.

All module inputs are consistently updated. However, line 96 references module.cache.redis_auth_token, which needs to be confirmed as an available cache module output (not visible in provided cache/main.tf review).

Please confirm that infrastructure/modules/cache/outputs.tf exposes the redis_auth_token value needed by the parameters module.

Also applies to: 38-70, 99-109

infrastructure/modules/security/main.tf (2)

33-48: Clear separation of RDS and RDS Proxy security groups.

The introduction of a dedicated, non-conditional aws_security_group "rds" resource, with the proxy security group now properly conditional, provides clean infrastructure for optional RDS proxy deployment.


93-124: Security group rule count logic correctly implements proxy feature flag.

The three security group rules implement correct conditional ingress:

  • rds_from_lambda (count = 0 when proxy enabled): Direct Lambda→RDS access without proxy
  • rds_from_proxy (count = 1 when proxy enabled): Proxy→RDS access when proxy is active
  • rds_proxy_from_lambda (count = 1 when proxy enabled): Lambda→Proxy access when proxy is active

This cleanly handles both proxy and non-proxy configurations with mutually exclusive rule sets.

infrastructure/modules/database/main.tf (3)

58-58: Variable rename is consistently applied.

Both the aws_db_instance resource and secrets manager secret version correctly use var.db_user instead of the previous var.db_username.

Also applies to: 74-74


79-158: RDS proxy conditional resource creation uses correct count pattern throughout.

All RDS proxy-related resources (IAM role, policy, proxy, target group, proxy target) correctly use count = var.create_rds_proxy ? 1 : 0, with all dependent references properly indexed using [0]. The least-privilege IAM policy grants only secretsmanager:GetSecretValue for secure secrets access.


127-131: RDS proxy configuration appropriately secures connections.

The proxy requires TLS (require_tls = true), uses Secrets Manager for credentials, and sets a 30-minute idle client timeout. Verify whether this timeout is appropriate for the long-running background tasks mentioned in issue #2595, as overly aggressive timeout settings could contribute to task interruptions.

infrastructure/modules/parameters/outputs.tf (1)

1-22: Output structure correctly exposes all SSM parameter ARNs.

The map cleanly exposes the ARNs of all 17 SSM parameters needed by ECS tasks for secret/config injection.

infrastructure/modules/parameters/main.tf (4)

1-14: Terraform provider versions are appropriate.

Requirements look good: Terraform ≥ 1.0, AWS ≥ 6.0, and random ≥ 3.5 are all recent stable versions.


40-50: Parameter naming and lifecycle strategy is sound.

The /${var.project_name}/${var.environment}/* naming convention provides good scoping and environment isolation. The lifecycle.ignore_changes blocks on non-credential parameters (allowed_hosts, configuration, settings_module) and Algolia/OpenAI/Sentry/Slack parameters correctly allow manual AWS console updates without Terraform drift. Credentials sourced from Terraform state (db_password, redis_password) appropriately lack lifecycle blocks.


188-191: Django secret key generation is solid.

Using random_string with 50 characters and special characters enabled ensures a strong, cryptographically suitable secret key. No concerns here.


16-186: KMS CMK encryption for SecureString parameters—already planned post-S3 state management.

All nine SecureString parameters (django_algolia_application_id, django_algolia_write_api_key, django_db_password, django_open_ai_secret_key, django_redis_password, django_secret_key, django_sentry_dsn, django_slack_bot_token, django_slack_signing_secret) currently lack kms_key_id and use AWS-managed keys. Per existing roadmap, KMS CMK encryption is slated for implementation after S3 state management is completed, making AWS-managed keys acceptable for the current testing infrastructure. Add kms_key_id to all SecureString parameters when KMS CMK implementation begins.

infrastructure/modules/parameters/variables.tf (1)

1-70: Variable schema is well-designed and environment-aware.

All 13 variables align with main.tf resource definitions. Sensitive flags on db_password and redis_password are correct. Defaults for configuration ("Staging") and settings_module ("settings.staging") reflect the current staging-focused scope noted in learnings; consider parameterizing these per environment in future iterations. No immediate concerns.

infrastructure/modules/ecs/main.tf (2)

12-12: Caller identity data source enables dynamic account reference.

Using data.aws_caller_identity.current to fetch account ID for the SSM policy resource pattern is a best practice that avoids hardcoding and ensures correctness across environments.


46-71: IAM policy correctly scopes SSM parameter read access.

The new policy permits ssm:GetParameters on the scoped resource pattern arn:aws:ssm:${region}:${account}:parameter/${project}/${environment}/*, correctly limiting ECS tasks to read parameters within their environment path. The policy attachment to the execution role (not task role) is appropriate for secret injection during task startup. Approved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/tests/apps/common/management/commands/load_data_test.py (1)

12-36: Add test coverage for the new --fixture-path argument.

The test correctly verifies the default fixture path behavior. However, since the load_data command now accepts a --fixture-path CLI argument, consider adding a test case that verifies custom fixture paths work correctly.

Example test to add:

@patch("apps.core.utils.index.DisableIndexing.unregister_indexes")
@patch("apps.core.utils.index.DisableIndexing.register_indexes")
@patch("apps.common.management.commands.load_data.call_command")
@patch("apps.common.management.commands.load_data.transaction.atomic")
def test_handle_with_custom_fixture_path(
    self,
    mock_atomic,
    mock_call_command,
    mock_register,
    mock_unregister,
):
    mock_atomic.return_value.__enter__ = MagicMock()
    mock_atomic.return_value.__exit__ = MagicMock()

    call_command("load_data", fixture_path="custom/path.json.gz")

    mock_call_command.assert_called_once_with("loaddata", "custom/path.json.gz", "-v", "3")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6e42aa and 2162352.

📒 Files selected for processing (2)
  • backend/docker/Dockerfile (1 hunks)
  • backend/tests/apps/common/management/commands/load_data_test.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/docker/Dockerfile
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (2)
backend/tests/apps/common/management/commands/load_data_test.py (2)

4-4: LGTM: Correct Django testing pattern.

Switching to call_command aligns with Django's recommended approach for testing management commands.


42-61: LGTM: Exception handling test is correct.

The test properly verifies that indexing is re-enabled even when call_command raises an exception, which is critical for maintaining data integrity.

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/tests/apps/common/management/commands/load_data_test.py (2)

19-21: Remove unused mock objects.

The mock_model and mock_app_config objects are created but never used in the tests. They're not patched into any location and no assertions reference them.

Apply this diff to remove the unused mocks from the first test:

-        mock_model = MagicMock()
-        mock_app_config = MagicMock()
-        mock_app_config.get_models.return_value = [mock_model]
-
         mock_atomic.return_value.__enter__ = MagicMock()

Apply similar cleanup to the second test (lines 49-51) and verify other test methods as well.

Also applies to: 49-51


84-86: Remove unused contextlib.suppress mock.

The patch of contextlib.suppress creates a mock that is never used or asserted. The actual contextlib.suppress on line 88 is the real one from the standard library, used only to prevent the test itself from failing due to the mocked exception.

Apply this diff:

         """Test that indexing is re-enabled even if call_command fails."""
         mock_call_command.side_effect = Exception("Call command failed")
 
-        with patch("contextlib.suppress") as mock_suppress:
-            mock_suppress.return_value.__enter__ = MagicMock()
-            mock_suppress.return_value.__exit__ = MagicMock()
-
-            with contextlib.suppress(Exception):
-                call_command("load_data")
+        with contextlib.suppress(Exception):
+            call_command("load_data")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2162352 and 7fe6726.

📒 Files selected for processing (1)
  • backend/tests/apps/common/management/commands/load_data_test.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:16-26
Timestamp: 2025-11-08T11:43:19.276Z
Learning: KMS CMK encryption for SSM SecureString parameters in infrastructure/modules/parameters/ is planned to be implemented after S3 state management is completed. Currently using AWS-managed keys for the testing infrastructure.
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (2)
backend/tests/apps/common/management/commands/load_data_test.py (2)

4-4: LGTM! Better Django testing practice.

The switch from directly instantiating Command() to using call_command("load_data") is the idiomatic way to test Django management commands and better simulates how the command is invoked in production.

Also applies to: 29-29, 89-89


38-68: Excellent test coverage for the new fixture_path parameter.

The new test case properly validates that custom fixture paths are correctly passed through to the underlying loaddata command. The test structure is consistent with existing tests and includes all necessary assertions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend backend-tests docs Improvements or additions to documentation makefile

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix long-running ECS/Fargate Tasks

1 participant