Fix long-running ECS/Fargate Tasks #2620

rudransh-shrivastava · 2025-11-13T14:49:24Z

Resolves #2595

Proposed change

Fix and ensure that all ECS tasks are functional.

Note: PR will be rebased once #2551 is merged.

Checklist

I've read and followed the contributing guidelines.
I've run make check-test locally; all checks and tests passed.

coderabbitai · 2025-11-13T14:49:38Z

Summary by CodeRabbit

Release Notes

New Features
- Added support for custom data fixture paths via command-line option during deployment.
- Introduced optional direct execution mode for backend commands alongside containerized execution.
Infrastructure
- Integrated AWS Systems Manager Parameter Store for centralized secrets management.
- Made RDS proxy creation configurable for flexible database access patterns.
- Optimized task memory allocations for better resource efficiency.
Documentation
- Updated deployment guides to reflect new secrets population workflow and enhanced task configuration steps.

Walkthrough

Adds AWS SSM Parameter Store integration and related Terraform parameter module, makes RDS Proxy optional with API renames, switches ECS tasks to use SSM parameter ARNs and direct/Make-based commands, adds CLI fixture-path for load_data, updates Makefile/Docker for EXEC_MODE, and populates SSM at Django/WGSI startup.

Changes

Cohort / File(s)	Summary
Backend Makefile & Docker `backend/Makefile`, `backend/docker/Dockerfile`	Make targets now branch on `EXEC_MODE` to run commands locally (`direct`) or via `docker exec`. Dockerfile copies the Makefile, installs `make` in the builder image, and creates a `backend` symlink.
Backend management command & tests `backend/apps/common/management/commands/load_data.py`, `backend/tests/apps/common/management/commands/load_data_test.py`	Added `--fixture-path` CLI argument (default `data/nest.json.gz`) and updated tests to use `call_command("load_data")`, including asserting fixture-path/verbosity usage.
Backend SSM & staging settings `backend/wsgi.py`, `backend/settings/staging.py`, `backend/zappa_settings.example.json`	`wsgi.py` now loads env vars from SSM when `AWS_SYSTEMS_MANAGER_PARAM_STORE_PATH` is set. Removed AWS keys from staging. `zappa_settings.example.json` now references `aws_environment_variables` and adds SSM permissions.
Terraform: new parameters module `infrastructure/modules/parameters/...` (`.../main.tf`, `.../outputs.tf`, `.../variables.tf`)	New module provisioning many `aws_ssm_parameter` resources for DJANGO_* settings, a `random_string` secret, and an output `ssm_parameter_arns` mapping env names → ARNs.
Terraform: ECS and task changes `infrastructure/modules/ecs/main.tf`, `infrastructure/modules/ecs/modules/task/main.tf`, `infrastructure/modules/ecs/modules/task/variables.tf`, `infrastructure/modules/ecs/variables.tf`	Replaced `container_environment` with `container_parameters_arns` (map env name → SSM ARN). Task definitions use `secrets` mapping. Added SSM IAM policy/attachment for task execution role. Task commands switched to shell sequences invoking `EXEC_MODE=direct` and Make targets. Reduced several task memory defaults.
Terraform: database module & RDS proxy opt-in `infrastructure/modules/database/main.tf`, `infrastructure/modules/database/outputs.tf`, `infrastructure/modules/database/variables.tf`	Added `create_rds_proxy` variable, renamed `db_username` → `db_user`, made RDS proxy resources conditional with `count` and updated indexed references; output `db_proxy_endpoint` now conditional on proxy creation.
Terraform: security & SG rules `infrastructure/modules/security/main.tf`, `infrastructure/modules/security/outputs.tf`, `infrastructure/modules/security/variables.tf`	Added `create_rds_proxy` variable; introduced `aws_security_group.rds`, conditional `rds_proxy` SG, and explicit conditional `aws_security_group_rule` resources replacing inline ingress. `rds_proxy_sg_id` output made conditional/indexed.
Terraform: cache module & variables `infrastructure/modules/cache/main.tf`, `infrastructure/modules/cache/variables.tf`	Simplified Redis auth token handling to always generate via `random_password`; removed `redis_auth_token` input variable.
Terraform: top-level wiring & removed outputs/vars `infrastructure/main.tf`, `infrastructure/outputs.tf`, `infrastructure/variables.tf`, `infrastructure/terraform.tfvars.example`	Added `module "parameters"` and wired `container_parameters_arns` in `main.tf`. Removed multiple Django-related variables and sensitive outputs (`database_endpoint`, `redis_endpoint`, `db_password`, `redis_auth_token`). Renamed `db_username` → `db_user` and added `create_rds_proxy`; updated `environment` validation and example tfvars.
Docs `infrastructure/README.md`	Reworked deployment steps to instruct populating SSM Parameter Store with DJANGO_* secrets, removed Terraform output copy steps, clarified migration/load task instructions, and added helpful commands.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas to focus on:

infrastructure/modules/parameters/* — verify parameter types (String vs SecureString), lifecycle ignore_changes usage, and naming consistency with outputs.
infrastructure/modules/ecs/* — confirm IAM policies/attachments, secrets mapping, and correctness of task command substitutions to shell/Make/EXEC_MODE.
create_rds_proxy conditionals — ensure all indexed references ([0]) and outputs/SG wiring are consistent across database, security, and outputs.
Variable renames (db_username → db_user) — ensure no leftover references.
backend/wsgi.py SSM fetch — validate boto3 usage, error handling, and environment variable naming.

Possibly related PRs

Migrate OWASP Nest to Zappa for serverless deployment #2431 — similar migration to SSM parameters and related ECS/Zappa wiring changes.

Suggested reviewers

arkid15r
kasya

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Fix long-running ECS/Fargate Tasks' directly matches the linked issue #2595 and accurately summarizes the main objective of resolving task execution problems.
Description check	✅ Passed	The PR description adequately explains the proposed change (fixing and ensuring all ECS tasks are functional) and references the linked issue #2595, making it relevant to the changeset.
Linked Issues check	✅ Passed	The changeset implements comprehensive infrastructure and backend modifications to fix long-running ECS/Fargate tasks: introduces SSM Parameter Store integration for secure configuration management [#2595], adds conditional RDS proxy support, updates ECS task definitions to use secrets instead of environment variables, and includes a new parameters module for managing Django configuration.
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing ECS/Fargate task execution: infrastructure refactoring (parameters module, security groups, database proxy), backend configuration (SSM integration, load_data fixture path), Makefile for direct execution, and Docker setup—all directly supporting the objective to fix and ensure task functionality.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-13T14:50:03Z

Test: This PR must be linked to an issue.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

infrastructure/modules/ecs/main.tf (1)

119-256: OWASP Make targets missing; load_data_task pattern inconsistent; image URL tagging requires alignment.

Critical issues confirmed:

Missing OWASP Make targets: The tasks reference owasp-update-project-health-requirements, owasp-update-project-health-metrics, and owasp-update-project-health-scores, but these targets do not exist in backend/Makefile. These tasks will fail at runtime.

load_data_task pattern inconsistency (lines 216–242): This task directly executes Python commands rather than wrapping in EXEC_MODE=direct make load-data. While the Make target exists, the task diverges from the pattern used by other tasks. Align with the Make infrastructure or document why this task requires direct execution.

Image URL tagging inconsistency confirmed (lines 131, 160, 181 vs. 201, 230, 250): Three tasks append :latest to the ECR repository URL while others do not. Standardize the tagging strategy across all tasks.

Task module compatibility verified: container_parameters_arns is properly supported in the task module variables and configuration.

🧹 Nitpick comments (4)

infrastructure/README.md (3)

56-60: Clarify which parameters need updating and where to find them.

The instruction references parameters with to-be-set-in-aws-console values but doesn't explain how users identify which DJANGO_* parameters require updating or what values to populate. Consider adding a reference to the parameters module documentation or more explicit guidance (e.g., "Look for parameters with the value to-be-set-in-aws-console and replace with your actual secret values").

106-108: Clarify what causes deployment failures and what "invalid" secrets mean.

The warning states that deployment might fail if DJANGO_* secrets like DJANGO_SLACK_BOT_TOKEN are "invalid," but it doesn't clarify what makes a secret invalid (e.g., missing value, wrong format, wrong key name) or describe the actual failure mode (beyond "no logs"). Consider expanding this note with concrete examples or a troubleshooting reference.

166-166: Verify AWS subnet auto-selection behavior and consider adding explicit subnet guidance.

The instruction assumes that subnets will auto-select when a VPC is chosen in the AWS console. This behavior may not be guaranteed across all AWS console versions or configurations. Consider either:

Verifying this is consistent AWS behavior, or

Adding explicit subnet selection guidance if users may need to manually select them.

This would prevent user confusion if the auto-selection doesn't occur as expected.
backend/apps/common/management/commands/load_data.py (1)
13-26: Consider improving type hint and removing redundant parameter.

The implementation is correct and maintains backward compatibility with the default fixture path. Two minor improvements:

The parser parameter type hint could be more specific: parser: argparse.ArgumentParser

The required=False parameter is redundant since it's the default for add_argument

Apply this diff:
+    def add_arguments(self, parser: argparse.ArgumentParser) -> None:
-    def add_arguments(self, parser) -> None:
         """Add command-line arguments to the parser.
 
         Args:
-            parser (argparse.ArgumentParser): The argument parser instance.
+            parser: The argument parser instance.
 
         """
         parser.add_argument(
             "--fixture-path",
             default="data/nest.json.gz",
-            required=False,
             type=str,
             help="Path to the fixture file",
         )
You'll also need to import argparse at the top of the file:
import argparse

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2175602 and c6e42aa.

📒 Files selected for processing (26)

backend/Makefile (1 hunks)
backend/apps/common/management/commands/load_data.py (1 hunks)
backend/docker/Dockerfile (1 hunks)
backend/settings/staging.py (0 hunks)
backend/wsgi.py (1 hunks)
backend/zappa_settings.example.json (1 hunks)
infrastructure/README.md (4 hunks)
infrastructure/main.tf (4 hunks)
infrastructure/modules/cache/main.tf (2 hunks)
infrastructure/modules/cache/variables.tf (0 hunks)
infrastructure/modules/database/main.tf (4 hunks)
infrastructure/modules/database/outputs.tf (1 hunks)
infrastructure/modules/database/variables.tf (2 hunks)
infrastructure/modules/ecs/main.tf (9 hunks)
infrastructure/modules/ecs/modules/task/main.tf (1 hunks)
infrastructure/modules/ecs/modules/task/variables.tf (1 hunks)
infrastructure/modules/ecs/variables.tf (5 hunks)
infrastructure/modules/parameters/main.tf (1 hunks)
infrastructure/modules/parameters/outputs.tf (1 hunks)
infrastructure/modules/parameters/variables.tf (1 hunks)
infrastructure/modules/security/main.tf (3 hunks)
infrastructure/modules/security/outputs.tf (1 hunks)
infrastructure/modules/security/variables.tf (1 hunks)
infrastructure/outputs.tf (0 hunks)
infrastructure/terraform.tfvars.example (1 hunks)
infrastructure/variables.tf (2 hunks)

💤 Files with no reviewable changes (3)

backend/settings/staging.py
infrastructure/outputs.tf
infrastructure/modules/cache/variables.tf

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.

📚 Learning: 2025-11-08T11:16:25.725Z

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.

Applied to files:

backend/zappa_settings.example.json
infrastructure/main.tf
infrastructure/modules/parameters/variables.tf
infrastructure/modules/parameters/main.tf
infrastructure/terraform.tfvars.example
infrastructure/modules/ecs/modules/task/variables.tf

📚 Learning: 2025-10-23T19:22:23.811Z

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.

Applied to files:

backend/zappa_settings.example.json
infrastructure/README.md

📚 Learning: 2025-10-17T15:25:53.713Z

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/modules/database/main.tf:22-60
Timestamp: 2025-10-17T15:25:53.713Z
Learning: The infrastructure code in the `infrastructure/` directory is intended for quick testing purposes only, not production-grade deployment. Production-grade security hardening controls (such as IAM database authentication, deletion protection, Performance Insights KMS encryption) are not required for this testing infrastructure.

Applied to files:

infrastructure/README.md
infrastructure/terraform.tfvars.example

📚 Learning: 2025-10-26T12:50:50.512Z

Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

backend/Makefile

📚 Learning: 2025-11-08T11:43:19.276Z

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:16-26
Timestamp: 2025-11-08T11:43:19.276Z
Learning: KMS CMK encryption for SSM SecureString parameters in infrastructure/modules/parameters/ is planned to be implemented after S3 state management is completed. Currently using AWS-managed keys for the testing infrastructure.

Applied to files:

infrastructure/modules/parameters/main.tf

🧬 Code graph analysis (1)

backend/apps/common/management/commands/load_data.py (2)

backend/apps/common/management/commands/restore_backup.py (1)

handle (13-17)

backend/apps/core/utils/index.py (1)

disable_indexing (74-92)

🪛 Checkov (3.2.334)

infrastructure/modules/parameters/main.tf

[high] 16-26: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

[high] 28-38: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

[high] 80-86: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

[high] 104-114: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

[high] 124-130: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

[high] 132-138: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

[high] 152-162: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

[high] 164-174: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

[high] 176-186: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Run frontend e2e tests
GitHub Check: Run backend tests
GitHub Check: Run frontend unit tests
GitHub Check: CodeQL (javascript-typescript)
GitHub Check: CodeQL (python)

🔇 Additional comments (38)

infrastructure/README.md (1)

189-193: Good addition for observability.

The new "Helpful Commands" section with the zappa tail command is a useful addition for users to troubleshoot deployments and monitor logs.

Summary: The documentation updates align well with the infrastructure migration to AWS SSM Parameter Store and ECS task management. The main areas for improvement are:

Clarity on secret population (lines 56–60): Users need explicit guidance on identifying which parameters to update and with what values.

Deployment failure messaging (lines 106–108): The phrasing about "invalid" secrets and "no logs" could be more specific to help users troubleshoot.

AWS behavior assumption (line 166): Verify or document the subnet auto-selection behavior to avoid user confusion.

The dynamic references (task revisions, Zappa-provided URLs) and new helpful commands are good improvements. Based on learnings, this documentation should ideally reference the parameters module setup to help users understand where the to-be-set-in-aws-console placeholder values come from.

backend/Makefile (2)

24-36: EXEC_MODE conditional pattern is clean and consistent.

Both targets properly branch execution: when EXEC_MODE=direct, commands run locally; otherwise they use Docker as before. The pattern is idiomatic Makefile and maintains backward compatibility (Docker mode is the implicit default when EXEC_MODE is unset).

22-22: Integration pattern is well-established throughout the Makefile.

Calling targets consistently set CMD before invoking exec-backend-command or exec-backend-command-it. This pattern holds across all invocations (create-superuser, migrations, data operations, etc.), ensuring predictable behavior.

Also applies to: 42-48, 72-95

backend/docker/Dockerfile (3)

41-41: Dockerfile changes properly enable Make target execution in both build and runtime contexts.

The Makefile copy + symlink approach cleanly solves path resolution: ln -s . backend converts backend/apps/*/Makefile includes to apps/*/Makefile without duplicating directory structure. Installing make in the final stage (line 54) ensures Make is available when ECS tasks execute inside the container.

Also applies to: 46-47, 54-54

41-41: Build stage structure correctly propagates Make infrastructure from builder to runtime.

The Makefile and symlink are created in the builder stage and inherited by the final stage via the wholesale directory copy (line 64), eliminating duplication and keeping the runtime image lean.

Also applies to: 46-47, 64-64

54-54: Package installation approach is efficient.

The make package is installed in the final stage with appropriate system update sequencing, consolidated into a single RUN layer for image efficiency.

infrastructure/modules/database/variables.tf (2)

7-11: LGTM!

The create_rds_proxy variable is well-defined with an appropriate default value of false, ensuring opt-in behavior for RDS proxy creation.

87-90: Variable rename is complete and consistent across all references.

Verification confirms all references to db_username have been removed and successfully replaced with db_user throughout the codebase. The variable is properly declared and used in all necessary locations.

infrastructure/modules/security/variables.tf (1)

7-11: LGTM!

The create_rds_proxy variable is well-defined and consistent with the same variable in the database module.

backend/apps/common/management/commands/load_data.py (1)

28-34: LGTM!

The parameterization of the fixture path enables flexible configuration for ECS tasks while maintaining backward compatibility through the default value.

backend/wsgi.py (1)

29-29: LGTM!

The noqa: E402 comment is justified since the environment must be populated from SSM before importing Django's WSGI application.

infrastructure/modules/security/outputs.tf (1)

6-9: LGTM!

The conditional output correctly returns the security group ID when the RDS proxy is created, or null otherwise. The indexed access [0] properly references the count-based resource.

infrastructure/modules/ecs/modules/task/main.tf (1)

46-49: LGTM - security improvement!

The migration from environment to secrets is a significant security improvement. Sensitive configuration values are now fetched from SSM Parameter Store at task runtime rather than being stored directly in the task definition.

infrastructure/terraform.tfvars.example (1)

1-10: LGTM - appropriate for testing infrastructure.

The configuration values are appropriate for a testing/staging environment. The db_backup_retention_period = 0 and force_destroy_bucket = true settings are acceptable since this infrastructure is intended for quick testing purposes only, not production-grade deployment.

Based on learnings.

infrastructure/modules/ecs/modules/task/variables.tf (1)

17-21: LGTM!

The variable rename from container_environment to container_parameters_arns clearly reflects the shift to SSM-based secrets. The updated description accurately documents that it maps environment variable names to SSM parameter ARNs.

infrastructure/modules/cache/main.tf (1)

16-20: Verify unconditional Redis auth token generation doesn't break existing deployments.

Line 18 now unconditionally references random_password.redis_auth_token[0].result, meaning Redis auth is always enabled. This changes from prior conditional behavior. While this aligns with security best practices, if deployments existed without auth tokens, this could cause issues.

Please verify:

Whether the parameters module still receives redis_auth_token from the cache module output (since infrastructure/main.tf line 96 shows redis_password = module.cache.redis_auth_token)

Whether existing deployments can handle the mandatory auth token, or if any manual Terraform state interventions are needed

backend/zappa_settings.example.json (1)

4-20: Clarify production environment configuration.

The example shows staging environment with SSM parameter store configuration. Based on the learnings note that the parameters module is "currently configured for staging environment only," verify whether this example needs production-equivalent configuration, or if it's intentionally staging-only for this PR iteration.

infrastructure/modules/database/outputs.tf (1)

7-10: Conditional endpoint selection is correct.

The output properly switches between RDS Proxy and direct database endpoints based on the feature flag, with correct indexing for the conditional resource.

infrastructure/variables.tf (3)

13-17: Good addition of RDS proxy feature flag.

The create_rds_proxy variable with safe default (false) enables optional RDS proxy provisioning without breaking existing deployments.

68-72: Variable rename is clear and consistent.

The db_username → db_user rename improves naming consistency and aligns with usage throughout the codebase.

74-82: Environment validation improves deployment safety.

The validation block restricting environment to staging or production prevents misconfiguration and aligns with the parameters module's environment-driven design.

infrastructure/modules/ecs/variables.tf (2)

12-16: SSM parameter ARN mapping variable rename is appropriate.

The django_environment_variables → container_parameters_arns rename accurately reflects the shift from direct environment variables to SSM Parameter Store ARN references. Removal of sensitive=true is correct since ARNs are not sensitive.

68-118: Verify that 50% memory reduction won't cause task failures.

Multiple ECS tasks have memory defaults reduced by 50% (2048 MB → 1024 MB):

migrate_task_memory (line 71)

sync_data_task_memory (line 93)

update_project_health_metrics_task_memory (line 105)

update_project_health_scores_task_memory (line 117)

Given the PR objective to "fix long-running ECS tasks," verify whether these reductions are intentional optimizations or if they relate to addressing task failures. If tasks were previously hitting memory limits or timeouts, reducing memory could exacerbate issues.

Please confirm:

Whether memory reductions were tested with actual workloads

Whether the original 2048 MB allocation was based on measured peak usage or arbitrary defaults

Any correlation between memory and the long-running task issues mentioned in #2595

infrastructure/main.tf (2)

84-97: Parameters module wiring is well-positioned and correctly ordered.

The new parameters module is properly placed after database and cache modules to consume their outputs, establishing correct dependency ordering. The use of module.database.db_proxy_endpoint correctly adapts to the conditional proxy endpoint logic.

24-36: Verify cache module still exposes redis_auth_token as output.

All module inputs are consistently updated. However, line 96 references module.cache.redis_auth_token, which needs to be confirmed as an available cache module output (not visible in provided cache/main.tf review).

Please confirm that infrastructure/modules/cache/outputs.tf exposes the redis_auth_token value needed by the parameters module.

Also applies to: 38-70, 99-109

infrastructure/modules/security/main.tf (2)

33-48: Clear separation of RDS and RDS Proxy security groups.

The introduction of a dedicated, non-conditional aws_security_group "rds" resource, with the proxy security group now properly conditional, provides clean infrastructure for optional RDS proxy deployment.

93-124: Security group rule count logic correctly implements proxy feature flag.

The three security group rules implement correct conditional ingress:

rds_from_lambda (count = 0 when proxy enabled): Direct Lambda→RDS access without proxy

rds_from_proxy (count = 1 when proxy enabled): Proxy→RDS access when proxy is active

rds_proxy_from_lambda (count = 1 when proxy enabled): Lambda→Proxy access when proxy is active

This cleanly handles both proxy and non-proxy configurations with mutually exclusive rule sets.

infrastructure/modules/database/main.tf (3)

58-58: Variable rename is consistently applied.

Both the aws_db_instance resource and secrets manager secret version correctly use var.db_user instead of the previous var.db_username.

Also applies to: 74-74

79-158: RDS proxy conditional resource creation uses correct count pattern throughout.

All RDS proxy-related resources (IAM role, policy, proxy, target group, proxy target) correctly use count = var.create_rds_proxy ? 1 : 0, with all dependent references properly indexed using [0]. The least-privilege IAM policy grants only secretsmanager:GetSecretValue for secure secrets access.

127-131: RDS proxy configuration appropriately secures connections.

The proxy requires TLS (require_tls = true), uses Secrets Manager for credentials, and sets a 30-minute idle client timeout. Verify whether this timeout is appropriate for the long-running background tasks mentioned in issue #2595, as overly aggressive timeout settings could contribute to task interruptions.

infrastructure/modules/parameters/outputs.tf (1)

1-22: Output structure correctly exposes all SSM parameter ARNs.

The map cleanly exposes the ARNs of all 17 SSM parameters needed by ECS tasks for secret/config injection.

infrastructure/modules/parameters/main.tf (4)

1-14: Terraform provider versions are appropriate.

Requirements look good: Terraform ≥ 1.0, AWS ≥ 6.0, and random ≥ 3.5 are all recent stable versions.

40-50: Parameter naming and lifecycle strategy is sound.

The /${var.project_name}/${var.environment}/* naming convention provides good scoping and environment isolation. The lifecycle.ignore_changes blocks on non-credential parameters (allowed_hosts, configuration, settings_module) and Algolia/OpenAI/Sentry/Slack parameters correctly allow manual AWS console updates without Terraform drift. Credentials sourced from Terraform state (db_password, redis_password) appropriately lack lifecycle blocks.

188-191: Django secret key generation is solid.

Using random_string with 50 characters and special characters enabled ensures a strong, cryptographically suitable secret key. No concerns here.

16-186: KMS CMK encryption for SecureString parameters—already planned post-S3 state management.

All nine SecureString parameters (django_algolia_application_id, django_algolia_write_api_key, django_db_password, django_open_ai_secret_key, django_redis_password, django_secret_key, django_sentry_dsn, django_slack_bot_token, django_slack_signing_secret) currently lack kms_key_id and use AWS-managed keys. Per existing roadmap, KMS CMK encryption is slated for implementation after S3 state management is completed, making AWS-managed keys acceptable for the current testing infrastructure. Add kms_key_id to all SecureString parameters when KMS CMK implementation begins.

infrastructure/modules/parameters/variables.tf (1)

1-70: Variable schema is well-designed and environment-aware.

All 13 variables align with main.tf resource definitions. Sensitive flags on db_password and redis_password are correct. Defaults for configuration ("Staging") and settings_module ("settings.staging") reflect the current staging-focused scope noted in learnings; consider parameterizing these per environment in future iterations. No immediate concerns.

infrastructure/modules/ecs/main.tf (2)

12-12: Caller identity data source enables dynamic account reference.

Using data.aws_caller_identity.current to fetch account ID for the SSM policy resource pattern is a best practice that avoids hardcoding and ensures correctness across environments.

46-71: IAM policy correctly scopes SSM parameter read access.

The new policy permits ssm:GetParameters on the scoped resource pattern arn:aws:ssm:${region}:${account}:parameter/${project}/${environment}/*, correctly limiting ECS tasks to read parameters within their environment path. The policy attachment to the execution role (not task role) is appropriate for secret injection during task startup. Approved.

backend/wsgi.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

backend/tests/apps/common/management/commands/load_data_test.py (1)
12-36: Add test coverage for the new --fixture-path argument.

The test correctly verifies the default fixture path behavior. However, since the load_data command now accepts a --fixture-path CLI argument, consider adding a test case that verifies custom fixture paths work correctly.

Example test to add:
@patch("apps.core.utils.index.DisableIndexing.unregister_indexes")
@patch("apps.core.utils.index.DisableIndexing.register_indexes")
@patch("apps.common.management.commands.load_data.call_command")
@patch("apps.common.management.commands.load_data.transaction.atomic")
def test_handle_with_custom_fixture_path(
    self,
    mock_atomic,
    mock_call_command,
    mock_register,
    mock_unregister,
):
    mock_atomic.return_value.__enter__ = MagicMock()
    mock_atomic.return_value.__exit__ = MagicMock()

    call_command("load_data", fixture_path="custom/path.json.gz")

    mock_call_command.assert_called_once_with("loaddata", "custom/path.json.gz", "-v", "3")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6e42aa and 2162352.

📒 Files selected for processing (2)

backend/docker/Dockerfile (1 hunks)
backend/tests/apps/common/management/commands/load_data_test.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

backend/docker/Dockerfile

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run frontend e2e tests
GitHub Check: Run backend tests
GitHub Check: Run frontend unit tests
GitHub Check: CodeQL (javascript-typescript)

🔇 Additional comments (2)

backend/tests/apps/common/management/commands/load_data_test.py (2)

4-4: LGTM: Correct Django testing pattern.

Switching to call_command aligns with Django's recommended approach for testing management commands.

42-61: LGTM: Exception handling test is correct.

The test properly verifies that indexing is re-enabled even when call_command raises an exception, which is critical for maintaining data integrity.

sonarqubecloud · 2025-11-13T19:31:55Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

backend/tests/apps/common/management/commands/load_data_test.py (2)
19-21: Remove unused mock objects.

The mock_model and mock_app_config objects are created but never used in the tests. They're not patched into any location and no assertions reference them.

Apply this diff to remove the unused mocks from the first test:
-        mock_model = MagicMock()
-        mock_app_config = MagicMock()
-        mock_app_config.get_models.return_value = [mock_model]
-
         mock_atomic.return_value.__enter__ = MagicMock()
Apply similar cleanup to the second test (lines 49-51) and verify other test methods as well.

Also applies to: 49-51

84-86: Remove unused contextlib.suppress mock.

The patch of contextlib.suppress creates a mock that is never used or asserted. The actual contextlib.suppress on line 88 is the real one from the standard library, used only to prevent the test itself from failing due to the mocked exception.

Apply this diff:
         """Test that indexing is re-enabled even if call_command fails."""
         mock_call_command.side_effect = Exception("Call command failed")
 
-        with patch("contextlib.suppress") as mock_suppress:
-            mock_suppress.return_value.__enter__ = MagicMock()
-            mock_suppress.return_value.__exit__ = MagicMock()
-
-            with contextlib.suppress(Exception):
-                call_command("load_data")
+        with contextlib.suppress(Exception):
+            call_command("load_data")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2162352 and 7fe6726.

📒 Files selected for processing (1)

backend/tests/apps/common/management/commands/load_data_test.py (4 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:16-26
Timestamp: 2025-11-08T11:43:19.276Z
Learning: KMS CMK encryption for SSM SecureString parameters in infrastructure/modules/parameters/ is planned to be implemented after S3 state management is completed. Currently using AWS-managed keys for the testing infrastructure.

Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run frontend unit tests
GitHub Check: Run frontend e2e tests
GitHub Check: Run backend tests
GitHub Check: CodeQL (javascript-typescript)

🔇 Additional comments (2)

backend/tests/apps/common/management/commands/load_data_test.py (2)

4-4: LGTM! Better Django testing practice.

The switch from directly instantiating Command() to using call_command("load_data") is the idiomatic way to test Django management commands and better simulates how the command is invoked in production.

Also applies to: 29-29, 89-89

38-68: Excellent test coverage for the new fixture_path parameter.

The new test case properly validates that custom fixture paths are correctly passed through to the underlying loaddata command. The test structure is consistent with existing tests and includes all necessary assertions.

rudransh-shrivastava added 14 commits November 4, 2025 18:35

Use AWS SSM Parameter Store to handle environment variables

a397c91

Use focused policy for read access

3d9af1b

Update documentation

520eaa5

Add flag for create_rds_proxy

9fa30b1

set default value of create_rds_proxy to false

39a6f90

Populate Zappa/Lambda environment variables from ssm/parameter store

a406fdf

Update documentation

fcabcdf

Update example

a013453

add default configurations

8e4dc59

add security group db from lambda

e7b86a7

fix load-data task by adding a --fixture-path flag

e09e7ad

fix ecs tasks by introducing ecs-* make targets

53724df

change ecs run steps

a1ce391

remove ecs-* and clean code

c6e42aa

rudransh-shrivastava changed the title ~~Feature/nest zappa migration ecs tasks~~ Fix long-running ECS/Fargate Tasks Nov 13, 2025

github-actions bot added docs Improvements or additions to documentation backend makefile labels Nov 13, 2025

rudransh-shrivastava changed the base branch from feature/nest-zappa-migration to main November 13, 2025 14:49

rudransh-shrivastava changed the base branch from main to feature/nest-zappa-migration November 13, 2025 14:50

rudransh-shrivastava changed the base branch from feature/nest-zappa-migration to main November 13, 2025 14:50

rudransh-shrivastava changed the base branch from main to feature/nest-zappa-migration November 13, 2025 14:51

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

backend/wsgi.py Show resolved Hide resolved

rudransh-shrivastava added 2 commits November 14, 2025 00:39

add --no-cache

9ea1bbd

use call_command

2162352

github-actions bot added the backend-tests label Nov 13, 2025

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

add test for --fixture-path

7fe6726

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix long-running ECS/Fargate Tasks #2620

Fix long-running ECS/Fargate Tasks #2620

rudransh-shrivastava commented Nov 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

sonarqubecloud bot commented Nov 13, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Fix long-running ECS/Fargate Tasks #2620

Are you sure you want to change the base?

Fix long-running ECS/Fargate Tasks #2620

Conversation

rudransh-shrivastava commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change

Checklist

Uh oh!

coderabbitai bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Nov 13, 2025

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rudransh-shrivastava commented Nov 13, 2025 •

edited

Loading

coderabbitai bot commented Nov 13, 2025 •

edited

Loading