Skip to content

fix: container resource limits + JVM/native memory accounting to prevent OOM #378

@NotYuSheng

Description

@NotYuSheng

Description

No CPU/memory requests or limits are defined for any service, and the memory model risks OOM. Identified in the production readiness assessment (#366, docs/production-readiness.md, finding P1-3).

The entrypoint sets the JVM heap to 75% of APP_MEMORY_MB via -Xmx/-Xms (backend/docker-entrypoint.sh:7,31-36). However, tshark and ndpi run as native subprocesses (PcapParserService, NdpiService, TsharkEnrichmentService, SessionReconstructionService, …) and consume memory outside the JVM heap. With heap already at 75% and no container memory limit, analysing a large capture can exhaust host memory.

There are also no restart policies on any compose service, so a crashed container stays down.

Acceptance Criteria

  • Define CPU/memory requests and limits per service (compose deploy.resources and/or Kubernetes resources).
  • Re-balance the JVM heap fraction so the container limit accommodates JVM heap plus native tshark/ndpi headroom; validate against a large representative PCAP.
  • Add restart policies (restart: unless-stopped / k8s default) where appropriate.
  • Document the validated minimum and recommended sizing (see the sizing table in docs/production-readiness.md).

Affected Files

Metadata

Metadata

Assignees

No one assigned

    Labels

    devopsDeployment and operationsproductionProduction readiness

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions