|
| 1 | +--- |
| 2 | +title: Self-hosted on AWS |
| 3 | +sidebarTitle: AWS |
| 4 | +icon: "aws" |
| 5 | +--- |
| 6 | + |
| 7 | +When running LangSmith on [Amazon Web Services (AWS)](https://aws.amazon.com/), you can set up in either [full self-hosted](/langsmith/self-hosted) or [hybrid](/langsmith/hybrid) mode. Full self-hosted mode deploys a complete LangSmith platform with observability functionality as well as the option to create agent deployments. Hybrid mode entails just the infrastructure to run agents in a data plane within your cloud, while our SaaS provides the control plane and observability functionality. |
| 8 | + |
| 9 | +This page provides AWS-specific architecture patterns, service recommendations, and best practices for deploying and operating LangSmith on AWS. |
| 10 | + |
| 11 | +<Note> |
| 12 | +LangChain provides Terraform modules specifically for AWS to help provision infrastructure for LangSmith. These modules can quickly set up EKS clusters, RDS, ElastiCache, S3, and networking resources. |
| 13 | + |
| 14 | +View the [AWS Terraform modules](https://github.com/langchain-ai/terraform/tree/main/modules/aws) for documentation and examples. |
| 15 | +</Note> |
| 16 | + |
| 17 | +## Reference architecture |
| 18 | + |
| 19 | +We recommend leveraging AWS's managed services to provide a scalable, secure, and resilient platform. The following architecture applies to both self-hosted and hybrid and aligns with the [AWS Well-Architected Framework](https://aws.amazon.com/architecture/well-architected/): |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +- <Icon icon="globe" /> **Ingress & networking**: Requests enter via [Amazon Application Load Balancer (ALB)](https://aws.amazon.com/elasticloadbalancing/application-load-balancer/) within your [VPC](https://aws.amazon.com/vpc/), secured using [AWS WAF](https://aws.amazon.com/waf/) and [IAM](https://aws.amazon.com/iam/)-based authentication. |
| 24 | +- <Icon icon="cube" /> **Frontend & backend services:** Containers run on [Amazon EKS](https://aws.amazon.com/eks/), orchestrated behind the ALB. routes requests to other services within the cluster as necessary. |
| 25 | +- <Icon icon="database" /> **Storage & databases:** |
| 26 | + - [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) or [Aurora](https://aws.amazon.com/rds/aurora/): metadata, projects, users, and short-term and long-term memory for deployed agents. LangSmith supports PostgreSQL version 14 or higher. |
| 27 | + - [Amazon ElastiCache (Redis)](https://aws.amazon.com/elasticache/redis/): caching and job queues. ElastiCache must be in single instance mode, running Redis OSS version 5 or higher. |
| 28 | + - ClickHouse + [Amazon EBS](https://aws.amazon.com/ebs/): analytics and trace storage. |
| 29 | + - We recommend using an [externally managed ClickHouse solution](/langsmith/self-host-external-clickhouse) unless security or compliance reasons |
| 30 | + prevent you from doing so. |
| 31 | + - ClickHouse is not required for hybrid deployments. |
| 32 | + - [Amazon S3](https://aws.amazon.com/s3/): object storage for trace artifacts and telemetry. |
| 33 | + |
| 34 | +- <Icon icon="sparkles" /> **LLM integration:** Optionally proxy requests to [Amazon Bedrock](https://aws.amazon.com/bedrock/) or [Amazon SageMaker](https://aws.amazon.com/sagemaker/) for LLM inference. |
| 35 | +- <Icon icon="chart-line" /> **Monitoring & observability:** Integrate with [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) |
| 36 | + |
| 37 | + |
| 38 | +## Compute options |
| 39 | + |
| 40 | +LangSmith supports multiple compute options depending on your requirements: |
| 41 | + |
| 42 | +| Compute option | Description | Suitable for | |
| 43 | +|-----------------|-------------|--------------| |
| 44 | +| **Elastic Kubernetes Service (preferred)** | Advanced scaling and multi-tenant support | Large enterprises | |
| 45 | +| **EC2-based** | Full control, BYO-infra | Regulated or air-gapped environments | |
| 46 | + |
| 47 | +## AWS Well-Architected best practices |
| 48 | + |
| 49 | +This reference is designed to align with the six pillars of the AWS Well-Architected Framework: |
| 50 | + |
| 51 | +### Operational excellence |
| 52 | + |
| 53 | +- Automate deployments with IaC ([CloudFormation](https://aws.amazon.com/cloudformation/) / [Terraform](https://www.terraform.io/)). |
| 54 | +- Use [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html) for configuration. |
| 55 | +- Configure your LangSmith instance to [export telemetry data](/langsmith/export-backend) and continuously monitor via [CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html). |
| 56 | +- The preferred method to manage [LangSmith deployments](/langsmith/deployments) is to create a CI process that builds [Agent Server](/langsmith/agent-server) images and pushes them to [ECR](https://aws.amazon.com/ecr/). Create a test deployment for pull requests before deploying a new revision to staging or production upon PR merge. |
| 57 | + |
| 58 | +### Security |
| 59 | + |
| 60 | +- Use [IAM](https://aws.amazon.com/iam/) roles with least-privilege policies. |
| 61 | +- Enable encryption at rest ([RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html), [S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingEncryption.html), ClickHouse volumes) and in transit (TLS 1.2+). |
| 62 | +- Integrate with [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) for credentials. |
| 63 | +- Use [Amazon Cognito](https://aws.amazon.com/cognito/) as an IDP in conjunction with LangSmith's built-in authentication and authorization features to secure access to agents and their tools. |
| 64 | + |
| 65 | +### Reliability |
| 66 | + |
| 67 | +- Replicate the LangSmith [data plane](/langsmith/data-plane) across regions: Deploy identical data planes to Kubernetes clusters in different regions for LangSmith Deployment. Deploy [RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html) and [ECS](https://aws.amazon.com/ecs/) services across [Multi-AZ](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/). |
| 68 | +- Implement [auto-scaling](https://aws.amazon.com/autoscaling/) for backend workers. |
| 69 | +- Use [Amazon Route 53](https://aws.amazon.com/route53/) health checks and failover policies. |
| 70 | + |
| 71 | +### Performance efficiency |
| 72 | + |
| 73 | +- Leverage [EC2](https://aws.amazon.com/ec2/) instances for optimized compute. |
| 74 | +- Use [S3 Intelligent-Tiering](https://aws.amazon.com/s3/storage-classes/intelligent-tiering/) for infrequently accessed trace data. |
| 75 | + |
| 76 | +### Cost optimization |
| 77 | + |
| 78 | +- Right-size [EKS](https://aws.amazon.com/eks/) clusters using [Compute Savings Plans](https://aws.amazon.com/savingsplans/compute-pricing/). |
| 79 | +- Monitor cost KPIs using [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) dashboards. |
| 80 | + |
| 81 | +### Sustainability |
| 82 | + |
| 83 | +- Minimize idle workloads with on-demand compute. |
| 84 | +- Store telemetry in low-latency, low-cost tiers. |
| 85 | +- Enable auto-shutdown for non-prod environments. |
| 86 | + |
| 87 | +## Security and compliance |
| 88 | + |
| 89 | +LangSmith can be configured for: |
| 90 | + |
| 91 | +- [PrivateLink](https://aws.amazon.com/privatelink/)-only access (no public internet exposure, besides egress necessary for billing). |
| 92 | +- [KMS](https://aws.amazon.com/kms/)-based encryption keys for S3, RDS, and EBS. |
| 93 | +- Audit logging to [CloudWatch](https://aws.amazon.com/cloudwatch/) and [AWS CloudTrail](https://aws.amazon.com/cloudtrail/). |
| 94 | + |
| 95 | +Customers can deploy in [GovCloud](https://aws.amazon.com/govcloud-us/), ISO, or HIPAA regions as needed. |
| 96 | + |
| 97 | +## Monitoring and evals |
| 98 | + |
| 99 | +Use LangSmith to: |
| 100 | + |
| 101 | +- Capture traces from LLM apps running on [Bedrock](https://aws.amazon.com/bedrock/) or [SageMaker](https://aws.amazon.com/sagemaker/). |
| 102 | +- Evaluate model outputs via [LangSmith datasets](/langsmith/manage-datasets). |
| 103 | +- Track latency, token usage, and success rates. |
| 104 | + |
| 105 | +Integrate with: |
| 106 | + |
| 107 | +- [AWS CloudWatch](https://aws.amazon.com/cloudwatch/) dashboards. |
| 108 | +- [OpenTelemetry](https://opentelemetry.io/) and [Prometheus](https://prometheus.io/) exporters. |
0 commit comments