AWS Deployment Guide¶
This guide walks through deploying SCP on AWS using CloudFormation.
Prerequisites¶
Before starting, ensure you have:
- AWS Account with permissions to create:
- VPC, Subnets, Internet Gateway, NAT Gateway
- EC2 instances, Security Groups
- Application Load Balancer
- SSM Parameter Store entries
- S3 buckets
- IAM roles and policies
-
CloudWatch resources
-
Domain and SSL Certificate:
- Domain name for SCP (e.g.,
scp.example.com) -
ACM certificate for the domain (in the deployment region)
-
EC2 Key Pair:
-
Create a key pair in your target region for SSH access
-
AWS CLI configured with appropriate credentials
Quick Start¶
1. Clone the Repository¶
git clone https://github.com/tim-mccrimmon/supervisory-control-plane.git
cd supervisory-control-plane/deployment/aws
2. Prepare Parameters¶
Create a parameters file:
cat > params.json <<EOF
[
{"ParameterKey": "Environment", "ParameterValue": "production"},
{"ParameterKey": "InstanceType", "ParameterValue": "t3.large"},
{"ParameterKey": "KeyName", "ParameterValue": "your-key-pair-name"},
{"ParameterKey": "SCPVersion", "ParameterValue": "0.3.0"},
{"ParameterKey": "DomainName", "ParameterValue": "scp.example.com"},
{"ParameterKey": "CertificateArn", "ParameterValue": "arn:aws:acm:us-east-1:123456789012:certificate/xxx"},
{"ParameterKey": "TelemetryEndpoint", "ParameterValue": "https://telemetry.scp.example.com/v1/report"},
{"ParameterKey": "DBPassword", "ParameterValue": "your-secure-db-password-here"},
{"ParameterKey": "JWTSecret", "ParameterValue": "your-secure-jwt-secret-here-min-32-chars"}
]
EOF
Important: Use strong, unique passwords. Generate them with:
# DB Password
python3 -c "import secrets; print(secrets.token_urlsafe(24))"
# JWT Secret
python3 -c "import secrets; print(secrets.token_hex(32))"
3. Deploy the Stack¶
aws cloudformation create-stack \
--stack-name scp-production \
--template-body file://cloudformation/scp-stack.yaml \
--parameters file://params.json \
--capabilities CAPABILITY_NAMED_IAM \
--region us-east-1
4. Wait for Completion¶
aws cloudformation wait stack-create-complete \
--stack-name scp-production \
--region us-east-1
# Get outputs
aws cloudformation describe-stacks \
--stack-name scp-production \
--query 'Stacks[0].Outputs' \
--region us-east-1
5. Configure DNS¶
Point your domain to the ALB:
- Get the ALB DNS name from the stack outputs
- Create a CNAME or Alias record in your DNS provider:
- CNAME:
scp.example.com→scp-production-alb-xxx.us-east-1.elb.amazonaws.com - Route53 Alias: Use the ALB's hosted zone ID from outputs
6. Verify Deployment¶
# Health check
curl https://scp.example.com/health
# Expected response:
# {"status": "healthy", "service": "api", "version": "0.3.0"}
Architecture Overview¶
The CloudFormation template creates:
┌─────────────────────────────────────────────────────────────┐
│ VPC │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Public Subnet │ │ Public Subnet │ │
│ │ (us-east-1a) │ │ (us-east-1b) │ │
│ │ │ │ │ │
│ │ ┌───────────┐ │ │ │ │
│ │ │ ALB │ │ │ │ │
│ │ └─────┬─────┘ │ │ │ │
│ │ │ │ │ │ │
│ │ ┌─────┴─────┐ │ │ │ │
│ │ │ NAT │ │ │ │ │
│ │ └───────────┘ │ │ │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Private Subnet │ │ Private Subnet │ │
│ │ (us-east-1a) │ │ (us-east-1b) │ │
│ │ │ │ │ │
│ │ ┌───────────┐ │ │ │ │
│ │ │ EC2 │ │ │ │ │
│ │ │ (SCP) │ │ │ │ │
│ │ └───────────┘ │ │ │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Service Ports¶
| Service | Port | Protocol | Description |
|---|---|---|---|
| API Server | 8000 | HTTP | Bundle Registry, Agent Registry |
| Rules Engine | 8001 | HTTP | Business rules management |
| Context Service | 8002 | gRPC | Agent context streaming |
| Context Service | 8003 | HTTP | Webhook endpoint |
| Telemetry | 8004 | HTTP | Usage reporting |
Security Configuration¶
Security Groups¶
The template creates two security groups:
- ALB Security Group: Allows inbound 80 and 443 from anywhere
- EC2 Security Group: Allows inbound from ALB only, plus SSH from within VPC
Secrets Management¶
Secrets are stored in SSM Parameter Store:
- /scp/production/db-password (SecureString)
- /scp/production/jwt-secret (SecureString)
- /scp/production/deployment-id (String)
The EC2 instance has an IAM role that can read these parameters.
Encryption¶
- EBS volumes are encrypted at rest
- S3 buckets use AES-256 encryption
- ALB terminates TLS with ACM certificate
Monitoring¶
CloudWatch Alarms¶
The template creates two alarms:
- High CPU: Triggers when CPU > 80% for 5 minutes
- Health Check: Triggers when ALB health checks fail
Logs¶
Logs are sent to CloudWatch Logs:
- /scp/api - API Server logs
- /scp/rules - Rules Engine logs
- /scp/context - Context Service logs
- /scp/telemetry - Telemetry Service logs
- /scp/system - System logs
Troubleshooting¶
SSH Access¶
# Get instance ID
INSTANCE_ID=$(aws cloudformation describe-stacks \
--stack-name scp-production \
--query 'Stacks[0].Outputs[?OutputKey==`InstanceID`].OutputValue' \
--output text)
# Connect via SSM (recommended - no SSH key needed)
aws ssm start-session --target $INSTANCE_ID
# Or via bastion (requires SSH key)
ssh -i your-key.pem ubuntu@<bastion-ip>
ssh ubuntu@<private-ip>
Service Logs¶
# On the instance
cd /opt/scp
docker compose logs -f api
docker compose logs -f rules
docker compose logs -f context
docker compose logs -f telemetry
Health Checks¶
# Check each service
curl http://localhost:8000/health # API
curl http://localhost:8001/health # Rules
curl http://localhost:8003/health # Context
curl http://localhost:8004/health # Telemetry
Common Issues¶
Services not starting:
# Check container status
docker compose ps
# Check logs for errors
docker compose logs --tail=100
Database connection issues:
# Verify PostgreSQL is healthy
docker compose exec postgres pg_isready -U postgres
ALB health checks failing: 1. Verify security groups allow ALB → EC2 2. Check that services are running on correct ports 3. Review service logs for errors
Costs¶
Estimated monthly costs (us-east-1, production):
| Resource | Specification | Est. Cost |
|---|---|---|
| EC2 | t3.large (on-demand) | ~$60 |
| ALB | + data processing | ~$20 |
| NAT Gateway | + data processing | ~$35 |
| EBS | 100GB gp3 | ~$8 |
| S3 | Backups (~10GB) | ~$0.25 |
| CloudWatch | Logs + metrics | ~$5 |
| Total | ~$130/month |
Tips to reduce costs: - Use Reserved Instances for EC2 (up to 72% savings) - Use Spot Instances for non-production - Remove NAT Gateway if instances don't need outbound internet
Next Steps¶
After deployment:
- Register your first agent: See
control-plane/docs/pilot/ONBOARDING.mdin the repository, or use the Getting Started guide - Load bundles: Use the
load_bundle.pyscript - Set up backups: Verify daily backup cron is running
- Configure alerts: Add SNS topics for CloudWatch alarms