GitHub Template Repository for deploying LabLink infrastructure to AWS
Deploy your own LabLink infrastructure for cloud-based VM allocation and management. This template uses Terraform and GitHub Actions to automate deployment of the LabLink allocator service to AWS.
📖 Main Documentation: https://talmolab.github.io/lablink/
LabLink automates deployment and management of cloud-based VMs for running research software. It provides:
- Web interface for requesting and managing VMs
- Automatic VM provisioning with your software pre-installed
- GPU support for ML/AI workloads
- Chrome Remote Desktop access to VM GUI
- Flexible configuration for different research needs
Click the "Use this template" button at the top of this repository to create your own deployment repository.
The setup script creates AWS infrastructure and GitHub secrets:
./scripts/setup.shWhat the script does:
- Checks prerequisites (AWS CLI, GitHub CLI, credentials)
- Creates OIDC provider and IAM role for GitHub Actions
- Creates S3 bucket (with versioning) and DynamoDB table
- Creates Route53 hosted zone (if using custom domain)
- Sets GitHub secrets (
AWS_ROLE_ARN,AWS_REGION,ADMIN_PASSWORD,DB_PASSWORD) - Calls
configure.shto generatelablink-infrastructure/config/config.yaml - Verifies all resources were created successfully
The script is idempotent — safe to re-run if interrupted.
Updating configuration later: To change settings like instance type, image tags, or DNS options without re-creating infrastructure, run the configuration wizard directly:
./scripts/configure.shThis can be run as many times as needed. It reads your existing config.yaml values as defaults.
Important: The config file path (lablink-infrastructure/config/config.yaml) is hardcoded in the infrastructure. Do not move or rename this file.
See Configuration Reference for all options, or Manual Setup if you prefer to create resources individually.
Via GitHub Actions (Recommended):
- Go to Actions → "Deploy LabLink Infrastructure"
- Click "Run workflow"
- Select environment (
test,prod, orci-test) - Click "Run workflow"
Via Local Terraform:
cd lablink-infrastructure
../scripts/init-terraform.sh test
terraform apply -var="resource_suffix=test"After deployment completes:
- Allocator URL: Check workflow output or Terraform output for the URL/IP
- SSH Access: Download the PEM key from workflow artifacts
- Web Interface: Navigate to allocator URL in your browser
-
AWS Account with permissions to create:
- EC2 instances
- Security Groups
- Elastic IPs
- (Optional) Route 53 records for DNS
-
GitHub Account with ability to:
- Create repositories from templates
- Configure GitHub Actions secrets
- Run GitHub Actions workflows
-
Basic Knowledge of:
- Terraform (helpful but not required)
- AWS services
Before deploying, you must set up:
- S3 Bucket for Terraform state storage
- IAM Role for GitHub Actions OIDC authentication
- (Optional) Elastic IP for persistent allocator address
- (Optional) Route 53 Hosted Zone for custom domain
See AWS Setup Guide below for detailed instructions.
Create an IAM role with OIDC provider for GitHub Actions:
-
Create OIDC provider in IAM (if not exists):
- Provider URL:
https://token.actions.githubusercontent.com - Audience:
sts.amazonaws.com
- Provider URL:
-
Create IAM role with trust policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringLike": { "token.actions.githubusercontent.com:sub": "repo:YOUR_ORG/YOUR_REPO:*" } } } ] } -
Attach permissions:
PowerUserAccess(or custom policy with EC2, VPC, S3, Route53, IAM permissions)
-
Copy the Role ARN and add to GitHub secrets
The AWS region where your infrastructure will be deployed. Must match the region in your config.yaml.
Common regions:
us-west-2(Oregon)us-east-1(N. Virginia)eu-west-1(Ireland)
Important: AMI IDs are region-specific. If you change regions, update the ami_id in config.yaml.
Password for accessing the allocator web interface. Choose a strong password (12+ characters, mixed case, numbers, symbols).
This password is used to log in to the admin dashboard where you can:
- Create and destroy client VMs
- View VM status
- Assign VMs to users
Password for the PostgreSQL database used by the allocator service. Choose a different strong password than ADMIN_PASSWORD.
This is stored securely and injected into the configuration at deployment time.
The setup script creates all infrastructure and secrets in one go:
./scripts/setup.shThis creates all required AWS resources (OIDC provider, IAM role, S3 bucket, DynamoDB table, Route53 hosted zone), sets GitHub secrets, and calls configure.sh to generate config.yaml. It is idempotent and safe to re-run.
To update configuration later (instance types, image tags, DNS/SSL options, etc.), run the config wizard directly:
./scripts/configure.shWhat the script does NOT do:
- Does NOT register domain names (you must register via Route53 registrar, CloudFlare, or other registrar)
- Does NOT create DNS records (Terraform handles these, or you create manually)
After setup, your DNS/SSL approach is configured based on your wizard choices:
- Route53 + Let's Encrypt: Register domain, update nameservers to Route53
- CloudFlare DNS + SSL: Manage domain/DNS in CloudFlare, create A record pointing to allocator IP
- IP-only (no DNS/SSL): Access via IP address directly
If you prefer to create resources manually:
# Create bucket (must be globally unique across ALL of AWS)
aws s3 mb s3://tf-state-YOUR-ORG-lablink --region us-west-2
# Enable versioning (recommended)
aws s3api put-bucket-versioning \
--bucket tf-state-YOUR-ORG-lablink \
--versioning-configuration Status=EnabledUpdate bucket_name in lablink-infrastructure/config/config.yaml to match.
aws dynamodb create-table \
--table-name lock-table \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-west-2For persistent allocator IP address across deployments:
# Allocate EIP
aws ec2 allocate-address --domain vpc --region us-west-2
# Tag it for reuse
aws ec2 create-tags \
--resources eipalloc-XXXXXXXX \
--tags Key=Name,Value=lablink-eipUpdate eip.tag_name in config.yaml if using a different tag name.
If using a custom domain:
-
Create or use existing hosted zone:
aws route53 create-hosted-zone --name your-domain.com --caller-reference $(date +%s) -
Update your domain's nameservers to point to Route 53 NS records
-
Update
dnssection inconfig.yaml:dns: enabled: true domain: "your-domain.com" zone_id: "Z..." # Optional - will auto-lookup if empty
See GitHub Secrets Setup above for detailed IAM role configuration.
All configuration is in lablink-infrastructure/config/config.yaml.
db:
dbname: "lablink_db"
user: "lablink"
password: "PLACEHOLDER_DB_PASSWORD" # Injected from GitHub secret
host: "localhost"
port: 5432machine:
machine_type: "g4dn.xlarge" # AWS instance type
image: "ghcr.io/talmolab/lablink-client-base-image:latest" # Docker image
ami_id: "ami-0601752c11b394251" # Region-specific AMI
repository: "https://github.com/YOUR_ORG/YOUR_REPO.git" # Your code/data repo
software: "your-software" # Software identifier
extension: "ext" # Data file extensionInstance Types:
g4dn.xlarge- GPU instance (NVIDIA T4, good for ML)t3.large- CPU-only, cheaperp3.2xlarge- More powerful GPU (NVIDIA V100)
AMI IDs (Ubuntu 24.04 with Docker + Nvidia):
us-west-2:ami-0601752c11b394251- Other regions: Use AWS Console to find similar AMI or create custom
app:
admin_user: "admin"
admin_password: "PLACEHOLDER_ADMIN_PASSWORD" # Injected from secret
region: "us-west-2" # Must match AWS_REGION secretdns:
enabled: false # true to use DNS, false for IP-only
terraform_managed: false # true = Terraform creates records
domain: "lablink.example.com" # Full domain name (e.g., test.lablink.example.com)
zone_id: "" # Leave empty for auto-lookupDomain Naming:
- Specify the full domain directly (e.g.,
lablink.example.comortest.lablink.example.com) - No automatic subdomain construction - use exactly what you specify
ssl:
provider: "none" # "letsencrypt", "cloudflare", "acm", or "none"
email: "admin@example.com" # For Let's Encrypt notifications
certificate_arn: "" # Required when provider="acm"SSL Providers:
none: HTTP only (for testing)letsencrypt: Automatic SSL with Caddy (production certs)cloudflare: Use CloudFlare proxy for SSLacm: AWS Certificate Manager via Application Load Balancer
ssl.provider: "letsencrypt"), be aware of rate limits:
| Limit Type | Limit | Lockout Period |
|---|---|---|
| Certificates per exact domain | 5 per week | 7 days |
| Certificates per registered domain | 50 per week | 7 days |
What this means:
- You can only deploy the same domain (e.g.,
test.lablink.example.com) 5 times in 7 days - If you hit the limit, you must wait 7 days before deploying that domain again
- No override available for the per-domain limit
Testing Strategies to Avoid Rate Limits:
| Strategy | DNS | SSL | Use Case | Rate Limit Risk |
|---|---|---|---|---|
| IP-only | Disabled | None | Development/debugging | ✅ None |
| CloudFlare | Enabled | CloudFlare | Frequent testing | ✅ None |
| Subdomain rotation | Enabled | Let's Encrypt | SSL testing | |
| Production | Enabled | Let's Encrypt | Stable deployment |
📖 See Testing Best Practices for detailed testing strategies and monitoring certificate usage.
eip:
strategy: "persistent" # "persistent" or "dynamic"
tag_name: "lablink-eip" # Tag to find reusable EIPDeploys or updates your LabLink infrastructure.
Triggers:
- Manual: Actions → "Deploy LabLink Infrastructure" → Run workflow
- Automatic: Push to
testbranch
Inputs:
environment:testorprod
What it does:
- Configures AWS credentials via OIDC
- Injects passwords from GitHub secrets into config
- Runs Terraform to create/update infrastructure
- Verifies deployment and DNS
- Uploads SSH key as artifact
Triggers:
- Manual only: Actions → "Destroy LabLink Infrastructure" → Run workflow
Inputs:
confirm_destroy: Must type "yes" to confirmenvironment:testorprod
What it does:
- Creates a minimal terraform backend configuration
- Initializes Terraform with S3 backend to access client VM state
- Destroys client VMs directly from the S3 state (for test/prod/ci-test)
- Destroys the allocator infrastructure (EC2, security groups, EIP, etc.)
Note: Client VM state is stored in S3 (same bucket as infrastructure state). Terraform can destroy resources using only the state file - no terraform configuration files needed!
If the destroy workflow fails or leaves orphaned resources, see the Manual Cleanup Guide for step-by-step procedures to:
- Remove orphaned IAM roles, policies, and instance profiles
- Clean up leftover EC2 instances, security groups, and key pairs
- Fix Terraform state file issues (checksum mismatches, corrupted state)
- Verify complete resource removal
Common scenarios covered:
- Destroy workflow failures
- "Resource in use" errors
- Orphaned client VMs
- State lock issues
-
Update
config.yaml:machine: repository: "https://github.com/your-org/your-software-data.git" software: "your-software-name" extension: "your-file-ext" # e.g., "h5", "npy", "csv"
-
(Optional) Use custom Docker image:
machine: image: "ghcr.io/your-org/your-custom-image:latest"
-
Update
config.yaml:app: region: "eu-west-1" # Your region machine: ami_id: "ami-XXXXXXX" # Region-specific AMI
-
Update GitHub secret
AWS_REGION -
Find appropriate AMI for region (Ubuntu 24.04 with Docker)
machine:
machine_type: "t3.xlarge" # No GPU, cheaper
# or
machine_type: "p3.2xlarge" # More powerful GPUSee AWS EC2 Instance Types for options.
The client VMs can be configured with a custom startup script. See the LabLink Infrastructure README for more details.
Cause: Destroy workflow failed or Terraform state is out of sync with AWS resources
Solution: Use the automated cleanup script:
# Dry-run to see what would be deleted
./scripts/cleanup-orphaned-resources.sh <environment> --dry-run
# Actual cleanup
./scripts/cleanup-orphaned-resources.sh <environment>The script automatically reads configuration from config.yaml, backs up Terraform state files, and deletes resources in the correct dependency order. For detailed manual cleanup procedures, see MANUAL_CLEANUP_GUIDE.md.
Cause: AMI ID doesn't exist in your region
Solution: Update ami_id in config.yaml with a region-appropriate AMI
Cause: Security group or DNS not configured
Solution:
- Check security group allows inbound traffic on port 5000
- If using DNS, verify DNS records propagated
- Try accessing via public IP first
Cause: Previous deployment didn't complete or cleanup
Solution:
# In lablink-infrastructure/
terraform force-unlock LOCK_IDCause: DNS propagation delay or Route 53 not configured
Solution:
- Wait 5-10 minutes for propagation
- Verify Route 53 hosted zone exists
- Check nameservers match at domain registrar
- Use
nslookup your-domain.comto test
- Main Documentation: https://talmolab.github.io/lablink/
- Infrastructure Docs: lablink-infrastructure/README.md
- GitHub Issues: https://github.com/talmolab/lablink/issues
- Deployment Checklist: DEPLOYMENT_CHECKLIST.md
lablink-template/
├── .github/workflows/ # GitHub Actions workflows
│ ├── terraform-deploy.yml # Deploy infrastructure
│ └── terraform-destroy.yml # Destroy infrastructure (includes client VMs)
├── lablink-infrastructure/ # Terraform infrastructure
│ ├── config/
│ │ ├── config.yaml # Main configuration
│ │ └── *.example.yaml # Configuration examples
│ ├── main.tf # Core Terraform config
│ ├── backend-*.hcl # Environment-specific backends
│ ├── user_data.sh # EC2 initialization script
│ └── README.md # Infrastructure documentation
├── scripts/ # Helper scripts
│ ├── init-terraform.sh # Terraform init helper
│ └── verify-deployment.sh # Deployment verification
├── MANUAL_CLEANUP_GUIDE.md # Manual cleanup procedures
├── README.md # This file
├── DEPLOYMENT_CHECKLIST.md # Pre-deployment checklist
└── LICENSE
Found an issue with the template or want to suggest improvements?
- Open an issue: https://github.com/talmolab/lablink-template/issues
- For LabLink core issues: https://github.com/talmolab/lablink/issues
BSD 2-Clause License - see LICENSE file for details.
- Main LabLink Repository: https://github.com/talmolab/lablink
- Documentation: https://talmolab.github.io/lablink/
- Template Repository: https://github.com/talmolab/lablink-template
- Example Deployment: https://github.com/talmolab/sleap-lablink (SLEAP-specific configuration)
Need Help? Check the Deployment Checklist or Troubleshooting section above.