Terraform > 0.12.0 Nomad >= 1.1.0 aws-iam-authenticator >= 0.5.0 docker java
Diagram : <./diagram>
- Nomad : External Nomad Server
- EC2 : Nomad Client
The demo makes some assumptions because of the quick, and local nature of it.
- AWS access credentials will be available via environment variables or default profile. This is needed by Terraform and Nomad. Please see the AWS documentation for more details on setting this up.
- The ECS cluster is running within
us-east-1. If you need to change the AWS region please update./nomad/{client-1,client-2,server}.hclfiles.
When running this demo, a small number of AWS resources will be created. The majority of these do not incur direct costs, however, the ECS task does. For mre information regarding this, please visit the Fargate pricing document. Also beware AWS data transfer costs.
-
Change directory into the Terraform directory:
$ cd ./terraform -
Modify the Terraform variables file with any custom configuration. The file is located at
./terraform/variables.tf. -
Perform the Terraform initialisation:
$ terraform init -
Apply the Terraform plan to build out the AWS resources:
$ terraform apply -auto-approve -
The Terraform output will contain
demo_subnet_idanddemo_security_group_idvalues, these should be noted for later use. -
Start the Nomad server and clients. Ideally each command is run in a separate terminal allowing for easy following of logs:
$ cd ../nomad $ nomad agent -config=server.hcl $ nomad agent -config=client-1.hcl -plugin-dir=$(pwd)/plugins $ nomad agent -config=client-2.hcl -plugin-dir=$(pwd)/plugins -
Check the ECS driver status on a client node:
$ nomad node status # To see Node IDs $ nomad node status <node-id> |grep "Driver Status"
The following steps will demonstrate how Nomad, and the remote driver handle multiple situations that operators will likely come across during day-to-day cluster management. Notably, how Nomad attempts to minimise the impact of task availability even when its availability is degraded.
- Using the Terraform output from before, update the
nomad/demo-ecs.nomadfile to reflect these details. In particular these two parameters need updating:security_groups = ["sg-0d647d4c7ce15034f"] subnets = ["subnet-010b03f1a021887ff"] - Submit the remote task driver job to the cluster:
$ nomad run demo-ecs.nomad - Check the allocation status, and the logs to show the client is remotely monitoring the task:
$ nomad status nomad-ecs-demo $ nomad logs -f <alloc-id> - Navigate to the AWS ECS console and check the running tasks on the cluster. The URL will look like
https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters/nomad-remote-driver-demo/tasks, but be sure to change the region if needed. - Drain the node on which the remote task is currently being monitored from. This will cause Nomad to create a new allocation, but will not impact the remote task:
$ nomad node drain -enable <node-id> - Here you can again check the logs of the new allocation and the AWS console to check the status of the ECS task. You should notice the remote task remains running, and the new allocation logs attach and monitor the same task as the previous allocation.
- Remove the drain status from the previously drained node so that it is available for scheduling again:
$ nomad node drain -disable <node-id> - Kill the Nomad client which is currently running to simulate a lost node situation. This can be done either by control-c of the process, or using kill -9.
- Check the logs of the new allocation and the AWS console to check the status of the ECS task. You should notice the remote task remains running, and the new allocation logs attach, and monitor the same task as the previous allocation.
- Now updated the ECS task definition. This process has been wrapped via Terraform using variables:
$ cd ../terraform $ terraform apply -var 'ecs_task_definition_file=./files/updated-demo.json' -auto-approve - Update the job specification in order to deploy to new, updated task definition:
$ cd ../nomad $ sed -ie "s/nomad-remote-driver-demo:1/nomad-remote-driver-demo:2/g" demo-ecs.nomad - Register the updated job on the Nomad cluster:
$ nomad run demo-ecs.nomad - Observing the AWS console, there will be a new task provisioning. Filtering by status
stoppedshows the previous task instoppingstatus. Nomad has successfully deployed the new version of the task. - Stop the Nomad job and observe the task stopping within AWS:
$ nomad stop nomad-ecs-demo
-
Stop the Nomad clients and server processes, either by control-c or killing the process IDs.
-
Destroy the created AWS resources, performing a plan and checking the destroy is targeting the expected resources:
$ cd ../terraform $ terraform destroy -auto-approve
