Ops 3: Infrastructure Automation
Obsidian Dynamics acquired a three-person startup last quarter. Their “CTO” (who is also their intern) was given read access to the AWS account as part of onboarding. He deleted your Minecraft EC2 instance on his second day, thinking it was a test environment. It was not a test environment.
Rebuilding from your notes took most of a weekend. Leadership’s takeaway was not “let’s improve the onboarding process” but rather “we need an audit trail.” Every layer of the stack must now be automated: Terraform or OpenTofu declares infrastructure, Ansible configures the host, and a CI/CD pipeline tests and publishes container images. Nothing is hand-configured. Everything is rebuildable.
Learning Objectives
Section titled “Learning Objectives”- Model cloud infrastructure with Terraform or OpenTofu.
- Automate server configuration with Ansible so that it is idempotent and repeatable.
- Automate image publishing with a CI/CD pipeline.
- Design for rebuildability and auditability.
Constraints (AWS Academy)
Section titled “Constraints (AWS Academy)”- Use Terraform or OpenTofu for provisioning.
- Ansible is required for post-provision server configuration.
- CI/CD pipeline must use GitHub Actions.
- EC2 remains the compute target.
- Use the pre-existing
LabInstanceProfileto grant the EC2 instance AWS permissions; in AWS Academy this profile containsLabRole. Do not place AWS access keys on the host. - ECR remains the container registry for images.
- You must document how your state is handled.
Requirements
Section titled “Requirements”A. Provisioned Infrastructure (Terraform or OpenTofu)
Section titled “A. Provisioned Infrastructure (Terraform or OpenTofu)”Your Terraform or OpenTofu code must create (or explicitly and cleanly reference) the following:
- Networking placement for the instance and public entry point (VPC/subnet strategy must be stated; a direct public IP is acceptable, and a private subnet with a documented TCP-capable load balancer is also acceptable).
- Security Group rules: SSH for admin access and TCP 25565 for Minecraft clients at the appropriate public or internal boundary.
- EC2 instance configuration (AMI choice, instance type, storage choice).
- The pre-existing
LabInstanceProfileattached to the EC2 instance, enabling it to pull from ECR and write world backups to S3 without credentials on disk.
B. Configuration Management (Ansible)
Section titled “B. Configuration Management (Ansible)”An Ansible playbook configures the EC2 instance after Terraform provisions it.
- The playbook must:
- Install runtime prerequisites (Docker, AWS CLI).
- Authenticate to ECR using the instance profile credentials exposed on the host; in AWS Academy this comes from
LabInstanceProfile/LabRole. - Pull the pinned Minecraft image version from ECR.
- Mount the
/datavolume for persistent world data. - Restore world data from S3 into
/databefore starting the container when performing a rebuild or recovery. - Set required environment variables including
EULA=TRUE. - Start the Minecraft server container.
- Cloud-init/user-data may handle initial bootstrap (e.g., installing Ansible, cloning the repo), but all server configuration must live in the Ansible playbook.
- The playbook must be idempotent: re-running it against the same host produces the same result without duplication or errors.
C. Image Publishing Pipeline (CI/CD)
Section titled “C. Image Publishing Pipeline (CI/CD)”A GitHub Actions workflow automates image publishing to ECR.
- The workflow triggers when a git tag is pushed.
- The workflow must acquire the deployable image and publish it to ECR. Re-tagging a pinned upstream image is acceptable; building from a Dockerfile is also acceptable if you completed that extra credit in Assignment 2.
- The workflow must include a smoke test: run the image briefly and verify the Minecraft server initializes without errors.
- At least one successful pipeline run must be evidenced (link to the Actions tab or screenshot).
D. Rebuild Proof
Section titled “D. Rebuild Proof”- Demonstrate that
terraform destroyfollowed byterraform applyplus running the Ansible playbook produces a joinable Minecraft server. - Document what happens to world data in your rebuild strategy: the playbook should restore the world from S3 before starting the server so that players return to the same world after a rebuild.
- Your rebuild evidence must make it clear whether the same world was restored or a fresh world was created.
- A full end-to-end restore does not need to be shown in video if time-constrained, but the strategy must be documented.
E. Documentation
Section titled “E. Documentation”Your documentation must include:
- An architecture diagram showing AWS resources, Ansible configuration flow, and the CI/CD pipeline.
- Terraform inputs/variables and what they control.
- A change process: how a teammate would propose and review infrastructure changes.
- Link to your repository and a concise file map identifying the exact submission files for Terraform/OpenTofu, Ansible, GitHub Actions, and any supporting automation you used (for example: cloud-init, helper scripts, inventory files).
- A teardown checklist to prevent runaway cost.
These pointers cover Minecraft-specific integration points and AWS Academy constraints.
- Chaining Terraform and Ansible: Terraform can trigger your Ansible playbook automatically using a
null_resourcewith alocal-execprovisioner. This letsterraform applyprovision and configure the host in one step. The provisioner runs theansible-playbookcommand from your local machine, so SSH access and your inventory must be correct before wiring this up. - Minecraft EULA and startup environment: The Minecraft Docker image will exit immediately without starting if
EULA=TRUEis not set. Pass this via theenvironmentkey in your Ansible Docker task alongside any memory settings (JVM_OPTS,MEMORY). - Smoke test startup latency: Minecraft takes 30 to 60 seconds to finish loading. A smoke test that checks for
"Done"indocker logsimmediately afterdocker runwill always fail. Use a polling loop with a short sleep (e.g.,for i in $(seq 1 12); do sleep 10 && docker logs ...; done) or the container’s--health-*flags to wait for the server to be ready. - World data volume: The
itzg/minecraft-serverimage stores all world data under/datainside the container. Mount a named volume or host directory to/datain your Ansible task. Your S3 restore step must populate that path before the server starts, or the server will generate a fresh world on every rebuild. - AWS Academy credentials in GitHub Actions: Academy credentials include
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_SESSION_TOKEN. All three must be stored as GitHub Actions secrets and all three must be passed toaws-actions/configure-aws-credentials. These credentials are temporary and should be refreshed each time your Learner Lab session restarts or ends.
You can use any Minecraft server software you like. You can choose the Linux distribution you prefer.
What You’ll Submit
Section titled “What You’ll Submit”- PDF containing: brief design note, architecture diagram, Terraform/OpenTofu variables documentation, change process, teardown checklist, world-data recovery strategy, and a link to your repository with a brief file map for the submitted Terraform/OpenTofu, Ansible, GitHub Actions, and supporting automation files.
- Narrated screen recording (max 3 minutes). Your server MOTD must include your name or student ID. Submit timestamps alongside the video (e.g., “Checkpoint 1: 0:00, Checkpoint 2: 0:38, …”):
- Show
terraform applyoutput with new resources created, then the Ansible playbook running against the provisioned instance, and confirm the Minecraft server is running after the playbook completes. - Run
nmap -sV -Pn -p T:25565 <public-endpoint>against the Terraform/Ansible-provisioned service to show 25565/tcp open and display your custom MOTD. - Show a successful GitHub Actions pipeline run in the Actions tab (or screenshot) with the tag trigger, image acquisition/build, smoke test, and push steps visible.
- Show
terraform destroyremoving resources, thenterraform applyfollowed by the Ansible playbook, and confirm the Minecraft server is joinable after rebuild. Make it clear whether the same world was restored or a fresh world was created.
- Show
Rubric
Section titled “Rubric”Always refer to Canvas for the most up-to-date rubric information. Canvas's rubric will be used for grading.
| Criteria | Ratings |
|---|---|
| Video: Automated deploy with terraform apply and Ansible (10) Video checkpoint 1: `terraform apply` output shown with new resources created, Ansible playbook run shown against the provisioned instance, and service confirmed running after the playbook completes. | 10 pts Complete All three elements clearly shown: terraform apply output, Ansible playbook run, and service confirmed running. 5 pts Partial Two of three elements shown, or one is ambiguous (e.g., Ansible run not shown against the Terraform-provisioned instance, or service health not confirmed after playbook). 0 pts Missing No credible automated deploy sequence demonstrated. |
| Video: Service reachability after automated deploy (10) Video checkpoint 2: `nmap -sV -Pn -p T:25565 <public-endpoint>` output shows port open with Minecraft service responding and MOTD containing name or student ID, run against the Terraform/Ansible-provisioned service. | 10 pts Complete nmap output clearly shows 25565/tcp open with Minecraft service and MOTD containing name or student ID, run against the automated-deploy service. 5 pts Partial Reachability shown but MOTD is missing name/ID, nmap output is incomplete, or it is unclear whether the service was provisioned by Terraform/Ansible or manually. 0 pts Missing No credible post-automated-deploy reachability demonstrated. |
| Video: CI/CD pipeline success evidenced (10) Video checkpoint 3: GitHub Actions run tab or screenshot clearly shows a successful pipeline run triggered by a tag push, with image acquisition/build, smoke test, and image push steps visible. | 10 pts Complete Successful Actions run clearly shown with tag trigger, image acquisition/build, smoke test, and push steps all visible. 5 pts Partial Pipeline success evidenced but one step is not visible or unclear (e.g., smoke test not distinct, tag trigger not confirmed, image acquisition/build is not visible, or push not shown). 0 pts Missing No credible CI/CD pipeline success evidence provided. |
| Video: Destroy and rebuild cycle (10) Video checkpoint 4: `terraform destroy` shown removing resources, followed by `terraform apply` and Ansible playbook run, with the server confirmed joinable (e.g., nmap) after rebuild and the restoration outcome made clear (same world restored or fresh world created). | 10 pts Complete Full destroy, apply, and Ansible run shown in sequence; server confirmed joinable after rebuild; restoration outcome is clearly explained or evidenced. 5 pts Partial Destroy and rebuild attempted but one phase is unclear, server is not confirmed joinable after rebuild, or the restoration outcome is not made clear. 0 pts Missing No credible destroy and rebuild cycle demonstrated. |
| Terraform code quality (20) Evaluated on four elements: (1) all required resources provisioned or cleanly referenced (networking/subnet and public entry point strategy, Security Group with TCP 25565, EC2 instance, pre-existing `LabInstanceProfile` attached), (2) `LabInstanceProfile` enables ECR pull and S3 write without hardcoded credentials on the host, (3) Security Group rules are minimal and justified at the relevant admin and public entry points, (4) input variables are named, typed, and documented. | 20 pts All four elements All four elements present and correctly implemented. 15 pts Three elements Three of four elements present and correctly implemented; one minor gap. 10 pts Two elements Two of four elements present; significant missing configuration in networking, IAM, or SG. 0 pts One or zero elements Fewer than two elements present or Terraform code does not provision the required infrastructure. |
| Ansible playbook quality (15) Evaluated on three elements: (1) playbook completes all required configuration steps (install prerequisites, authenticate to ECR, pull pinned image, configure persistent storage, restore world data from S3 when rebuilding/recovering, set required environment variables including `EULA=TRUE`, start service), (2) playbook is idempotent; re-running against the same host produces no errors and no unintended changes, (3) tasks are clearly named and logically ordered. | 15 pts All three elements All three elements present; playbook is complete, idempotent, and clearly structured. 10 pts Two elements Two of three elements present; one area has meaningful gaps (e.g., a step is missing, idempotency not confirmed, or task ordering is unclear). 5 pts One element Only one element clearly present; significant manual intervention still required to reach a running service. 0 pts Missing No Ansible playbook provided or playbook does not produce a working service. |
| CI/CD pipeline implementation (10) Evaluated on three elements: (1) GitHub Actions workflow triggers on a git tag push, (2) workflow acquires or builds the deployable image, smoke-tests it, and publishes it to ECR, (3) at least one successful pipeline run is clearly evidenced (link to Actions tab or screenshot showing green run with all steps). | 10 pts All three elements All three elements present and correctly implemented. 7 pts Two elements Two of three elements present; one is missing or weak (e.g., smoke test is trivial, trigger is on push not tag, publish step missing, or evidence of successful run is ambiguous). 4 pts One element Only one element clearly present; pipeline does not reliably publish a tested image to ECR. 0 pts Missing No CI/CD pipeline provided or pipeline does not publish images to ECR. |
| Documentation and rebuild strategy (15) All four required documentation sections present and usable (architecture diagram showing AWS resources + Ansible flow + CI/CD pipeline, Terraform variable documentation, change process for infrastructure changes, teardown checklist to prevent runaway cost), and world data recovery strategy documented and defensible. | 15 pts Exemplary All five items (four doc sections + world data strategy) present, accurate, and usable by another operator without asking clarifying questions. 12 pts Proficient Four of five items complete or one section has a minor gap; overall usable by a new operator. 8 pts Developing Three of five items complete or multiple sections are too vague to act on; teardown checklist or world data strategy missing. 0 pts Insufficient Fewer than three items present or documentation is not usable by another operator. |
Extra Credit (up to +10)
Section titled “Extra Credit (up to +10)”- Remote Terraform State (+3): Configure an S3 backend with state locking; document the setup and tradeoffs.
- Ansible Role Reuse (+4): Structure the playbook as reusable role(s) with variables; demonstrate running against a second instance or show how the role could be reused.
- Pipeline Hardening (+3): Add linting, security scanning, or build caching to the CI/CD pipeline; document what each step catches.