Skip to content

Ops 3: Infrastructure Automation

Obsidian Dynamics acquired a three-person startup last quarter. Their “CTO” (who is also their intern) was given read access to the AWS account as part of onboarding. He deleted your Minecraft EC2 instance on his second day, thinking it was a test environment. It was not a test environment.

Rebuilding from your notes took most of a weekend. Leadership’s takeaway was not “let’s improve the onboarding process” but rather “we need an audit trail.” Every layer of the stack must now be automated: Terraform or OpenTofu declares infrastructure, Ansible configures the host, and a CI/CD pipeline tests and publishes container images. Nothing is hand-configured. Everything is rebuildable.

  • Model cloud infrastructure with Terraform or OpenTofu.
  • Automate server configuration with Ansible so that it is idempotent and repeatable.
  • Automate image publishing with a CI/CD pipeline.
  • Design for rebuildability and auditability.
  • Use Terraform or OpenTofu for provisioning.
  • Ansible is required for post-provision server configuration.
  • CI/CD pipeline must use GitHub Actions.
  • EC2 remains the compute target.
  • Use the pre-existing LabInstanceProfile to grant the EC2 instance AWS permissions; in AWS Academy this profile contains LabRole. Do not place AWS access keys on the host.
  • ECR remains the container registry for images.
  • You must document how your state is handled.

A. Provisioned Infrastructure (Terraform or OpenTofu)

Section titled “A. Provisioned Infrastructure (Terraform or OpenTofu)”

Your Terraform or OpenTofu code must create (or explicitly and cleanly reference) the following:

  • Networking placement for the instance and public entry point (VPC/subnet strategy must be stated; a direct public IP is acceptable, and a private subnet with a documented TCP-capable load balancer is also acceptable).
  • Security Group rules: SSH for admin access and TCP 25565 for Minecraft clients at the appropriate public or internal boundary.
  • EC2 instance configuration (AMI choice, instance type, storage choice).
  • The pre-existing LabInstanceProfile attached to the EC2 instance, enabling it to pull from ECR and write world backups to S3 without credentials on disk.

An Ansible playbook configures the EC2 instance after Terraform provisions it.

  • The playbook must:
    • Install runtime prerequisites (Docker, AWS CLI).
    • Authenticate to ECR using the instance profile credentials exposed on the host; in AWS Academy this comes from LabInstanceProfile / LabRole.
    • Pull the pinned Minecraft image version from ECR.
    • Mount the /data volume for persistent world data.
    • Restore world data from S3 into /data before starting the container when performing a rebuild or recovery.
    • Set required environment variables including EULA=TRUE.
    • Start the Minecraft server container.
  • Cloud-init/user-data may handle initial bootstrap (e.g., installing Ansible, cloning the repo), but all server configuration must live in the Ansible playbook.
  • The playbook must be idempotent: re-running it against the same host produces the same result without duplication or errors.

A GitHub Actions workflow automates image publishing to ECR.

  • The workflow triggers when a git tag is pushed.
  • The workflow must acquire the deployable image and publish it to ECR. Re-tagging a pinned upstream image is acceptable; building from a Dockerfile is also acceptable if you completed that extra credit in Assignment 2.
  • The workflow must include a smoke test: run the image briefly and verify the Minecraft server initializes without errors.
  • At least one successful pipeline run must be evidenced (link to the Actions tab or screenshot).
  • Demonstrate that terraform destroy followed by terraform apply plus running the Ansible playbook produces a joinable Minecraft server.
  • Document what happens to world data in your rebuild strategy: the playbook should restore the world from S3 before starting the server so that players return to the same world after a rebuild.
  • Your rebuild evidence must make it clear whether the same world was restored or a fresh world was created.
  • A full end-to-end restore does not need to be shown in video if time-constrained, but the strategy must be documented.

Your documentation must include:

  • An architecture diagram showing AWS resources, Ansible configuration flow, and the CI/CD pipeline.
  • Terraform inputs/variables and what they control.
  • A change process: how a teammate would propose and review infrastructure changes.
  • Link to your repository and a concise file map identifying the exact submission files for Terraform/OpenTofu, Ansible, GitHub Actions, and any supporting automation you used (for example: cloud-init, helper scripts, inventory files).
  • A teardown checklist to prevent runaway cost.

These pointers cover Minecraft-specific integration points and AWS Academy constraints.

  • Chaining Terraform and Ansible: Terraform can trigger your Ansible playbook automatically using a null_resource with a local-exec provisioner. This lets terraform apply provision and configure the host in one step. The provisioner runs the ansible-playbook command from your local machine, so SSH access and your inventory must be correct before wiring this up.
  • Minecraft EULA and startup environment: The Minecraft Docker image will exit immediately without starting if EULA=TRUE is not set. Pass this via the environment key in your Ansible Docker task alongside any memory settings (JVM_OPTS, MEMORY).
  • Smoke test startup latency: Minecraft takes 30 to 60 seconds to finish loading. A smoke test that checks for "Done" in docker logs immediately after docker run will always fail. Use a polling loop with a short sleep (e.g., for i in $(seq 1 12); do sleep 10 && docker logs ...; done) or the container’s --health-* flags to wait for the server to be ready.
  • World data volume: The itzg/minecraft-server image stores all world data under /data inside the container. Mount a named volume or host directory to /data in your Ansible task. Your S3 restore step must populate that path before the server starts, or the server will generate a fresh world on every rebuild.
  • AWS Academy credentials in GitHub Actions: Academy credentials include AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN. All three must be stored as GitHub Actions secrets and all three must be passed to aws-actions/configure-aws-credentials. These credentials are temporary and should be refreshed each time your Learner Lab session restarts or ends.

You can use any Minecraft server software you like. You can choose the Linux distribution you prefer.

  1. PDF containing: brief design note, architecture diagram, Terraform/OpenTofu variables documentation, change process, teardown checklist, world-data recovery strategy, and a link to your repository with a brief file map for the submitted Terraform/OpenTofu, Ansible, GitHub Actions, and supporting automation files.
  2. Narrated screen recording (max 3 minutes). Your server MOTD must include your name or student ID. Submit timestamps alongside the video (e.g., “Checkpoint 1: 0:00, Checkpoint 2: 0:38, …”):
    1. Show terraform apply output with new resources created, then the Ansible playbook running against the provisioned instance, and confirm the Minecraft server is running after the playbook completes.
    2. Run nmap -sV -Pn -p T:25565 <public-endpoint> against the Terraform/Ansible-provisioned service to show 25565/tcp open and display your custom MOTD.
    3. Show a successful GitHub Actions pipeline run in the Actions tab (or screenshot) with the tag trigger, image acquisition/build, smoke test, and push steps visible.
    4. Show terraform destroy removing resources, then terraform apply followed by the Ansible playbook, and confirm the Minecraft server is joinable after rebuild. Make it clear whether the same world was restored or a fresh world was created.

Always refer to Canvas for the most up-to-date rubric information. Canvas's rubric will be used for grading.

Infrastructure Automation on EC2 (Total: 100 pts)
Criteria Ratings
Video: Automated deploy with terraform apply and Ansible (10)
Video checkpoint 1: `terraform apply` output shown with new resources created, Ansible playbook run shown against the provisioned instance, and service confirmed running after the playbook completes.
10 pts
Complete
All three elements clearly shown: terraform apply output, Ansible playbook run, and service confirmed running.
5 pts
Partial
Two of three elements shown, or one is ambiguous (e.g., Ansible run not shown against the Terraform-provisioned instance, or service health not confirmed after playbook).
0 pts
Missing
No credible automated deploy sequence demonstrated.
 
Video: Service reachability after automated deploy (10)
Video checkpoint 2: `nmap -sV -Pn -p T:25565 <public-endpoint>` output shows port open with Minecraft service responding and MOTD containing name or student ID, run against the Terraform/Ansible-provisioned service.
10 pts
Complete
nmap output clearly shows 25565/tcp open with Minecraft service and MOTD containing name or student ID, run against the automated-deploy service.
5 pts
Partial
Reachability shown but MOTD is missing name/ID, nmap output is incomplete, or it is unclear whether the service was provisioned by Terraform/Ansible or manually.
0 pts
Missing
No credible post-automated-deploy reachability demonstrated.
 
Video: CI/CD pipeline success evidenced (10)
Video checkpoint 3: GitHub Actions run tab or screenshot clearly shows a successful pipeline run triggered by a tag push, with image acquisition/build, smoke test, and image push steps visible.
10 pts
Complete
Successful Actions run clearly shown with tag trigger, image acquisition/build, smoke test, and push steps all visible.
5 pts
Partial
Pipeline success evidenced but one step is not visible or unclear (e.g., smoke test not distinct, tag trigger not confirmed, image acquisition/build is not visible, or push not shown).
0 pts
Missing
No credible CI/CD pipeline success evidence provided.
 
Video: Destroy and rebuild cycle (10)
Video checkpoint 4: `terraform destroy` shown removing resources, followed by `terraform apply` and Ansible playbook run, with the server confirmed joinable (e.g., nmap) after rebuild and the restoration outcome made clear (same world restored or fresh world created).
10 pts
Complete
Full destroy, apply, and Ansible run shown in sequence; server confirmed joinable after rebuild; restoration outcome is clearly explained or evidenced.
5 pts
Partial
Destroy and rebuild attempted but one phase is unclear, server is not confirmed joinable after rebuild, or the restoration outcome is not made clear.
0 pts
Missing
No credible destroy and rebuild cycle demonstrated.
 
Terraform code quality (20)
Evaluated on four elements: (1) all required resources provisioned or cleanly referenced (networking/subnet and public entry point strategy, Security Group with TCP 25565, EC2 instance, pre-existing `LabInstanceProfile` attached), (2) `LabInstanceProfile` enables ECR pull and S3 write without hardcoded credentials on the host, (3) Security Group rules are minimal and justified at the relevant admin and public entry points, (4) input variables are named, typed, and documented.
20 pts
All four elements
All four elements present and correctly implemented.
15 pts
Three elements
Three of four elements present and correctly implemented; one minor gap.
10 pts
Two elements
Two of four elements present; significant missing configuration in networking, IAM, or SG.
0 pts
One or zero elements
Fewer than two elements present or Terraform code does not provision the required infrastructure.
Ansible playbook quality (15)
Evaluated on three elements: (1) playbook completes all required configuration steps (install prerequisites, authenticate to ECR, pull pinned image, configure persistent storage, restore world data from S3 when rebuilding/recovering, set required environment variables including `EULA=TRUE`, start service), (2) playbook is idempotent; re-running against the same host produces no errors and no unintended changes, (3) tasks are clearly named and logically ordered.
15 pts
All three elements
All three elements present; playbook is complete, idempotent, and clearly structured.
10 pts
Two elements
Two of three elements present; one area has meaningful gaps (e.g., a step is missing, idempotency not confirmed, or task ordering is unclear).
5 pts
One element
Only one element clearly present; significant manual intervention still required to reach a running service.
0 pts
Missing
No Ansible playbook provided or playbook does not produce a working service.
CI/CD pipeline implementation (10)
Evaluated on three elements: (1) GitHub Actions workflow triggers on a git tag push, (2) workflow acquires or builds the deployable image, smoke-tests it, and publishes it to ECR, (3) at least one successful pipeline run is clearly evidenced (link to Actions tab or screenshot showing green run with all steps).
10 pts
All three elements
All three elements present and correctly implemented.
7 pts
Two elements
Two of three elements present; one is missing or weak (e.g., smoke test is trivial, trigger is on push not tag, publish step missing, or evidence of successful run is ambiguous).
4 pts
One element
Only one element clearly present; pipeline does not reliably publish a tested image to ECR.
0 pts
Missing
No CI/CD pipeline provided or pipeline does not publish images to ECR.
Documentation and rebuild strategy (15)
All four required documentation sections present and usable (architecture diagram showing AWS resources + Ansible flow + CI/CD pipeline, Terraform variable documentation, change process for infrastructure changes, teardown checklist to prevent runaway cost), and world data recovery strategy documented and defensible.
15 pts
Exemplary
All five items (four doc sections + world data strategy) present, accurate, and usable by another operator without asking clarifying questions.
12 pts
Proficient
Four of five items complete or one section has a minor gap; overall usable by a new operator.
8 pts
Developing
Three of five items complete or multiple sections are too vague to act on; teardown checklist or world data strategy missing.
0 pts
Insufficient
Fewer than three items present or documentation is not usable by another operator.
  • Remote Terraform State (+3): Configure an S3 backend with state locking; document the setup and tradeoffs.
  • Ansible Role Reuse (+4): Structure the playbook as reusable role(s) with variables; demonstrate running against a second instance or show how the role could be reused.
  • Pipeline Hardening (+3): Add linting, security scanning, or build caching to the CI/CD pipeline; document what each step catches.