Skip to content

This course uses a single evolving project to teach practical system administration in a way that forces you to operate real services: you’ll build and run a Minecraft server on AWS Academy and progressively increase maturity: from manual operations, to containerization, to infrastructure as code, to Kubernetes, to observability.

These assignments are not tutorials. They are specifications. Your job is to make defensible engineering decisions, document them, and prove that your system works under real operational conditions.

You will complete the following five milestones in order:

  1. Minecraft Ops 1: Manual EC2 Server (AWS Academy)
  2. Minecraft Ops 2: Docker + ECR on EC2
  3. Minecraft Ops 3: Terraform-Provisioned EC2 + Automated Deploy
  4. Minecraft Ops 4: Kubernetes Migration (k3s on EC2)
  5. Minecraft Ops 5: IaC + Multi-Node k3s + Observability

Each assignment builds on the last. You are expected to carry forward working artifacts and improve them rather than starting over.

Across all five assignments, you will be graded on the same operator-grade themes:

  • Correctness: the service runs and clients can connect.
  • Security posture: least privilege, minimal exposure, reasonable defaults.
  • Recoverability: backups exist, restores work, failures are survivable.
  • Operational maturity: clear procedures, safe change practices, evidence-based troubleshooting.
  • Documentation: someone else can operate your system using what you wrote.

You must assume a real cloud billing model and treat cost as a design constraint.

  • Prefer small instances and justify instance sizing.
  • Restrict administrative access (SSH from known IPs or SSM if available).
  • Minimize public ports (only what is required for Minecraft and administration).
  • Use IAM instance profiles instead of long-lived access keys on servers.
  • Include a teardown or “stop resources” plan in every submission.

Unless an assignment specifies otherwise, each submission should include:

  • Architecture & decisions: a short design note explaining the key choices and tradeoffs.
  • Runbooks: how to deploy, validate health, upgrade, rollback, and recover.
  • Evidence: screenshots, logs, dashboard exports, and/or command output that proves requirements are met.
  • Cost controls: what you will shut down, when, and how you’ll prevent surprise charges.

Your artifacts should be reviewable by a third party (TA/operator) without private context.

  • You may discuss approaches and troubleshoot with classmates.
  • Your infrastructure code, automation, and documentation must be your own work unless explicitly approved as shared starter code.
  • If you use external references, cite them in your design note.
  • Treat every change like it could break production: write down a rollback plan before you deploy.
  • Record what “healthy” looks like (logs/metrics) so you can detect when it isn’t.
  • Make small changes, validate, then proceed.

If you keep your system reproducible and your docs honest, these assignments become easier over time.