Ops 2: Containerized Minecraft Server

The server crashed during the CEO’s first session. He lost a diamond pickaxe. An all-hands email with the subject line “ACCOUNTABILITY” was sent at 11:47 PM. The phrase “enterprise-grade reliability” now appears in your quarterly objectives.

Your manual setup works, but it drifts every time someone touches it, upgrades break in unpredictable ways, and recovery is a prayer-based workflow. Obsidian Dynamics now requires containerized, versioned, and recoverable operations that another operator can execute without improvisation.

Learning Objectives

Package a stateful service in Docker while preserving data correctly.
Publish and consume immutable image versions from ECR.
Execute safe upgrade and rollback procedures with explicit checks.
Implement S3 backup/restore with a defined retention policy.

Constraints (AWS Academy)

Compute remains on EC2.
Container images are stored in ECR.
Backups are stored in a private S3 bucket.
Security Group exposure must stay minimal and justified.

Requirements

A. Runtime Definition (Container + Service)

Provide a container runtime definition that runs the Minecraft server on EC2.
- docker compose is allowed but not required for a single container.
- Plain docker run, Podman, or equivalent is acceptable if reproducible.
Service must be reachable by clients on 25565/tcp.
- Verify reachability using:
  Verify Reachability Using
```
nmap -sV -Pn -p T:25565 <public-endpoint>
```
Runtime configuration is externalized (environment variables and/or mounted config).
Service must come back automatically after host reboot.
- Acceptable mechanisms include container restart policies and/or generated systemd units (docker/podman workflows).
- You do not need to hand-write a systemd unit if your runtime tooling generates one.

B. State Boundary and Persistence

World data is stored outside the container image (host volume or bind mount).
Demonstrate persistence by removing and recreating the container while preserving world state.
- Verification approach:
  1. Identify the persisted world location used by your deployment (bind mount path or named volume mounted into the container).
  2. Create a marker file in the persisted world data, for example touch /data/world/PERSISTENCE_TEST from inside the container or the equivalent host-path command for a bind mount.
  3. Stop, remove, and recreate the container.
  4. Verify the marker file still exists and that the same world data is still present after recreate.
Clearly document what is stateful vs immutable.

C. ECR Publishing and Version Discipline

Publish your image to an ECR repository.
Baseline approach: Re-tag a trusted upstream Minecraft server image and publish it to your ECR repository.
- Document upstream image provenance and why it is trusted.
Define and use an immutable tagging scheme (for example: mc-1.21.1-build7).
If your chosen image supports server-version selection via runtime configuration, pin and document that configuration alongside the image tag.
latest may exist, but deployments must pin a specific version tag.
You must have at least two distinct pinned deployable versions available to demonstrate upgrade and rollback.
- A second ECR tag alone does not count if it resolves to the same image digest and uses the same runtime configuration.

D. Safe Upgrade and Rollback Workflow

Provide a pre-change checklist that includes:
- Backup world data to S3 before any upgrade (mandatory).
- Verify the backup artifact for that change window exists in S3.
- Verify both deployable versions exist in ECR.
Perform an upgrade from one pinned deployable version to another and validate success.
Perform rollback to the prior known-good image version without rebuilding from scratch.
Include post-change validation steps and expected healthy outcomes.

E. Backup/Restore to S3

Create a backup artifact for world data and upload it to S3.
S3 bucket must be private.
Configure at least one S3 lifecycle rule (e.g., expire backups older than 7 days) and document why you chose that retention period.
Demonstrate restore from S3 backup onto the service.

F. Operator Documentation

Your documentation must include:

Build + publish workflow (conceptual steps, not command dump only).
ECR usage expectations and image tag policy.
Runtime architecture diagram (simple is acceptable).
Backup and restore runbook for S3.
Upgrade and rollback runbook with explicit checks.

Hints

These pointers address the most common friction points so you can focus on the operational skills this assignment is really testing.

The itzg/minecraft-server image is the most widely used Minecraft Docker image and handles EULA acceptance, version selection, and MOTD via environment variables. Set your MOTD with -e MOTD="Your Name - CS312" (or the equivalent in your Compose file).
If you use itzg/minecraft-server, remember that the image tag and the Minecraft server version are not the same thing. Prefer pinned upstream release tags over floating tags, and document any pinned VERSION=... change as part of the deployable version.
If you use a named volume rather than a bind mount, create and verify your persistence marker from inside the container instead of assuming a host world/ path exists.
S3 backup and restore on AWS Academy: use aws s3 cp to upload and download. A tar archive of the world directory is a sufficient backup artifact.

What You’ll Submit

Operator Runbook (PDF) containing all required documentation sections, the container deployment definition (docker-compose.yml or equivalent as a code block), and ECR repository URI with versioning policy and image provenance notes.
Narrated screen recording (max 3 minutes). Your server MOTD must include your name or student ID. Submit timestamps alongside the video (e.g., “Checkpoint 1: 0:00, Checkpoint 2: 0:38, …”):
1. Show the running container (docker ps or equivalent), run nmap -sV -Pn -p T:25565 <public-endpoint> to confirm reachability and display your custom MOTD, then reboot the host and confirm the container returns automatically.
2. Identify the persistent world location used by your deployment (bind mount or named volume), create a marker file there, stop and remove the container, recreate it from the same image, and show the marker file and prior world data still exist after recreate.
3. Before changing versions, show the S3 backup artifact for that change window, then upgrade to a newer pinned deployable version and confirm the service is healthy (e.g., nmap or container status). Roll back to the prior known-good version and confirm it runs correctly. If the version difference depends on runtime configuration such as VERSION=..., show that change explicitly.
4. Upload a world data backup artifact to S3, then restore from that backup to the live service and confirm the service is healthy after restore.

Rubric

Always refer to Canvas for the most up-to-date rubric information. Canvas's rubric will be used for grading.

Containerized Minecraft Server Rubric (Total: 100 pts)
Criteria	Ratings
Video: Container reachability and auto-recovery (15) Video checkpoint 1: running container shown on EC2, `nmap -sV -Pn -p T:25565 <public-endpoint>` output shows port open with Minecraft service responding and MOTD containing name or student ID, and host reboot followed by confirmed auto-recovery.	15 pts Complete All three elements clearly shown: running container, nmap output with correct MOTD, and confirmed reboot auto-recovery. 8 pts Partial Two of three elements clearly shown, or one element is ambiguous (e.g., MOTD missing name, reboot timing unclear, or auto-recovery not confirmed). 0 pts Missing Fewer than two elements shown or no credible evidence of reachability or auto-recovery.
Video: Persistence proof (10) Video checkpoint 2: persisted world location identified (bind mount path or named volume mounted into the container), marker file created in persisted world data, then container stopped, removed, and recreated; marker file and prior world data confirmed after recreate.	10 pts Complete Persisted storage location is clear, full container stop-remove-recreate lifecycle is shown, and both marker survival and prior world data are clearly visible after recreate. 5 pts Partial Persistence is demonstrated but one step is missing or ambiguous (e.g., storage location unclear, marker check unclear, or container was only restarted rather than removed and recreated). 0 pts Missing No credible persistence proof shown or world data appears lost.
Video: Upgrade and rollback execution (10) Video checkpoint 3: S3 backup artifact for the change window shown before upgrade, upgrade from one pinned deployable version to a newer one shown with health validation, then rollback to the prior known-good version confirmed without rebuilding from scratch.	10 pts Complete Backup artifact is shown before the change, both upgrade and rollback are performed with clearly distinct deployable versions, health validation is shown after upgrade, and rollback is confirmed with the prior version running. 5 pts Partial One operation (upgrade or rollback) is clearly shown but the other is weak or omitted, the pre-change backup evidence is unclear, or the version difference is ambiguous. 0 pts Missing No credible upgrade and rollback sequence demonstrated.
Video: S3 backup and restore (10) Video checkpoint 4: world data backup artifact uploaded to S3 bucket, then a restore from that backup to the live service demonstrated with service confirmed healthy after restore.	10 pts Complete Backup upload to S3 and restore from S3 both clearly shown; service confirmed healthy after restore. 5 pts Partial Backup upload shown but restore not demonstrated, or restore shown but S3 source is ambiguous or service health not confirmed. 0 pts Missing No credible S3 backup upload and restore demonstrated.
Container definition and service configuration (10) Evaluated on four elements: (1) run definition is reproducible (`docker-compose.yml`, `docker run` command, or equivalent), (2) runtime config is externalized via environment variables or mounted config file, (3) restart-on-reboot mechanism is configured (restart policy or systemd unit), (4) `25565/tcp` is exposed and accessible to clients.	10 pts All four elements All four elements present and correctly configured. 7 pts Three elements Three of four elements present and correctly configured; one minor gap. 4 pts Two elements Two of four elements present; significant missing configuration remains. 0 pts One or zero elements Fewer than two elements present or container definition is not reproducibly runnable.
ECR publishing and version discipline (15) Evaluated on four elements: (1) at least two distinct deployable versions are available; same-digest retags with unchanged runtime configuration do not qualify, (2) tagging scheme is immutable and consistent (e.g., `mc-1.21.1-build7`), (3) deployments pin a specific image tag and, when applicable, version-selecting runtime configuration (`latest`-only deployments do not qualify), (4) upstream image provenance is documented and justified as trusted.	15 pts All four elements All four elements present and correct. 11 pts Three elements Three of four elements present; one area is incomplete or inconsistent. 7 pts Two elements Two of four elements present; version discipline or provenance is substantially incomplete. 0 pts One or zero elements Fewer than two elements present or no meaningful version discipline demonstrated.
S3 backup, restore, and retention (10) Evaluated on three elements: (1) backup artifact uploaded to a private S3 bucket (no public ACL or bucket policy), (2) at least one S3 lifecycle rule configured with a documented justification for the chosen retention period, (3) restore procedure is documented step-by-step and reproducible.	10 pts All three elements All three elements present and correctly configured. 7 pts Two elements Two of three elements present; one is missing or underdeveloped. 4 pts One element Only one element clearly present; backup or restore workflow is substantially incomplete. 0 pts Missing No credible private S3 backup, restore, or retention setup demonstrated.
Upgrade and rollback runbook quality (10) Evaluated on three elements: (1) pre-change checklist explicitly includes S3 backup of world data and verifying the backup artifact and target versions before any upgrade, (2) post-change validation steps are listed with expected healthy outcomes, (3) rollback procedure is operationally distinct from a fresh install (pulls and runs an existing ECR image without rebuilding or reinstalling).	10 pts All three elements All three elements present and clearly executable by another operator. 7 pts Two elements Two of three elements present; one is vague, incomplete, or missing. 4 pts One element Only one element clearly present; runbook is not operationally usable as written. 0 pts Missing Upgrade and rollback runbook is absent or meets none of the three elements.
Operator documentation quality (10) All five required sections present (build+publish workflow, ECR usage and image tag policy, runtime architecture diagram, S3 backup/restore runbook, upgrade and rollback runbook with pre/post validation), and procedures are conceptual and actionable rather than a raw command dump that another operator could not follow without guesswork.	10 pts Exemplary All five sections present, correctly ordered, and operator-ready; another TA could execute the procedures without asking clarifying questions. 8 pts Proficient Four of five sections complete, or one section has a minor gap; overall usable by a new operator. 5 pts Developing Three of five sections complete or multiple sections require significant inference; not reliably usable by another operator. 0 pts Insufficient Fewer than three sections present or documentation is not executable by another operator.

Extra Credit (up to +10)

Custom Dockerfile (+5): Build your own container image using a Dockerfile instead of re-tagging an upstream image. Include a justified base image choice and at least one hardening step (e.g., non-root user, minimal layers). Document your build choices.
In-Service Administration (+5): Configure secure remote administration (e.g., RCON with authentication) and execute at least 3 admin commands without stopping the container. Include safety notes about when admin intervention is appropriate vs. when a controlled restart is safer.

Extra credit must stay within this assignment’s scope (no orchestration, IaC frameworks, or CI/CD pipelines).