Skip to content

Docker + ECR on EC2

The server crashed during the CEO’s first session. He lost a diamond pickaxe. An all-hands email with the subject line “ACCOUNTABILITY” was sent at 11:47 PM. The phrase “enterprise-grade reliability” now appears in your quarterly objectives.

Your manual setup works, but it drifts every time someone touches it, upgrades break in unpredictable ways, and recovery is a prayer-based workflow. Obsidian Dynamics now requires containerized, versioned, and recoverable operations that another operator can execute without improvisation.

  • Package a stateful service in Docker while preserving data correctly.
  • Publish and consume immutable image versions from ECR.
  • Execute safe upgrade and rollback procedures with explicit checks.
  • Implement S3 backup/restore with a defined retention policy.
  • Compute remains on EC2.
  • Container images are stored in ECR.
  • Backups are stored in a private S3 bucket.
  • Security Group exposure must stay minimal and justified.
  • Provide a container runtime definition that runs the Minecraft server on EC2.
    • docker compose is allowed but not required for a single container.
    • Plain docker run, Podman, or equivalent is acceptable if reproducible.
  • Service must be reachable by clients on 25565/tcp.
    • Verify reachability using:
      Terminal window
      nmap -sV -Pn -p T:25565 <instance_public_ip>
  • Runtime configuration is externalized (environment variables and/or mounted config).
  • Service must come back automatically after host reboot.
    • Acceptable mechanisms include container restart policies and/or generated systemd units (docker/podman workflows).
    • You do not need to hand-write a systemd unit if your runtime tooling generates one.
  • World data is stored outside the container image (host volume or bind mount).
  • Demonstrate persistence by restarting/recreating the container while preserving world state.
    • Verification approach:
      1. Create a marker file in the world directory: touch world/PERSISTENCE_TEST
      2. Note the modification time of level.dat: stat -c '%Y' world/level.dat (or stat -f '%m' on macOS)
      3. Stop, remove, and recreate the container
      4. Verify marker file still exists and level.dat timestamp is unchanged
  • Clearly document what is stateful vs immutable.
  • Publish your image to an ECR repository.
  • Baseline approach: Re-tag a trusted upstream Minecraft server image and publish it to your ECR repository.
    • Document upstream image provenance and why it is trusted.
  • Define and use an immutable tagging scheme (for example: mc-1.21.1-build7).
  • latest may exist, but deployments must pin a specific version tag.
  • You must have at least two distinct versioned images published to ECR to demonstrate upgrade and rollback.
  • Provide a pre-change checklist that includes:
    • Backup world data to S3 before any upgrade (mandatory).
    • Verify both image versions exist in ECR.
  • Perform an upgrade from one pinned image version to another and validate success.
  • Perform rollback to the prior known-good image version without rebuilding from scratch.
  • Include post-change validation steps and expected healthy outcomes.
  • Create a backup artifact for world data and upload it to S3.
  • S3 bucket must be private.
  • Configure at least one S3 lifecycle rule (e.g., expire backups older than 7 days) and document why you chose that retention period.
  • Demonstrate restore from S3 backup onto the service.

Your documentation must include:

  • Build + publish workflow (conceptual steps, not command dump only).
  • ECR usage expectations and image tag policy.
  • Runtime architecture diagram (simple is acceptable).
  • Backup and restore runbook for S3.
  • Upgrade and rollback runbook with explicit checks.
  1. Operator Runbook (PDF) containing all required documentation sections and answers to the reflection questions above.
  2. Container deployment definition (docker-compose.yml or equivalent).
  3. ECR reference + versioning policy (repository URI and tag strategy), plus image provenance notes.
  4. Narrated screen recording (max 3 minutes) with a timestamp list for each checkpoint:
    1. Show running container, nmap reachability on 25565/tcp, and reboot auto-recovery.
    2. Persistence proof: place a marker, restart/recreate the container, show marker survives.
    3. Upgrade to a new ECR image version, then rollback to the previous version.
    4. S3 backup upload and restore demonstration.

Your server MOTD must include your name or student ID. Submit timestamps alongside the video.

Always refer to Canvas for the most up-to-date rubric information. Canvas's rubric will be used for grading.

Docker + ECR on EC2 (Total: 100 pts)
Criteria Ratings
Container runtime correctness (20)
Scored on runtime definition and service operation: reproducible container run method (Compose optional), service starts on EC2, auto-recovers after EC2 reboot, and clients can reach `25565/tcp`. Video evidence required for reboot auto-recovery.
20 pts
Exemplary
Deployment definition is complete and reproducible; service starts cleanly, returns automatically after host reboot (shown in video), and reachability evidence is clear and correct.
16 pts
Proficient
Runtime and reachability mostly work; reboot auto-start is functional but video evidence has minor gaps.
10 pts
Developing
Partial runtime setup works, but major gaps remain in reboot behavior, deployment reproducibility, or reachability proof.
0 pts
Insufficient
Service is not reproducibly runnable in container form, does not auto-return after reboot, or reachability is not demonstrated.
Persistence boundary design (20)
Scored on state handling: world data is externalized from the image, boundaries are explicit, and persistence is proven across container restart/recreate. Video evidence required showing world survives container lifecycle.
20 pts
Exemplary
State vs image boundary is explicit and correct; video evidence clearly shows world continuity across container restart/recreate.
16 pts
Proficient
Persistence approach is mostly correct; mechanism works but documentation or video proof has minor ambiguity.
10 pts
Developing
Some persistence mechanism exists, but boundary confusion or weak validation leaves reliability uncertain.
0 pts
Insufficient
World state is effectively tied to container lifecycle or persistence proof is missing.
ECR publishing and version discipline (20)
Scored on artifact workflow: image is published to ECR, at least two distinct versioned images exist, tags are immutable and meaningful, and deployments pin explicit versions (not latest-only). Image provenance documented.
20 pts
Exemplary
ECR workflow is reproducible; at least two versioned images exist; version scheme is consistent and immutable; pinned deployments support deterministic rollback; provenance documented.
16 pts
Proficient
ECR use and versioning are mostly correct; two images exist but minor gaps in consistency, pinning practice, or provenance notes.
10 pts
Developing
ECR publication occurs but fewer than two images, version policy is weak, or pinning/provenance is incomplete.
0 pts
Insufficient
No credible ECR publish workflow, no version discipline for controlled deployment, or fewer than two images for rollback.
Upgrade and rollback execution (15)
Scored on change process execution: pre-change checklist includes mandatory S3 backup, successful upgrade validation shown in video, and rollback to known-good version demonstrated without rebuild.
15 pts
Exemplary
Pre-change checklist is explicit and includes S3 backup; video shows upgrade execution and validation; rollback is executed and restores known-good behavior.
12 pts
Proficient
Change workflow is mostly complete; upgrade and rollback work but one area has weaker evidence (pre-check detail, validation, or rollback proof).
8 pts
Developing
Partial change workflow exists, but upgrade/rollback is incomplete, pre-change checklist missing S3 backup, or validation insufficient.
0 pts
Insufficient
No operationally credible upgrade+rollback procedure is demonstrated, or mandatory S3 backup step is missing from checklist.
S3 backup, restore, and retention (15)
Scored on data protection: backup upload to private S3, defined retention/lifecycle policy, and verified restore path shown in video.
15 pts
Exemplary
Backups are uploaded to private S3, lifecycle policy is defined, and restore is successfully demonstrated in video.
12 pts
Proficient
Backup and restore workflow is mostly correct; one requirement is weak or partially evidenced (privacy, retention, or restore validation).
8 pts
Developing
Backup-related steps exist but privacy, retention, or restore validation is incomplete or not demonstrated.
0 pts
Insufficient
No credible private S3 backup/restore process is demonstrated.
Operator documentation quality (10)
Scored on runbook usability: contains all required sections (build/publish workflow, ECR usage, architecture diagram, S3 runbook, upgrade/rollback runbook), and executable procedures a new TA/operator can follow without guesswork.
10 pts
Exemplary
Runbook is clear, ordered, and operator-ready; all required sections are complete, actionable, and conceptual rather than command-dump.
8 pts
Proficient
Runbook is usable with minor ambiguity or one small missing detail; most required sections are complete.
5 pts
Developing
Runbook is partially usable but requires significant inference in multiple sections or missing required content.
0 pts
Insufficient
Documentation is incomplete, unclear, or not executable by another operator; multiple required sections missing.
  • Custom Dockerfile (+5): Build your own container image using a Dockerfile instead of re-tagging an upstream image. Include a justified base image choice and at least one hardening step (e.g., non-root user, minimal layers). Document your build choices.
  • In-service administration (+5): Configure secure remote administration (e.g., RCON with authentication) and execute at least 3 admin commands without stopping the container. Include safety notes about when admin intervention is appropriate vs. when a controlled restart is safer.

Extra credit must stay within this assignment’s scope (no orchestration, IaC frameworks, or CI/CD pipelines).