Docker + ECR on EC2
The server crashed during the CEO’s first session. He lost a diamond pickaxe. An all-hands email with the subject line “ACCOUNTABILITY” was sent at 11:47 PM. The phrase “enterprise-grade reliability” now appears in your quarterly objectives.
Your manual setup works, but it drifts every time someone touches it, upgrades break in unpredictable ways, and recovery is a prayer-based workflow. Obsidian Dynamics now requires containerized, versioned, and recoverable operations that another operator can execute without improvisation.
Learning Objectives
Section titled “Learning Objectives”- Package a stateful service in Docker while preserving data correctly.
- Publish and consume immutable image versions from ECR.
- Execute safe upgrade and rollback procedures with explicit checks.
- Implement S3 backup/restore with a defined retention policy.
Constraints (AWS Academy)
Section titled “Constraints (AWS Academy)”- Compute remains on EC2.
- Container images are stored in ECR.
- Backups are stored in a private S3 bucket.
- Security Group exposure must stay minimal and justified.
Requirements
Section titled “Requirements”Runtime Definition (Container + Service)
Section titled “Runtime Definition (Container + Service)”- Provide a container runtime definition that runs the Minecraft server on EC2.
docker composeis allowed but not required for a single container.- Plain
docker run, Podman, or equivalent is acceptable if reproducible.
- Service must be reachable by clients on
25565/tcp.- Verify reachability using:
Terminal window nmap -sV -Pn -p T:25565 <instance_public_ip>
- Verify reachability using:
- Runtime configuration is externalized (environment variables and/or mounted config).
- Service must come back automatically after host reboot.
- Acceptable mechanisms include container restart policies and/or generated
systemdunits (docker/podmanworkflows). - You do not need to hand-write a
systemdunit if your runtime tooling generates one.
- Acceptable mechanisms include container restart policies and/or generated
State Boundary and Persistence
Section titled “State Boundary and Persistence”- World data is stored outside the container image (host volume or bind mount).
- Demonstrate persistence by restarting/recreating the container while preserving world state.
- Verification approach:
- Create a marker file in the world directory:
touch world/PERSISTENCE_TEST - Note the modification time of
level.dat:stat -c '%Y' world/level.dat(orstat -f '%m'on macOS) - Stop, remove, and recreate the container
- Verify marker file still exists and
level.dattimestamp is unchanged
- Create a marker file in the world directory:
- Verification approach:
- Clearly document what is stateful vs immutable.
ECR Publishing and Version Discipline
Section titled “ECR Publishing and Version Discipline”- Publish your image to an ECR repository.
- Baseline approach: Re-tag a trusted upstream Minecraft server image and publish it to your ECR repository.
- Document upstream image provenance and why it is trusted.
- Define and use an immutable tagging scheme (for example:
mc-1.21.1-build7). latestmay exist, but deployments must pin a specific version tag.- You must have at least two distinct versioned images published to ECR to demonstrate upgrade and rollback.
Safe Upgrade and Rollback Workflow
Section titled “Safe Upgrade and Rollback Workflow”- Provide a pre-change checklist that includes:
- Backup world data to S3 before any upgrade (mandatory).
- Verify both image versions exist in ECR.
- Perform an upgrade from one pinned image version to another and validate success.
- Perform rollback to the prior known-good image version without rebuilding from scratch.
- Include post-change validation steps and expected healthy outcomes.
Backup/Restore to S3
Section titled “Backup/Restore to S3”- Create a backup artifact for world data and upload it to S3.
- S3 bucket must be private.
- Configure at least one S3 lifecycle rule (e.g., expire backups older than 7 days) and document why you chose that retention period.
- Demonstrate restore from S3 backup onto the service.
Operator Documentation
Section titled “Operator Documentation”Your documentation must include:
- Build + publish workflow (conceptual steps, not command dump only).
- ECR usage expectations and image tag policy.
- Runtime architecture diagram (simple is acceptable).
- Backup and restore runbook for S3.
- Upgrade and rollback runbook with explicit checks.
What You’ll Submit
Section titled “What You’ll Submit”- Operator Runbook (PDF) containing all required documentation sections and answers to the reflection questions above.
- Container deployment definition (
docker-compose.ymlor equivalent). - ECR reference + versioning policy (repository URI and tag strategy), plus image provenance notes.
- Narrated screen recording (max 3 minutes) with a timestamp list for each checkpoint:
- Show running container,
nmapreachability on 25565/tcp, and reboot auto-recovery. - Persistence proof: place a marker, restart/recreate the container, show marker survives.
- Upgrade to a new ECR image version, then rollback to the previous version.
- S3 backup upload and restore demonstration.
- Show running container,
Your server MOTD must include your name or student ID. Submit timestamps alongside the video.
Rubric
Section titled “Rubric”Always refer to Canvas for the most up-to-date rubric information. Canvas's rubric will be used for grading.
| Criteria | Ratings |
|---|---|
| Container runtime correctness (20) Scored on runtime definition and service operation: reproducible container run method (Compose optional), service starts on EC2, auto-recovers after EC2 reboot, and clients can reach `25565/tcp`. Video evidence required for reboot auto-recovery. | 20 pts Exemplary Deployment definition is complete and reproducible; service starts cleanly, returns automatically after host reboot (shown in video), and reachability evidence is clear and correct. 16 pts Proficient Runtime and reachability mostly work; reboot auto-start is functional but video evidence has minor gaps. 10 pts Developing Partial runtime setup works, but major gaps remain in reboot behavior, deployment reproducibility, or reachability proof. 0 pts Insufficient Service is not reproducibly runnable in container form, does not auto-return after reboot, or reachability is not demonstrated. |
| Persistence boundary design (20) Scored on state handling: world data is externalized from the image, boundaries are explicit, and persistence is proven across container restart/recreate. Video evidence required showing world survives container lifecycle. | 20 pts Exemplary State vs image boundary is explicit and correct; video evidence clearly shows world continuity across container restart/recreate. 16 pts Proficient Persistence approach is mostly correct; mechanism works but documentation or video proof has minor ambiguity. 10 pts Developing Some persistence mechanism exists, but boundary confusion or weak validation leaves reliability uncertain. 0 pts Insufficient World state is effectively tied to container lifecycle or persistence proof is missing. |
| ECR publishing and version discipline (20) Scored on artifact workflow: image is published to ECR, at least two distinct versioned images exist, tags are immutable and meaningful, and deployments pin explicit versions (not latest-only). Image provenance documented. | 20 pts Exemplary ECR workflow is reproducible; at least two versioned images exist; version scheme is consistent and immutable; pinned deployments support deterministic rollback; provenance documented. 16 pts Proficient ECR use and versioning are mostly correct; two images exist but minor gaps in consistency, pinning practice, or provenance notes. 10 pts Developing ECR publication occurs but fewer than two images, version policy is weak, or pinning/provenance is incomplete. 0 pts Insufficient No credible ECR publish workflow, no version discipline for controlled deployment, or fewer than two images for rollback. |
| Upgrade and rollback execution (15) Scored on change process execution: pre-change checklist includes mandatory S3 backup, successful upgrade validation shown in video, and rollback to known-good version demonstrated without rebuild. | 15 pts Exemplary Pre-change checklist is explicit and includes S3 backup; video shows upgrade execution and validation; rollback is executed and restores known-good behavior. 12 pts Proficient Change workflow is mostly complete; upgrade and rollback work but one area has weaker evidence (pre-check detail, validation, or rollback proof). 8 pts Developing Partial change workflow exists, but upgrade/rollback is incomplete, pre-change checklist missing S3 backup, or validation insufficient. 0 pts Insufficient No operationally credible upgrade+rollback procedure is demonstrated, or mandatory S3 backup step is missing from checklist. |
| S3 backup, restore, and retention (15) Scored on data protection: backup upload to private S3, defined retention/lifecycle policy, and verified restore path shown in video. | 15 pts Exemplary Backups are uploaded to private S3, lifecycle policy is defined, and restore is successfully demonstrated in video. 12 pts Proficient Backup and restore workflow is mostly correct; one requirement is weak or partially evidenced (privacy, retention, or restore validation). 8 pts Developing Backup-related steps exist but privacy, retention, or restore validation is incomplete or not demonstrated. 0 pts Insufficient No credible private S3 backup/restore process is demonstrated. |
| Operator documentation quality (10) Scored on runbook usability: contains all required sections (build/publish workflow, ECR usage, architecture diagram, S3 runbook, upgrade/rollback runbook), and executable procedures a new TA/operator can follow without guesswork. | 10 pts Exemplary Runbook is clear, ordered, and operator-ready; all required sections are complete, actionable, and conceptual rather than command-dump. 8 pts Proficient Runbook is usable with minor ambiguity or one small missing detail; most required sections are complete. 5 pts Developing Runbook is partially usable but requires significant inference in multiple sections or missing required content. 0 pts Insufficient Documentation is incomplete, unclear, or not executable by another operator; multiple required sections missing. |
Extra Credit (up to +10)
Section titled “Extra Credit (up to +10)”- Custom Dockerfile (+5): Build your own container image using a Dockerfile instead of re-tagging an upstream image. Include a justified base image choice and at least one hardening step (e.g., non-root user, minimal layers). Document your build choices.
- In-service administration (+5): Configure secure remote administration (e.g., RCON with authentication) and execute at least 3 admin commands without stopping the container. Include safety notes about when admin intervention is appropriate vs. when a controlled restart is safer.
Extra credit must stay within this assignment’s scope (no orchestration, IaC frameworks, or CI/CD pipelines).