How to explain to students
Ask the room: "Imagine you copy-paste a deploy on Friday at 5pm. Something breaks. It's now 7pm and you don't remember which file you edited. What happens next?" That's the world without CI/CD. CI = Continuous Integration: every push runs the tests automatically. CD = Continuous Delivery / Deployment: every green build can be pushed to production with one click — or zero, if you trust your tests.
The numbers from Accelerate (the seminal DevOps research book): elite teams deploy 973× more frequently than low performers, with 3× lower failure rates. CI/CD is how that math works.
🎯 Practice Questions
Show Answer
CD (Continuous Delivery): every green build is automatically packaged into a deployable artifact (Docker image, JAR, etc.) and pushed to a staging environment. A human still clicks "deploy to production."
CD (Continuous Deployment): same as above, but production deploys are automatic too — no human in the loop. Requires very strong test coverage and observability. Most teams stop at Continuous Delivery.
How to explain to students
Every CI/CD pipeline is a directed sequence of stages. Earlier stages are cheap and fast (linting); later stages are slow and expensive (deploying to production). Fail-fast: don't waste 20 minutes building a Docker image when npm run lint would have caught the bug in 5 seconds.
The canonical sequence: checkout → install deps → lint → test → build → scan → publish artifact → deploy. Some pipelines add quality gates between stages — for example, "block production deploy unless code coverage ≥ 80%."
🎯 Practice Questions
npm ci appear in CI pipelines instead of npm install? What's the difference?Show Answer
npm install resolves dependencies from package.json — and updates package-lock.json if it can. npm ci ("clean install") refuses to touch the lock file: it does an exact, reproducible install based on package-lock.json alone. If the lock file is out of sync, ci fails the build.Why CI prefers
ci: reproducibility — every build gets the exact same dependency tree. install can silently upgrade a transitive dep and ship a different bundle than yesterday.
npm test, npm run lint, docker build, terraform plan, aws ecs update-service. Why does this order matter?How to explain to students
There are three big strategies that show up in every interview. Use the traffic light analogy: imagine a busy intersection with old + new traffic patterns.
Rolling: replace one server at a time. Default for ECS, Kubernetes, Beanstalk. Cheap. Slow rollback.
Blue-Green: spin up a full second environment ("green"), test it, then switch the load balancer over. Instant rollback. Doubles infra cost during deploy.
Canary: route 1% → 10% → 50% → 100% of traffic to the new version, watching error rates. Best for risky changes. Most complex.
🎯 Practice Questions
Show Answer
1. ALB weighted target groups (CodeDeploy can drive this).
2. App Mesh / Istio / Linkerd for fine-grained percentage routing.
3. API Gateway canary releases for serverless/REST APIs.
Without one of these, you can only do "all-or-nothing" deploys. The percentage-based shifting is the load balancer's job, not the application's.
How to explain to students
A workflow is one YAML file in .github/workflows/. It contains one or more jobs. Each job runs on a fresh runner (a clean VM). Jobs run in parallel by default; use needs: to make one wait for another. Inside a job, steps run sequentially.
The trigger is the on: key. Common options: push, pull_request, schedule (cron), workflow_dispatch (manual button), release. You can also filter by branch (branches: [main]) or path (paths: ['src/**']).
🎯 Practice Questions
needs: between them. Do they run in parallel or sequentially? On the same runner or different runners?Show Answer
runs-on says). They share no filesystem state by default — a file written in job A is invisible to job B unless you upload it as an artifact.If you need a strict order, add
needs: [job-a] to the dependent job. If you need to share files, use actions/upload-artifact + actions/download-artifact.
on: block for a workflow that runs on every push to main, every pull request targeting main, and at 3am UTC daily.src/ or package*.json change. Why does this matter for monorepos?uses: actions/checkout@main. You convince them to switch to @v4. What's the security risk of pinning to a moving branch?How to explain to students
Matrix builds are how you say "run this job for every combination of these inputs". Most common: testing a Node.js library against Node 18, 20, and 22 simultaneously. Or a Python tool on Ubuntu, macOS, and Windows. GitHub spins up a runner per combination — all in parallel.
🎯 Practice Questions
matrix.python reference in actions/setup-python@v5.exclude heavily, or rewriting with include?include — it lets you specify exact combos.fail-fast: false usually the right choice for a matrix that tests cross-OS compatibility?Show Answer
fail-fast: true (the default), the first failing combination cancels all the others. You then know that one failed, but not whether the others would have passed or failed too.For cross-OS / cross-version compatibility tests, you want the full picture: "Windows + Node 22 fails, but everything else passes" tells you it's a Windows-specific issue. Setting
fail-fast: false lets every combo run to completion so you see all the data at once.For deploy pipelines (where you only need one green run), the default
fail-fast: true is correct — there's no value in burning compute on combos you'll cancel anyway.
How to explain to students
Never paste secrets into YAML. GitHub provides three layers: repo-level secrets, environment-level secrets (with optional approval gates), and org-level secrets (shared across all repos). Always use ${{ secrets.NAME }} in workflows.
Environments are how you wire approval gates: a job targeting the production environment can require a human reviewer click "approve" before it runs. Reusable workflows let you DRY up shared logic — one "deploy.yml" reused by 10 microservices.
Practical: A reusable workflow that 10 repos can call
🎯 Practice Questions
AWS_ACCESS_KEY=AKIA... directly into a workflow YAML and pushes. They realise the mistake 30 seconds later and force-push to remove it. Are they safe? Why or why not?Show Answer
1. In GitHub's reflog for ~30 days.
2. In any clone any teammate or CI runner pulled before the rewrite.
3. In cached search-engine results if the repo is public.
4. Bots scrape commits in real time — automated AWS-key scanners detect leaks within minutes and abuse them.
Real fix: immediately rotate the credential in AWS (delete the IAM access key), generate a new one, store it in GitHub Secrets, and review CloudTrail for any unauthorized usage. Then add the secret-scanning push-protection rule so it can't happen again.
echo $MY_SECRET in a step appears as *** in the logs. Does that mean it's safe? What if you base64-encoded it first and printed?AWS_SECRET_ACCESS_KEY in sightHow to explain to students
The old way: create an IAM user, generate an access key, paste it into secrets.AWS_SECRET_ACCESS_KEY, hope nobody leaks it. The key never expires, and if it leaks, you have a 12-hour window to notice and rotate before it's abused.
The new way: OpenID Connect federation. GitHub mints a short-lived JWT for each workflow run; AWS verifies the JWT was signed by GitHub and matches a trusted repo+branch; AWS gives you a 15-minute STS credential. No long-lived secrets exist anywhere. This is the AWS-recommended pattern as of 2023+.
🎯 Practice Questions
Show Answer
permissions: id-token: write block at the workflow or job level. Without it, GitHub will not mint the OIDC JWT for the runner, and aws-actions/configure-aws-credentials has no token to exchange.Add at the top of the workflow:
permissions:
id-token: write
contents: readOther common causes: trust policy
sub condition doesn't match the repo / branch / environment that's actually running.
main in my-org/my-app.sub claim than branch pushes.How to explain to students
YAML indentation is the #1 source of CI failures. AI is great at generating workflow scaffolds and debugging the cryptic errors, but it can also confidently invent action names that don't exist. Verify by clicking the action's GitHub link — if it 404s, the AI hallucinated.
🎯 Practice Questions
uses: super-actions/awesome-deployer@latest. List two red flags before you merge.Show Answer
github.com/super-actions/awesome-deployer — if 404, it's a hallucination.2.
@latest is not a valid GitHub Actions ref. Actions are pinned by branch, tag, or SHA — there is no automatic "@latest". Even if it resolved to "default branch", that's a supply-chain risk: anyone who compromises the action's repo gets RCE in your CI.Fix: verify the action exists, read its README, then pin to a specific version tag (
@v3) or — for high-security CI — a full commit SHA (@a1b2c3d...) so even tag retargeting can't compromise you.
secrets.AWS_SECRET_ACCESS_KEY into an AI tool, even when asking for help?How to explain to students
Walk through this on screen, then have students recreate it on their own Node app. The workflow combines: matrix testing, Docker build, OIDC AWS auth, ECR push, ECS deploy, environment-gated production approval. This is what a real team's main.yml looks like.
Sample quiz questions (interactive)
needs: — how do they execute?npm ci over npm install in CI?Fill-in-the-command
on: trigger that fires on every push to main and on the manual "Run workflow" button.How to explain to students
Frame as a hiring task: "Pick any of your existing GitHub repos. Add a CI workflow that runs lint + test + build on every PR. Add a green check to the README. You have a weekend." This is the single most common DevOps interview prompt for junior roles.
📋 Assignment Requirements
- Pick any existing repo (yours or a fork). Must have at least 3 source files and a test command.
- Create
.github/workflows/ci.ymlthat triggers on push and pull_request to main - 3 parallel jobs: lint, test, build (use
needs:only where required) - Use a matrix to test against at least 2 versions of your runtime (Node 18+20, Python 3.11+3.12, etc.)
- Cache dependencies via
actions/setup-node(or equivalent) to keep CI under 2 minutes - Pin every action to a specific
@vNtag — no@mainor@latest - Add a status badge to your README that links to the workflow runs
- Bonus: A 4th job that builds (but does not push) a Docker image, only on push to main
- Bonus: Add Trivy scanning for the Docker image and fail on HIGH/CRITICAL CVEs
- Bonus: Configure branch protection so PRs cannot merge without all 3 jobs passing
cache: npm, used @main, jobs run sequentially because of unnecessary needs:, badge URL wrong.