<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://edwardpraveen.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://edwardpraveen.com/" rel="alternate" type="text/html" /><updated>2026-05-24T14:37:16+05:30</updated><id>https://edwardpraveen.com/feed.xml</id><title type="html">Build With Edward</title><subtitle>Weekly tutorials on AI/ML engineering, infrastructure, and DevOps - with real architecture patterns and code.</subtitle><author><name>Edward Praveen</name></author><entry><title type="html">From GitHub Pages to a Real Domain - How I Finally Claimed edwardpraveen.com</title><link href="https://edwardpraveen.com/devops/infrastructure/custom-domain-setup/" rel="alternate" type="text/html" title="From GitHub Pages to a Real Domain - How I Finally Claimed edwardpraveen.com" /><published>2026-05-24T00:00:00+05:30</published><updated>2026-05-24T00:00:00+05:30</updated><id>https://edwardpraveen.com/devops/infrastructure/custom-domain-setup</id><content type="html" xml:base="https://edwardpraveen.com/devops/infrastructure/custom-domain-setup/"><![CDATA[<p>I’ve been meaning to do this for a long time. My blog has been sitting at <code class="language-plaintext highlighter-rouge">buildwithedward.github.io</code> for years - functional, free, and completely forgettable as a URL. If I’m going to write seriously about production AI systems and infra, the least I can do is own my own domain.</p>

<p>This week I finally did it. Here’s exactly how it went.</p>

<hr />

<h2 id="the-itch-why-a-custom-domain-now">The itch: why a custom domain now?</h2>

<p>The github.io URL always felt temporary. It’s fine when you’re experimenting, but I’m now writing about real production topics - RAG pipelines, LLM inference, AWS architecture, client RFP approaches. A personal domain signals that this is a real, long-term effort. It’s also just easier to share verbally: <em>“edwardpraveen.com”</em> versus <em>“buildwithedward dot github dot io.”</em></p>

<hr />

<h2 id="the-two-options-i-considered">The two options I considered</h2>

<p>Before buying anything, I thought through the two realistic paths.</p>

<h3 id="option-1---move-hosting-to-aws-s3--cloudfront--route-53">Option 1 - Move hosting to AWS (S3 + CloudFront + Route 53)</h3>

<p><strong>Advantages:</strong></p>
<ul>
  <li>Full control over CDN behaviour, caching rules, and redirects</li>
  <li>Custom edge logic if needed in the future</li>
  <li>Everything under one AWS account alongside other infra</li>
</ul>

<p><strong>Considerations:</strong></p>
<ul>
  <li>~$15–20/yr total cost (Route 53 hosted zone + S3 + CloudFront)</li>
  <li>Requires a CI/CD pipeline (GitHub Actions → S3 sync + CloudFront invalidation)</li>
  <li>More moving parts to maintain for what is ultimately a static site</li>
  <li>Overkill for a Jekyll blog that rebuilds on every push</li>
</ul>

<h3 id="option-2---keep-github-pages-add-a-custom-domain--chosen">Option 2 - Keep GitHub Pages, add a custom domain ✓ <em>Chosen</em></h3>

<p><strong>Advantages:</strong></p>
<ul>
  <li>GitHub handles SSL automatically via Let’s Encrypt - no cost, no renewal headache</li>
  <li>Zero infrastructure to manage</li>
  <li>The old <code class="language-plaintext highlighter-rouge">buildwithedward.github.io</code> URL auto-redirects to the new domain</li>
  <li>Domain cost only (~$10/yr)</li>
</ul>

<p><strong>Considerations:</strong></p>
<ul>
  <li>No server-side logic or edge functions</li>
  <li>GitHub Pages uptime is tied to GitHub’s infrastructure (historically excellent)</li>
</ul>

<p><strong>Why I chose this:</strong> AWS makes sense when you need custom server-side behaviour, granular CDN control, or want everything under one account for compliance reasons. For a static Jekyll blog, GitHub Pages does everything needed. Right tool for the job - no point adding complexity to a solved problem.</p>

<hr />

<h2 id="buying-the-domain-what-i-skipped-at-namecheap-checkout">Buying the domain: what I skipped at Namecheap checkout</h2>

<p>I went with Namecheap and grabbed <code class="language-plaintext highlighter-rouge">edwardpraveen.com</code>. The checkout page, as always, tries to sell you a bundle of add-ons. Here’s my honest take on each:</p>

<table>
  <thead>
    <tr>
      <th>Add-on</th>
      <th>Cost</th>
      <th>Verdict</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>SSL Certificate</td>
      <td>~$15–50/yr</td>
      <td>❌ Skip - GitHub Pages gives free SSL via Let’s Encrypt. ACM is also free if you move to AWS later.</td>
    </tr>
    <tr>
      <td>Premium DNS</td>
      <td>~$5/yr</td>
      <td>❌ Skip - BasicDNS is perfectly adequate for a personal blog. Premium adds DDoS protection you don’t need.</td>
    </tr>
    <tr>
      <td>Sitelock Website Security</td>
      <td>~$20/yr</td>
      <td>❌ Skip - Sitelock scans PHP/WordPress sites for malware. A static Jekyll site has nothing to infect.</td>
    </tr>
    <tr>
      <td>Business Email</td>
      <td>~$15/yr</td>
      <td>⚠️ Optional - worth it for <code class="language-plaintext highlighter-rouge">edward@edwardpraveen.com</code>, but Zoho Mail has a free tier for one custom address.</td>
    </tr>
  </tbody>
</table>

<p>Total checkout: ~$10 for the domain alone.</p>

<hr />

<h2 id="the-actual-setup-what-i-changed">The actual setup: what I changed</h2>

<p>Once the domain was registered, the changes were split across two places.</p>

<h3 id="namecheap---advanced-dns">Namecheap - Advanced DNS</h3>

<p>Namecheap pre-populates two default records out of the box:</p>
<ul>
  <li>A CNAME pointing <code class="language-plaintext highlighter-rouge">www</code> to <code class="language-plaintext highlighter-rouge">parkingpage.namecheap.com</code></li>
  <li>A URL Redirect Record on <code class="language-plaintext highlighter-rouge">@</code></li>
</ul>

<p><strong>Delete both of these first.</strong> They will conflict with the GitHub records.</p>

<p>Then add the following:</p>

<p><strong>4 A records</strong> (GitHub’s servers):</p>

<table>
  <thead>
    <tr>
      <th>Type</th>
      <th>Host</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>A Record</td>
      <td>@</td>
      <td>185.199.108.153</td>
    </tr>
    <tr>
      <td>A Record</td>
      <td>@</td>
      <td>185.199.109.153</td>
    </tr>
    <tr>
      <td>A Record</td>
      <td>@</td>
      <td>185.199.110.153</td>
    </tr>
    <tr>
      <td>A Record</td>
      <td>@</td>
      <td>185.199.111.153</td>
    </tr>
  </tbody>
</table>

<p><strong>1 CNAME record</strong> (www subdomain):</p>

<table>
  <thead>
    <tr>
      <th>Type</th>
      <th>Host</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>CNAME Record</td>
      <td>www</td>
      <td>buildwithedward.github.io.</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p>Note the trailing dot after <code class="language-plaintext highlighter-rouge">.github.io.</code> - include it exactly as shown.</p>
</blockquote>

<h3 id="github-repository">GitHub repository</h3>

<ol>
  <li>Add a <code class="language-plaintext highlighter-rouge">CNAME</code> file in the root of your repo containing just your domain:
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>edwardpraveen.com
</code></pre></div>    </div>
  </li>
  <li>
    <p>Go to <strong>Settings → Pages → Custom domain</strong>, enter <code class="language-plaintext highlighter-rouge">edwardpraveen.com</code>, click Save.</p>
  </li>
  <li>
    <p>Wait for GitHub’s DNS check to pass (a few minutes), then tick <strong>Enforce HTTPS</strong>.</p>
  </li>
  <li>Update <code class="language-plaintext highlighter-rouge">_config.yml</code>:
    <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">url</span><span class="pi">:</span> <span class="s2">"</span><span class="s">https://edwardpraveen.com"</span>
<span class="na">baseurl</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
</code></pre></div>    </div>
  </li>
</ol>

<p>Commit and push. GitHub rebuilds the site and the SSL cert provisions automatically within ~30 minutes.</p>

<blockquote>
  <p>The old <code class="language-plaintext highlighter-rouge">buildwithedward.github.io</code> URL automatically issues a <strong>301 redirect</strong> to <code class="language-plaintext highlighter-rouge">edwardpraveen.com</code> once the custom domain is set - no extra config needed.</p>
</blockquote>

<h3 id="verify-everything-works">Verify everything works</h3>

<p>Once DNS propagates (30 min to 48 hrs), check:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">https://edwardpraveen.com</code> loads the blog</li>
  <li><code class="language-plaintext highlighter-rouge">https://www.edwardpraveen.com</code> loads the blog</li>
  <li><code class="language-plaintext highlighter-rouge">http://edwardpraveen.com</code> redirects to <code class="language-plaintext highlighter-rouge">https://</code></li>
  <li><code class="language-plaintext highlighter-rouge">buildwithedward.github.io</code> redirects to <code class="language-plaintext highlighter-rouge">edwardpraveen.com</code></li>
  <li>Padlock/SSL shows in the browser</li>
</ul>

<p>You can monitor DNS propagation in real time at <a href="https://dnschecker.org">dnschecker.org</a>.</p>

<hr />

<h2 id="whats-next-for-this-blog">What’s next for this blog</h2>

<p>The domain was the activation energy I needed. Going forward, I’m planning to publish weekly - covering the things I’m actually working on:</p>

<p><strong>Infrastructure and DevOps</strong> - real-world AWS architectures, ECS deployments, CI/CD pipelines, IaC patterns. The kind of end-to-end walkthroughs I wish existed when I was setting things up.</p>

<p><strong>AI/ML engineering</strong> - LLM inference optimization, RAG pipeline design, multi-agent orchestration. Practical content grounded in production constraints, not toy examples.</p>

<p><strong>Architecture approaches from client RFPs</strong> - I’ve been involved in scoping and proposing systems for enterprise clients. I’ll share the architectural thinking behind those proposals, with synthetic data where needed, so the reasoning is transferable even when the specifics can’t be.</p>

<hr />

<p>If you’ve been putting off claiming your own domain for a blog or portfolio, the barrier is genuinely low - one afternoon, ~$10, and a handful of DNS records. The github.io redirect means you lose nothing from your existing audience. There’s no good reason to wait.</p>]]></content><author><name>Edward Praveen</name></author><category term="devops" /><category term="infrastructure" /><category term="devops" /><category term="github-pages" /><category term="dns" /><category term="namecheap" /><category term="aws" /><category term="blogging" /><category term="infrastructure" /><summary type="html"><![CDATA[A step-by-step account of moving my technical blog off a github.io subdomain, the architecture decisions I weighed, and exactly what I clicked (and skipped) along the way.]]></summary></entry><entry><title type="html">How I Built an Explainable AI System to Optimize Pharma Sales Calls</title><link href="https://edwardpraveen.com/hcp-call-optimization/" rel="alternate" type="text/html" title="How I Built an Explainable AI System to Optimize Pharma Sales Calls" /><published>2026-05-18T00:00:00+05:30</published><updated>2026-05-18T00:00:00+05:30</updated><id>https://edwardpraveen.com/hcp-call-optimization</id><content type="html" xml:base="https://edwardpraveen.com/hcp-call-optimization/"><![CDATA[<p>A client came to me with a deceptively simple question:</p>

<blockquote>
  <p><em>“How many times should our sales reps call each doctor to maximize prescriptions - without annoying them?”</em></p>
</blockquote>

<p>My first instinct was: XGBoost. Maybe a neural network. Something powerful.</p>

<p>Then the client added one constraint that changed everything:</p>

<blockquote>
  <p><em>“We don’t want a black box. We need to explain every recommendation to our commercial team.”</em></p>
</blockquote>

<p>That sent me down a rabbit hole of <strong>explainable AI, causal thinking, and a model type I’d honestly underestimated</strong> - GAM (Generalized Additive Models). I ended up building a complete end-to-end ML platform to understand it deeply. This post walks through what I built and why.</p>

<hr />

<h2 id="the-business-problem">The Business Problem</h2>

<p>In the pharma world, <strong>HCP</strong> stands for Healthcare Provider - basically, the doctors that sales representatives visit to promote drugs.</p>

<p>The problem is a classic Goldilocks situation:</p>

<ul>
  <li><strong>Too few calls</strong> → the doctor forgets about your drug, you miss prescription opportunities</li>
  <li><strong>Too many calls</strong> → the doctor gets tired of hearing from you, engagement fatigue and they stop responding</li>
</ul>

<p>The goal isn’t just “more calls = more prescriptions.” The goal is finding the <strong>sweet spot for each individual doctor</strong> - because a cardiologist in New York and a neurologist in Texas will have completely different tolerances and response patterns.</p>

<p>This is what commercial analytics teams call <strong>call optimization</strong>, and it’s a real problem at scale when you have thousands of doctors across multiple territories.</p>

<hr />

<h2 id="why-i-didnt-use-deep-learning">Why I Didn’t Use Deep Learning</h2>

<p>When you hear “optimize for each individual”, it’s tempting to reach for a neural network. They’re great at learning individual patterns.</p>

<p>But the client’s commercial team had a legitimate need: they wanted to sit in a meeting and explain to their VP <em>why</em> Doctor X was getting 3 calls this quarter and Doctor Y was getting 6. A neural network can’t do that. It gives you a number with no explanation.</p>

<p>This is a more common situation than you’d think in enterprise settings - especially in regulated industries like pharma, finance, and healthcare. Sometimes <strong>trust and explainability matter more than squeezing out the last 1% of accuracy.</strong></p>

<p>This led me to <strong>GAM - Generalized Additive Models.</strong></p>

<hr />

<h2 id="what-is-a-gam">What is a GAM?</h2>

<p>You’ve probably seen linear regression:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Prescriptions = (Calls × 2) + (Emails × 3) + 5
</code></pre></div></div>

<p>That’s a straight line. Simple and explainable - but it assumes the relationship is always a straight line. In real life, that’s rarely true.</p>

<p>A GAM says instead:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Prescriptions = f(Calls) + f(Emails) + f(Recency) + ...
</code></pre></div></div>

<p>Each <code class="language-plaintext highlighter-rouge">f(...)</code> is a <strong>smooth curve</strong> the model learns from data - not a straight line. And here’s the magic: you can <strong>plot each curve individually</strong> and show a business person exactly what’s happening.</p>

<p>For this use case, the “calls” curve looks something like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Prescription Growth
        ▲
        │       ●●●
        │     ●     ●●
        │   ●          ●●●
        │ ●                ●●●●
        └──────────────────────────▶
          1   2   3   4   5   6   7
               Number of Calls

        ←── Sweet Spot ──→←─ Fatigue ─→
</code></pre></div></div>

<p>Initial calls drive growth. After a certain point, returns diminish. Too many calls and you actually hurt performance. GAM captures this naturally - and you can show this exact curve to a stakeholder and say <em>“this is why we’re recommending 4 calls.”</em></p>

<hr />

<h2 id="the-key-insight-not-all-doctors-are-the-same">The Key Insight: Not All Doctors Are the Same</h2>

<p>Before modeling, I realized there’s a fundamental split in the doctor population:</p>

<table>
  <thead>
    <tr>
      <th>Group</th>
      <th>Who They Are</th>
      <th>What We’re Trying to Do</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Breadth HCPs</strong></td>
      <td>Doctors who have <em>never</em> prescribed the drug</td>
      <td>Convert them to first-time prescribers</td>
    </tr>
    <tr>
      <td><strong>Depth HCPs</strong></td>
      <td>Doctors who <em>already</em> prescribe</td>
      <td>Grow their volume without causing fatigue</td>
    </tr>
  </tbody>
</table>

<p>These two groups need completely different models. Asking “will this doctor prescribe?” is a classification problem. Asking “how much will prescriptions grow with one more call?” is a regression problem.</p>

<p>So I built two separate pipelines:</p>

<ul>
  <li><strong>Breadth pipeline</strong> → GAM Logistic model → outputs a probability (0–100%) of first-time conversion</li>
  <li><strong>Depth pipeline</strong> → GAM Regression model → outputs predicted prescription count</li>
</ul>

<hr />

<h2 id="the-hidden-bias-problem-and-how-to-fix-it">The Hidden Bias Problem (And How to Fix It)</h2>

<p>Here’s something that trips up a lot of commercial analytics work:</p>

<blockquote>
  <p>Sales reps are human. They naturally spend more time visiting doctors who are already performing well.</p>
</blockquote>

<p>So if you naively train a model on raw call data, it learns the wrong thing:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"Doctors who get more calls write more prescriptions"
</code></pre></div></div>

<p>But that’s backwards. The doctor was already a top performer - the rep was visiting them <em>because</em> they were doing well, not the other way around. This is called <strong>selection bias</strong>, and it can make your model confidently wrong.</p>

<h3 id="fix-1-propensity-scoring">Fix 1: Propensity Scoring</h3>

<p>I built a logistic regression model that asks: <em>“Given this doctor’s profile, how likely is a sales rep to call them frequently?”</em> This score captures the bias - and lets us mathematically correct for it when training the main models. Think of it like a control group in a clinical trial.</p>

<h3 id="fix-2-uplift-features">Fix 2: Uplift Features</h3>

<p>Instead of asking “will this doctor prescribe?”, I engineered features around the question: <strong>“will an extra call actually <em>cause</em> more prescriptions?”</strong> This is the causal question. A doctor might have high prescriptions regardless of calls - in that case, calling them more is wasted effort.</p>

<p>Together, these two layers help the model measure the <strong>true incremental effect</strong> of a call, not just correlation.</p>

<hr />

<h2 id="how-the-full-system-works">How the Full System Works</h2>

<p>Here’s the complete flow from raw data to recommendation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1 - Load historical HCP data
         (calls, prescriptions, emails, territory, specialty)
         ↓
Step 2 - Feature engineering
         (responsiveness score, saturation estimate, recency)
         ↓
Step 3 - Split: Breadth vs Depth doctors
         ↓
Step 4 - Build Propensity Scores (fix selection bias)
         ↓
Step 5 - Engineer Uplift Features (causal signal)
         ↓
Step 6 - Train models
         Breadth → GAM Logistic  (predicts conversion %)
         Depth   → GAM Regression (predicts script growth)
         ↓
Step 7 - Find optimal call count per doctor
         (the point on the curve before diminishing returns)
         ↓
Step 8 - Serve recommendations via FastAPI
         ↓
Step 9 - Track in MLflow, monitor for drift, retrain when needed
</code></pre></div></div>

<hr />

<h2 id="the-architecture">The Architecture</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────────┐
│           DATA LAYER                        │
│  20,000 HCP records (synthetic)             │
│  Calls, scripts, emails, specialty, etc.    │
└──────────────────┬──────────────────────────┘
                   │
┌──────────────────▼──────────────────────────┐
│         FEATURE ENGINEERING                 │
│  Propensity Scores + Uplift Features        │
└──────────────────┬──────────────────────────┘
                   │
        ┌──────────┴───────────┐
        │                      │
┌───────▼────────┐   ┌─────────▼──────────┐
│ BREADTH MODEL  │   │   DEPTH MODEL      │
│                │   │                    │
│ GAM Logistic   │   │ GAM Regression     │
│ → Conversion % │   │ → Script Growth    │
│                │   │                    │
│ XGBoost        │   │ XGBoost Regressor  │
│ (benchmark)    │   │ (benchmark)        │
└───────┬────────┘   └─────────┬──────────┘
        └──────────┬───────────┘
                   │
┌──────────────────▼──────────────────────────┐
│           MLOPS LAYER                       │
│  MLflow → FastAPI → Docker → Monitoring     │
└─────────────────────────────────────────────┘
</code></pre></div></div>

<hr />

<h2 id="step-1---generating-the-data">Step 1 - Generating the Data</h2>

<p>I started with a synthetic dataset of 20,000 doctors. The key fields:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">({</span>
    <span class="sh">"</span><span class="s">hcp_id</span><span class="sh">"</span><span class="p">:</span> <span class="n">hcp_ids</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">specialty</span><span class="sh">"</span><span class="p">:</span> <span class="n">specialty</span><span class="p">,</span>          <span class="c1"># Cardiology, Oncology, etc.
</span>    <span class="sh">"</span><span class="s">previous_scripts</span><span class="sh">"</span><span class="p">:</span> <span class="p">...,</span>         <span class="c1"># How many scripts before?
</span>    <span class="sh">"</span><span class="s">calls</span><span class="sh">"</span><span class="p">:</span> <span class="p">...,</span>                    <span class="c1"># Calls made this quarter
</span>    <span class="sh">"</span><span class="s">email_open_rate</span><span class="sh">"</span><span class="p">:</span> <span class="p">...,</span>          <span class="c1"># Engagement signal
</span>    <span class="sh">"</span><span class="s">responsiveness</span><span class="sh">"</span><span class="p">:</span> <span class="p">...,</span>           <span class="c1"># How much does this doctor respond?
</span>    <span class="sh">"</span><span class="s">saturation</span><span class="sh">"</span><span class="p">:</span> <span class="p">...,</span>               <span class="c1"># How quickly do they tire?
</span>    <span class="sh">"</span><span class="s">estimated_uplift</span><span class="sh">"</span><span class="p">:</span> <span class="p">...,</span>         <span class="c1"># Causal signal
</span>    <span class="sh">"</span><span class="s">converted</span><span class="sh">"</span><span class="p">:</span> <span class="p">...,</span>                <span class="c1"># Did they prescribe? (0 or 1)
</span>    <span class="sh">"</span><span class="s">hcp_type</span><span class="sh">"</span><span class="p">:</span> <span class="p">...</span>                  <span class="c1"># "Breadth" or "Depth"
</span><span class="p">})</span>
</code></pre></div></div>

<p>The synthetic script growth formula captures the saturation curve:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">script_growth</span> <span class="o">=</span> <span class="p">(</span>
    <span class="n">responsiveness</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nf">log1p</span><span class="p">(</span><span class="n">calls</span><span class="p">)</span>   <span class="c1"># diminishing returns
</span>    <span class="o">-</span> <span class="n">saturation</span> <span class="o">*</span> <span class="p">(</span><span class="n">calls</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>        <span class="c1"># fatigue penalty
</span><span class="p">)</span>
</code></pre></div></div>

<p>This means early calls help, but the squared penalty kicks in as calls increase. That’s the curve GAM will learn.</p>

<hr />

<h2 id="step-2---training-the-breadth-model">Step 2 - Training the Breadth Model</h2>

<p>For doctors who’ve never prescribed, we want to predict the probability they’ll convert:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pygam</span> <span class="kn">import</span> <span class="n">LogisticGAM</span><span class="p">,</span> <span class="n">s</span>

<span class="n">model</span> <span class="o">=</span> <span class="nc">LogisticGAM</span><span class="p">(</span>
    <span class="nf">s</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">+</span> <span class="nf">s</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="nf">s</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="nf">s</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="nf">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>

<span class="n">pred_probs</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="nf">predict_proba</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
<span class="n">auc</span> <span class="o">=</span> <span class="nf">roc_auc_score</span><span class="p">(</span><span class="n">y_test</span><span class="p">,</span> <span class="n">pred_probs</span><span class="p">)</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">s()</code> is a spline term - it tells GAM “learn a smooth curve for this feature.” Each feature gets its own curve. You can plot them individually and explain exactly what’s happening.</p>

<hr />

<h2 id="step-3---training-the-depth-model">Step 3 - Training the Depth Model</h2>

<p>For existing prescribers, we predict how many scripts they’ll write:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pygam</span> <span class="kn">import</span> <span class="n">LinearGAM</span><span class="p">,</span> <span class="n">s</span>

<span class="n">model</span> <span class="o">=</span> <span class="nc">LinearGAM</span><span class="p">(</span>
    <span class="nf">s</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">+</span> <span class="nf">s</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="nf">s</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="nf">s</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span> <span class="o">+</span> <span class="nf">s</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="nf">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>

<span class="n">preds</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="nf">predict</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
<span class="n">rmse</span> <span class="o">=</span> <span class="nf">mean_squared_error</span><span class="p">(</span><span class="n">y_test</span><span class="p">,</span> <span class="n">preds</span><span class="p">)</span> <span class="o">**</span> <span class="mf">0.5</span>
</code></pre></div></div>

<p>The partial dependence plot for “calls” is the response curve - it shows exactly where the saturation point is. That’s the number you put in your quarterly call plan.</p>

<hr />

<h2 id="step-4---tracking-with-mlflow">Step 4 - Tracking with MLflow</h2>

<p>Without experiment tracking, you’re flying blind. Every training run logs its metrics and artifacts:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">mlflow</span>

<span class="n">mlflow</span><span class="p">.</span><span class="nf">set_experiment</span><span class="p">(</span><span class="sh">"</span><span class="s">hcp-call-optimization</span><span class="sh">"</span><span class="p">)</span>

<span class="k">with</span> <span class="n">mlflow</span><span class="p">.</span><span class="nf">start_run</span><span class="p">():</span>
    <span class="n">model</span><span class="p">.</span><span class="nf">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>
    <span class="n">rmse</span> <span class="o">=</span> <span class="nf">mean_squared_error</span><span class="p">(</span><span class="n">y_test</span><span class="p">,</span> <span class="n">preds</span><span class="p">)</span> <span class="o">**</span> <span class="mf">0.5</span>

    <span class="n">mlflow</span><span class="p">.</span><span class="nf">log_metric</span><span class="p">(</span><span class="sh">"</span><span class="s">rmse</span><span class="sh">"</span><span class="p">,</span> <span class="n">rmse</span><span class="p">)</span>
    <span class="n">mlflow</span><span class="p">.</span><span class="nf">log_param</span><span class="p">(</span><span class="sh">"</span><span class="s">model_type</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Depth_GAM</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">mlflow</span><span class="p">.</span><span class="nf">log_artifact</span><span class="p">(</span><span class="sh">"</span><span class="s">models/depth_gam.pkl</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>Run <code class="language-plaintext highlighter-rouge">mlflow ui</code> and you get a full dashboard showing every experiment - which parameters you used, what the metrics were, and which model artifacts were saved.</p>

<hr />

<h2 id="step-5---the-inference-api">Step 5 - The Inference API</h2>

<p>Once models are trained, they’re wrapped in a FastAPI service. Any downstream system - a CRM, a planning tool, a dashboard - can call it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>POST /predict/breadth
~ Send: { calls, email_open_rate, call_recency_days, estimated_uplift }
~ Get:  { "conversion_probability": 0.73 }

POST /predict/depth
~ Send: { calls, previous_scripts, email_open_rate, ... }
~ Get:  { "predicted_scripts": 14.2 }
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">fastapi</span> <span class="kn">import</span> <span class="n">FastAPI</span>
<span class="kn">from</span> <span class="n">pydantic</span> <span class="kn">import</span> <span class="n">BaseModel</span>
<span class="kn">import</span> <span class="n">joblib</span>

<span class="n">app</span> <span class="o">=</span> <span class="nc">FastAPI</span><span class="p">()</span>

<span class="n">breadth_model</span> <span class="o">=</span> <span class="n">joblib</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="sh">"</span><span class="s">models/breadth_gam.pkl</span><span class="sh">"</span><span class="p">)</span>
<span class="n">depth_model</span> <span class="o">=</span> <span class="n">joblib</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="sh">"</span><span class="s">models/depth_gam.pkl</span><span class="sh">"</span><span class="p">)</span>

<span class="nd">@app.post</span><span class="p">(</span><span class="sh">"</span><span class="s">/predict/breadth</span><span class="sh">"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">predict_breadth</span><span class="p">(</span><span class="n">hcp</span><span class="p">:</span> <span class="n">BreadthInput</span><span class="p">):</span>
    <span class="n">row</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">([[</span><span class="n">hcp</span><span class="p">.</span><span class="n">calls</span><span class="p">,</span> <span class="n">hcp</span><span class="p">.</span><span class="n">email_open_rate</span><span class="p">,</span> <span class="p">...]])</span>
    <span class="n">pred</span> <span class="o">=</span> <span class="n">breadth_model</span><span class="p">.</span><span class="nf">predict_proba</span><span class="p">(</span><span class="n">row</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
    <span class="k">return</span> <span class="p">{</span><span class="sh">"</span><span class="s">conversion_probability</span><span class="sh">"</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">pred</span><span class="p">)}</span>
</code></pre></div></div>

<p>Start the server with <code class="language-plaintext highlighter-rouge">uvicorn api.app:app --reload</code> and the interactive docs are available at <code class="language-plaintext highlighter-rouge">http://127.0.0.1:8000/docs</code>.</p>

<hr />

<h2 id="step-6---monitoring-and-retraining">Step 6 - Monitoring and Retraining</h2>

<p>The monitoring script checks whether real outcomes match predictions:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mean_error</span> <span class="o">=</span> <span class="n">predictions</span><span class="p">[</span><span class="sh">"</span><span class="s">error</span><span class="sh">"</span><span class="p">].</span><span class="nf">mean</span><span class="p">()</span>

<span class="k">if</span> <span class="n">mean_error</span> <span class="o">&gt;</span> <span class="mi">5</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Retraining recommended</span><span class="sh">"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Model stable</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>Simple, but the idea scales. In production, you’d feed actual prescription data back in, calculate drift, and trigger a retraining pipeline automatically.</p>

<hr />

<h2 id="why-xgboost-stayed-as-a-benchmark">Why XGBoost Stayed as a Benchmark</h2>

<p>I also trained XGBoost models - both a classifier for breadth and a regressor for depth. They scored similarly to the GAM models.</p>

<p>But the decision was clear: <strong>explainability beats a marginal accuracy bump.</strong> In a commercial setting, a model that a business analyst can point to and explain is infinitely more adoptable than a black box with slightly better AUC.</p>

<p>XGBoost stayed in the project as a challenger - useful for validation, not for production.</p>

<hr />

<h2 id="project-structure">Project Structure</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hcp-call-optimization/
│
├── data/
│   └── synthetic_hcp_data.csv
│
├── src/
│   ├── generate_data.py            ← Synthetic dataset
│   ├── propensity_model.py         ← Bias correction
│   ├── breadth_gam_model.py        ← Breadth training
│   ├── depth_gam_model.py          ← Depth training
│   ├── train_breadth_pipeline.py   ← MLflow breadth run
│   └── train_pipeline.py           ← MLflow depth run
│
├── models/
│   ├── breadth_gam.pkl
│   └── depth_gam.pkl
│
├── api/
│   └── app.py                      ← FastAPI service
│
├── monitoring/
│   └── monitor.py
│
└── docker/
    └── Dockerfile
</code></pre></div></div>

<hr />

<h2 id="running-the-project">Running the Project</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Generate data</span>
python src/generate_data.py

<span class="c"># Build bias-correction scores</span>
python src/propensity_model.py

<span class="c"># Train models</span>
python src/breadth_gam_model.py
python src/depth_gam_model.py

<span class="c"># MLflow-tracked training runs</span>
python src/train_breadth_pipeline.py
python src/train_pipeline.py

<span class="c"># View experiment history</span>
mlflow ui
<span class="c">#  http://localhost:5000</span>

<span class="c"># Start the prediction API</span>
uvicorn api.app:app <span class="nt">--reload</span>
<span class="c">#  http://127.0.0.1:8000/docs</span>

<span class="c"># Run monitoring check</span>
python monitoring/monitor.py
</code></pre></div></div>

<hr />

<h2 id="key-takeaways">Key Takeaways</h2>

<p><strong>1. Explainability is a feature, not a consolation prize</strong></p>

<p>In enterprise AI - especially pharma, finance, or healthcare - a model your stakeholders trust will always outperform a model they don’t. GAM gave me comparable accuracy to XGBoost and full transparency. In regulated industries, that’s not a tradeoff. It’s the right answer.</p>

<p><strong>2. Always ask the causal question</strong></p>

<p>“Will this doctor prescribe?” and “Did our call <em>cause</em> more prescriptions?” are completely different questions. Getting that distinction right changes your feature engineering, your model architecture, and the quality of your recommendations. Propensity scoring and uplift modeling exist to close that gap.</p>

<p><strong>3. GAM is underrated for commercial analytics</strong></p>

<p>I came in expecting GAM to be a fallback option. I left with a lot of respect for it. For problems involving saturation curves, diminishing returns, and engagement optimization, GAM is genuinely the right tool - not just a compromise.</p>

<p><strong>4. MLOps is what makes ML real</strong></p>

<p>A trained model file sitting on your laptop is not a product. The FastAPI layer, MLflow tracking, Docker container, and monitoring pipeline are what turn an experiment into something a business can actually use.</p>

<hr />

<h2 id="whats-next">What’s Next</h2>

<p>If I were to extend this into a production system, the next steps would be:</p>

<ul>
  <li><strong>SHAP values</strong> - per-prediction feature attribution to explain individual recommendations</li>
  <li><strong>CausalML / EconML</strong> - more rigorous uplift modeling with proper treatment/control framing</li>
  <li><strong>Airflow</strong> - scheduled retraining pipelines triggered by drift alerts</li>
  <li><strong>AWS SageMaker</strong> - scalable cloud deployment with auto-scaling endpoints</li>
  <li><strong>Feature Store</strong> - centralized, versioned features shared across models</li>
  <li><strong>Drift detection</strong> - automated alerts when input data distribution shifts post-deployment</li>
</ul>

<hr />

<p>The project started as a way to understand a client use case. It ended up being one of the most interesting explorations I’ve done - touching causal ML, response curve optimization, and the full MLOps stack. Sometimes the most valuable thing a constraint gives you is a reason to think differently.</p>

<p>If you’re working on commercial analytics, HCP engagement, or explainable ML in regulated industries - I’d love to hear how you’re approaching it.</p>]]></content><author><name>Edward Praveen</name></author><category term="mlops" /><category term="explainable-ai" /><category term="gam" /><category term="pharma" /><category term="fastapi" /><category term="mlflow" /><category term="python" /><summary type="html"><![CDATA[When a client said 'no black boxes', I discovered GAM - and it changed how I think about enterprise ML]]></summary></entry><entry><title type="html">Reverse Engineering an MCP-Based Job Application (Without Instructions)</title><link href="https://edwardpraveen.com/modern-resume-submission/" rel="alternate" type="text/html" title="Reverse Engineering an MCP-Based Job Application (Without Instructions)" /><published>2026-04-15T00:00:00+05:30</published><updated>2026-04-15T00:00:00+05:30</updated><id>https://edwardpraveen.com/modern-resume-submission</id><content type="html" xml:base="https://edwardpraveen.com/modern-resume-submission/"><![CDATA[<p>Most job applications are simple - upload your CV, fill out a form, and click submit. But this one was completely different.</p>

<p>I came across a role for a <strong>Forward Deployed Engineer</strong> that had a very unusual ask:</p>

<iframe src="https://www.linkedin.com/embed/feed/update/urn:li:share:7450164975286906880" height="789" width="504" frameborder="0" allowfullscreen="" title="Embedded post"></iframe>

<blockquote>
  <p>“Submit your application through this MCP endpoint. No instructions.”</p>
</blockquote>

<p>Just a URL. No documentation. No user interface. No hints.</p>

<p>Here’s how I figured out the entire system from scratch and successfully submitted my application.</p>

<hr />

<h2 id="what-is-mcp">What Is MCP?</h2>

<p>Before diving in, let me quickly explain. <strong>MCP (Model Context Protocol)</strong> is a way for AI tools and services to communicate with each other. Think of it like a language that AI systems use to talk. In this case, the job application was hidden behind an MCP server - meaning I had to speak its language to apply.</p>

<hr />

<h2 id="step-1---hitting-the-endpoint">Step 1 - Hitting the Endpoint</h2>

<p>I started with the simplest thing possible - just visiting the URL:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-i</span> https://submit-cv-mcp.webrix.workers.dev/mcp
</code></pre></div></div>

<p>The response:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"error"</span><span class="p">:</span><span class="w"> </span><span class="s2">"invalid_token"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"error_description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Missing or invalid access token"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>In plain English: the door is locked, and I need a key (an access token) to get in.</p>

<hr />

<h2 id="step-2---reading-the-response-headers">Step 2 - Reading the Response Headers</h2>

<p>When a server sends back an error, it often includes <strong>headers</strong> (extra metadata) in the response. These headers had a critical clue:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>www-authenticate: Bearer realm="OAuth",
resource_metadata=".../.well-known/oauth-protected-resource/mcp"
</code></pre></div></div>

<p>This is like the locked door having a sign that says: <em>“The key shop is at this address.”</em> The server was pointing me toward an <strong>OAuth</strong> authentication system - a standard way websites let you prove who you are.</p>

<hr />

<h2 id="step-3---oauth-discovery">Step 3 - OAuth Discovery</h2>

<p>Following the clue from the headers, I visited the metadata URL:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://submit-cv-mcp.webrix.workers.dev/.well-known/oauth-protected-resource/mcp
</code></pre></div></div>

<p>This told me where the authentication server lives. Then I fetched the full configuration:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://submit-cv-mcp.webrix.workers.dev/.well-known/oauth-authorization-server
</code></pre></div></div>

<p>This gave me three important URLs:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">/authorize</code> - where I go to prove who I am</li>
  <li><code class="language-plaintext highlighter-rouge">/token</code> - where I exchange a temporary code for a real access token</li>
  <li><code class="language-plaintext highlighter-rouge">/register</code> - where I register myself as a new “client” (application)</li>
</ul>

<p>Think of it like discovering the full map of a building: <em>“Registration is on floor 1, authorization on floor 2, and key pickup on floor 3.”</em></p>

<hr />

<h2 id="step-4---registering-a-client">Step 4 - Registering a Client</h2>

<p>Before I could authenticate, I needed to register. This is like signing up before you can log in:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-X</span> POST https://submit-cv-mcp.webrix.workers.dev/register <span class="se">\</span>
  <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span>
  <span class="nt">-d</span> <span class="s1">'{"redirect_uris":["http://localhost"]}'</span>
</code></pre></div></div>

<p>The server gave me a <code class="language-plaintext highlighter-rouge">client_id</code> and <code class="language-plaintext highlighter-rouge">client_secret</code> - think of these as my username and password for the authentication process.</p>

<hr />

<h2 id="step-5---pkce-proof-key-for-code-exchange">Step 5 - PKCE (Proof Key for Code Exchange)</h2>

<p>OAuth uses an extra security step called <strong>PKCE</strong> (pronounced “pixie”). It ensures nobody can intercept and steal your login mid-process. Here’s how it works in simple terms:</p>

<ol>
  <li>I generate a random secret string (the <strong>verifier</strong>)</li>
  <li>I create a scrambled version of it (the <strong>challenge</strong>)</li>
  <li>I send the challenge first, and prove I know the original verifier later</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">VERIFIER</span><span class="o">=</span><span class="si">$(</span>openssl rand <span class="nt">-base64</span> 32 | <span class="nb">tr</span> <span class="nt">-d</span> <span class="s1">'=+/'</span><span class="si">)</span>
<span class="nv">CHALLENGE</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="nt">-n</span> <span class="nv">$VERIFIER</span> | openssl dgst <span class="nt">-sha256</span> <span class="nt">-binary</span> | openssl <span class="nb">base64</span> | <span class="nb">tr</span> <span class="s1">'+/'</span> <span class="s1">'-_'</span> | <span class="nb">tr</span> <span class="nt">-d</span> <span class="s1">'='</span><span class="si">)</span>
</code></pre></div></div>

<p>It’s like telling someone <em>“I’ll prove my identity later by completing this puzzle that only I can solve.”</em></p>

<hr />

<h2 id="step-6---authorization-browser-flow">Step 6 - Authorization (Browser Flow)</h2>

<p>Next, I had to open a URL in my browser with all the pieces put together:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/authorize?response_type=code
  &amp;client_id=...
  &amp;redirect_uri=http://localhost
  &amp;code_challenge=...
  &amp;code_challenge_method=S256
</code></pre></div></div>

<p>This is the part where you actually log in and give permission.</p>

<hr />

<h2 id="step-7---the-unexpected-twist">Step 7 - The Unexpected Twist</h2>

<p>Here’s where things got interesting. Instead of a simple login form:</p>

<ul>
  <li><strong>CSRF protection</strong> appeared (a security measure to prevent fake requests)</li>
  <li>The system required a <strong>browser session</strong> with cookies</li>
  <li>It then <strong>redirected me to GitHub</strong> for login</li>
</ul>

<p>This revealed that the system uses <strong>GitHub as the identity provider</strong>. In other words: <em>“Prove who you are by logging into GitHub.”</em></p>

<hr />

<h2 id="step-8---completing-the-oauth-flow">Step 8 - Completing the OAuth Flow</h2>

<p>The full sequence:</p>
<ol>
  <li>Visit the authorize URL</li>
  <li>Get redirected to GitHub</li>
  <li>Log in and approve access</li>
  <li>Get redirected back with a temporary <code class="language-plaintext highlighter-rouge">code</code></li>
</ol>

<p>I captured this code from the redirect URL. This code is like a receipt - proof that I successfully authenticated.</p>

<hr />

<h2 id="step-9---getting-the-access-token">Step 9 - Getting the Access Token</h2>

<p>Now I traded my temporary code for a real access token:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-X</span> POST https://submit-cv-mcp.webrix.workers.dev/token <span class="se">\</span>
  <span class="nt">-H</span> <span class="s2">"Content-Type: application/x-www-form-urlencoded"</span> <span class="se">\</span>
  <span class="nt">-d</span> <span class="s2">"grant_type=authorization_code&amp;client_id=...&amp;client_secret=...&amp;code=...&amp;redirect_uri=http://localhost&amp;code_verifier=..."</span>
</code></pre></div></div>

<p>The server verified everything matched (the code, the PKCE verifier, the client credentials) and gave me the golden key: an <strong>access token</strong>.</p>

<hr />

<h2 id="step-10---calling-mcp-a-new-problem">Step 10 - Calling MCP (A New Problem)</h2>

<p>With my access token in hand, I tried calling the MCP endpoint again. But I got another error:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Client must accept application/json and text/event-stream
</code></pre></div></div>

<p>This told me something important: <strong>this is not a regular REST API.</strong> It uses a streaming protocol, meaning the server can send data continuously rather than all at once - like a live conversation instead of sending letters.</p>

<hr />

<h2 id="step-11---json-rpc-and-mcp-protocol">Step 11 - JSON-RPC and MCP Protocol</h2>

<p>MCP uses a format called <strong>JSON-RPC</strong> - a simple way to call functions remotely. Instead of visiting different URLs for different actions (like REST), you send messages to one endpoint with a “method” field:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"jsonrpc"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2.0"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"..."</span><span class="p">,</span><span class="w">
  </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Think of it like calling a receptionist and saying <em>“I’d like to speak to the submit-cv department”</em> instead of walking to different offices.</p>

<hr />

<h2 id="step-12---starting-a-session">Step 12 - Starting a Session</h2>

<p>First, I needed to initialize a session:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"initialize"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The response included a <strong>session ID</strong> - meaning the server remembers who I am across multiple requests. This is important because MCP is <strong>stateful</strong> (it keeps track of the conversation).</p>

<hr />

<h2 id="step-13---discovering-whats-available">Step 13 - Discovering What’s Available</h2>

<p>Next, I asked the server what tools (actions) are available:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"tools/list"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Response:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"tools"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="s2">"get-job-description"</span><span class="p">,</span><span class="w">
    </span><span class="s2">"submit-cv"</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Two tools: one to read the job description, and one to submit my CV.</p>

<hr />

<h2 id="step-14---hidden-requirements">Step 14 - Hidden Requirements</h2>

<p>When I tried to use <code class="language-plaintext highlighter-rouge">submit-cv</code>, it told me I needed two special codes:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">resourceCode</code></li>
  <li><code class="language-plaintext highlighter-rouge">promptCode</code></li>
</ul>

<p>These weren’t given to me directly - I had to fetch them from the MCP server using its <strong>resources</strong> and <strong>prompts</strong> features. It’s like being told <em>“bring form A and form B to the submission window”</em> – but first you have to find where forms A and B are stored.</p>

<hr />

<h2 id="step-15---fetching-the-resource-code">Step 15 - Fetching the Resource Code</h2>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"resources/read"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"uri"</span><span class="p">:</span><span class="w"> </span><span class="s2">"submit-cv-mcp://cv/submission-code"</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>This gave me the first required code.</p>

<hr />

<h2 id="step-16---fetching-the-prompt-code">Step 16 - Fetching the Prompt Code</h2>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"prompts/get"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"submit-cv"</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>This gave me the second required code.</p>

<hr />

<h2 id="step-17---submitting-the-cv">Step 17 - Submitting the CV</h2>

<p>With both codes in hand, I made the final submission:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"tools/call"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"submit-cv"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"arguments"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"resourceCode"</span><span class="p">:</span><span class="w"> </span><span class="s2">"..."</span><span class="p">,</span><span class="w">
      </span><span class="nl">"promptCode"</span><span class="p">:</span><span class="w"> </span><span class="s2">"..."</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<hr />

<h2 id="step-18---one-more-thing-elicitation">Step 18 - One More Thing (Elicitation)</h2>

<p>Just when I thought I was done, the server sent back a new kind of request:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"elicitation/create"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>It was asking me for:</p>
<ul>
  <li><strong>Email</strong> (required)</li>
  <li><strong>Phone</strong> (optional)</li>
  <li><strong>Note</strong> (optional)</li>
</ul>

<p>This is called <strong>elicitation</strong> – the server is asking the client for information mid-conversation, like a form popping up during a chat.</p>

<hr />

<h2 id="step-19---responding-correctly">Step 19 - Responding Correctly</h2>

<p>This was the trickiest part. Instead of calling a new method, I had to <strong>respond using the same request ID</strong> that the server sent. This is how two-way (bidirectional) JSON-RPC works:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"jsonrpc"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2.0"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
  </span><span class="nl">"result"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"email"</span><span class="p">:</span><span class="w"> </span><span class="s2">"your@email.com"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"phone"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
    </span><span class="nl">"note"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Looking forward to discussing this opportunity."</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>It’s like the server said <em>“Hey, before I finish, I need one more thing”</em> – and I had to reply in exactly the right format.</p>

<hr />

<h2 id="the-result">The Result</h2>

<ul>
  <li>CV submitted</li>
  <li>Contact details shared</li>
  <li>Challenge completed</li>
</ul>

<hr />

<h2 id="what-this-challenge-actually-tested">What This Challenge Actually Tested</h2>

<p>This wasn’t a coding test. It was a <strong>thinking test</strong>. Here’s what it evaluated:</p>

<p><strong>System Thinking</strong> - Can you follow signals and breadcrumbs instead of waiting for instructions?</p>

<p><strong>Debugging Skills</strong> - Can you read error messages and headers to figure out what’s going on?</p>

<p><strong>Protocol Understanding</strong> - Do you know how OAuth 2.0, PKCE, JSON-RPC, and MCP work (or can you figure them out)?</p>

<p><strong>Adaptability</strong> - Can you switch between tools (curl, browser, APIs, streaming) as needed?</p>

<p><strong>Real Engineering Behavior</strong> - Can you experiment, fail, iterate, and solve?</p>

<hr />

<h2 id="key-takeaways">Key Takeaways</h2>

<ol>
  <li><strong>Error messages are guidance, not noise</strong> - they tell you exactly what’s wrong and often hint at the solution</li>
  <li><strong>HTTP headers contain critical clues</strong> - always read them, especially on auth failures</li>
  <li><strong>Not everything is REST</strong> - protocols like JSON-RPC and MCP exist and are increasingly important in the AI era</li>
  <li><strong>Authentication flows are stateful</strong> - you need to track sessions, codes, and tokens carefully</li>
  <li><strong>Understanding protocols matters more than knowing tools</strong> - if you understand the “why,” you can figure out the “how”</li>
</ol>

<hr />

<h2 id="final-thoughts">Final Thoughts</h2>

<p>This was one of the most interesting application processes I’ve encountered. Instead of asking <em>“What do you know?”</em>, it asked <em>“How do you think?”</em></p>

<p>And that makes all the difference.</p>]]></content><author><name>Edward Praveen</name></author><category term="engineering" /><category term="oauth" /><category term="debugging" /><category term="systems" /><category term="mcp" /><category term="backend" /><category term="ai" /><summary type="html"><![CDATA[How I cracked an unconventional job application that gave me nothing but a URL]]></summary></entry><entry><title type="html">Day 10 of 180 - PyTest &amp;amp; Virtual Environments</title><link href="https://edwardpraveen.com/dl-llm-systems/pytest-day10/" rel="alternate" type="text/html" title="Day 10 of 180 - PyTest &amp;amp; Virtual Environments" /><published>2026-03-29T00:00:00+05:30</published><updated>2026-03-29T00:00:00+05:30</updated><id>https://edwardpraveen.com/dl-llm-systems/pytest-day10</id><content type="html" xml:base="https://edwardpraveen.com/dl-llm-systems/pytest-day10/"><![CDATA[<blockquote>
  <h2 id="part-of-my-180-day-ai-engineering-journey---learning-in-public-one-hour-a-day-writing-everything-in-plain-english-so-beginners-can-follow-along-the-blog-is-written-with-the-help-of-ai"><em>Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI</em></h2>
</blockquote>

<h2 id="introduction">Introduction</h2>

<p>There’s a moment every developer faces. You ship code to production. It works. Customers use it. And then one of them does something you didn’t expect. Your app crashes. They post a 1-star review. You don’t sleep.</p>

<p>Here’s the thing: that crash-to-be <strong>existed in your code the entire time</strong>. You just never tested that specific path. You tested the happy path-“user enters valid data, everything works”-but you never tested “user enters empty data” or “user enters huge numbers” or “user is offline.”</p>

<p>Today is the day you <strong>stop letting bugs hide.</strong></p>

<p>Day 10 is a turning point in your journey as an engineer. You’re moving from “code that works on my machine” to “code I can <em>prove</em> works.” This is where professionalism begins.</p>

<p>You’ll learn three tools:</p>

<ol>
  <li><strong>Virtual environments</strong> - Keep your projects isolated, like separate apartments instead of one giant dorm</li>
  <li><strong>requirements.txt + pip</strong> - Make your dependencies reproducible, like a recipe with exact measurements</li>
  <li><strong>pytest</strong> - Write tests that catch bugs before users do, like crash-test dummies instead of real crashes</li>
</ol>

<p>By the end, you’ll have a test suite that gives you confidence. And confidence is everything.</p>

<hr />

<h2 id="setup">Setup</h2>

<p>First, create a project directory and set up your environment:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create and navigate to project directory</span>
<span class="nb">mkdir</span> ~/workspace/day10-testing
<span class="nb">cd</span> ~/workspace/day10-testing

<span class="c"># Create virtual environment</span>
python <span class="nt">-m</span> venv .venv

<span class="c"># Activate it (macOS/Linux)</span>
<span class="nb">source</span> .venv/bin/activate

<span class="c"># On Windows, use:</span>
<span class="c"># .venv\Scripts\Activate.ps1</span>
</code></pre></div></div>

<p>You should see <code class="language-plaintext highlighter-rouge">(.venv)</code> appear in your terminal prompt. Good-your venv is active.</p>

<p>Now install the tools you need:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span><span class="nv">pytest</span><span class="o">==</span>7.4.3 pytest-cov<span class="o">==</span>4.1.0
</code></pre></div></div>

<p>Create your <code class="language-plaintext highlighter-rouge">requirements.txt</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pytest==7.4.3
pytest-cov==4.1.0
</code></pre></div></div>

<p>Create your <code class="language-plaintext highlighter-rouge">.gitignore</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.venv/
__pycache__/
*.pyc
.pytest_cache/
.coverage
htmlcov/
dist/
build/
*.egg-info/
.DS_Store
</code></pre></div></div>

<p>Your project structure so far:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>day10-testing/
├── .venv/
├── .gitignore
├── requirements.txt
└── (code files go here)
</code></pre></div></div>

<hr />

<h2 id="part-1-virtual-environments">Part 1: Virtual Environments</h2>

<h3 id="why-virtual-environments-exist">Why Virtual Environments Exist</h3>

<p>Imagine you’re building two houses on the same street. The first house needs blue paint. The second needs red paint. If they shared the same paint supply, disaster: when you paint one red, the other turns red too.</p>

<p>Python projects are like that. Project A might need Django 3.0 (an older web framework). Project B might need Django 4.0 (a newer version with different features). If both projects share the same Python installation, <strong>you can’t install both versions at the same time.</strong></p>

<p>Virtual environments solve this by creating a <strong>separate Python installation for each project.</strong> Each project gets its own <code class="language-plaintext highlighter-rouge">site-packages/</code> folder (where packages live). Paint one project red, the other stays blue.</p>

<h3 id="creating-a-virtual-environment">Creating a Virtual Environment</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python <span class="nt">-m</span> venv .venv
</code></pre></div></div>

<p>Let me break this down:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">python</code> - Call Python itself</li>
  <li><code class="language-plaintext highlighter-rouge">-m venv</code> - Run Python’s built-in venv module</li>
  <li><code class="language-plaintext highlighter-rouge">.venv</code> - Create a folder called <code class="language-plaintext highlighter-rouge">.venv</code> (hidden on macOS/Linux because it starts with a dot)</li>
</ul>

<p>What did Python create? A folder with this structure:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.venv/
├── bin/                 # (on macOS/Linux) or Scripts/ (on Windows)
│   ├── python           # A Python binary specific to this venv
│   ├── pip              # A pip specific to this venv
│   ├── activate         # Script to activate the venv
│   └── ...
├── lib/
│   └── python3.11/
│       └── site-packages/   # Empty folder where packages will install
└── pyvenv.cfg
</code></pre></div></div>

<p>All of this is isolated. Your system Python is untouched.</p>

<h3 id="activating-and-deactivating">Activating and Deactivating</h3>

<p><strong>Activation</strong> means “use this venv’s Python instead of the system Python.”</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># macOS/Linux</span>
<span class="nb">source</span> .venv/bin/activate

<span class="c"># Windows (PowerShell)</span>
.venv<span class="se">\S</span>cripts<span class="se">\A</span>ctivate.ps1
</code></pre></div></div>

<p>After activation, your terminal prompt changes:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Before activation:</span>
<span class="nv">$ </span>python <span class="nt">--version</span>
Python 3.11.0

<span class="c"># After activation:</span>
<span class="o">(</span>.venv<span class="o">)</span> <span class="nv">$ </span>python <span class="nt">--version</span>
Python 3.11.0 <span class="o">(</span>from /Users/edward/workspace/day10-testing/.venv/bin/python<span class="o">)</span>
</code></pre></div></div>

<p>That <code class="language-plaintext highlighter-rouge">(.venv)</code> prefix tells you: <strong>you are now using the isolated Python.</strong></p>

<p>To deactivate (go back to system Python):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>.venv<span class="o">)</span> <span class="nv">$ </span>deactivate
<span class="nv">$ </span>python <span class="nt">--version</span>
Python 3.11.0 <span class="o">(</span>system Python again<span class="o">)</span>
</code></pre></div></div>

<h3 id="why-venv-goes-in-gitignore">Why .venv Goes in .gitignore</h3>

<p>Your <code class="language-plaintext highlighter-rouge">.venv/</code> folder contains <strong>thousands of files</strong>. If you committed it to Git, your repo would be:</p>

<ul>
  <li>Massive (hundreds of MB or even GB)</li>
  <li>Non-portable (another person’s M1 Mac can’t use your Intel Windows <code class="language-plaintext highlighter-rouge">.venv/</code>)</li>
  <li>Unnecessary (they can recreate it by installing <code class="language-plaintext highlighter-rouge">requirements.txt</code>)</li>
</ul>

<p>So: <strong>Never commit <code class="language-plaintext highlighter-rouge">.venv/</code>.</strong> Commit only <code class="language-plaintext highlighter-rouge">requirements.txt</code>.</p>

<p>When someone clones your repo:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/edward/my-project.git
<span class="nb">cd </span>my-project

<span class="c"># Create their own venv</span>
python <span class="nt">-m</span> venv .venv
<span class="nb">source</span> .venv/bin/activate

<span class="c"># Install packages</span>
pip <span class="nb">install</span> <span class="nt">-r</span> requirements.txt
</code></pre></div></div>

<p>They get the exact same environment as you, without downloading gigabytes.</p>

<h3 id="a-brief-note-on-conda">A Brief Note on Conda</h3>

<p>On your M1 Mac, you might hear about <strong>conda.</strong> Conda is like venv’s more powerful cousin-it handles non-Python dependencies (C libraries, CUDA, even system libraries). M1 Macs especially benefit from conda because it handles ARM architecture seamlessly.</p>

<p>For now, stick with venv. It’s built into Python and sufficient for what we’re doing. Conda is useful later when you’re installing scientific libraries like TensorFlow.</p>

<hr />

<h2 id="part-2-pip-and-requirementstxt">Part 2: pip and requirements.txt</h2>

<h3 id="the-problem-version-hell">The Problem: Version Hell</h3>

<p>Let’s say you write code that uses NumPy:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>

<span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span>
</code></pre></div></div>

<p>You run <code class="language-plaintext highlighter-rouge">pip install numpy</code> (without specifying a version). You get version 1.26.0. Your code works perfectly.</p>

<p>Six months later, a colleague clones your repo and runs <code class="language-plaintext highlighter-rouge">pip install numpy</code>. NumPy is now at 2.0.0. There’s a breaking change-some API you used was removed. Suddenly your code breaks.</p>

<p><strong>You didn’t change your code. NumPy changed.</strong> And now you’re debugging at midnight.</p>

<h3 id="the-solution-exact-version-pinning">The Solution: Exact Version Pinning</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Instead of</span>
pip <span class="nb">install </span>numpy

<span class="c"># Do this</span>
pip <span class="nb">install </span><span class="nv">numpy</span><span class="o">==</span>1.24.1
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">==1.24.1</code> means: “Install version 1.24.1, exactly. Nothing else.”</p>

<p>Now if someone else installs the same requirements, they get 1.24.1 too. Same code, same dependencies, same behavior.</p>

<h3 id="creating-requirementstxt">Creating requirements.txt</h3>

<p><strong>Option 1: Freeze your current environment</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip freeze <span class="o">&gt;</span> requirements.txt
</code></pre></div></div>

<p>This command captures everything you’ve installed, with versions:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>certifi==2024.2.2
charset-normalizer==3.3.2
idna==3.6
pytest==7.4.3
pytest-cov==4.1.0
requests==2.31.0
urllib3==2.1.0
</code></pre></div></div>

<p>Any package you’ve installed appears. Some you might not have explicitly asked for-they’re dependencies of dependencies (transitive dependencies).</p>

<p><strong>Option 2: Write it by hand</strong></p>

<p>For a fresh project, just list the packages you actually need:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pytest==7.4.3
pytest-cov==4.1.0
</code></pre></div></div>

<p>Later, if you add other packages, update this file.</p>

<h3 id="installing-from-requirementstxt">Installing from requirements.txt</h3>

<p>Once you’ve created <code class="language-plaintext highlighter-rouge">requirements.txt</code>, sharing your environment is one command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install</span> <span class="nt">-r</span> requirements.txt
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">-r</code> means “read from file.” All packages with exact versions install. Reproducibility achieved.</p>

<h3 id="other-useful-pip-commands">Other Useful pip Commands</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># List what you have installed</span>
pip list

<span class="c"># Show details about a specific package</span>
pip show pytest
<span class="c"># Output:</span>
<span class="c"># Name: pytest</span>
<span class="c"># Version: 7.4.3</span>
<span class="c"># Summary: pytest: simple powerful testing with Python</span>
<span class="c"># ...</span>

<span class="c"># Uninstall a package</span>
pip uninstall numpy

<span class="c"># See what's outdated</span>
pip list <span class="nt">--outdated</span>

<span class="c"># Update a specific package</span>
pip <span class="nb">install</span> <span class="nt">--upgrade</span> pytest
<span class="c"># or</span>
pip <span class="nb">install</span> <span class="nt">-U</span> pytest
</code></pre></div></div>

<h3 id="requirementstxt-vs-pyprojecttoml">requirements.txt vs pyproject.toml</h3>

<p>Newer Python projects use <code class="language-plaintext highlighter-rouge">pyproject.toml</code> instead of <code class="language-plaintext highlighter-rouge">requirements.txt</code>. It’s more powerful and follows PEP 518 standards.</p>

<p><code class="language-plaintext highlighter-rouge">pyproject.toml</code> looks like this:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">[</span><span class="n">project</span><span class="k">]</span>
<span class="n">name</span> <span class="o">=</span><span class="w"> </span><span class="s">"my-project"</span>
<span class="n">version</span> <span class="o">=</span><span class="w"> </span><span class="s">"1.0.0"</span>
<span class="n">dependencies</span> <span class="o">=</span><span class="w"> </span><span class="p">[</span>
    <span class="s">"pytest==7.4.3"</span><span class="p">,</span>
    <span class="s">"pytest-cov==4.1.0"</span><span class="p">,</span>
<span class="p">]</span>
</code></pre></div></div>

<p>For Day 10, we’re sticking with <code class="language-plaintext highlighter-rouge">requirements.txt</code>-simpler, universally understood, and fully supported everywhere. You’ll learn <code class="language-plaintext highlighter-rouge">pyproject.toml</code> later.</p>

<hr />

<h2 id="part-3-pytest---your-new-testing-standard">Part 3: pytest - Your New Testing Standard</h2>

<h3 id="why-testing-matters-the-crash-test-dummy-analogy">Why Testing Matters: The Crash-Test Dummy Analogy</h3>

<p>Before a car ships to customers, engineers do something that seems wasteful: they crash it.</p>

<p>They strap a dummy (made of plastic and sensors) into the car. They launch it into a wall at 30 mph. The dummy experiences the impact. Sensors record everything-forces on the head, chest, legs. Engineers study the data. If the dummy’s chest took too much force, they redesign the airbags.</p>

<p>Then they crash it again. And again. They test every scenario: head-on collision, side impact, rollover. Each test teaches them something.</p>

<p>By the time the car reaches you, it’s been “crashed” hundreds of times. But those crashes were <em>tests</em>, not failures.</p>

<p><strong>Code is the same.</strong> pytest is your crash-test dummy.</p>

<h3 id="without-tests-bugs-hide-until-production">Without Tests: Bugs Hide Until Production</h3>

<p>Let me show you a real bug:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calculate_average</span><span class="p">(</span><span class="n">numbers</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Calculate the average of a list of numbers.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="nf">sum</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>

<span class="c1"># Happy path: works fine
</span><span class="n">result</span> <span class="o">=</span> <span class="nf">calculate_average</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>  <span class="c1"># 2.0
</span>
<span class="c1"># But what if someone passes an empty list?
</span><span class="n">result</span> <span class="o">=</span> <span class="nf">calculate_average</span><span class="p">([])</span>  <span class="c1"># ZeroDivisionError!
</span></code></pre></div></div>

<p>You test the happy path manually. You see it works. You ship it. A user passes an empty list. <strong>Your app crashes.</strong> You find out via a 1-star review.</p>

<p>This is <strong>not your fault personally.</strong> But it’s a bug that tests would have caught immediately.</p>

<h3 id="with-tests-the-bug-gets-caught-during-development">With Tests: The Bug Gets Caught During Development</h3>

<p>Now, with pytest:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>

<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">calculate_average</span><span class="p">(</span><span class="n">numbers</span><span class="p">:</span> <span class="nb">list</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Calculate the average of a list of numbers.

    Args:
        numbers: List of numeric values

    Returns:
        The average as a float

    Raises:
        ValueError: If the list is empty
    </span><span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">numbers</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot calculate average of empty list</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot calculate average of empty list</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Calculating average of </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span><span class="si">}</span><span class="s"> values</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="nf">sum</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>

<span class="c1"># Test it
</span><span class="k">def</span> <span class="nf">test_calculate_average_happy_path</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">Test normal case.</span><span class="sh">"""</span>
    <span class="n">result</span> <span class="o">=</span> <span class="nf">calculate_average</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
    <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">2.0</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Happy path test passed</span><span class="sh">"</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">test_calculate_average_empty_list</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">Test edge case: empty list.</span><span class="sh">"""</span>
    <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">):</span>
        <span class="nf">calculate_average</span><span class="p">([])</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Empty list edge case handled correctly</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>When you run <code class="language-plaintext highlighter-rouge">pytest -v</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tests/test_stats.py::test_calculate_average_happy_path PASSED
tests/test_stats.py::test_calculate_average_empty_list PASSED

======================== 2 passed in 0.05s ========================
</code></pre></div></div>

<p>You find the bug during development, not in production. You log it. You fix it. Users never see it. Everyone’s happy.</p>

<h3 id="test-discovery-naming-matters">Test Discovery: Naming Matters</h3>

<p>pytest finds tests automatically based on <strong>naming conventions.</strong> Follow these rules:</p>

<ol>
  <li>Test files are named <code class="language-plaintext highlighter-rouge">test_*.py</code> or <code class="language-plaintext highlighter-rouge">*_test.py</code></li>
  <li>Test functions are named <code class="language-plaintext highlighter-rouge">test_*</code></li>
  <li>Test classes are named <code class="language-plaintext highlighter-rouge">Test*</code></li>
</ol>

<p>pytest will find these:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>project/
├── test_mean.py              ✓ Starts with test_
├── tests/test_variance.py    ✓ File starts with test_
├── stats_test.py             ✓ Ends with _test
│
└── THESE GET IGNORED:
    ├── check_mean.py         ✗ Doesn't follow pattern
    ├── mean_test.py          ✗ Not test_mean_test.py
    └── tests/stats.py        ✗ Doesn't start with test_
</code></pre></div></div>

<p><strong>Break this rule, and pytest won’t find your tests.</strong></p>

<h3 id="your-first-test-the-aaa-pattern">Your First Test: The AAA Pattern</h3>

<p>The gold standard for test structure is <strong>Arrange → Act → Assert</strong>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_mean_of_positive_integers</span><span class="p">():</span>
    <span class="c1"># ARRANGE: Set up the data you're testing
</span>    <span class="n">numbers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>

    <span class="c1"># ACT: Call the function
</span>    <span class="n">result</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>

    <span class="c1"># ASSERT: Check the result
</span>    <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">3.0</span>
</code></pre></div></div>

<p>Breaking it down:</p>

<ul>
  <li><strong>Arrange:</strong> Create any inputs or objects the function needs</li>
  <li><strong>Act:</strong> Call the function</li>
  <li><strong>Assert:</strong> Verify the output is what you expect</li>
</ul>

<p>This structure makes tests easy to read and understand.</p>

<h3 id="running-tests">Running Tests</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Run all tests in the current directory</span>
pytest

<span class="c"># Run all tests, verbose (shows each test name)</span>
pytest <span class="nt">-v</span>

<span class="c"># Run only one file</span>
pytest tests/test_mean.py

<span class="c"># Run one specific test</span>
pytest tests/test_mean.py::test_mean_of_positive_integers

<span class="c"># Run tests matching a pattern</span>
pytest <span class="nt">-k</span> <span class="s2">"mean"</span>  <span class="c"># Runs test_mean_*, *_mean, etc.</span>

<span class="c"># Stop on the first failure</span>
pytest <span class="nt">-x</span>

<span class="c"># Show print statements (normally hidden)</span>
pytest <span class="nt">-s</span>

<span class="c"># Quiet mode (only show summary)</span>
pytest <span class="nt">-q</span>
</code></pre></div></div>

<p>Example output from <code class="language-plaintext highlighter-rouge">pytest -v</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tests/test_stats.py::test_mean_happy_path PASSED                        [ 10%]
tests/test_stats.py::test_mean_empty_list FAILED                       [ 20%]
tests/test_stats.py::test_mean_single_number PASSED                    [ 30%]
tests/test_stats.py::test_variance_happy_path PASSED                   [ 40%]
tests/test_stats.py::test_bayes_update_basic PASSED                    [ 50%]

======= FAILED tests/test_stats.py::test_mean_empty_list ========

def test_mean_empty_list():
    with pytest.raises(ValueError):
        mean([])

E    AssertionError: DID NOT RAISE ValueError

=========== 1 failed, 4 passed in 0.23s ============
</code></pre></div></div>

<p>This tells you: “The function didn’t raise ValueError when you expected it to. Go fix that.”</p>

<h3 id="testing-for-exceptions">Testing for Exceptions</h3>

<p>Many functions should raise exceptions for bad input. You test that they do:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_mean_raises_on_empty_list</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">The function should reject empty lists.</span><span class="sh">"""</span>
    <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">):</span>
        <span class="nf">mean</span><span class="p">([])</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">with pytest.raises(ValueError):</code> context manager says: “I expect this code to raise ValueError. If it does, the test passes. If it doesn’t, the test fails.”</p>

<p>You can even check the error message:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_mean_raises_with_correct_message</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">Check the error message too.</span><span class="sh">"""</span>
    <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">Cannot calculate mean</span><span class="sh">"</span><span class="p">):</span>
        <span class="nf">mean</span><span class="p">([])</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">match</code> takes a regex pattern. The error message must contain “Cannot calculate mean” or the test fails.</p>

<h3 id="parametrised-tests-dry-testing">Parametrised Tests: DRY Testing</h3>

<p>Parametrised tests let you test many inputs with one test function.</p>

<p><strong>Without parametrisation (repetitive):</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_mean_case_1</span><span class="p">():</span>
    <span class="k">assert</span> <span class="nf">mean</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span> <span class="o">==</span> <span class="mf">2.0</span>

<span class="k">def</span> <span class="nf">test_mean_case_2</span><span class="p">():</span>
    <span class="k">assert</span> <span class="nf">mean</span><span class="p">([</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">])</span> <span class="o">==</span> <span class="mf">15.0</span>

<span class="k">def</span> <span class="nf">test_mean_case_3</span><span class="p">():</span>
    <span class="k">assert</span> <span class="nf">mean</span><span class="p">([</span><span class="mi">5</span><span class="p">])</span> <span class="o">==</span> <span class="mf">5.0</span>

<span class="k">def</span> <span class="nf">test_mean_case_4</span><span class="p">():</span>
    <span class="k">assert</span> <span class="nf">mean</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">])</span> <span class="o">==</span> <span class="o">-</span><span class="mf">2.0</span>

<span class="c1"># ... 10 more similar tests
</span></code></pre></div></div>

<p><strong>With parametrisation (clean):</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@pytest.mark.parametrize</span><span class="p">(</span><span class="sh">"</span><span class="s">numbers,expected</span><span class="sh">"</span><span class="p">,</span> <span class="p">[</span>
    <span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="mf">2.0</span><span class="p">),</span>
    <span class="p">([</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span> <span class="mf">15.0</span><span class="p">),</span>
    <span class="p">([</span><span class="mi">5</span><span class="p">],</span> <span class="mf">5.0</span><span class="p">),</span>
    <span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">],</span> <span class="o">-</span><span class="mf">2.0</span><span class="p">),</span>
    <span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="mf">0.0</span><span class="p">),</span>
    <span class="p">([</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">,</span> <span class="mf">3.5</span><span class="p">],</span> <span class="mf">2.5</span><span class="p">),</span>
<span class="p">])</span>
<span class="k">def</span> <span class="nf">test_mean_multiple_cases</span><span class="p">(</span><span class="n">numbers</span><span class="p">,</span> <span class="n">expected</span><span class="p">):</span>
    <span class="k">assert</span> <span class="nf">mean</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span> <span class="o">==</span> <span class="n">expected</span>
</code></pre></div></div>

<p>pytest runs this one test function <strong>six times</strong>-once for each tuple. Much cleaner.</p>

<p>You can parametrize multiple arguments:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@pytest.mark.parametrize</span><span class="p">(</span><span class="sh">"</span><span class="s">numbers,alpha,expected</span><span class="sh">"</span><span class="p">,</span> <span class="p">[</span>
    <span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="mf">0.05</span><span class="p">,</span> <span class="mf">0.6666</span><span class="p">),</span>
    <span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.6666</span><span class="p">),</span>
    <span class="p">([</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">],</span> <span class="mf">0.05</span><span class="p">,</span> <span class="mf">66.6666</span><span class="p">),</span>
<span class="p">])</span>
<span class="k">def</span> <span class="nf">test_variance_with_alpha</span><span class="p">(</span><span class="n">numbers</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">expected</span><span class="p">):</span>
    <span class="n">result</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
    <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="fixtures-reusable-test-setup">Fixtures: Reusable Test Setup</h3>

<p>If many tests need the same data, use a <strong>fixture:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">pytest</span>

<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">sample_dataset</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">Fixture: provides sample data to tests that ask for it.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>

<span class="k">def</span> <span class="nf">test_mean_with_fixture</span><span class="p">(</span><span class="n">sample_dataset</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">This test receives sample_dataset automatically.</span><span class="sh">"""</span>
    <span class="n">result</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">sample_dataset</span><span class="p">)</span>
    <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">3.0</span>

<span class="k">def</span> <span class="nf">test_variance_with_fixture</span><span class="p">(</span><span class="n">sample_dataset</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Different test, same fixture.</span><span class="sh">"""</span>
    <span class="n">result</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">(</span><span class="n">sample_dataset</span><span class="p">)</span>
    <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">2.0</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">sample_dataset</code> parameter tells pytest: “I need the fixture named <code class="language-plaintext highlighter-rouge">sample_dataset</code>.” pytest calls the fixture function and injects the return value.</p>

<p><strong>Why fixtures are great:</strong></p>

<ol>
  <li><strong>DRY:</strong> Define data once, use in many tests</li>
  <li><strong>Readable:</strong> Test code is cleaner without setup clutter</li>
  <li><strong>Maintainable:</strong> Change the fixture once, all tests update</li>
  <li><strong>Shareable:</strong> Fixtures work across multiple test files</li>
</ol>

<h3 id="conftestpy-sharing-fixtures-across-test-files">conftest.py: Sharing Fixtures Across Test Files</h3>

<p>If you have multiple test files and want to share fixtures:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>project/
├── stats.py
├── tests/
    ├── conftest.py          ← Fixtures defined here
    ├── test_mean.py         ← Uses fixtures from conftest
    ├── test_variance.py     ← Uses fixtures from conftest
    └── test_bayes.py
</code></pre></div></div>

<p><strong>tests/conftest.py:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">pytest</span>

<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">simple_numbers</span><span class="p">():</span>
    <span class="k">return</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">]</span>

<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">large_numbers</span><span class="p">():</span>
    <span class="k">return</span> <span class="nf">list</span><span class="p">(</span><span class="nf">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">101</span><span class="p">))</span>

<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">negative_numbers</span><span class="p">():</span>
    <span class="k">return</span> <span class="p">[</span><span class="o">-</span><span class="mf">5.0</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">]</span>
</code></pre></div></div>

<p><strong>tests/test_mean.py:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_mean</span><span class="p">(</span><span class="n">simple_numbers</span><span class="p">):</span>
    <span class="c1"># No import needed, fixture comes from conftest.py
</span>    <span class="k">assert</span> <span class="nf">mean</span><span class="p">(</span><span class="n">simple_numbers</span><span class="p">)</span> <span class="o">==</span> <span class="mf">2.0</span>
</code></pre></div></div>

<p><strong>tests/test_variance.py:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_variance</span><span class="p">(</span><span class="n">simple_numbers</span><span class="p">):</span>
    <span class="c1"># Same fixture, different file
</span>    <span class="n">result</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">(</span><span class="n">simple_numbers</span><span class="p">)</span>
    <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="mf">2.0</span><span class="o">/</span><span class="mf">3.0</span><span class="p">)</span>
</code></pre></div></div>

<p>pytest automatically finds fixtures in <code class="language-plaintext highlighter-rouge">conftest.py</code>. No imports needed.</p>

<h3 id="mocking-testing-code-that-calls-external-services">Mocking: Testing Code That Calls External Services</h3>

<p>Here’s a brief intro (we’ll do this more on Day 14):</p>

<p>Sometimes your code calls an external API:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">requests</span>

<span class="k">def</span> <span class="nf">fetch_stock_price</span><span class="p">(</span><span class="n">ticker</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Call an external API to get stock price.</span><span class="sh">"""</span>
    <span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">https://api.example.com/price/</span><span class="si">{</span><span class="n">ticker</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">response</span><span class="p">.</span><span class="nf">json</span><span class="p">()[</span><span class="sh">"</span><span class="s">price</span><span class="sh">"</span><span class="p">]</span>
</code></pre></div></div>

<p>In tests, you <strong>don’t want to hit the real API.</strong> It’s slow, unreliable, and might cost money. Instead, you mock it:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">unittest.mock</span> <span class="kn">import</span> <span class="n">patch</span>

<span class="k">def</span> <span class="nf">test_fetch_stock_price</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">Test without calling the real API.</span><span class="sh">"""</span>
    <span class="k">with</span> <span class="nf">patch</span><span class="p">(</span><span class="sh">'</span><span class="s">requests.get</span><span class="sh">'</span><span class="p">)</span> <span class="k">as</span> <span class="n">mock_get</span><span class="p">:</span>
        <span class="c1"># Make requests.get return fake data
</span>        <span class="n">mock_get</span><span class="p">.</span><span class="n">return_value</span><span class="p">.</span><span class="n">json</span><span class="p">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">price</span><span class="sh">"</span><span class="p">:</span> <span class="mf">150.0</span><span class="p">}</span>

        <span class="n">result</span> <span class="o">=</span> <span class="nf">fetch_stock_price</span><span class="p">(</span><span class="sh">"</span><span class="s">AAPL</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">150.0</span>
</code></pre></div></div>

<p>The test never touches the real API. It’s fast (milliseconds), isolated, and reliable.</p>

<h3 id="test-coverage-proving-your-code-is-tested">Test Coverage: Proving Your Code Is Tested</h3>

<p>You write tests. But how do you know if you’ve tested <em>enough?</em></p>

<p><strong>Code coverage</strong> measures what percentage of your code is executed by tests.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>pytest-cov

pytest <span class="nt">--cov</span><span class="o">=</span>stats <span class="nt">--cov-report</span><span class="o">=</span>term-missing
</code></pre></div></div>

<p>Output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Name       Stmts   Miss  Cover   Missing
stats.py      95      5    94%    45-46, 78, 102-104
</code></pre></div></div>

<p>This says:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">stats.py</code> has 95 statements (lines of code)</li>
  <li>5 are not executed by any test (missing)</li>
  <li>Coverage is 94%</li>
  <li>Lines 45-46, 78, 102-104 are untested</li>
</ul>

<p>You then write tests for those lines until coverage is 100% (or as close as practical).</p>

<p>Aim for:</p>
<ul>
  <li><strong>Green (90%+):</strong> Excellent</li>
  <li><strong>Yellow (70-89%):</strong> Good</li>
  <li><strong>Red (&lt;70%):</strong> Needs work</li>
</ul>

<h3 id="good-test-names-be-descriptive">Good Test Names: Be Descriptive</h3>

<p>A good test name tells you exactly what’s being tested and what’s expected.</p>

<p><strong>Bad names:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_mean</span><span class="p">():</span>
    <span class="k">pass</span>

<span class="k">def</span> <span class="nf">test_variance</span><span class="p">():</span>
    <span class="k">pass</span>

<span class="k">def</span> <span class="nf">test_1</span><span class="p">():</span>
    <span class="k">pass</span>
</code></pre></div></div>

<p>When they fail, you have no idea what broke.</p>

<p><strong>Good names:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_mean_of_positive_integers_returns_correct_average</span><span class="p">():</span>
    <span class="k">pass</span>

<span class="k">def</span> <span class="nf">test_variance_of_empty_list_raises_valueerror</span><span class="p">():</span>
    <span class="k">pass</span>

<span class="k">def</span> <span class="nf">test_bayes_update_with_strong_evidence_increases_posterior</span><span class="p">():</span>
    <span class="k">pass</span>
</code></pre></div></div>

<p>When they fail, you know exactly what broke. The test name is a specification of what the function should do.</p>

<p>Pattern: <code class="language-plaintext highlighter-rouge">test_[function_name]_[condition]_[expected_result]</code></p>

<h3 id="first-principles-what-makes-a-good-test">FIRST Principles: What Makes a Good Test</h3>

<p><strong>F - Fast</strong>
Tests should run in milliseconds. If a test takes seconds, you won’t run them often. You’ll skip them. Bugs hide.</p>

<p><strong>I - Isolated</strong>
Tests shouldn’t depend on each other. If test A must run before test B, you have a problem. Tests should be independent.</p>

<p><strong>R - Repeatable</strong>
Same test, same result every time. No random variation. No flaky tests that sometimes pass and sometimes fail.</p>

<p><strong>S - Self-Validating</strong>
A test either passes or fails. No manual checking (“did this look right?”). No human judgment.</p>

<p><strong>T - Timely</strong>
Ideally, write tests before the code (Test-Driven Development). Minimum: write tests alongside the code, not months later. Fresh context makes better tests.</p>

<hr />

<h2 id="the-project-testing-the-stats-functions-from-day-9">The Project: Testing the Stats Functions from Day 9</h2>

<p>Today you’ll build a complete pytest test suite for the statistics functions from Day 9.</p>

<h3 id="project-structure">Project Structure</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>day10-testing/
├── .venv/
├── .gitignore
├── requirements.txt
├── stats.py              ← Code being tested
└── tests/
    ├── __init__.py       ← Empty file, makes tests/ a package
    ├── conftest.py       ← Shared fixtures
    └── test_stats.py     ← All tests
</code></pre></div></div>

<h3 id="statspy---the-code">stats.py - The Code</h3>

<p>Create <code class="language-plaintext highlighter-rouge">stats.py</code> with full type hints and logging:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="sh">"""</span><span class="s">
Statistics module with type hints and logging.
Day 10: Production-grade Python with testing.
</span><span class="sh">"""</span>

<span class="kn">import</span> <span class="n">logging</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">List</span><span class="p">,</span> <span class="n">Optional</span><span class="p">,</span> <span class="n">Tuple</span>

<span class="c1"># Configure logging
</span><span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span>
    <span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">,</span>
    <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">%(asctime)s - %(name)s - %(levelname)s - %(message)s</span><span class="sh">'</span>
<span class="p">)</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">mean</span><span class="p">(</span><span class="n">numbers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Calculate the arithmetic mean of a list of numbers.

    Args:
        numbers: List of floats or ints

    Returns:
        The mean as a float

    Raises:
        ValueError: If the list is empty
        TypeError: If numbers contains non-numeric values

    Example:
</span><span class="gp">        &gt;&gt;&gt;</span> <span class="nf">mean</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
        <span class="mf">2.0</span>
    <span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">numbers</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot calculate mean of empty list</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot calculate mean of empty list</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">try</span><span class="p">:</span>
        <span class="n">total</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
        <span class="n">result</span> <span class="o">=</span> <span class="n">total</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Calculated mean of </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span><span class="si">}</span><span class="s"> values: </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">result</span>
    <span class="k">except</span> <span class="nb">TypeError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Invalid data type in numbers list: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">TypeError</span><span class="p">(</span><span class="sh">"</span><span class="s">All elements must be numeric</span><span class="sh">"</span><span class="p">)</span> <span class="k">from</span> <span class="n">e</span>


<span class="k">def</span> <span class="nf">variance</span><span class="p">(</span><span class="n">numbers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Calculate the variance (average squared deviation from mean).

    Args:
        numbers: List of floats or ints

    Returns:
        The variance as a float

    Raises:
        ValueError: If the list is empty or has only one element
        TypeError: If numbers contains non-numeric values

    Example:
</span><span class="gp">        &gt;&gt;&gt;</span> <span class="nf">variance</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
        <span class="mf">0.6666666666666666</span>
    <span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">numbers</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot calculate variance of empty list</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot calculate variance of empty list</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">if</span> <span class="nf">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">2</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Variance requires at least 2 values</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">Variance requires at least 2 values (population has no variance)</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">try</span><span class="p">:</span>
        <span class="n">m</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
        <span class="n">squared_diffs</span> <span class="o">=</span> <span class="p">[(</span><span class="n">x</span> <span class="o">-</span> <span class="n">m</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">numbers</span><span class="p">]</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">squared_diffs</span><span class="p">)</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Calculated variance: </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">result</span>
    <span class="k">except</span> <span class="nb">TypeError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Invalid data type in numbers list: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">TypeError</span><span class="p">(</span><span class="sh">"</span><span class="s">All elements must be numeric</span><span class="sh">"</span><span class="p">)</span> <span class="k">from</span> <span class="n">e</span>


<span class="k">def</span> <span class="nf">std_dev</span><span class="p">(</span><span class="n">numbers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Calculate the standard deviation (square root of variance).

    Args:
        numbers: List of floats or ints

    Returns:
        The standard deviation as a float

    Raises:
        ValueError: If the list is empty or has only one element
        TypeError: If numbers contains non-numeric values

    Example:
</span><span class="gp">        &gt;&gt;&gt;</span> <span class="nf">std_dev</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
        <span class="mf">0.8164965809004287</span>
    <span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">numbers</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot calculate std dev of empty list</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot calculate std dev of empty list</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">try</span><span class="p">:</span>
        <span class="n">var</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
        <span class="n">result</span> <span class="o">=</span> <span class="n">var</span> <span class="o">**</span> <span class="mf">0.5</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Calculated std dev: </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">result</span>
    <span class="nf">except </span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="nb">TypeError</span><span class="p">)</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="k">raise</span>


<span class="k">def</span> <span class="nf">is_significant</span><span class="p">(</span><span class="n">p_value</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">alpha</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.05</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">bool</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Determine if a p-value is statistically significant.

    In plain English: Is this result unlikely to happen by chance?
    If p_value &lt; alpha, we say </span><span class="sh">"</span><span class="s">yes, this is surprising</span><span class="sh">"</span><span class="s"> (significant).

    Args:
        p_value: The p-value (must be between 0 and 1)
        alpha: The significance level (default 0.05 = 5%)

    Returns:
        True if p_value &lt; alpha, False otherwise

    Raises:
        ValueError: If p_value or alpha are outside [0, 1]

    Example:
</span><span class="gp">        &gt;&gt;&gt;</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.03</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>
        <span class="bp">True</span>
        <span class="o">&gt;&gt;&gt;</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.08</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>
        <span class="bp">False</span>
    <span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">0</span> <span class="o">&lt;=</span> <span class="n">p_value</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">):</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">p_value </span><span class="si">{</span><span class="n">p_value</span><span class="si">}</span><span class="s"> is outside [0, 1]</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">p_value must be between 0 and 1</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">0</span> <span class="o">&lt;=</span> <span class="n">alpha</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">):</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">alpha </span><span class="si">{</span><span class="n">alpha</span><span class="si">}</span><span class="s"> is outside [0, 1]</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">alpha must be between 0 and 1</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">result</span> <span class="o">=</span> <span class="n">p_value</span> <span class="o">&lt;</span> <span class="n">alpha</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">p_value=</span><span class="si">{</span><span class="n">p_value</span><span class="si">}</span><span class="s">, alpha=</span><span class="si">{</span><span class="n">alpha</span><span class="si">}</span><span class="s"> → significant=</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">result</span>


<span class="k">def</span> <span class="nf">bayes_update</span><span class="p">(</span>
    <span class="n">prior</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span>
    <span class="n">likelihood</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span>
    <span class="n">likelihood_complement</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">float</span><span class="p">]</span> <span class="o">=</span> <span class="bp">None</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Update a prior probability using Bayes</span><span class="sh">'</span><span class="s"> Theorem.

    In plain English: You had a guess (prior). You saw new evidence (likelihood).
    What should your new guess be (posterior)?

    Bayes</span><span class="sh">'</span><span class="s"> Theorem: P(A|B) = P(B|A) * P(A) / P(B)

    Args:
        prior: Your initial guess (0 to 1)
        likelihood: Probability of evidence given your guess (0 to 1)
        likelihood_complement: Probability of evidence given NOT your guess.
                              If None, assumed to be 1 - likelihood

    Returns:
        The updated probability (posterior)

    Raises:
        ValueError: If any probability is outside [0, 1]

    Example:
</span><span class="gp">        &gt;&gt;&gt;</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">)</span>
        <span class="mf">0.6428571428571429</span>
    <span class="sh">"""</span>
    <span class="c1"># Validate inputs
</span>    <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">0</span> <span class="o">&lt;=</span> <span class="n">prior</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">):</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">prior </span><span class="si">{</span><span class="n">prior</span><span class="si">}</span><span class="s"> is outside [0, 1]</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">prior must be between 0 and 1</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">0</span> <span class="o">&lt;=</span> <span class="n">likelihood</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">):</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">likelihood </span><span class="si">{</span><span class="n">likelihood</span><span class="si">}</span><span class="s"> is outside [0, 1]</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">likelihood must be between 0 and 1</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Default likelihood_complement
</span>    <span class="k">if</span> <span class="n">likelihood_complement</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
        <span class="n">likelihood_complement</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">likelihood</span>

    <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">0</span> <span class="o">&lt;=</span> <span class="n">likelihood_complement</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">):</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">likelihood_complement </span><span class="si">{</span><span class="n">likelihood_complement</span><span class="si">}</span><span class="s"> is outside [0, 1]</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">likelihood_complement must be between 0 and 1</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Bayes' Theorem
</span>    <span class="n">posterior</span> <span class="o">=</span> <span class="p">(</span>
        <span class="n">likelihood</span> <span class="o">*</span> <span class="n">prior</span> <span class="o">/</span>
        <span class="p">(</span><span class="n">likelihood</span> <span class="o">*</span> <span class="n">prior</span> <span class="o">+</span> <span class="n">likelihood_complement</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">prior</span><span class="p">))</span>
    <span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span>
        <span class="sa">f</span><span class="sh">"</span><span class="s">Bayes update: prior=</span><span class="si">{</span><span class="n">prior</span><span class="si">}</span><span class="s"> → posterior=</span><span class="si">{</span><span class="n">posterior</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span>
    <span class="p">)</span>
    <span class="k">return</span> <span class="n">posterior</span>
</code></pre></div></div>

<h3 id="testsconftestpy---shared-fixtures">tests/conftest.py - Shared Fixtures</h3>

<p>Create <code class="language-plaintext highlighter-rouge">tests/conftest.py</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="sh">"""</span><span class="s">
Pytest configuration and shared fixtures for stats tests.
</span><span class="sh">"""</span>

<span class="kn">import</span> <span class="n">pytest</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">List</span>


<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">simple_numbers</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Simple dataset for testing.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">4.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">]</span>


<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">large_numbers</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Larger dataset for testing edge cases.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="nf">list</span><span class="p">(</span><span class="nf">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">101</span><span class="p">))</span>  <span class="c1"># 1 to 100
</span>

<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">negative_numbers</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Numbers including negatives.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="p">[</span><span class="o">-</span><span class="mf">5.0</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">]</span>


<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">single_number</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Single value (edge case).</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="p">[</span><span class="mf">42.0</span><span class="p">]</span>


<span class="nd">@pytest.fixture</span>
<span class="k">def</span> <span class="nf">duplicate_numbers</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">All values the same.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="p">[</span><span class="mf">7.0</span><span class="p">,</span> <span class="mf">7.0</span><span class="p">,</span> <span class="mf">7.0</span><span class="p">,</span> <span class="mf">7.0</span><span class="p">]</span>
</code></pre></div></div>

<h3 id="teststest_statspy---complete-test-suite">tests/test_stats.py - Complete Test Suite</h3>

<p>Create <code class="language-plaintext highlighter-rouge">tests/test_stats.py</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="sh">"""</span><span class="s">
Complete test suite for stats module.
Day 10: pytest introduction with happy paths, edge cases, and parametrisation.
</span><span class="sh">"""</span>

<span class="kn">import</span> <span class="n">pytest</span>
<span class="kn">import</span> <span class="n">logging</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">List</span>
<span class="kn">from</span> <span class="n">stats</span> <span class="kn">import</span> <span class="n">mean</span><span class="p">,</span> <span class="n">variance</span><span class="p">,</span> <span class="n">std_dev</span><span class="p">,</span> <span class="n">is_significant</span><span class="p">,</span> <span class="n">bayes_update</span>

<span class="c1"># Get logger for test output
</span><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>


<span class="c1"># ============================================================================
# TESTS FOR mean()
# ============================================================================
</span>
<span class="k">class</span> <span class="nc">TestMean</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Test the mean() function.</span><span class="sh">"""</span>

    <span class="k">def</span> <span class="nf">test_mean_happy_path</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">simple_numbers</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Test normal case: positive integers.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">simple_numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">mean(simple_numbers) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">3.0</span>

    <span class="k">def</span> <span class="nf">test_mean_single_number</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">single_number</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Edge case: single value.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">single_number</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">mean(single_number) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">42.0</span>

    <span class="k">def</span> <span class="nf">test_mean_all_same</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">duplicate_numbers</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Edge case: all values identical.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">duplicate_numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">mean(duplicate_numbers) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">7.0</span>

    <span class="k">def</span> <span class="nf">test_mean_with_negatives</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">negative_numbers</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Include negative numbers.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">negative_numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">mean(negative_numbers) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_mean_empty_list_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Empty list should raise ValueError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing mean([]) raises ValueError</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">Cannot calculate mean</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">mean</span><span class="p">([])</span>

    <span class="k">def</span> <span class="nf">test_mean_with_floats</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Floats should work correctly.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">([</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">,</span> <span class="mf">3.5</span><span class="p">])</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">mean([1.5, 2.5, 3.5]) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="mf">2.5</span><span class="p">)</span>

    <span class="nd">@pytest.mark.parametrize</span><span class="p">(</span><span class="sh">"</span><span class="s">numbers,expected</span><span class="sh">"</span><span class="p">,</span> <span class="p">[</span>
        <span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="mf">2.0</span><span class="p">),</span>
        <span class="p">([</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span> <span class="mf">15.0</span><span class="p">),</span>
        <span class="p">([</span><span class="mi">5</span><span class="p">],</span> <span class="mf">5.0</span><span class="p">),</span>
        <span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">],</span> <span class="o">-</span><span class="mf">2.0</span><span class="p">),</span>
        <span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="mf">0.0</span><span class="p">),</span>
        <span class="p">([</span><span class="mf">1.1</span><span class="p">,</span> <span class="mf">2.2</span><span class="p">,</span> <span class="mf">3.3</span><span class="p">],</span> <span class="mf">2.2</span><span class="p">),</span>
    <span class="p">])</span>
    <span class="k">def</span> <span class="nf">test_mean_parametrised</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">numbers</span><span class="p">,</span> <span class="n">expected</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Parametrised: test many cases at once.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">mean(</span><span class="si">{</span><span class="n">numbers</span><span class="si">}</span><span class="s">) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_mean_non_numeric_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Non-numeric values should raise TypeError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing mean() with non-numeric values</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">TypeError</span><span class="p">):</span>
            <span class="nf">mean</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="sh">"</span><span class="s">three</span><span class="sh">"</span><span class="p">])</span>


<span class="c1"># ============================================================================
# TESTS FOR variance()
# ============================================================================
</span>
<span class="k">class</span> <span class="nc">TestVariance</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Test the variance() function.</span><span class="sh">"""</span>

    <span class="k">def</span> <span class="nf">test_variance_happy_path</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">simple_numbers</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Test normal case.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">(</span><span class="n">simple_numbers</span><span class="p">)</span>
        <span class="n">expected</span> <span class="o">=</span> <span class="mf">2.0</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">variance(simple_numbers) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_variance_two_elements</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Edge case: minimum valid input.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">])</span>
        <span class="n">expected</span> <span class="o">=</span> <span class="mf">1.0</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">variance([1.0, 3.0]) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_variance_all_same</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">duplicate_numbers</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Edge case: variance of identical values is zero.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">(</span><span class="n">duplicate_numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">variance(duplicate_numbers) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = 0.0</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_variance_empty_list_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Empty list should raise ValueError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing variance([]) raises ValueError</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">Cannot calculate variance</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">variance</span><span class="p">([])</span>

    <span class="k">def</span> <span class="nf">test_variance_single_element_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Single element is not enough for variance.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing variance([42.0]) raises ValueError</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">at least 2 values</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">variance</span><span class="p">([</span><span class="mf">42.0</span><span class="p">])</span>

    <span class="k">def</span> <span class="nf">test_variance_negative_numbers</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">negative_numbers</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Variance of numbers including negatives.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">(</span><span class="n">negative_numbers</span><span class="p">)</span>
        <span class="n">expected</span> <span class="o">=</span> <span class="mf">10.4</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">variance(negative_numbers) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>

    <span class="nd">@pytest.mark.parametrize</span><span class="p">(</span><span class="sh">"</span><span class="s">numbers,expected</span><span class="sh">"</span><span class="p">,</span> <span class="p">[</span>
        <span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="mf">2.0</span><span class="o">/</span><span class="mf">3.0</span><span class="p">),</span>
        <span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="mf">1.0</span><span class="p">),</span>
        <span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="mf">0.0</span><span class="p">),</span>
        <span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="mf">0.0</span><span class="p">),</span>
    <span class="p">])</span>
    <span class="k">def</span> <span class="nf">test_variance_parametrised</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">numbers</span><span class="p">,</span> <span class="n">expected</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Parametrised variance tests.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">variance</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">variance(</span><span class="si">{</span><span class="n">numbers</span><span class="si">}</span><span class="s">) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>


<span class="c1"># ============================================================================
# TESTS FOR std_dev()
# ============================================================================
</span>
<span class="k">class</span> <span class="nc">TestStdDev</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Test the std_dev() function.</span><span class="sh">"""</span>

    <span class="k">def</span> <span class="nf">test_std_dev_happy_path</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">simple_numbers</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Test normal case.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">std_dev</span><span class="p">(</span><span class="n">simple_numbers</span><span class="p">)</span>
        <span class="n">expected</span> <span class="o">=</span> <span class="mf">2.0</span> <span class="o">**</span> <span class="mf">0.5</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">std_dev(simple_numbers) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_std_dev_all_same</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">duplicate_numbers</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Edge case: std dev of identical values is zero.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">std_dev</span><span class="p">(</span><span class="n">duplicate_numbers</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">std_dev(duplicate_numbers) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = 0.0</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_std_dev_empty_list_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Empty list should raise ValueError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing std_dev([]) raises ValueError</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">):</span>
            <span class="nf">std_dev</span><span class="p">([])</span>

    <span class="k">def</span> <span class="nf">test_std_dev_single_element_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Single element is not enough.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing std_dev([42.0]) raises ValueError</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">):</span>
            <span class="nf">std_dev</span><span class="p">([</span><span class="mf">42.0</span><span class="p">])</span>


<span class="c1"># ============================================================================
# TESTS FOR is_significant()
# ============================================================================
</span>
<span class="k">class</span> <span class="nc">TestIsSignificant</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Test the is_significant() function.</span><span class="sh">"""</span>

    <span class="k">def</span> <span class="nf">test_significant_below_threshold</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">p-value &lt; alpha → True.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.03</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">is_significant(0.03, 0.05) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = True</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="ow">is</span> <span class="bp">True</span>

    <span class="k">def</span> <span class="nf">test_not_significant_above_threshold</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">p-value &gt; alpha → False.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.08</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">is_significant(0.08, 0.05) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = False</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="ow">is</span> <span class="bp">False</span>

    <span class="k">def</span> <span class="nf">test_boundary_equals_alpha</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">p-value == alpha → False (not strictly less).</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.05</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">is_significant(0.05, 0.05) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = False</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="ow">is</span> <span class="bp">False</span>

    <span class="k">def</span> <span class="nf">test_boundary_just_below_alpha</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Just below alpha threshold.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.049999</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">is_significant(0.049999, 0.05) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = True</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="ow">is</span> <span class="bp">True</span>

    <span class="k">def</span> <span class="nf">test_custom_alpha</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Different significance level.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.08</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.10</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">is_significant(0.08, alpha=0.10) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = True</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="ow">is</span> <span class="bp">True</span>

    <span class="k">def</span> <span class="nf">test_p_value_zero</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">p-value = 0 is always significant.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">is_significant(0.0, 0.05) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = True</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="ow">is</span> <span class="bp">True</span>

    <span class="k">def</span> <span class="nf">test_p_value_one</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">p-value = 1 is never significant.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">is_significant</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">is_significant(1.0, 0.05) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = False</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="ow">is</span> <span class="bp">False</span>

    <span class="k">def</span> <span class="nf">test_invalid_p_value_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">p-value outside [0, 1] raises ValueError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing is_significant with invalid p_value</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">p_value must be between</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">is_significant</span><span class="p">(</span><span class="o">-</span><span class="mf">0.05</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>

        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">p_value must be between</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">is_significant</span><span class="p">(</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_invalid_alpha_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">alpha outside [0, 1] raises ValueError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing is_significant with invalid alpha</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">alpha must be between</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">is_significant</span><span class="p">(</span><span class="mf">0.05</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.05</span><span class="p">)</span>

    <span class="nd">@pytest.mark.parametrize</span><span class="p">(</span><span class="sh">"</span><span class="s">p_value,alpha,expected</span><span class="sh">"</span><span class="p">,</span> <span class="p">[</span>
        <span class="p">(</span><span class="mf">0.01</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span>
        <span class="p">(</span><span class="mf">0.05</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span>
        <span class="p">(</span><span class="mf">0.10</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span>
        <span class="p">(</span><span class="mf">0.001</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span>
        <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span>
        <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span>
    <span class="p">])</span>
    <span class="k">def</span> <span class="nf">test_is_significant_parametrised</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">p_value</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">expected</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Parametrised tests for various thresholds.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">is_significant</span><span class="p">(</span><span class="n">p_value</span><span class="p">,</span> <span class="n">alpha</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">is_significant(</span><span class="si">{</span><span class="n">p_value</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">alpha</span><span class="si">}</span><span class="s">) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="ow">is</span> <span class="n">expected</span>


<span class="c1"># ============================================================================
# TESTS FOR bayes_update()
# ============================================================================
</span>
<span class="k">class</span> <span class="nc">TestBayesUpdate</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Test the bayes_update() function.</span><span class="sh">"""</span>

    <span class="k">def</span> <span class="nf">test_bayes_update_basic</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Test standard Bayes</span><span class="sh">'</span><span class="s"> Theorem calculation.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">)</span>
        <span class="n">expected</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="mf">0.9</span> <span class="o">/</span> <span class="p">(</span><span class="mf">0.5</span> <span class="o">*</span> <span class="mf">0.9</span> <span class="o">+</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="mf">0.1</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">bayes_update(0.5, 0.9) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected ≈ </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="mf">0.9</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_bayes_update_weak_prior</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Low prior gets updated strongly by evidence.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="n">prior</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">likelihood</span><span class="o">=</span><span class="mf">0.8</span><span class="p">)</span>
        <span class="n">expected</span> <span class="o">=</span> <span class="mf">0.8</span> <span class="o">*</span> <span class="mf">0.1</span> <span class="o">/</span> <span class="p">(</span><span class="mf">0.8</span> <span class="o">*</span> <span class="mf">0.1</span> <span class="o">+</span> <span class="mf">0.2</span> <span class="o">*</span> <span class="mf">0.9</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">bayes_update(0.1, 0.8) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected ≈ </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">expected</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">&gt;</span> <span class="mf">0.1</span>  <span class="c1"># Posterior &gt; prior
</span>
    <span class="k">def</span> <span class="nf">test_bayes_update_strong_prior</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">High prior stays high unless evidence is strong.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="n">prior</span><span class="o">=</span><span class="mf">0.9</span><span class="p">,</span> <span class="n">likelihood</span><span class="o">=</span><span class="mf">0.6</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">bayes_update(0.9, 0.6) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected &gt; 0.9</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">&gt;</span> <span class="mf">0.9</span>  <span class="c1"># Still high
</span>
    <span class="k">def</span> <span class="nf">test_bayes_update_explicit_complement</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Can specify likelihood_complement explicitly.</span><span class="sh">"""</span>
        <span class="n">result1</span> <span class="o">=</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">,</span> <span class="n">likelihood_complement</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
        <span class="n">result2</span> <span class="o">=</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">)</span>  <span class="c1"># Default: 1 - 0.8 = 0.2
</span>        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">explicit complement: </span><span class="si">{</span><span class="n">result1</span><span class="si">}</span><span class="s">, default complement: </span><span class="si">{</span><span class="n">result2</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result1</span> <span class="o">!=</span> <span class="n">result2</span>

    <span class="k">def</span> <span class="nf">test_bayes_update_zero_prior</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Posterior is zero if prior is zero.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">bayes_update(0.0, 0.9) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = 0.0</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">0.0</span>

    <span class="k">def</span> <span class="nf">test_bayes_update_one_prior</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Posterior stays high if prior is very high.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">bayes_update(1.0, 0.5) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s">, expected = 1.0</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">1.0</span>

    <span class="k">def</span> <span class="nf">test_bayes_update_invalid_prior_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Prior outside [0, 1] raises ValueError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing bayes_update with invalid prior</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">prior must be between</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">bayes_update</span><span class="p">(</span><span class="o">-</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">)</span>

        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">prior must be between</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_bayes_update_invalid_likelihood_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Likelihood outside [0, 1] raises ValueError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing bayes_update with invalid likelihood</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">likelihood must be between</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">test_bayes_update_invalid_complement_raises</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Complement outside [0, 1] raises ValueError.</span><span class="sh">"""</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Testing bayes_update with invalid complement</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">with</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">raises</span><span class="p">(</span><span class="nb">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="sh">"</span><span class="s">likelihood_complement must be between</span><span class="sh">"</span><span class="p">):</span>
            <span class="nf">bayes_update</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">,</span> <span class="n">likelihood_complement</span><span class="o">=</span><span class="mf">1.5</span><span class="p">)</span>

    <span class="nd">@pytest.mark.parametrize</span><span class="p">(</span><span class="sh">"</span><span class="s">prior,likelihood,expected_comparison</span><span class="sh">"</span><span class="p">,</span> <span class="p">[</span>
        <span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">,</span> <span class="sh">"</span><span class="s">greater</span><span class="sh">"</span><span class="p">),</span>   <span class="c1"># Strong evidence increases posterior
</span>        <span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">,</span> <span class="sh">"</span><span class="s">equal</span><span class="sh">"</span><span class="p">),</span>     <span class="c1"># Equal evidence keeps it steady
</span>        <span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="sh">"</span><span class="s">less</span><span class="sh">"</span><span class="p">),</span>      <span class="c1"># Weak evidence decreases posterior
</span>    <span class="p">])</span>
    <span class="k">def</span> <span class="nf">test_bayes_update_parametrised</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">prior</span><span class="p">,</span> <span class="n">likelihood</span><span class="p">,</span> <span class="n">expected_comparison</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">Parametrised Bayes update tests.</span><span class="sh">"""</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">bayes_update</span><span class="p">(</span><span class="n">prior</span><span class="p">,</span> <span class="n">likelihood</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">bayes_update(</span><span class="si">{</span><span class="n">prior</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">likelihood</span><span class="si">}</span><span class="s">) = </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

        <span class="k">if</span> <span class="n">expected_comparison</span> <span class="o">==</span> <span class="sh">"</span><span class="s">greater</span><span class="sh">"</span><span class="p">:</span>
            <span class="k">assert</span> <span class="n">result</span> <span class="o">&gt;</span> <span class="n">prior</span> <span class="ow">or</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">prior</span><span class="p">)</span>
        <span class="k">elif</span> <span class="n">expected_comparison</span> <span class="o">==</span> <span class="sh">"</span><span class="s">equal</span><span class="sh">"</span><span class="p">:</span>
            <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">prior</span><span class="p">)</span>
        <span class="k">elif</span> <span class="n">expected_comparison</span> <span class="o">==</span> <span class="sh">"</span><span class="s">less</span><span class="sh">"</span><span class="p">:</span>
            <span class="k">assert</span> <span class="n">result</span> <span class="o">&lt;</span> <span class="n">prior</span> <span class="ow">or</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="p">.</span><span class="nf">approx</span><span class="p">(</span><span class="n">prior</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="create-initpy">Create <strong>init</strong>.py</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">touch </span>tests/__init__.py
</code></pre></div></div>

<p>This makes <code class="language-plaintext highlighter-rouge">tests/</code> a Python package.</p>

<h3 id="running-the-tests">Running the Tests</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Make sure you're in the project root</span>
<span class="nb">cd</span> ~/workspace/day10-testing

<span class="c"># Make sure venv is activated</span>
<span class="nb">source</span> .venv/bin/activate

<span class="c"># Run all tests</span>
pytest <span class="nt">-v</span>

<span class="c"># Run with coverage</span>
pytest <span class="nt">--cov</span><span class="o">=</span>stats <span class="nt">--cov-report</span><span class="o">=</span>term-missing

<span class="c"># Run one test class</span>
pytest <span class="nt">-v</span> tests/test_stats.py::TestMean

<span class="c"># Run one test</span>
pytest <span class="nt">-v</span> tests/test_stats.py::TestMean::test_mean_happy_path

<span class="c"># Show logging output</span>
pytest <span class="nt">-v</span> <span class="nt">-s</span>
</code></pre></div></div>

<p>Expected output from <code class="language-plaintext highlighter-rouge">pytest -v</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tests/test_stats.py::TestMean::test_mean_happy_path PASSED                 [ 5%]
tests/test_stats.py::TestMean::test_mean_single_number PASSED              [10%]
tests/test_stats.py::TestMean::test_mean_all_same PASSED                   [15%]
tests/test_stats.py::TestMean::test_mean_with_negatives PASSED             [20%]
tests/test_stats.py::TestMean::test_mean_empty_list_raises PASSED          [25%]
tests/test_stats.py::TestMean::test_mean_with_floats PASSED                [30%]
tests/test_stats.py::TestMean::test_mean_parametrised[numbers0-expected0] PASSED [35%]
... (more tests)
tests/test_stats.py::TestBayesUpdate::test_bayes_update_parametrised[prior2-likelihood2-less] PASSED [100%]

======================== 60 passed in 0.34s ========================
</code></pre></div></div>

<p>Expected output from <code class="language-plaintext highlighter-rouge">pytest --cov=stats --cov-report=term-missing</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Name       Stmts   Miss  Cover   Missing
stats.py     145      0   100%
</code></pre></div></div>

<h2 id="congratulations-100-coverage">Congratulations! 100% coverage.</h2>
<h2 id="whats-next">What’s Next</h2>

<p>Day 11 will cover async/await and Python packaging.</p>]]></content><author><name>Edward Praveen</name></author><category term="dl-llm-systems" /><category term="pytest" /><category term="testing" /><category term="tdd" /><category term="virtual-environments" /><category term="python" /><category term="ml-engineering" /><category term="devops" /><summary type="html"><![CDATA[Part of my 180-day AI Engineering journey - explained for beginners]]></summary></entry><entry><title type="html">Day 9 of 180 - Probability &amp;amp; Statistics</title><link href="https://edwardpraveen.com/dl-llm-systems/probability-stats-day9/" rel="alternate" type="text/html" title="Day 9 of 180 - Probability &amp;amp; Statistics" /><published>2026-03-28T00:00:00+05:30</published><updated>2026-03-28T00:00:00+05:30</updated><id>https://edwardpraveen.com/dl-llm-systems/probability-stats-day9</id><content type="html" xml:base="https://edwardpraveen.com/dl-llm-systems/probability-stats-day9/"><![CDATA[<blockquote>
  <h2 id="part-of-my-180-day-ai-engineering-journey---learning-in-public-one-hour-a-day-writing-everything-in-plain-english-so-beginners-can-follow-along-the-blog-is-written-with-the-help-of-ai"><em>Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI</em></h2>
  <h2 id="introduction">Introduction</h2>
</blockquote>

<p>Here’s a uncomfortable truth: <strong>you cannot do AI without statistics.</strong></p>

<p>Every machine learning model makes predictions. Every prediction has uncertainty. Every decision about “is this model good?” requires statistical thinking. You’ll hear phrases like “p-value,” “confidence interval,” “null hypothesis” and if you don’t understand them, you’ll make expensive mistakes.</p>

<p>The good news: statistics isn’t magic. It’s applied common sense with numbers. Today, we’re building from the ground up - no assumptions about what you know.</p>

<hr />

<h2 id="setup">Setup</h2>

<p><strong>Installation:</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>numpy scipy matplotlib
</code></pre></div></div>

<p><strong>Python setup</strong> (copy this into every script today):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">scipy</span> <span class="kn">import</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="nb">tuple</span><span class="p">,</span> <span class="nb">dict</span><span class="p">,</span> <span class="nb">list</span>

<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span>
    <span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">,</span>
    <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">%(asctime)s - %(name)s - %(levelname)s - %(message)s</span><span class="sh">'</span>
<span class="p">)</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
</code></pre></div></div>

<p>We’re using <code class="language-plaintext highlighter-rouge">logging</code> instead of <code class="language-plaintext highlighter-rouge">print()</code> (Day 7’s lesson) and type hints on every function (Day 5). No exceptions, no Pydantic, no pytest (that’s tomorrow, Day 10).</p>

<hr />

<h2 id="part-1-probability-basics">Part 1: Probability Basics</h2>

<h3 id="what-is-probability-the-weather-forecast-analogy">What is Probability? (The Weather Forecast Analogy)</h3>

<p>You check your weather app. It says: <strong>“30% chance of rain tomorrow.”</strong></p>

<p>That 30% is a <strong>probability</strong>. It’s a number between 0 and 1 (or 0% and 100%) that tells you how likely something is to happen.</p>

<ul>
  <li><strong>0 = impossible</strong> → You will not randomly turn into a penguin today</li>
  <li><strong>0.5 = equally likely</strong> → Fair coin flip (heads or tails)</li>
  <li><strong>1 = certain</strong> → The Earth is round (pretty sure)</li>
</ul>

<p>In AI, probability answers questions like:</p>
<ul>
  <li>What’s the chance this email is spam?</li>
  <li>How likely is this patient to have diabetes given these symptoms?</li>
  <li>If I deploy this model, what’s the probability of a wrong prediction?</li>
</ul>

<h3 id="key-concept-sample-space">Key Concept: Sample Space</h3>

<p><strong>Sample space</strong> = all possible outcomes.</p>

<ul>
  <li>Coin flip: {heads, tails}</li>
  <li>Die roll: {1, 2, 3, 4, 5, 6}</li>
  <li>Email classification: {spam, not spam}</li>
</ul>

<h3 id="probability-notation-pa">Probability Notation: P(A)</h3>

<p>We write <strong>P(A)</strong> to mean “the probability of event A.”</p>

<p>Example:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Fair coin flip
</span><span class="nc">P</span><span class="p">(</span><span class="n">heads</span><span class="p">)</span> <span class="o">=</span> <span class="mf">0.5</span>  <span class="c1"># 50% chance
</span><span class="nc">P</span><span class="p">(</span><span class="n">tails</span><span class="p">)</span> <span class="o">=</span> <span class="mf">0.5</span>  <span class="c1"># 50% chance
</span>
<span class="c1"># Fair die roll
</span><span class="nc">P</span><span class="p">(</span><span class="n">rolling</span> <span class="n">a</span> <span class="mi">6</span><span class="p">)</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="mi">6</span> <span class="err">≈</span> <span class="mf">0.167</span>  <span class="c1"># 16.7% chance
</span></code></pre></div></div>

<h3 id="rule-1-complement---pnot-a--1---pa">Rule 1: Complement - P(not A) = 1 - P(A)</h3>

<p>The probabilities of all outcomes must add up to 1 (certainty).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">complement_rule</span><span class="p">(</span><span class="n">p_a</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">If P(A) is true, then P(not A) = 1 - P(A).</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">p_a</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(heads) = 0.5, P(not heads) = </span><span class="si">{</span><span class="nf">complement_rule</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output: P(not heads) = 0.5
</span></code></pre></div></div>

<h3 id="rule-2-independent-events---pa-and-b--pa--pb">Rule 2: Independent Events - P(A and B) = P(A) × P(B)</h3>

<p>If two events are <strong>independent</strong> (one doesn’t affect the other), multiply their probabilities.</p>

<p>Example: Two coin flips</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_heads_first</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="n">p_heads_second</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="n">p_both_heads</span> <span class="o">=</span> <span class="n">p_heads_first</span> <span class="o">*</span> <span class="n">p_heads_second</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(HH) = </span><span class="si">{</span><span class="n">p_both_heads</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output: P(HH) = 0.25
</span></code></pre></div></div>

<p>Another example: Spam detection</p>

<p>Imagine:</p>
<ul>
  <li>P(contains “free”) = 0.8</li>
  <li>P(contains “click now”) = 0.7</li>
  <li>If independent: P(both) = 0.8 × 0.7 = 0.56</li>
</ul>

<h3 id="rule-3-addition---pa-or-b--pa--pb---pa-and-b">Rule 3: Addition - P(A or B) = P(A) + P(B) - P(A and B)</h3>

<p>The <strong>or</strong> rule accounts for overlap (when both can happen).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">addition_rule</span><span class="p">(</span><span class="n">p_a</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">p_b</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">p_both</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">P(A or B) = P(A) + P(B) - P(A and B).</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">p_a</span> <span class="o">+</span> <span class="n">p_b</span> <span class="o">-</span> <span class="n">p_both</span>

<span class="c1"># Die roll: P(rolling 1 or 2)
</span><span class="n">p_1</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="mi">6</span>
<span class="n">p_2</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="mi">6</span>
<span class="n">p_both</span> <span class="o">=</span> <span class="mi">0</span>  <span class="c1"># Can't roll both 1 AND 2 at the same time
</span>
<span class="n">p_1_or_2</span> <span class="o">=</span> <span class="nf">addition_rule</span><span class="p">(</span><span class="n">p_1</span><span class="p">,</span> <span class="n">p_2</span><span class="p">,</span> <span class="n">p_both</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(1 or 2) = </span><span class="si">{</span><span class="n">p_1_or_2</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output: P(1 or 2) = 0.3333
</span></code></pre></div></div>

<h3 id="rule-4-conditional-probability---pa--b">Rule 4: Conditional Probability - P(A | B)</h3>

<table>
  <tbody>
    <tr>
      <td>**P(A</td>
      <td>B)** means “probability of A <strong>given</strong> B happened.”</td>
    </tr>
  </tbody>
</table>

<p>The vertical bar <code class="language-plaintext highlighter-rouge">|</code> is read as “given.”</p>

<p>Example: Weather and clouds</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>P(rain | clouds) = "probability of rain given we see clouds"
</code></pre></div></div>

<p><strong>Formula:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>P(A | B) = P(A and B) / P(B)
</code></pre></div></div>

<p>The denominator reduces the sample space to only cases where B happened.</p>

<p><strong>Worked example:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">conditional_probability</span><span class="p">(</span><span class="n">p_a_and_b</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">p_b</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">P(A | B) = P(A and B) / P(B).</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="n">p_b</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">P(B) cannot be zero</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="mf">0.0</span>
    <span class="k">return</span> <span class="n">p_a_and_b</span> <span class="o">/</span> <span class="n">p_b</span>

<span class="c1"># Medical test scenario
# P(positive test AND has disease) = 0.0099
# P(positive test) = 0.0101
</span><span class="n">p_disease_given_positive</span> <span class="o">=</span> <span class="nf">conditional_probability</span><span class="p">(</span><span class="mf">0.0099</span><span class="p">,</span> <span class="mf">0.0101</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(disease | positive test) = </span><span class="si">{</span><span class="n">p_disease_given_positive</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output: P(disease | positive test) ≈ 0.98 (98%)
</span></code></pre></div></div>

<h3 id="simulating-coin-flips-and-dice-rolls">Simulating Coin Flips and Dice Rolls</h3>

<p>Let’s use Python’s <code class="language-plaintext highlighter-rouge">random</code> module to simulate probability:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">random</span>

<span class="k">def</span> <span class="nf">simulate_coin_flips</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">int</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Flip a coin n times, count heads and tails.</span><span class="sh">"""</span>
    <span class="n">heads</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="mi">1</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="k">if</span> <span class="n">random</span><span class="p">.</span><span class="nf">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="mf">0.5</span><span class="p">)</span>
    <span class="n">tails</span> <span class="o">=</span> <span class="n">n</span> <span class="o">-</span> <span class="n">heads</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Simulated </span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s"> flips: </span><span class="si">{</span><span class="n">heads</span><span class="si">}</span><span class="s"> heads, </span><span class="si">{</span><span class="n">tails</span><span class="si">}</span><span class="s"> tails</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="p">{</span><span class="sh">'</span><span class="s">heads</span><span class="sh">'</span><span class="p">:</span> <span class="n">heads</span><span class="p">,</span> <span class="sh">'</span><span class="s">tails</span><span class="sh">'</span><span class="p">:</span> <span class="n">tails</span><span class="p">}</span>

<span class="k">def</span> <span class="nf">simulate_die_rolls</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Roll a die n times, count occurrences of each face.</span><span class="sh">"""</span>
    <span class="n">results</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">7</span><span class="p">)}</span>
    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
        <span class="n">roll</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="nf">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">6</span><span class="p">)</span>
        <span class="n">results</span><span class="p">[</span><span class="n">roll</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Simulated </span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s"> die rolls:</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">face</span><span class="p">,</span> <span class="n">count</span> <span class="ow">in</span> <span class="n">results</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  Face </span><span class="si">{</span><span class="n">face</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">count</span><span class="si">}</span><span class="s"> (</span><span class="si">{</span><span class="n">count</span><span class="o">/</span><span class="n">n</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">results</span>

<span class="c1"># Run simulations
</span><span class="nf">simulate_coin_flips</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span>
<span class="nf">simulate_die_rolls</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Expected output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Simulated 1000 flips: 509 heads, 491 tails
Simulated 1000 die rolls:
  Face 1: 168 (16.8%)
  Face 2: 167 (16.7%)
  Face 3: 172 (17.2%)
  ...
</code></pre></div></div>

<hr />

<h2 id="part-2-distributions">Part 2: Distributions</h2>

<h3 id="what-is-a-distribution">What is a Distribution?</h3>

<p>Imagine rolling a die 1000 times and making a histogram:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Frequency
   |     ___
   |    |   |
   |    |   |
   |    |   |
   |____|___|____
      1 2 3 ... 6
</code></pre></div></div>

<p>Each face appears roughly 1/6 of the time (≈ 167 times). That histogram is a <strong>distribution</strong> - it shows the probability of each outcome.</p>

<h3 id="uniform-distribution-everyone-gets-equal-odds">Uniform Distribution: Everyone Gets Equal Odds</h3>

<p>In a uniform distribution, every outcome is equally likely.</p>

<p><strong>Real-world examples:</strong></p>
<ul>
  <li>Die roll (each face: 1/6)</li>
  <li>Picking a random number between 0 and 100</li>
  <li>Shuffled deck of cards (each card equally likely)</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>

<span class="k">def</span> <span class="nf">plot_uniform_distribution</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Visualize uniform distribution.</span><span class="sh">"""</span>
    <span class="c1"># Generate 10,000 random samples between 0 and 100
</span>    <span class="n">samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">uniform</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">hist</span><span class="p">(</span><span class="n">samples</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Value</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Frequency</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">title</span><span class="p">(</span><span class="sh">'</span><span class="s">Uniform Distribution: 10,000 Samples from [0, 100]</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">uniform_distribution.png</span><span class="sh">'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Saved uniform distribution plot</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

<span class="nf">plot_uniform_distribution</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="normal-gaussian-distribution-the-bell-curve">Normal (Gaussian) Distribution: The Bell Curve</h3>

<p>The <strong>normal distribution</strong> is the most important distribution in statistics. It looks like a bell.</p>

<p><strong>Real-world examples:</strong></p>
<ul>
  <li>Student exam scores</li>
  <li>Adult heights</li>
  <li>IQ scores</li>
  <li>Measurement errors</li>
  <li>Stock price returns (approximately)</li>
</ul>

<p><strong>Key parameters:</strong></p>
<ul>
  <li><strong>μ (mu)</strong> = mean (center of bell)</li>
  <li><strong>σ (sigma)</strong> = standard deviation (width of bell)</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">plot_normal_distribution</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Visualize normal distribution.</span><span class="sh">"""</span>
    <span class="c1"># Generate 10,000 samples from a normal distribution
</span>    <span class="c1"># mean=100 (like IQ), std=15
</span>    <span class="n">samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">hist</span><span class="p">(</span><span class="n">samples</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>

    <span class="c1"># Overlay theoretical normal curve
</span>    <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">linspace</span><span class="p">(</span><span class="n">samples</span><span class="p">.</span><span class="nf">min</span><span class="p">(),</span> <span class="n">samples</span><span class="p">.</span><span class="nf">max</span><span class="p">(),</span> <span class="mi">100</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="nf">pdf</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">15</span><span class="p">),</span> <span class="sh">'</span><span class="s">r-</span><span class="sh">'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">Theory</span><span class="sh">'</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Value</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Density</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">title</span><span class="p">(</span><span class="sh">'</span><span class="s">Normal Distribution: μ=100, σ=15 (like IQ scores)</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">legend</span><span class="p">()</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">normal_distribution.png</span><span class="sh">'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Saved normal distribution plot</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

<span class="nf">plot_normal_distribution</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="the-68-95-997-rule-knowing-whats-normal">The 68-95-99.7 Rule: Knowing What’s Normal</h3>

<p>In a normal distribution:</p>

<ul>
  <li><strong>68%</strong> of data falls within 1 std of the mean</li>
  <li><strong>95%</strong> falls within 2 stds</li>
  <li><strong>99.7%</strong> falls within 3 stds</li>
</ul>

<p>Example: IQ scores (mean=100, std=15)</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">demonstrate_68_95_99_7_rule</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Explain the empirical rule.</span><span class="sh">"""</span>
    <span class="n">mean</span><span class="p">,</span> <span class="n">std</span> <span class="o">=</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">15</span>

    <span class="c1"># 1 std: [85, 115]
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">68% of IQ scores between </span><span class="si">{</span><span class="n">mean</span><span class="o">-</span><span class="n">std</span><span class="si">}</span><span class="s"> and </span><span class="si">{</span><span class="n">mean</span><span class="o">+</span><span class="n">std</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># 2 stds: [70, 130]
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">95% of IQ scores between </span><span class="si">{</span><span class="n">mean</span><span class="o">-</span><span class="mi">2</span><span class="o">*</span><span class="n">std</span><span class="si">}</span><span class="s"> and </span><span class="si">{</span><span class="n">mean</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="n">std</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># 3 stds: [55, 145]
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">99.7% of IQ scores between </span><span class="si">{</span><span class="n">mean</span><span class="o">-</span><span class="mi">3</span><span class="o">*</span><span class="n">std</span><span class="si">}</span><span class="s"> and </span><span class="si">{</span><span class="n">mean</span><span class="o">+</span><span class="mi">3</span><span class="o">*</span><span class="n">std</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Verify with simulation
</span>    <span class="n">samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">mean</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">std</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">100000</span><span class="p">)</span>
    <span class="n">within_1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">sum</span><span class="p">((</span><span class="n">samples</span> <span class="o">&gt;=</span> <span class="n">mean</span> <span class="o">-</span> <span class="n">std</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">samples</span> <span class="o">&lt;=</span> <span class="n">mean</span> <span class="o">+</span> <span class="n">std</span><span class="p">))</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">samples</span><span class="p">)</span>
    <span class="n">within_2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">sum</span><span class="p">((</span><span class="n">samples</span> <span class="o">&gt;=</span> <span class="n">mean</span> <span class="o">-</span> <span class="mi">2</span><span class="o">*</span><span class="n">std</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">samples</span> <span class="o">&lt;=</span> <span class="n">mean</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="n">std</span><span class="p">))</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">samples</span><span class="p">)</span>
    <span class="n">within_3</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">sum</span><span class="p">((</span><span class="n">samples</span> <span class="o">&gt;=</span> <span class="n">mean</span> <span class="o">-</span> <span class="mi">3</span><span class="o">*</span><span class="n">std</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">samples</span> <span class="o">&lt;=</span> <span class="n">mean</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="n">std</span><span class="p">))</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">samples</span><span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Simulated: </span><span class="si">{</span><span class="n">within_1</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s"> within 1 std (expected 68%)</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Simulated: </span><span class="si">{</span><span class="n">within_2</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s"> within 2 stds (expected 95%)</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Simulated: </span><span class="si">{</span><span class="n">within_3</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s"> within 3 stds (expected 99.7%)</span><span class="sh">"</span><span class="p">)</span>

<span class="nf">demonstrate_68_95_99_7_rule</span><span class="p">()</span>
</code></pre></div></div>

<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Simulated: 68.1% within 1 std (expected 68%)
Simulated: 95.2% within 2 stds (expected 95%)
Simulated: 99.7% within 3 stds (expected 99.7%)
</code></pre></div></div>

<h3 id="binomial-distribution-counting-successes">Binomial Distribution: Counting Successes</h3>

<p>The <strong>binomial distribution</strong> answers: “If I do N independent trials, each with success probability p, how many successes do I get?”</p>

<p><strong>Real-world examples:</strong></p>
<ul>
  <li>Flipping a coin 100 times, counting heads</li>
  <li>Shooting 10 basketball free throws, counting makes</li>
  <li>Spam detector running on 1000 emails, counting false positives</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">plot_binomial_distribution</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Visualize binomial distribution.</span><span class="sh">"""</span>
    <span class="c1"># Flip a fair coin 10 times, repeat 10,000 times
</span>    <span class="c1"># Count heads each time
</span>    <span class="n">samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">binomial</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">hist</span><span class="p">(</span><span class="n">samples</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">align</span><span class="o">=</span><span class="sh">'</span><span class="s">left</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Number of Heads (out of 10 flips)</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Frequency</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">title</span><span class="p">(</span><span class="sh">'</span><span class="s">Binomial Distribution: n=10 flips, p=0.5</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">binomial_distribution.png</span><span class="sh">'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Saved binomial distribution plot</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

<span class="nf">plot_binomial_distribution</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="central-limit-theorem-the-magic-of-averages">Central Limit Theorem: The Magic of Averages</h3>

<p>Here’s one of the most important theorems in statistics:</p>

<p><strong>If you take samples from ANY distribution and average them, those averages follow a normal distribution.</strong></p>

<p>This is huge because:</p>
<ol>
  <li>We don’t need to know the original distribution</li>
  <li>Normal distributions are easy to work with</li>
  <li>This is why normal distributions are everywhere</li>
</ol>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">demonstrate_central_limit_theorem</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Show that sample means become normal.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Demonstrating Central Limit Theorem...</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Original distribution: uniform (NOT normal)
</span>    <span class="n">original</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">uniform</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">100000</span><span class="p">)</span>

    <span class="c1"># Take 1000 samples of size 50, compute mean of each
</span>    <span class="n">sample_means</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span>
        <span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">choice</span><span class="p">(</span><span class="n">original</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
        <span class="n">sample_means</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">))</span>

    <span class="n">sample_means</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">(</span><span class="n">sample_means</span><span class="p">)</span>

    <span class="n">fig</span><span class="p">,</span> <span class="n">axes</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">14</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>

    <span class="c1"># Plot 1: Original distribution (uniform)
</span>    <span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">hist</span><span class="p">(</span><span class="n">original</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Original Distribution: Uniform [0, 100]</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Value</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Frequency</span><span class="sh">'</span><span class="p">)</span>

    <span class="c1"># Plot 2: Distribution of sample means (normal!)
</span>    <span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="nf">hist</span><span class="p">(</span><span class="n">sample_means</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Distribution of Sample Means (1000 trials, n=50)</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Mean Value</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Frequency</span><span class="sh">'</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">tight_layout</span><span class="p">()</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">clt_demonstration.png</span><span class="sh">'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Saved CLT demonstration plot</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Original distribution: uniform</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Sample means: μ=</span><span class="si">{</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">sample_means</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, σ=</span><span class="si">{</span><span class="n">np</span><span class="p">.</span><span class="nf">std</span><span class="p">(</span><span class="n">sample_means</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Sample means distribution looks normal!</span><span class="sh">"</span><span class="p">)</span>

<span class="nf">demonstrate_central_limit_theorem</span><span class="p">()</span>
</code></pre></div></div>

<hr />

<h2 id="part-3-descriptive-statistics">Part 3: Descriptive Statistics</h2>

<h3 id="mean-the-average">Mean: The Average</h3>

<p>The <strong>mean</strong> is the sum of all values divided by the count.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">compute_mean</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute mean from scratch.</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot compute mean of empty list</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="mf">0.0</span>
    <span class="k">return</span> <span class="nf">sum</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>

<span class="c1"># Example: exam scores
</span><span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">81</span><span class="p">]</span>
<span class="n">mean</span> <span class="o">=</span> <span class="nf">compute_mean</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Mean score: </span><span class="si">{</span><span class="n">mean</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Verify with NumPy
</span><span class="n">mean_numpy</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">NumPy mean: </span><span class="si">{</span><span class="n">mean_numpy</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>When to use mean:</strong></p>
<ul>
  <li>Symmetric data (normal distribution)</li>
  <li>No extreme outliers</li>
  <li>Example: student exam scores (usually)</li>
</ul>

<p><strong>When NOT to use:</strong></p>
<ul>
  <li>Skewed data (income, housing prices)</li>
  <li>Outliers (average salary with one billionaire)</li>
</ul>

<h3 id="median-the-middle-value">Median: The Middle Value</h3>

<p>The <strong>median</strong> is the middle value when sorted.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">compute_median</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute median from scratch.</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot compute median of empty list</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="mf">0.0</span>

    <span class="n">sorted_data</span> <span class="o">=</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="n">n</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">sorted_data</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
        <span class="c1"># Odd number of elements
</span>        <span class="k">return</span> <span class="nf">float</span><span class="p">(</span><span class="n">sorted_data</span><span class="p">[</span><span class="n">n</span> <span class="o">//</span> <span class="mi">2</span><span class="p">])</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="c1"># Even number of elements: average the two middle
</span>        <span class="n">mid1</span> <span class="o">=</span> <span class="n">sorted_data</span><span class="p">[</span><span class="n">n</span> <span class="o">//</span> <span class="mi">2</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
        <span class="n">mid2</span> <span class="o">=</span> <span class="n">sorted_data</span><span class="p">[</span><span class="n">n</span> <span class="o">//</span> <span class="mi">2</span><span class="p">]</span>
        <span class="nf">return </span><span class="p">(</span><span class="n">mid1</span> <span class="o">+</span> <span class="n">mid2</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>

<span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">81</span><span class="p">]</span>
<span class="n">median</span> <span class="o">=</span> <span class="nf">compute_median</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Median score: </span><span class="si">{</span><span class="n">median</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Verify with NumPy
</span><span class="n">median_numpy</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">median</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">NumPy median: </span><span class="si">{</span><span class="n">median_numpy</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>When to use median:</strong></p>
<ul>
  <li>Skewed data (income distribution)</li>
  <li>Outliers present</li>
  <li>Example: housing prices (median is more representative than mean)</li>
</ul>

<h3 id="mode-most-frequent-value">Mode: Most Frequent Value</h3>

<p>The <strong>mode</strong> is the value that appears most often.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">compute_mode</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Find the most frequent value.</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Cannot compute mode of empty list</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="mf">0.0</span>

    <span class="kn">from</span> <span class="n">collections</span> <span class="kn">import</span> <span class="n">Counter</span>
    <span class="n">counts</span> <span class="o">=</span> <span class="nc">Counter</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="n">most_common</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">counts</span><span class="p">.</span><span class="nf">most_common</span><span class="p">(</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
    <span class="k">return</span> <span class="nf">float</span><span class="p">(</span><span class="n">most_common</span><span class="p">)</span>

<span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">81</span><span class="p">,</span> <span class="mi">88</span><span class="p">]</span>  <span class="c1"># 88 appears twice
</span><span class="n">mode</span> <span class="o">=</span> <span class="nf">compute_mode</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Mode score: </span><span class="si">{</span><span class="n">mode</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="variance-how-spread-out-is-the-data">Variance: How Spread Out Is the Data?</h3>

<p><strong>Variance</strong> is the average squared distance from the mean.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>σ² = (Σ(x - μ)²) / n
</code></pre></div></div>

<p>Why squared? So negative differences don’t cancel out positive ones.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">compute_variance</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute variance from scratch.</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">2</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Need at least 2 data points</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="mf">0.0</span>

    <span class="n">mean</span> <span class="o">=</span> <span class="nf">compute_mean</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="n">squared_diffs</span> <span class="o">=</span> <span class="p">[(</span><span class="n">x</span> <span class="o">-</span> <span class="n">mean</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
    <span class="n">variance</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">squared_diffs</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>  <span class="c1"># -1 for sample variance
</span>    <span class="k">return</span> <span class="n">variance</span>

<span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">81</span><span class="p">]</span>
<span class="n">variance</span> <span class="o">=</span> <span class="nf">compute_variance</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Variance: </span><span class="si">{</span><span class="n">variance</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Verify with NumPy
</span><span class="n">variance_numpy</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">var</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">ddof</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">NumPy variance: </span><span class="si">{</span><span class="n">variance_numpy</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Problem:</strong> Variance is in squared units. Hard to interpret.</p>

<h3 id="standard-deviation-variances-friendly-cousin">Standard Deviation: Variance’s Friendly Cousin</h3>

<p><strong>Standard deviation</strong> is the square root of variance.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>σ = √(σ²)
</code></pre></div></div>

<p><strong>Advantage:</strong> Same units as the original data.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">compute_std</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute standard deviation from scratch.</span><span class="sh">"""</span>
    <span class="n">variance</span> <span class="o">=</span> <span class="nf">compute_variance</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">variance</span> <span class="o">**</span> <span class="mf">0.5</span>

<span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">81</span><span class="p">]</span>
<span class="n">std</span> <span class="o">=</span> <span class="nf">compute_std</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Std dev: </span><span class="si">{</span><span class="n">std</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Verify with NumPy
</span><span class="n">std_numpy</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">std</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">ddof</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">NumPy std: </span><span class="si">{</span><span class="n">std_numpy</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Interpretation:</strong> If mean=88 and std=6, most students scored between 82–94.</p>

<h3 id="percentiles-and-quartiles-dividing-the-data">Percentiles and Quartiles: Dividing the Data</h3>

<p><strong>Percentile:</strong> Value below which a certain percentage of data falls.</p>

<ul>
  <li>25th percentile (Q1): 25% of data below this</li>
  <li>50th percentile (median): 50% below</li>
  <li>75th percentile (Q3): 75% below</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">compute_percentiles</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Compute quartiles and common percentiles.</span><span class="sh">"""</span>
    <span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="k">return</span> <span class="p">{</span>
        <span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">min</span><span class="p">(</span><span class="n">arr</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">q25</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">percentile</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="mi">25</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">median</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">median</span><span class="p">(</span><span class="n">arr</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">q75</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">percentile</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="mi">75</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">max</span><span class="p">(</span><span class="n">arr</span><span class="p">)),</span>
    <span class="p">}</span>

<span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">81</span><span class="p">,</span> <span class="mi">90</span><span class="p">,</span> <span class="mi">87</span><span class="p">]</span>
<span class="n">percentiles</span> <span class="o">=</span> <span class="nf">compute_percentiles</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Percentile breakdown:</span><span class="sh">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">percentiles</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">value</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h2 id="part-4-bayes-theorem">Part 4: Bayes’ Theorem</h2>

<h3 id="the-problem-updating-beliefs-with-evidence">The Problem: Updating Beliefs with Evidence</h3>

<p>You just tested positive for a rare disease.</p>

<p>Naturally, you panic. The test is 99% accurate, so you must be 99% likely to have the disease, right?</p>

<p><strong>Wrong.</strong> This is where Bayes’ theorem saves the day.</p>

<h3 id="bayes-theorem-formula">Bayes’ Theorem Formula</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>P(A | B) = P(B | A) × P(A) / P(B)
</code></pre></div></div>

<p>Where:</p>
<ul>
  <li>
    <table>
      <tbody>
        <tr>
          <td>**P(A</td>
          <td>B)** = Posterior (what we want: probability of disease given positive test?)</td>
        </tr>
      </tbody>
    </table>
  </li>
  <li>
    <table>
      <tbody>
        <tr>
          <td>**P(B</td>
          <td>A)** = Likelihood (test accuracy: if you have disease, what’s probability of positive test?)</td>
        </tr>
      </tbody>
    </table>
  </li>
  <li><strong>P(A)</strong> = Prior (base rate: how common is the disease?)</li>
  <li><strong>P(B)</strong> = Total probability of observing B</li>
</ul>

<h3 id="medical-test-example-when-a-positive-test-isnt-good-news">Medical Test Example: When a Positive Test Isn’t Good News</h3>

<p><strong>Setup:</strong></p>
<ul>
  <li>Disease affects 1 in 10,000 people</li>
  <li>Test accuracy: 99% (both sensitivity and specificity)</li>
</ul>

<p><strong>Step 1: Prior probability (base rate)</strong></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_disease</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="mi">10000</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(disease) = </span><span class="si">{</span><span class="n">p_disease</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># 0.0001
</span></code></pre></div></div>

<p><strong>Step 2: Likelihood (test accuracy)</strong></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_positive_if_disease</span> <span class="o">=</span> <span class="mf">0.99</span>  <span class="c1"># Test catches 99% of true positives
</span><span class="n">p_positive_if_no_disease</span> <span class="o">=</span> <span class="mf">0.01</span>  <span class="c1"># Test gives 1% false positives
</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(positive | disease) = </span><span class="si">{</span><span class="n">p_positive_if_disease</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(positive | no disease) = </span><span class="si">{</span><span class="n">p_positive_if_no_disease</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Step 3: Total probability P(B)</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>P(positive) = P(positive | disease) × P(disease) + P(positive | no disease) × P(no disease)
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_no_disease</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">p_disease</span>
<span class="n">p_positive</span> <span class="o">=</span> <span class="p">(</span><span class="n">p_positive_if_disease</span> <span class="o">*</span> <span class="n">p_disease</span><span class="p">)</span> <span class="o">+</span> \
             <span class="p">(</span><span class="n">p_positive_if_no_disease</span> <span class="o">*</span> <span class="n">p_no_disease</span><span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(positive) = </span><span class="si">{</span><span class="n">p_positive</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Step 4: Bayes’ theorem</strong></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">bayes_theorem</span><span class="p">(</span><span class="n">likelihood</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">prior</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">evidence</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute posterior using Bayes</span><span class="sh">'</span><span class="s"> theorem.</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="n">evidence</span> <span class="o">&lt;=</span> <span class="mi">0</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Evidence must be positive</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="mf">0.0</span>
    <span class="nf">return </span><span class="p">(</span><span class="n">likelihood</span> <span class="o">*</span> <span class="n">prior</span><span class="p">)</span> <span class="o">/</span> <span class="n">evidence</span>

<span class="n">p_disease_given_positive</span> <span class="o">=</span> <span class="nf">bayes_theorem</span><span class="p">(</span>
    <span class="n">likelihood</span><span class="o">=</span><span class="n">p_positive_if_disease</span><span class="p">,</span>
    <span class="n">prior</span><span class="o">=</span><span class="n">p_disease</span><span class="p">,</span>
    <span class="n">evidence</span><span class="o">=</span><span class="n">p_positive</span>
<span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(disease | positive test) = </span><span class="si">{</span><span class="n">p_disease_given_positive</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">You are only </span><span class="si">{</span><span class="n">p_disease_given_positive</span> <span class="o">*</span> <span class="mi">100</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">% likely to have the disease!</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>P(disease | positive test) = 0.0099
You are only 0.99% likely to have the disease!
</code></pre></div></div>

<p><strong>Wow!</strong> Even with a positive test from a 99% accurate test, you’re less than 1% likely to have the disease. Why?</p>

<p>Because the disease is so rare. False positives (1% of healthy people) outnumber true positives (99% of the tiny number who have it).</p>

<h3 id="spam-email-classifier-pure-probability">Spam Email Classifier: Pure Probability</h3>

<p>Let’s use Bayes’ theorem to classify emails as spam or not spam, without any ML library.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SimpleSpamClassifier</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Spam detector using Bayes</span><span class="sh">'</span><span class="s"> theorem.</span><span class="sh">"""</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">prior_spam</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
        <span class="sh">"""</span><span class="s">Initialize with prior probability of spam.</span><span class="sh">"""</span>
        <span class="n">self</span><span class="p">.</span><span class="n">prior_spam</span> <span class="o">=</span> <span class="n">prior_spam</span>
        <span class="n">self</span><span class="p">.</span><span class="n">prior_ham</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">prior_spam</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Initialized: P(spam)=</span><span class="si">{</span><span class="n">prior_spam</span><span class="si">}</span><span class="s">, P(ham)=</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">prior_ham</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">classify</span><span class="p">(</span><span class="n">self</span><span class="p">,</span>
                 <span class="n">p_word_in_spam</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span>
                 <span class="n">p_word_in_ham</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">]:</span>
        <span class="sh">"""</span><span class="s">Classify based on presence of a keyword.

        Returns: (p_spam, p_ham) posteriors
        </span><span class="sh">"""</span>
        <span class="c1"># P(word | spam)
</span>        <span class="c1"># P(word | ham)
</span>        <span class="c1"># P(spam | word) = ?
</span>
        <span class="n">p_word</span> <span class="o">=</span> <span class="p">(</span><span class="n">p_word_in_spam</span> <span class="o">*</span> <span class="n">self</span><span class="p">.</span><span class="n">prior_spam</span><span class="p">)</span> <span class="o">+</span> \
                 <span class="p">(</span><span class="n">p_word_in_ham</span> <span class="o">*</span> <span class="n">self</span><span class="p">.</span><span class="n">prior_ham</span><span class="p">)</span>

        <span class="n">p_spam_given_word</span> <span class="o">=</span> <span class="p">(</span><span class="n">p_word_in_spam</span> <span class="o">*</span> <span class="n">self</span><span class="p">.</span><span class="n">prior_spam</span><span class="p">)</span> <span class="o">/</span> <span class="n">p_word</span>
        <span class="n">p_ham_given_word</span> <span class="o">=</span> <span class="p">(</span><span class="n">p_word_in_ham</span> <span class="o">*</span> <span class="n">self</span><span class="p">.</span><span class="n">prior_ham</span><span class="p">)</span> <span class="o">/</span> <span class="n">p_word</span>

        <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">P(spam | word) = </span><span class="si">{</span><span class="n">p_spam_given_word</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">p_spam_given_word</span><span class="p">,</span> <span class="n">p_ham_given_word</span>

<span class="c1"># Create classifier
# Assume 20% of emails are spam
</span><span class="n">classifier</span> <span class="o">=</span> <span class="nc">SimpleSpamClassifier</span><span class="p">(</span><span class="n">prior_spam</span><span class="o">=</span><span class="mf">0.20</span><span class="p">)</span>

<span class="c1"># Word: "free"
# 80% of spam emails contain "free"
# 10% of legitimate emails contain "free"
</span><span class="n">p_spam</span><span class="p">,</span> <span class="n">p_ham</span> <span class="o">=</span> <span class="n">classifier</span><span class="p">.</span><span class="nf">classify</span><span class="p">(</span>
    <span class="n">p_word_in_spam</span><span class="o">=</span><span class="mf">0.80</span><span class="p">,</span>
    <span class="n">p_word_in_ham</span><span class="o">=</span><span class="mf">0.10</span>
<span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Email contains </span><span class="sh">'</span><span class="s">free</span><span class="sh">'</span><span class="s">: </span><span class="si">{</span><span class="n">p_spam</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s"> spam, </span><span class="si">{</span><span class="n">p_ham</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s"> ham</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Word: "unsubscribe"
# 5% of spam emails contain "unsubscribe" (to avoid filters)
# 30% of legitimate emails contain "unsubscribe" (newsletters)
</span><span class="n">p_spam</span><span class="p">,</span> <span class="n">p_ham</span> <span class="o">=</span> <span class="n">classifier</span><span class="p">.</span><span class="nf">classify</span><span class="p">(</span>
    <span class="n">p_word_in_spam</span><span class="o">=</span><span class="mf">0.05</span><span class="p">,</span>
    <span class="n">p_word_in_ham</span><span class="o">=</span><span class="mf">0.30</span>
<span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Email contains </span><span class="sh">'</span><span class="s">unsubscribe</span><span class="sh">'</span><span class="s">: </span><span class="si">{</span><span class="n">p_spam</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s"> spam, </span><span class="si">{</span><span class="n">p_ham</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s"> ham</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Initialized: P(spam)=0.20, P(ham)=0.80
Email contains 'free': 64.0% spam, 36.0% ham
Email contains 'unsubscribe': 4.8% spam, 95.2% ham
</code></pre></div></div>

<p>See how “free” strongly suggests spam, while “unsubscribe” suggests legitimacy? That’s Bayes’ theorem at work.</p>

<hr />

<h2 id="part-5-hypothesis-testing">Part 5: Hypothesis Testing</h2>

<h3 id="the-core-question-is-this-real-or-luck">The Core Question: Is This Real or Luck?</h3>

<p>A pharmaceutical company tests a new drug on 100 patients:</p>
<ul>
  <li>Control group: mean recovery time 10 days</li>
  <li>Treatment group: mean recovery time 5 days</li>
</ul>

<p>Difference: 5 days faster.</p>

<p><strong>But is the drug actually working, or did we just get lucky?</strong></p>

<p>Hypothesis testing answers this question statistically.</p>

<h3 id="setting-up-hypotheses">Setting Up Hypotheses</h3>

<p><strong>Null Hypothesis (H₀):</strong> No effect exists</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>H₀: Drug has no effect on recovery time
</code></pre></div></div>

<p><strong>Alternative Hypothesis (H₁):</strong> Effect exists</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>H₁: Drug does affect recovery time
</code></pre></div></div>

<p>The null hypothesis is the “boring” one. We try to disprove it.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">H₀: Drug has no effect</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">H₁: Drug does have an effect</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="p-value-the-key-number">p-value: The Key Number</h3>

<p><strong>p-value</strong> = “If H₀ were true (drug does nothing), how often would we observe a difference this extreme or larger, by pure chance?”</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Example: we observe a 5-day difference
# p-value = 0.03
</span><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">p-value = 0.03</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Interpretation: 3% chance of seeing this data if the drug is useless</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Critical misconceptions:</strong></p>

<p>❌ WRONG: “p-value = 0.03 means 97% chance the drug works”
✅ RIGHT: “p-value = 0.03 means 3% chance we’d see this data if the drug doesn’t work”</p>

<h3 id="significance-level-α-the-threshold">Significance Level (α): The Threshold</h3>

<p>We choose a significance level, typically <strong>α = 0.05</strong>.</p>

<p><strong>Decision rule:</strong></p>
<ul>
  <li>If p-value &lt; 0.05: reject H₀ (claim the drug works)</li>
  <li>If p-value ≥ 0.05: fail to reject H₀ (can’t claim it works)</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_value</span> <span class="o">=</span> <span class="mf">0.03</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.05</span>

<span class="k">if</span> <span class="n">p_value</span> <span class="o">&lt;</span> <span class="n">alpha</span><span class="p">:</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">✓ Reject H₀: The effect is statistically significant</span><span class="sh">"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">✗ Fail to reject H₀: No significant effect detected</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="type-i-and-type-ii-errors">Type I and Type II Errors</h3>

<p>Every hypothesis test can make two kinds of mistakes:</p>

<p><strong>Type I Error (False Positive):</strong></p>
<ul>
  <li>We reject H₀, but H₀ is actually true</li>
  <li>We claim the drug works, but it doesn’t</li>
  <li>Probability: α (usually 0.05)</li>
</ul>

<p><strong>Type II Error (False Negative):</strong></p>
<ul>
  <li>We fail to reject H₀, but H₀ is actually false</li>
  <li>We claim the drug doesn’t work, but it does</li>
  <li>Probability: β (usually 0.20)</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Type I Error (False Positive):</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">  Claim drug works when it doesn</span><span class="sh">'</span><span class="s">t</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">  Probability: α = 0.05</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">""</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Type II Error (False Negative):</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">  Claim drug doesn</span><span class="sh">'</span><span class="s">t work when it does</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">  Probability: β ≈ 0.20 (related to statistical power)</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="t-test-comparing-two-group-means">t-test: Comparing Two Group Means</h3>

<p>The <strong>t-test</strong> compares means of two groups.</p>

<p><strong>Question:</strong> Are the treatment and control groups significantly different?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">scipy.stats</span> <span class="kn">import</span> <span class="n">ttest_ind</span>

<span class="c1"># Generate synthetic data
</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
<span class="n">control</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>  <span class="c1"># mean=10 days
</span><span class="n">treatment</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>  <span class="c1"># mean=5 days
</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Control group: mean=</span><span class="si">{</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">control</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, std=</span><span class="si">{</span><span class="n">np</span><span class="p">.</span><span class="nf">std</span><span class="p">(</span><span class="n">control</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Treatment group: mean=</span><span class="si">{</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">treatment</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, std=</span><span class="si">{</span><span class="n">np</span><span class="p">.</span><span class="nf">std</span><span class="p">(</span><span class="n">treatment</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Run t-test
</span><span class="n">t_statistic</span><span class="p">,</span> <span class="n">p_value</span> <span class="o">=</span> <span class="nf">ttest_ind</span><span class="p">(</span><span class="n">control</span><span class="p">,</span> <span class="n">treatment</span><span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">t-statistic: </span><span class="si">{</span><span class="n">t_statistic</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">p-value: </span><span class="si">{</span><span class="n">p_value</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Decision
</span><span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.05</span>
<span class="k">if</span> <span class="n">p_value</span> <span class="o">&lt;</span> <span class="n">alpha</span><span class="p">:</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ Reject H₀ (p=</span><span class="si">{</span><span class="n">p_value</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> &lt; α=</span><span class="si">{</span><span class="n">alpha</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">  The groups are significantly different</span><span class="sh">"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✗ Fail to reject H₀ (p=</span><span class="si">{</span><span class="n">p_value</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> ≥ α=</span><span class="si">{</span><span class="n">alpha</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">  No significant difference detected</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Control group: mean=10.12, std=2.05
Treatment group: mean=5.08, std=1.87
t-statistic: 12.7832
p-value: 0.000000
✓ Reject H₀ (p=0.000000 &lt; α=0.05)
  The groups are significantly different
</code></pre></div></div>

<hr />

<h2 id="part-6-confidence-intervals">Part 6: Confidence Intervals</h2>

<h3 id="what-is-a-95-confidence-interval">What is a 95% Confidence Interval?</h3>

<p><strong>Common misconception:</strong>
❌ “There’s a 95% probability the true mean is in this interval”</p>

<p><strong>Correct interpretation:</strong>
✅ “If we repeated this experiment 100 times, 95 of those experiments would produce intervals containing the true mean”</p>

<p>The interval is random (depends on our sample). The true mean is fixed.</p>

<h3 id="computing-a-confidence-interval">Computing a Confidence Interval</h3>

<p><strong>Formula:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CI = mean ± (critical value × standard error)
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">scipy</span> <span class="kn">import</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>

<span class="k">def</span> <span class="nf">compute_confidence_interval</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span>
                                 <span class="n">confidence</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.95</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Compute confidence interval for the mean.</span><span class="sh">"""</span>
    <span class="n">n</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="n">mean</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>

    <span class="c1"># Standard error of the mean
</span>    <span class="n">sem</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="nf">sem</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>  <span class="c1"># = std / sqrt(n)
</span>
    <span class="c1"># t-critical value (use t-distribution because sample is small-ish)
</span>    <span class="n">df</span> <span class="o">=</span> <span class="n">n</span> <span class="o">-</span> <span class="mi">1</span>
    <span class="n">alpha</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">confidence</span>
    <span class="n">t_crit</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="nf">ppf</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">alpha</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">df</span><span class="p">)</span>

    <span class="c1"># Margin of error
</span>    <span class="n">margin</span> <span class="o">=</span> <span class="n">t_crit</span> <span class="o">*</span> <span class="n">sem</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">n=</span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s">, mean=</span><span class="si">{</span><span class="n">mean</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, sem=</span><span class="si">{</span><span class="n">sem</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">, margin=</span><span class="si">{</span><span class="n">margin</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="nf">return </span><span class="p">(</span><span class="n">mean</span> <span class="o">-</span> <span class="n">margin</span><span class="p">,</span> <span class="n">mean</span> <span class="o">+</span> <span class="n">margin</span><span class="p">)</span>

<span class="c1"># Example: exam scores
</span><span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">81</span><span class="p">,</span> <span class="mi">90</span><span class="p">,</span> <span class="mi">87</span><span class="p">,</span> <span class="mi">83</span><span class="p">,</span> <span class="mi">89</span><span class="p">])</span>

<span class="n">lower</span><span class="p">,</span> <span class="n">upper</span> <span class="o">=</span> <span class="nf">compute_confidence_interval</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Sample mean: </span><span class="si">{</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">95% Confidence Interval: [</span><span class="si">{</span><span class="n">lower</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">upper</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">]</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Interpretation: We</span><span class="sh">'</span><span class="s">re 95% confident the true population mean</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  is between </span><span class="si">{</span><span class="n">lower</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> and </span><span class="si">{</span><span class="n">upper</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Sample mean: 86.80
95% Confidence Interval: [82.47, 91.13]
</code></pre></div></div>

<h3 id="visualizing-confidence-intervals">Visualizing Confidence Intervals</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">plot_confidence_intervals</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Visualize CIs for two groups.</span><span class="sh">"""</span>
    <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>

    <span class="c1"># Generate data
</span>    <span class="n">group1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">80</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>
    <span class="n">group2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">85</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>

    <span class="c1"># Compute CIs
</span>    <span class="n">ci1</span> <span class="o">=</span> <span class="nf">compute_confidence_interval</span><span class="p">(</span><span class="n">group1</span><span class="p">)</span>
    <span class="n">ci2</span> <span class="o">=</span> <span class="nf">compute_confidence_interval</span><span class="p">(</span><span class="n">group2</span><span class="p">)</span>

    <span class="n">mean1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">group1</span><span class="p">)</span>
    <span class="n">mean2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">group2</span><span class="p">)</span>

    <span class="c1"># Plot
</span>    <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>

    <span class="n">groups</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">Group 1</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Group 2</span><span class="sh">'</span><span class="p">]</span>
    <span class="n">means</span> <span class="o">=</span> <span class="p">[</span><span class="n">mean1</span><span class="p">,</span> <span class="n">mean2</span><span class="p">]</span>
    <span class="n">errors</span> <span class="o">=</span> <span class="p">[</span>
        <span class="p">[</span><span class="n">mean1</span> <span class="o">-</span> <span class="n">ci1</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">mean2</span> <span class="o">-</span> <span class="n">ci2</span><span class="p">[</span><span class="mi">0</span><span class="p">]],</span>
        <span class="p">[</span><span class="n">ci1</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">mean1</span><span class="p">,</span> <span class="n">ci2</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">mean2</span><span class="p">]</span>
    <span class="p">]</span>

    <span class="n">ax</span><span class="p">.</span><span class="nf">errorbar</span><span class="p">(</span><span class="n">groups</span><span class="p">,</span> <span class="n">means</span><span class="p">,</span> <span class="n">yerr</span><span class="o">=</span><span class="n">errors</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="sh">'</span><span class="s">o</span><span class="sh">'</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
                <span class="n">capsize</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">capthick</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>

    <span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Mean Score</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">95% Confidence Intervals for Mean Scores</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="sh">'</span><span class="s">y</span><span class="sh">'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">tight_layout</span><span class="p">()</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">confidence_intervals.png</span><span class="sh">'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Saved confidence interval plot</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

<span class="nf">plot_confidence_intervals</span><span class="p">()</span>
</code></pre></div></div>

<hr />

<h2 id="the-project-student-score-statistical-analysis">The Project: Student Score Statistical Analysis</h2>

<p>Let’s bring it all together. We’ll:</p>
<ol>
  <li>Generate synthetic exam score data</li>
  <li>Compute descriptive statistics</li>
  <li>Plot distributions</li>
  <li>Run a t-test comparing two groups</li>
  <li>Compute confidence intervals</li>
</ol>

<p><strong>Complete script:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">scipy</span> <span class="kn">import</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="nb">dict</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">,</span> <span class="nb">list</span>

<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span>
    <span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">,</span>
    <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">%(asctime)s - %(name)s - %(levelname)s - %(message)s</span><span class="sh">'</span>
<span class="p">)</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>

<span class="c1"># ============================================================================
# DATA GENERATION
# ============================================================================
</span>
<span class="k">def</span> <span class="nf">generate_exam_scores</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="nb">tuple</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Generate synthetic exam scores for study vs no-study groups.</span><span class="sh">"""</span>
    <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>

    <span class="c1"># Students who studied: higher mean (78), lower variance
</span>    <span class="n">study_group</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">78</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>

    <span class="c1"># Students who didn't study: lower mean (68), higher variance
</span>    <span class="n">no_study_group</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">68</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Generated </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">study_group</span><span class="p">)</span><span class="si">}</span><span class="s"> study group scores</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Generated </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">no_study_group</span><span class="p">)</span><span class="si">}</span><span class="s"> no-study group scores</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">study_group</span><span class="p">,</span> <span class="n">no_study_group</span>

<span class="c1"># ============================================================================
# DESCRIPTIVE STATISTICS
# ============================================================================
</span>
<span class="k">def</span> <span class="nf">compute_stats</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Compute 7-point summary statistics.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Computing stats for </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="si">}</span><span class="s"> values</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">return</span> <span class="p">{</span>
        <span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">median</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">median</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">std</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">std</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">ddof</span><span class="o">=</span><span class="mi">1</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">min</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">max</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">q25</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">percentile</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="mi">25</span><span class="p">)),</span>
        <span class="sh">'</span><span class="s">q75</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">percentile</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="mi">75</span><span class="p">)),</span>
    <span class="p">}</span>

<span class="k">def</span> <span class="nf">log_stats</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">stats_dict</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Log statistics nicely.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s"> statistics:</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">stats_dict</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  </span><span class="si">{</span><span class="n">key</span><span class="si">:</span><span class="mi">8</span><span class="n">s</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">value</span><span class="si">:</span><span class="mf">7.2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># ============================================================================
# HYPOTHESIS TESTING: t-test
# ============================================================================
</span>
<span class="k">def</span> <span class="nf">run_t_test</span><span class="p">(</span><span class="n">group1</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">group2</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Run independent samples t-test.</span><span class="sh">"""</span>
    <span class="n">t_stat</span><span class="p">,</span> <span class="n">p_value</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="nf">ttest_ind</span><span class="p">(</span><span class="n">group1</span><span class="p">,</span> <span class="n">group2</span><span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">t-test results:</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  t-statistic: </span><span class="si">{</span><span class="n">t_stat</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  p-value:     </span><span class="si">{</span><span class="n">p_value</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">p_value</span> <span class="o">&lt;</span> <span class="mf">0.05</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">  ✓ Statistically significant difference (p &lt; 0.05)</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">  ✗ No statistically significant difference (p &gt;= 0.05)</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">t_stat</span><span class="p">,</span> <span class="n">p_value</span>

<span class="c1"># ============================================================================
# CONFIDENCE INTERVALS
# ============================================================================
</span>
<span class="k">def</span> <span class="nf">compute_ci</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">confidence</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.95</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Compute confidence interval for the mean.</span><span class="sh">"""</span>
    <span class="n">mean</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="n">sem</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="nf">sem</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="n">df</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>

    <span class="n">alpha</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">confidence</span>
    <span class="n">t_crit</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="nf">ppf</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">alpha</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">df</span><span class="p">)</span>
    <span class="n">margin</span> <span class="o">=</span> <span class="n">t_crit</span> <span class="o">*</span> <span class="n">sem</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">CI: mean=</span><span class="si">{</span><span class="n">mean</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, sem=</span><span class="si">{</span><span class="n">sem</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">, margin=</span><span class="si">{</span><span class="n">margin</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="nf">return </span><span class="p">(</span><span class="n">mean</span> <span class="o">-</span> <span class="n">margin</span><span class="p">,</span> <span class="n">mean</span> <span class="o">+</span> <span class="n">margin</span><span class="p">)</span>

<span class="c1"># ============================================================================
# VISUALIZATION
# ============================================================================
</span>
<span class="k">def</span> <span class="nf">plot_score_distributions</span><span class="p">(</span><span class="n">study</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">no_study</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Plot overlaid histograms of both groups.</span><span class="sh">"""</span>
    <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>

    <span class="c1"># Histograms
</span>    <span class="n">ax</span><span class="p">.</span><span class="nf">hist</span><span class="p">(</span><span class="n">study</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.6</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">Study Group</span><span class="sh">'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">green</span><span class="sh">'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">hist</span><span class="p">(</span><span class="n">no_study</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.6</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">No-Study Group</span><span class="sh">'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">red</span><span class="sh">'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">)</span>

    <span class="n">ax</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Exam Score</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Frequency</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Distribution of Exam Scores by Study Status</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">legend</span><span class="p">()</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">tight_layout</span><span class="p">()</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">output/plots/score_distribution.png</span><span class="sh">'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Saved score distribution plot</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

<span class="k">def</span> <span class="nf">plot_ci_comparison</span><span class="p">(</span><span class="n">study</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">no_study</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span>
                       <span class="n">ci_study</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">],</span>
                       <span class="n">ci_no_study</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Plot confidence intervals for both groups.</span><span class="sh">"""</span>
    <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>

    <span class="n">groups</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">Study Group</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">No-Study Group</span><span class="sh">'</span><span class="p">]</span>
    <span class="n">means</span> <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">study</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">no_study</span><span class="p">)]</span>

    <span class="n">errors</span> <span class="o">=</span> <span class="p">[</span>
        <span class="p">[</span><span class="n">means</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">ci_study</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">means</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">ci_no_study</span><span class="p">[</span><span class="mi">0</span><span class="p">]],</span>
        <span class="p">[</span><span class="n">ci_study</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">means</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">ci_no_study</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">means</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span>
    <span class="p">]</span>

    <span class="n">ax</span><span class="p">.</span><span class="nf">errorbar</span><span class="p">(</span><span class="n">groups</span><span class="p">,</span> <span class="n">means</span><span class="p">,</span> <span class="n">yerr</span><span class="o">=</span><span class="n">errors</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="sh">'</span><span class="s">o</span><span class="sh">'</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">12</span><span class="p">,</span>
                <span class="n">capsize</span><span class="o">=</span><span class="mi">12</span><span class="p">,</span> <span class="n">capthick</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">blue</span><span class="sh">'</span><span class="p">)</span>

    <span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Mean Score</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">95% Confidence Intervals for Mean Exam Scores</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="sh">'</span><span class="s">y</span><span class="sh">'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>

    <span class="c1"># Add value labels
</span>    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">group</span><span class="p">,</span> <span class="n">mean</span><span class="p">,</span> <span class="n">ci</span><span class="p">)</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="nf">zip</span><span class="p">(</span><span class="n">groups</span><span class="p">,</span> <span class="n">means</span><span class="p">,</span> <span class="p">[</span><span class="n">ci_study</span><span class="p">,</span> <span class="n">ci_no_study</span><span class="p">])):</span>
        <span class="n">ax</span><span class="p">.</span><span class="nf">text</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">mean</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="sa">f</span><span class="sh">'</span><span class="si">{</span><span class="n">mean</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="n">f</span><span class="si">}</span><span class="se">\n</span><span class="s">[</span><span class="si">{</span><span class="n">ci</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="n">f</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">ci</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="n">f</span><span class="si">}</span><span class="s">]</span><span class="sh">'</span><span class="p">,</span>
                <span class="n">ha</span><span class="o">=</span><span class="sh">'</span><span class="s">center</span><span class="sh">'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">tight_layout</span><span class="p">()</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">output/plots/ci_comparison.png</span><span class="sh">'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Saved confidence interval comparison plot</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

<span class="c1"># ============================================================================
# MAIN ANALYSIS
# ============================================================================
</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Run complete statistical analysis.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Starting exam score analysis...</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Generate data
</span>    <span class="n">study_group</span><span class="p">,</span> <span class="n">no_study_group</span> <span class="o">=</span> <span class="nf">generate_exam_scores</span><span class="p">()</span>

    <span class="c1"># Descriptive statistics
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">""</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">DESCRIPTIVE STATISTICS</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>

    <span class="n">study_stats</span> <span class="o">=</span> <span class="nf">compute_stats</span><span class="p">(</span><span class="n">study_group</span><span class="p">)</span>
    <span class="n">no_study_stats</span> <span class="o">=</span> <span class="nf">compute_stats</span><span class="p">(</span><span class="n">no_study_group</span><span class="p">)</span>

    <span class="nf">log_stats</span><span class="p">(</span><span class="sh">"</span><span class="s">Study Group</span><span class="sh">"</span><span class="p">,</span> <span class="n">study_stats</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">""</span><span class="p">)</span>
    <span class="nf">log_stats</span><span class="p">(</span><span class="sh">"</span><span class="s">No-Study Group</span><span class="sh">"</span><span class="p">,</span> <span class="n">no_study_stats</span><span class="p">)</span>

    <span class="c1"># Hypothesis test
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">""</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">HYPOTHESIS TESTING (t-test)</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>

    <span class="n">t_stat</span><span class="p">,</span> <span class="n">p_value</span> <span class="o">=</span> <span class="nf">run_t_test</span><span class="p">(</span><span class="n">study_group</span><span class="p">,</span> <span class="n">no_study_group</span><span class="p">)</span>

    <span class="c1"># Confidence intervals
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">""</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">CONFIDENCE INTERVALS (95%)</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>

    <span class="n">ci_study</span> <span class="o">=</span> <span class="nf">compute_ci</span><span class="p">(</span><span class="n">study_group</span><span class="p">)</span>
    <span class="n">ci_no_study</span> <span class="o">=</span> <span class="nf">compute_ci</span><span class="p">(</span><span class="n">no_study_group</span><span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Study Group CI:     [</span><span class="si">{</span><span class="n">ci_study</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">ci_study</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">]</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">No-Study Group CI:  [</span><span class="si">{</span><span class="n">ci_no_study</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">ci_no_study</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">]</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Visualization
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">""</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">CREATING VISUALIZATIONS</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>

    <span class="nf">plot_score_distributions</span><span class="p">(</span><span class="n">study_group</span><span class="p">,</span> <span class="n">no_study_group</span><span class="p">)</span>
    <span class="nf">plot_ci_comparison</span><span class="p">(</span><span class="n">study_group</span><span class="p">,</span> <span class="n">no_study_group</span><span class="p">,</span> <span class="n">ci_study</span><span class="p">,</span> <span class="n">ci_no_study</span><span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">""</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">ANALYSIS COMPLETE</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>

<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">'</span><span class="s">__main__</span><span class="sh">'</span><span class="p">:</span>
    <span class="nf">main</span><span class="p">()</span>
</code></pre></div></div>

<p><strong>Expected output:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2026-03-28 10:15:30,123 - __main__ - INFO - Starting exam score analysis...
2026-03-28 10:15:30,124 - __main__ - INFO - Generated 50 study group scores
2026-03-28 10:15:30,124 - __main__ - INFO - Generated 50 no-study group scores

============================================================
DESCRIPTIVE STATISTICS
============================================================
2026-03-28 10:15:30,125 - __main__ - INFO - Study Group statistics:
2026-03-28 10:15:30,125 - __main__ - INFO -   mean    :   78.41
2026-03-28 10:15:30,125 - __main__ - INFO -   median  :   78.27
2026-03-28 10:15:30,125 - __main__ - INFO -   std     :    8.12
2026-03-28 10:15:30,125 - __main__ - INFO -   min     :   60.53
2026-03-28 10:15:30,125 - __main__ - INFO -   max     :   95.18
2026-03-28 10:15:30,125 - __main__ - INFO -   q25     :   71.98
2026-03-28 10:15:30,125 - __main__ - INFO -   q75     :   84.87

2026-03-28 10:15:30,125 - __main__ - INFO - No-Study Group statistics:
2026-03-28 10:15:30,125 - __main__ - INFO -   mean    :   68.37
2026-03-28 10:15:30,125 - __main__ - INFO -   median  :   68.15
2026-03-28 10:15:30,125 - __main__ - INFO -   std     :    9.87
2026-03-28 10:15:30,125 - __main__ - INFO -   min     :   46.82
2026-03-28 10:15:30,125 - __main__ - INFO -   max     :   92.33
2026-03-28 10:15:30,125 - __main__ - INFO -   q25     :   61.45
2026-03-28 10:15:30,125 - __main__ - INFO -   q75     :   76.28

============================================================
HYPOTHESIS TESTING (t-test)
============================================================
2026-03-28 10:15:30,126 - __main__ - INFO - t-test results:
2026-03-28 10:15:30,126 - __main__ - INFO -   t-statistic: 4.8732
2026-03-28 10:15:30,126 - __main__ - INFO -   p-value:     0.000005
2026-03-28 10:15:30,126 - __main__ - INFO -   ✓ Statistically significant difference (p &lt; 0.05)

============================================================
CONFIDENCE INTERVALS (95%)
============================================================
2026-03-28 10:15:30,127 - __main__ - INFO - Study Group CI:     [75.42, 81.40]
2026-03-28 10:15:30,127 - __main__ - INFO - No-Study Group CI:  [65.10, 71.64]

============================================================
CREATING VISUALIZATIONS
============================================================
2026-03-28 10:15:30,200 - __main__ - INFO - Saved score distribution plot
2026-03-28 10:15:30,250 - __main__ - INFO - Saved confidence interval comparison plot

============================================================
ANALYSIS COMPLETE
============================================================
</code></pre></div></div>

<hr />

<h2 id="whats-next">What’s Next</h2>

<p><strong>Day 10: pytest - Introduction to Automated Testing</strong></p>

<p>pytest is the Python standard for testing. We’ll write:</p>
<ul>
  <li>Unit tests for statistical functions</li>
  <li>Test fixtures for reusable test data</li>
  <li>Parametrized tests to test multiple inputs</li>
  <li>Tests for edge cases (empty lists, negative values, etc.)</li>
</ul>]]></content><author><name>Edward Praveen</name></author><category term="dl-llm-systems" /><category term="probability" /><category term="statistics" /><category term="bayesian" /><category term="ml-math" /><category term="data-science" /><category term="deep-learning" /><summary type="html"><![CDATA[Part of my 180-day AI Engineering journey - explained for beginners]]></summary></entry><entry><title type="html">Day 8 of 180 - Linear Algebra &amp;amp; Calculus</title><link href="https://edwardpraveen.com/dl-llm-systems/linear-algebra-calculus-day8/" rel="alternate" type="text/html" title="Day 8 of 180 - Linear Algebra &amp;amp; Calculus" /><published>2026-03-27T00:00:00+05:30</published><updated>2026-03-27T00:00:00+05:30</updated><id>https://edwardpraveen.com/dl-llm-systems/linear-algebra-calculus-day8</id><content type="html" xml:base="https://edwardpraveen.com/dl-llm-systems/linear-algebra-calculus-day8/"><![CDATA[<blockquote>
  <h2 id="part-of-my-180-day-ai-engineering-journey---learning-in-public-one-hour-a-day-writing-everything-in-plain-english-so-beginners-can-follow-along-the-blog-is-written-with-the-help-of-ai"><em>Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI</em></h2>
</blockquote>

<h2 id="introduction-why-does-a-coder-need-this-math">Introduction: Why Does a Coder Need This Math?</h2>

<p>You might think: “I can build AI models with scikit-learn or PyTorch. Why learn linear algebra and calculus?”</p>

<p>Here’s the truth: <strong>Every AI model is math.</strong> When you train a neural network, you’re:</p>
<ol>
  <li>Multiplying matrices (linear algebra)</li>
  <li>Computing slopes and rates of change (calculus)</li>
  <li>Taking tiny steps downhill to minimize error (gradient descent)</li>
</ol>

<p>Without understanding these concepts, you’ll treat AI like a black box. You won’t know when something works, why it fails, or how to debug it.</p>

<p>Today, we’re building from scratch. No frameworks. Just Python, NumPy, and your brain.</p>

<hr />

<h2 id="setup">Setup</h2>

<h3 id="install-dependencies">Install Dependencies</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span><span class="nv">numpy</span><span class="o">==</span>1.26.4 <span class="nv">matplotlib</span><span class="o">==</span>3.8.2
</code></pre></div></div>

<h3 id="create-your-working-directory">Create Your Working Directory</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> <span class="nt">-p</span> ~/ai-journey/day-8
<span class="nb">cd</span> ~/ai-journey/day-8
</code></pre></div></div>

<p>All code files go in this directory.</p>

<hr />

<h2 id="part-1-linear-algebra---the-language-of-ai">Part 1: Linear Algebra - The Language of AI</h2>

<h3 id="scalars-vectors-matrices">Scalars, Vectors, Matrices</h3>

<p>Let’s start simple.</p>

<p><strong>Scalar:</strong> Just a number.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = 5
</code></pre></div></div>

<p><strong>Vector:</strong> A list of numbers, usually arranged vertically (a column). Think of it as a point or direction in space.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>v = [1, 2, 3]  (a point in 3D space)
</code></pre></div></div>

<p><strong>Matrix:</strong> A grid of numbers. Multiple vectors stacked together.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>     [1  2  3]  (2×3 matrix: 2 rows, 3 columns)
A =  [4  5  6]
</code></pre></div></div>

<h3 id="vector-operations">Vector Operations</h3>

<h4 id="addition-and-subtraction">Addition and Subtraction</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[1, 2, 3] + [4, 5, 6] = [5, 7, 9]   (add element-by-element)
[4, 5, 6] - [1, 2, 3] = [3, 3, 3]
</code></pre></div></div>

<h4 id="scalar-multiplication">Scalar Multiplication</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>3 × [2, 4, 6] = [6, 12, 18]
</code></pre></div></div>

<h4 id="vector-magnitude-norm---the-length-of-a-vector">Vector Magnitude (Norm) - The “Length” of a Vector</h4>

<p>The magnitude tells you how long a vector is:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>|v| = √(v₁² + v₂² + ... + vₙ²)
</code></pre></div></div>

<p>Example: The vector [3, 4] has magnitude:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>|[3, 4]| = √(3² + 4²) = √(9 + 16) = √25 = 5
</code></pre></div></div>

<p>This is just the Pythagorean theorem!</p>

<h3 id="dot-product---the-most-important-operation-in-ai">Dot Product - The Most Important Operation in AI</h3>

<p>The dot product of two vectors is computed as:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>a · b = a₁×b₁ + a₂×b₂ + ... + aₙ×bₙ
</code></pre></div></div>

<p><strong>Example:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[2, 3] · [4, 5] = (2×4) + (3×5) = 8 + 15 = 23
</code></pre></div></div>

<h4 id="why-does-the-dot-product-matter">Why Does the Dot Product Matter?</h4>

<p>The dot product measures <strong>how aligned two vectors are</strong>:</p>
<ul>
  <li>If they point the same way: large positive value</li>
  <li>If they’re perpendicular: zero</li>
  <li>If they point opposite ways: negative value</li>
</ul>

<p>This is used EVERYWHERE in AI:</p>
<ul>
  <li><strong>Embeddings:</strong> Compare the similarity of two text representations</li>
  <li><strong>Neural networks:</strong> Computing neuron outputs</li>
  <li><strong>Attention mechanisms:</strong> Measuring how relevant one word is to another</li>
</ul>

<h3 id="cosine-similarity---using-dot-product-for-similarity">Cosine Similarity - Using Dot Product for Similarity</h3>

<p>We can normalize the dot product to get a number between -1 and +1:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cos(θ) = (a · b) / (|a| × |b|)
</code></pre></div></div>

<p>This is <strong>cosine similarity</strong>. It’s the fundamental metric for comparing embeddings in AI.</p>

<h3 id="matrices">Matrices</h3>

<h4 id="matrix-shapes">Matrix Shapes</h4>
<p>A matrix has dimensions <strong>rows × columns</strong>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>     [1  2  3]     ← 2 rows
A =  [4  5  6]
     ↑  ↑  ↑
     3 columns

A is a 2×3 matrix
</code></pre></div></div>

<h4 id="matrix-transpose">Matrix Transpose</h4>

<p>Flip rows and columns:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>     [1  2  3]          [1  4]
A =  [4  5  6]    A^T = [2  5]
                         [3  6]
</code></pre></div></div>

<h4 id="matrix-vector-multiplication">Matrix-Vector Multiplication</h4>

<p>Each row of the matrix is dot-producted with the vector:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>     [1  2]          [2]        (1×2 + 2×3) = 8
A =  [3  4]    v =  [3]   =    (3×2 + 4×3) = 18
     [5  6]                     (5×2 + 6×3) = 28

Result: [8, 18, 28]
</code></pre></div></div>

<h4 id="matrix-matrix-multiplication">Matrix-Matrix Multiplication</h4>

<p>Each element (i,j) in the result is the dot product of row i from the left matrix with column j from the right matrix.</p>

<h3 id="special-matrices">Special Matrices</h3>

<p><strong>Identity Matrix (I):</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>     [1  0  0]
I =  [0  1  0]
     [0  0  1]

Property: A × I = A
</code></pre></div></div>

<p><strong>Inverse Matrix (A⁻¹):</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A × A⁻¹ = I

(Like division: x × (1/x) = 1)
</code></pre></div></div>

<p>Not all matrices have inverses. Those that don’t are called “singular” - no unique solution when solving Ax = b.</p>

<p><strong>Eigenvalues &amp; Eigenvectors:</strong></p>

<p>Special vectors that don’t change direction when multiplied by a matrix (only scaled):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A × v = λ × v

λ = eigenvalue (how much the vector is scaled)
v = eigenvector (the vector itself)
</code></pre></div></div>

<p>(We’ll dive deep into eigenvalues on Day 12. For now, just know they exist.)</p>

<hr />

<h2 id="part-2-calculus---the-rate-of-change">Part 2: Calculus - The Rate of Change</h2>

<h3 id="what-is-a-derivative">What IS a Derivative?</h3>

<p><strong>Analogy:</strong> You’re hiking on a mountain. At any point, the ground has a slope. That slope is the derivative. Steep = large derivative. Flat = zero derivative.</p>

<p><strong>Formal definition:</strong> The derivative dy/dx is the rate of change of y with respect to x. It tells you how much y changes when x changes by a tiny amount.</p>

<h3 id="numerical-derivatives">Numerical Derivatives</h3>

<p>We can compute derivatives without any fancy math rules. Here’s the formula:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>f'(x) ≈ (f(x + h) - f(x)) / h
</code></pre></div></div>

<p>where h is tiny (like 1e-7).</p>

<p>This is saying: “Look at the function at x and at x+h. The slope between those two points approximates the true derivative.”</p>

<p><strong>Example:</strong> Let’s say f(x) = x²</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>f'(2) ≈ (f(2.0000001) - f(2)) / 0.0000001
      ≈ (4.0000004 - 4) / 0.0000001
      ≈ 4 (which is correct! d/dx(x²) = 2x, so at x=2, derivative = 4)
</code></pre></div></div>

<h3 id="symbolic-derivative-rules">Symbolic Derivative Rules</h3>

<p>Instead of computing numerically every time, mathematicians found patterns (rules):</p>

<h4 id="power-rule">Power Rule</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>d/dx(xⁿ) = n × x^(n-1)

Examples:
d/dx(x²) = 2x
d/dx(x³) = 3x²
d/dx(x^5) = 5x⁴
</code></pre></div></div>

<h4 id="product-rule">Product Rule</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>d/dx(u × v) = u' × v + u × v'

Example:
d/dx(x² × x³) = d/dx(x⁵) = 5x⁴

Or using the product rule:
= 2x × x³ + x² × 3x²
= 2x⁴ + 3x⁴
= 5x⁴ ✓
</code></pre></div></div>

<h4 id="chain-rule---the-most-important-for-ai">Chain Rule - The Most Important for AI</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>d/dx(f(g(x))) = f'(g(x)) × g'(x)
</code></pre></div></div>

<p><strong>Analogy (gears):</strong> Imagine two gears meshed together. The outer gear turns at rate f’(g(x)). The inner gear turns at rate g’(x). The total rotation is their product.</p>

<p><strong>Example:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>d/dx((x² + 1)³) = ?

Let u = x² + 1, so we have f(u) = u³
f'(u) = 3u²
g(x) = x² + 1
g'(x) = 2x

By chain rule:
d/dx(u³) = 3u² × 2x = 3(x² + 1)² × 2x
</code></pre></div></div>

<h3 id="partial-derivatives">Partial Derivatives</h3>

<p>When a function has multiple variables, you can take the derivative with respect to ONE of them, treating the others as constants.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>f(x, y) = x² + 2xy + y²

∂f/∂x = 2x + 2y    (treat y as constant)
∂f/∂y = 2x + 2y    (treat x as constant)
</code></pre></div></div>

<p>The ∂ symbol just means “partial derivative” (derivative with respect to one variable).</p>

<h3 id="the-gradient">The Gradient</h3>

<p>The <strong>gradient</strong> is a vector of all partial derivatives. It points in the direction of steepest ascent.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>For f(x, y):
∇f = [∂f/∂x, ∂f/∂y]

This vector points "uphill" the fastest.
</code></pre></div></div>

<p>If you want to find the minimum, move <strong>opposite</strong> to the gradient.</p>

<h3 id="gradient-descent---the-heart-of-ai-training">Gradient Descent - The Heart of AI Training</h3>

<p>Gradient descent is an algorithm that minimizes a function by taking steps opposite to the gradient.</p>

<p><strong>Algorithm:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = initial_guess
learning_rate = 0.01
for i in range(max_iterations):
    gradient = compute_gradient(x)
    x = x - learning_rate × gradient
    loss = f(x)
    if gradient is near zero:
        break
return x
</code></pre></div></div>

<p>The <strong>learning rate</strong> controls step size:</p>
<ul>
  <li>Too large: steps are huge, might overshoot</li>
  <li>Too small: takes forever to converge</li>
  <li>Just right: converges smoothly</li>
</ul>

<hr />

<h2 id="part-3-gradient-descent-from-scratch">Part 3: Gradient Descent from Scratch</h2>

<p>Now let’s implement everything.</p>

<h3 id="file-1-math_toolkitpy">File 1: math_toolkit.py</h3>

<p>Complete linear algebra and calculus implementations:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">List</span>

<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">%(levelname)s: %(message)s</span><span class="sh">'</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">vector_add</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span> <span class="n">b</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Add two vectors element-wise.</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="o">!=</span> <span class="nf">len</span><span class="p">(</span><span class="n">b</span><span class="p">):</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Vector lengths don</span><span class="sh">'</span><span class="s">t match: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="si">}</span><span class="s"> vs </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">b</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="p">[</span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">))]</span>


<span class="k">def</span> <span class="nf">vector_subtract</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span> <span class="n">b</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Subtract vector b from vector a.</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="o">!=</span> <span class="nf">len</span><span class="p">(</span><span class="n">b</span><span class="p">):</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Vector lengths don</span><span class="sh">'</span><span class="s">t match: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="si">}</span><span class="s"> vs </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">b</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="p">[</span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-</span> <span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">))]</span>


<span class="k">def</span> <span class="nf">scalar_multiply</span><span class="p">(</span><span class="n">scalar</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">v</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Multiply a vector by a scalar.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="p">[</span><span class="n">scalar</span> <span class="o">*</span> <span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">v</span><span class="p">]</span>


<span class="k">def</span> <span class="nf">dot_product</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span> <span class="n">b</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute dot product of two vectors.</span><span class="sh">"""</span>
    <span class="k">if</span> <span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="o">!=</span> <span class="nf">len</span><span class="p">(</span><span class="n">b</span><span class="p">):</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Vectors must have same length: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="si">}</span><span class="s"> vs </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">b</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">result</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*</span> <span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">a</span><span class="p">)))</span>
    <span class="k">return</span> <span class="n">result</span>


<span class="k">def</span> <span class="nf">vector_magnitude</span><span class="p">(</span><span class="n">v</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute the magnitude (L2 norm) of a vector.</span><span class="sh">"""</span>
    <span class="nf">return </span><span class="p">(</span><span class="nf">sum</span><span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">v</span><span class="p">))</span> <span class="o">**</span> <span class="mf">0.5</span>


<span class="k">def</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span> <span class="n">b</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute cosine similarity between two vectors (range: -1 to 1).</span><span class="sh">"""</span>
    <span class="n">mag_a</span> <span class="o">=</span> <span class="nf">vector_magnitude</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
    <span class="n">mag_b</span> <span class="o">=</span> <span class="nf">vector_magnitude</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">mag_a</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">or</span> <span class="n">mag_b</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="k">return</span> <span class="mf">0.0</span>
    <span class="k">return</span> <span class="nf">dot_product</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">mag_a</span> <span class="o">*</span> <span class="n">mag_b</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">matrix_transpose</span><span class="p">(</span><span class="n">A</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]]:</span>
    <span class="sh">"""</span><span class="s">Transpose a matrix (swap rows and columns).</span><span class="sh">"""</span>
    <span class="n">rows</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>
    <span class="n">cols</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="k">return</span> <span class="p">[[</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">j</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">rows</span><span class="p">)]</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">cols</span><span class="p">)]</span>


<span class="k">def</span> <span class="nf">matrix_vector_multiply</span><span class="p">(</span><span class="n">A</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]],</span> <span class="n">v</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Multiply a matrix by a vector.</span><span class="sh">"""</span>
    <span class="n">rows</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>
    <span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">rows</span><span class="p">):</span>
        <span class="n">dot</span> <span class="o">=</span> <span class="nf">dot_product</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">v</span><span class="p">)</span>
        <span class="n">result</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">dot</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">result</span>


<span class="k">def</span> <span class="nf">matrix_matrix_multiply</span><span class="p">(</span><span class="n">A</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]],</span> <span class="n">B</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]]:</span>
    <span class="sh">"""</span><span class="s">Multiply two matrices: A (m×n) × B (n×p) = C (m×p).</span><span class="sh">"""</span>
    <span class="n">m</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>
    <span class="n">n</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">p</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">B</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    
    <span class="c1"># Transpose B for easier column access
</span>    <span class="n">B_T</span> <span class="o">=</span> <span class="nf">matrix_transpose</span><span class="p">(</span><span class="n">B</span><span class="p">)</span>
    
    <span class="c1"># Result is m×p
</span>    <span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
        <span class="n">row</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">p</span><span class="p">):</span>
            <span class="n">dot</span> <span class="o">=</span> <span class="nf">dot_product</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">B_T</span><span class="p">[</span><span class="n">j</span><span class="p">])</span>
            <span class="n">row</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">dot</span><span class="p">)</span>
        <span class="n">result</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">result</span>


<span class="k">def</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">h</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">1e-7</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute numerical derivative of f at point x using finite differences.</span><span class="sh">"""</span>
    <span class="nf">return </span><span class="p">(</span><span class="nf">f</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">h</span><span class="p">)</span> <span class="o">-</span> <span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">/</span> <span class="n">h</span>


<span class="k">def</span> <span class="nf">print_vector</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">v</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Helper: log a vector in readable format.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s"> = </span><span class="si">{</span><span class="p">[</span><span class="nf">round</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">v</span><span class="p">]</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">print_matrix</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">A</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Helper: log a matrix in readable format.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s"> =</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">A</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  </span><span class="si">{</span><span class="p">[</span><span class="nf">round</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">row</span><span class="p">]</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>


<span class="c1"># Demonstration
</span><span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=== Linear Algebra Toolkit ===</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Vectors
</span>    <span class="n">a</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
    <span class="n">b</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
    
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">a = </span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s">, b = </span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">a + b = </span><span class="si">{</span><span class="nf">vector_add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">a - b = </span><span class="si">{</span><span class="nf">vector_subtract</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">3 × a = </span><span class="si">{</span><span class="nf">scalar_multiply</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">a · b = </span><span class="si">{</span><span class="nf">dot_product</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">|a| = </span><span class="si">{</span><span class="nf">vector_magnitude</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">cosine_similarity(a, b) = </span><span class="si">{</span><span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Matrices
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">=== Matrix Operations ===</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">A</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]]</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">A shape: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span><span class="si">}</span><span class="s">×</span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print_matrix</span><span class="p">(</span><span class="sh">"</span><span class="s">A</span><span class="sh">"</span><span class="p">,</span> <span class="n">A</span><span class="p">)</span>
    <span class="nf">print_matrix</span><span class="p">(</span><span class="sh">"</span><span class="s">A^T</span><span class="sh">"</span><span class="p">,</span> <span class="nf">matrix_transpose</span><span class="p">(</span><span class="n">A</span><span class="p">))</span>
    
    <span class="n">v</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
    <span class="nf">print_vector</span><span class="p">(</span><span class="sh">"</span><span class="s">v</span><span class="sh">"</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
    <span class="nf">print_vector</span><span class="p">(</span><span class="sh">"</span><span class="s">A × v</span><span class="sh">"</span><span class="p">,</span> <span class="nf">matrix_vector_multiply</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">v</span><span class="p">))</span>
    
    <span class="c1"># Calculus
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">=== Numerical Derivatives ===</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">f</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">**</span><span class="mi">2</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mf">2.0</span>
    <span class="n">derivative</span> <span class="o">=</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">f(x) = x², at x=</span><span class="si">{</span><span class="n">x</span><span class="si">}</span><span class="s">: f</span><span class="sh">'</span><span class="s">(x) ≈ </span><span class="si">{</span><span class="n">derivative</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> (true: 4.0)</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="n">f2</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">**</span><span class="mi">3</span>
    <span class="n">derivative2</span> <span class="o">=</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">f2</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">f(x) = x³, at x=</span><span class="si">{</span><span class="n">x</span><span class="si">}</span><span class="s">: f</span><span class="sh">'</span><span class="s">(x) ≈ </span><span class="si">{</span><span class="n">derivative2</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> (true: 12.0)</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="file-2-gradient_descentpy">File 2: gradient_descent.py</h3>

<p>Gradient descent implementation with visualization:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Tuple</span><span class="p">,</span> <span class="n">List</span>

<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">%(levelname)s: %(message)s</span><span class="sh">'</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">f</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="nb">float</span><span class="p">],</span> <span class="nb">float</span><span class="p">],</span> <span class="n">x</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">h</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">1e-7</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Compute numerical gradient of f at point x.</span><span class="sh">"""</span>
    <span class="nf">return </span><span class="p">(</span><span class="nf">f</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">h</span><span class="p">)</span> <span class="o">-</span> <span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">/</span> <span class="n">h</span>


<span class="k">def</span> <span class="nf">gradient_descent_1d</span><span class="p">(</span>
    <span class="n">f</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="nb">float</span><span class="p">],</span> <span class="nb">float</span><span class="p">],</span>
    <span class="n">x_init</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span>
    <span class="n">learning_rate</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span>
    <span class="n">max_iterations</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
    <span class="n">tolerance</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">1e-6</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">]]:</span>
    <span class="sh">"""</span><span class="s">
    Perform 1D gradient descent.
    
    Returns:
        final_x: The x value at the minimum
        x_history: List of x values at each iteration
        loss_history: List of loss values at each iteration
        grad_history: List of gradient magnitudes at each iteration
    </span><span class="sh">"""</span>
    <span class="n">x</span> <span class="o">=</span> <span class="n">x_init</span>
    <span class="n">x_history</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="p">]</span>
    <span class="n">loss_history</span> <span class="o">=</span> <span class="p">[</span><span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">)]</span>
    <span class="n">grad_history</span> <span class="o">=</span> <span class="p">[]</span>
    
    <span class="k">for</span> <span class="n">iteration</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">max_iterations</span><span class="p">):</span>
        <span class="c1"># Compute gradient
</span>        <span class="n">grad</span> <span class="o">=</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
        <span class="n">grad_history</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="n">grad</span><span class="p">))</span>
        
        <span class="c1"># Update x
</span>        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">grad</span>
        <span class="n">x_history</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">loss_history</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
        
        <span class="c1"># Check convergence
</span>        <span class="k">if</span> <span class="nf">abs</span><span class="p">(</span><span class="n">grad</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">tolerance</span><span class="p">:</span>
            <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Converged at iteration </span><span class="si">{</span><span class="n">iteration</span><span class="si">}</span><span class="s">, gradient: </span><span class="si">{</span><span class="n">grad</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">e</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
            <span class="k">break</span>
        
        <span class="nf">if </span><span class="p">(</span><span class="n">iteration</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">10</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
            <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Iteration </span><span class="si">{</span><span class="n">iteration</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="s">: x = </span><span class="si">{</span><span class="n">x</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="s">, loss = </span><span class="si">{</span><span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="s">, grad = </span><span class="si">{</span><span class="n">grad</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">e</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="k">return</span> <span class="n">x</span><span class="p">,</span> <span class="n">x_history</span><span class="p">,</span> <span class="n">loss_history</span><span class="p">,</span> <span class="n">grad_history</span>


<span class="k">def</span> <span class="nf">visualize_gradient_descent</span><span class="p">(</span>
    <span class="n">f</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="nb">float</span><span class="p">],</span> <span class="nb">float</span><span class="p">],</span>
    <span class="n">x_history</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span>
    <span class="n">loss_history</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span>
    <span class="n">x_range</span><span class="p">:</span> <span class="n">Tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span>
    <span class="n">filename</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="sh">"</span><span class="s">gradient_descent.png</span><span class="sh">"</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Visualize the gradient descent process.
    
    Creates a 2-panel plot:
    - Left: Function curve with descent steps marked
    - Right: Loss vs iteration
    </span><span class="sh">"""</span>
    <span class="n">fig</span><span class="p">,</span> <span class="p">(</span><span class="n">ax1</span><span class="p">,</span> <span class="n">ax2</span><span class="p">)</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">14</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
    
    <span class="c1"># Left panel: function curve + descent path
</span>    <span class="n">x_vals</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">linspace</span><span class="p">(</span><span class="n">x_range</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">x_range</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="mi">300</span><span class="p">)</span>
    <span class="n">y_vals</span> <span class="o">=</span> <span class="p">[</span><span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">x_vals</span><span class="p">]</span>
    
    <span class="n">ax1</span><span class="p">.</span><span class="nf">plot</span><span class="p">(</span><span class="n">x_vals</span><span class="p">,</span> <span class="n">y_vals</span><span class="p">,</span> <span class="sh">'</span><span class="s">b-</span><span class="sh">'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">f(x)</span><span class="sh">'</span><span class="p">)</span>
    
    <span class="c1"># Plot descent steps
</span>    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">x_history</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span>
        <span class="n">x_curr</span> <span class="o">=</span> <span class="n">x_history</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
        <span class="n">y_curr</span> <span class="o">=</span> <span class="n">loss_history</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
        <span class="n">x_next</span> <span class="o">=</span> <span class="n">x_history</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
        <span class="n">y_next</span> <span class="o">=</span> <span class="n">loss_history</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
        
        <span class="c1"># Draw arrow
</span>        <span class="n">ax1</span><span class="p">.</span><span class="nf">annotate</span><span class="p">(</span><span class="sh">''</span><span class="p">,</span> <span class="n">xy</span><span class="o">=</span><span class="p">(</span><span class="n">x_next</span><span class="p">,</span> <span class="n">y_next</span><span class="p">),</span> <span class="n">xytext</span><span class="o">=</span><span class="p">(</span><span class="n">x_curr</span><span class="p">,</span> <span class="n">y_curr</span><span class="p">),</span>
                    <span class="n">arrowprops</span><span class="o">=</span><span class="nf">dict</span><span class="p">(</span><span class="n">arrowstyle</span><span class="o">=</span><span class="sh">'</span><span class="s">-&gt;</span><span class="sh">'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">red</span><span class="sh">'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.6</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mf">1.5</span><span class="p">))</span>
    
    <span class="c1"># Highlight start and end
</span>    <span class="n">ax1</span><span class="p">.</span><span class="nf">plot</span><span class="p">(</span><span class="n">x_history</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">loss_history</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="sh">'</span><span class="s">go</span><span class="sh">'</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">Start</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax1</span><span class="p">.</span><span class="nf">plot</span><span class="p">(</span><span class="n">x_history</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">loss_history</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="sh">'</span><span class="s">r*</span><span class="sh">'</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">End (minimum)</span><span class="sh">'</span><span class="p">)</span>
    
    <span class="n">ax1</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">x</span><span class="sh">'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
    <span class="n">ax1</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">f(x)</span><span class="sh">'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
    <span class="n">ax1</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Gradient Descent: Function &amp; Descent Path</span><span class="sh">'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">14</span><span class="p">,</span> <span class="n">fontweight</span><span class="o">=</span><span class="sh">'</span><span class="s">bold</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax1</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
    <span class="n">ax1</span><span class="p">.</span><span class="nf">legend</span><span class="p">()</span>
    
    <span class="c1"># Right panel: loss over iterations
</span>    <span class="n">iterations</span> <span class="o">=</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">loss_history</span><span class="p">))</span>
    <span class="n">ax2</span><span class="p">.</span><span class="nf">plot</span><span class="p">(</span><span class="n">iterations</span><span class="p">,</span> <span class="n">loss_history</span><span class="p">,</span> <span class="sh">'</span><span class="s">b-</span><span class="sh">'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="sh">'</span><span class="s">o</span><span class="sh">'</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
    <span class="n">ax2</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Iteration</span><span class="sh">'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
    <span class="n">ax2</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Loss f(x)</span><span class="sh">'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
    <span class="n">ax2</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Loss Convergence</span><span class="sh">'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">14</span><span class="p">,</span> <span class="n">fontweight</span><span class="o">=</span><span class="sh">'</span><span class="s">bold</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax2</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
    
    <span class="n">plt</span><span class="p">.</span><span class="nf">tight_layout</span><span class="p">()</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="sh">'</span><span class="s">tight</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Plot saved to </span><span class="si">{</span><span class="n">filename</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>


<span class="c1"># Demonstration
</span><span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=== Gradient Descent: 1D Example ===</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Define function: f(x) = x² - 4x + 6
</span>    <span class="c1"># Minimum at x = 2, value = 2
</span>    <span class="n">f</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">4</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="mi">6</span>
    
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Objective: minimize f(x) = x² - 4x + 6</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">True minimum: x = 2, f(2) = 2</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Run gradient descent
</span>    <span class="n">x_init</span> <span class="o">=</span> <span class="mf">0.0</span>
    <span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.1</span>
    <span class="n">max_iterations</span> <span class="o">=</span> <span class="mi">100</span>
    
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Starting from x = </span><span class="si">{</span><span class="n">x_init</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Learning rate = </span><span class="si">{</span><span class="n">learning_rate</span><span class="si">}</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="n">x_final</span><span class="p">,</span> <span class="n">x_history</span><span class="p">,</span> <span class="n">loss_history</span><span class="p">,</span> <span class="n">grad_history</span> <span class="o">=</span> <span class="nf">gradient_descent_1d</span><span class="p">(</span>
        <span class="n">f</span><span class="p">,</span> <span class="n">x_init</span><span class="p">,</span> <span class="n">learning_rate</span><span class="p">,</span> <span class="n">max_iterations</span>
    <span class="p">)</span>
    
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="se">\n</span><span class="s">Final x: </span><span class="si">{</span><span class="n">x_final</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Final loss: </span><span class="si">{</span><span class="nf">f</span><span class="p">(</span><span class="n">x_final</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Iterations: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">x_history</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Visualize
</span>    <span class="nf">visualize_gradient_descent</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">x_history</span><span class="p">,</span> <span class="n">loss_history</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="sh">"</span><span class="s">gradient_descent.png</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Try different learning rates
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">=== Testing Different Learning Rates ===</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="n">learning_rates</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.01</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">]</span>
    <span class="k">for</span> <span class="n">lr</span> <span class="ow">in</span> <span class="n">learning_rates</span><span class="p">:</span>
        <span class="n">x_final</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">loss_hist</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="nf">gradient_descent_1d</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="n">lr</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">LR=</span><span class="si">{</span><span class="n">lr</span><span class="si">}</span><span class="s">: x=</span><span class="si">{</span><span class="n">x_final</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="s">, f(x)=</span><span class="si">{</span><span class="nf">f</span><span class="p">(</span><span class="n">x_final</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="s">, iterations=</span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">loss_hist</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="file-3-test_examplespy">File 3: test_examples.py</h3>

<p>Run examples to verify everything works:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">from</span> <span class="n">math_toolkit</span> <span class="kn">import</span> <span class="p">(</span>
    <span class="n">vector_add</span><span class="p">,</span> <span class="n">vector_subtract</span><span class="p">,</span> <span class="n">scalar_multiply</span><span class="p">,</span> <span class="n">dot_product</span><span class="p">,</span>
    <span class="n">vector_magnitude</span><span class="p">,</span> <span class="n">cosine_similarity</span><span class="p">,</span> <span class="n">matrix_transpose</span><span class="p">,</span>
    <span class="n">matrix_vector_multiply</span><span class="p">,</span> <span class="n">numerical_gradient</span>
<span class="p">)</span>
<span class="kn">from</span> <span class="n">gradient_descent</span> <span class="kn">import</span> <span class="n">gradient_descent_1d</span><span class="p">,</span> <span class="n">visualize_gradient_descent</span>

<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">%(levelname)s: %(message)s</span><span class="sh">'</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">test_linear_algebra</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Test linear algebra functions.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=== Testing Linear Algebra ===</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Test dot product
</span>    <span class="n">a</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
    <span class="n">b</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
    <span class="n">expected_dot</span> <span class="o">=</span> <span class="mi">1</span><span class="o">*</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="mi">5</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="mi">6</span>  <span class="c1"># 32
</span>    <span class="n">actual_dot</span> <span class="o">=</span> <span class="nf">dot_product</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
    <span class="k">assert</span> <span class="n">actual_dot</span> <span class="o">==</span> <span class="n">expected_dot</span><span class="p">,</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Dot product failed: </span><span class="si">{</span><span class="n">actual_dot</span><span class="si">}</span><span class="s"> != </span><span class="si">{</span><span class="n">expected_dot</span><span class="si">}</span><span class="sh">"</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ Dot product: </span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s"> · </span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s"> = </span><span class="si">{</span><span class="n">actual_dot</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Test vector magnitude
</span>    <span class="n">v</span> <span class="o">=</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
    <span class="n">expected_mag</span> <span class="o">=</span> <span class="mf">5.0</span>
    <span class="n">actual_mag</span> <span class="o">=</span> <span class="nf">vector_magnitude</span><span class="p">(</span><span class="n">v</span><span class="p">)</span>
    <span class="k">assert</span> <span class="nf">abs</span><span class="p">(</span><span class="n">actual_mag</span> <span class="o">-</span> <span class="n">expected_mag</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">1e-6</span><span class="p">,</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Magnitude failed: </span><span class="si">{</span><span class="n">actual_mag</span><span class="si">}</span><span class="s"> != </span><span class="si">{</span><span class="n">expected_mag</span><span class="si">}</span><span class="sh">"</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ Magnitude: |</span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s">| = </span><span class="si">{</span><span class="n">actual_mag</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Test cosine similarity (same vector)
</span>    <span class="n">sim</span> <span class="o">=</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span>
    <span class="k">assert</span> <span class="nf">abs</span><span class="p">(</span><span class="n">sim</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">1e-6</span><span class="p">,</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Cosine similarity failed: </span><span class="si">{</span><span class="n">sim</span><span class="si">}</span><span class="s"> != 1.0</span><span class="sh">"</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ Cosine similarity (same vector): </span><span class="si">{</span><span class="n">sim</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Test cosine similarity (perpendicular vectors)
</span>    <span class="n">perp1</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
    <span class="n">perp2</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>
    <span class="n">sim_perp</span> <span class="o">=</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">perp1</span><span class="p">,</span> <span class="n">perp2</span><span class="p">)</span>
    <span class="k">assert</span> <span class="nf">abs</span><span class="p">(</span><span class="n">sim_perp</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">1e-6</span><span class="p">,</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Cosine similarity (perpendicular) failed: </span><span class="si">{</span><span class="n">sim_perp</span><span class="si">}</span><span class="s"> != 0</span><span class="sh">"</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ Cosine similarity (perpendicular): </span><span class="si">{</span><span class="n">sim_perp</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Test matrix-vector multiplication
</span>    <span class="n">A</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]]</span>  <span class="c1"># 3×2
</span>    <span class="n">v</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span>
    <span class="n">result</span> <span class="o">=</span> <span class="nf">matrix_vector_multiply</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span>
    <span class="n">expected</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="o">*</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="o">*</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">4</span><span class="o">*</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="o">*</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">6</span><span class="o">*</span><span class="mi">2</span><span class="p">]</span>  <span class="c1"># [5, 11, 17]
</span>    <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">expected</span><span class="p">,</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Matrix-vector multiply failed: </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s"> != </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ Matrix-vector multiply: </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">()</span>


<span class="k">def</span> <span class="nf">test_calculus</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Test calculus functions.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=== Testing Calculus ===</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Test numerical derivative of x²
</span>    <span class="n">f</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">**</span><span class="mi">2</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mf">3.0</span>
    <span class="n">deriv</span> <span class="o">=</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
    <span class="n">expected</span> <span class="o">=</span> <span class="mi">2</span><span class="o">*</span><span class="n">x</span>  <span class="c1"># d/dx(x²) = 2x
</span>    <span class="k">assert</span> <span class="nf">abs</span><span class="p">(</span><span class="n">deriv</span> <span class="o">-</span> <span class="n">expected</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">1e-3</span><span class="p">,</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Derivative failed: </span><span class="si">{</span><span class="n">deriv</span><span class="si">}</span><span class="s"> != </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="sh">"</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ d/dx(x²) at x=</span><span class="si">{</span><span class="n">x</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">deriv</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> (expected: </span><span class="si">{</span><span class="n">expected</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Test numerical derivative of x³
</span>    <span class="n">g</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">**</span><span class="mi">3</span>
    <span class="n">deriv_g</span> <span class="o">=</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">g</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
    <span class="n">expected_g</span> <span class="o">=</span> <span class="mi">3</span><span class="o">*</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span>  <span class="c1"># d/dx(x³) = 3x²
</span>    <span class="k">assert</span> <span class="nf">abs</span><span class="p">(</span><span class="n">deriv_g</span> <span class="o">-</span> <span class="n">expected_g</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">1e-2</span><span class="p">,</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Derivative failed: </span><span class="si">{</span><span class="n">deriv_g</span><span class="si">}</span><span class="s"> != </span><span class="si">{</span><span class="n">expected_g</span><span class="si">}</span><span class="sh">"</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ d/dx(x³) at x=</span><span class="si">{</span><span class="n">x</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">deriv_g</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> (expected: </span><span class="si">{</span><span class="n">expected_g</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">()</span>


<span class="k">def</span> <span class="nf">test_gradient_descent</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Test gradient descent.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=== Testing Gradient Descent ===</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Minimize f(x) = (x-3)²
</span>    <span class="n">f</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="mi">3</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span>
    
    <span class="n">x_final</span><span class="p">,</span> <span class="n">x_hist</span><span class="p">,</span> <span class="n">loss_hist</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="nf">gradient_descent_1d</span><span class="p">(</span>
        <span class="n">f</span><span class="p">,</span> <span class="n">x_init</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">max_iterations</span><span class="o">=</span><span class="mi">100</span>
    <span class="p">)</span>
    
    <span class="c1"># Should converge to x ≈ 3
</span>    <span class="k">assert</span> <span class="nf">abs</span><span class="p">(</span><span class="n">x_final</span> <span class="o">-</span> <span class="mf">3.0</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">0.01</span><span class="p">,</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Gradient descent failed: </span><span class="si">{</span><span class="n">x_final</span><span class="si">}</span><span class="s"> != 3.0</span><span class="sh">"</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ Minimized (x-3)²: x = </span><span class="si">{</span><span class="n">x_final</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="s"> (expected: 3.0)</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">✓ Loss decreased from </span><span class="si">{</span><span class="n">loss_hist</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="s"> to </span><span class="si">{</span><span class="n">loss_hist</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">()</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span>
    <span class="nf">test_linear_algebra</span><span class="p">()</span>
    <span class="nf">test_calculus</span><span class="p">()</span>
    <span class="nf">test_gradient_descent</span><span class="p">()</span>
    
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=== All Tests Passed! ===</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h2 id="the-project-math-toolkit-complete-reference">The Project: Math Toolkit Complete Reference</h2>

<h3 id="running-the-code">Running the Code</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># From ~/ai-journey/day-8/</span>

<span class="c"># Test basic functions</span>
python test_examples.py

<span class="c"># Run gradient descent with visualization</span>
python gradient_descent.py
</code></pre></div></div>

<h3 id="expected-output">Expected Output</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INFO: === Linear Algebra Toolkit ===
INFO: a = [1, 2, 3], b = [4, 5, 6]
INFO: a + b = [5, 7, 9]
INFO: a - b = [-3, -3, -3]
INFO: 3 × a = [3, 6, 9]
INFO: a · b = 32
INFO: |a| = 3.7417
INFO: cosine_similarity(a, b) = 0.9746

INFO: === Matrix Operations ===
INFO: A shape: 2×3
INFO: A =
INFO:   [1, 2, 3]
INFO:   [4, 5, 6]
...
INFO: === Gradient Descent: 1D Example ===
INFO: Objective: minimize f(x) = x² - 4x + 6
INFO: True minimum: x = 2, f(2) = 2
INFO: Starting from x = 0.0
INFO: Learning rate = 0.1

INFO: Iteration 10: x = 1.851278, loss = 2.021889, grad = -0.30
INFO: Iteration 20: x = 1.989963, loss = 2.000100, grad = -0.02
INFO: Iteration 30: x = 1.999999, loss = 2.000000, grad = -0.00
INFO: Converged at iteration 31, gradient: 1.82e-07

INFO: Final x: 2.000000
INFO: Final loss: 2.000000
INFO: Iterations: 31
INFO: Plot saved to gradient_descent.png
</code></pre></div></div>

<h2 id="whats-next">What’s Next</h2>

<p><strong>Day 9:</strong> Probability &amp; Statistics - You’ll learn distributions, Bayes’ theorem, and why probability underpins uncertainty in AI. Same logging and type hints standards continue!</p>

<hr />]]></content><author><name>Edward Praveen</name></author><category term="dl-llm-systems" /><category term="linear-algebra" /><category term="calculus" /><category term="mathematics" /><category term="ml-math" /><category term="deep-learning" /><category term="neural-networks" /><summary type="html"><![CDATA[Part of my 180-day AI Engineering journey - explained for beginners]]></summary></entry><entry><title type="html">Day 7 of 180 - Data Visualisation</title><link href="https://edwardpraveen.com/dl-llm-systems/matplotlib-seaborn-day7/" rel="alternate" type="text/html" title="Day 7 of 180 - Data Visualisation" /><published>2026-03-26T00:00:00+05:30</published><updated>2026-03-26T00:00:00+05:30</updated><id>https://edwardpraveen.com/dl-llm-systems/matplotlib-seaborn-day7</id><content type="html" xml:base="https://edwardpraveen.com/dl-llm-systems/matplotlib-seaborn-day7/"><![CDATA[<blockquote>
  <h2 id="part-of-my-180-day-ai-engineering-journey---learning-in-public-one-hour-a-day-writing-everything-in-plain-english-so-beginners-can-follow-along-the-blog-is-written-with-the-help-of-ai"><em>Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI</em></h2>
</blockquote>

<h2 id="introduction">Introduction</h2>

<p>Today marks a pivotal shift in how we write code. Up until now, you’ve been using <code class="language-plaintext highlighter-rouge">print()</code> to see what your programs do. Starting today, you’ll graduate to <strong>logging</strong> - the production-standard way that real engineers observe their code.</p>

<p>Here’s why this matters: Imagine you deploy a data analysis script to production, and it processes 1 million customer records at 3 AM. Something goes wrong. With <code class="language-plaintext highlighter-rouge">print()</code>, your output scrolls off the screen and you have no record of what happened. With <code class="language-plaintext highlighter-rouge">logging</code>, you have a timestamped, categorized, searchable log file that tells you exactly what went wrong, when, and what led up to it.</p>

<p>By the end of today, you’ll also have <strong>beautiful, publication-quality plots</strong> using Matplotlib and Seaborn. But logging comes first - it’s the more important production skill.</p>

<hr />

<h2 id="setup">Setup</h2>

<p>Before we start, install the packages for today:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span><span class="nv">matplotlib</span><span class="o">==</span>3.8.2 <span class="nv">seaborn</span><span class="o">==</span>0.13.1 python-json-logger<span class="o">==</span>2.0.7
</code></pre></div></div>

<p><strong>Why these versions?</strong> Pinned versions prevent surprise breaking changes. In real jobs, you’ll do this for every project.</p>

<hr />

<h2 id="part-1-python-logging---your-new-production-standard">Part 1: Python Logging - Your New Production Standard</h2>

<h3 id="11-the-problem-with-print">1.1 The Problem with <code class="language-plaintext highlighter-rouge">print()</code></h3>

<p>Let’s say you’re processing student test scores:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># ❌ OLD WAY - DON'T DO THIS ANYMORE
</span><span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Starting analysis...</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Loaded 100 records</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">ERROR: Student 42 has no math score</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Finished!</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Problems:</strong></p>
<ol>
  <li><strong>No severity levels</strong> - All output looks the same. Can’t tell warnings from errors from normal info.</li>
  <li><strong>No timestamps</strong> - When exactly did that error happen? Was it 3 AM or 3 PM?</li>
  <li><strong>No filtering</strong> - In development, you want DEBUG messages. In production, they’re noise. Can’t easily toggle.</li>
  <li><strong>Console-only</strong> - Output disappears when the terminal closes. Can’t analyze later.</li>
  <li><strong>Not machine-readable</strong> - Can’t pipe to log aggregation tools (Datadog, Splunk, CloudWatch).</li>
</ol>

<h3 id="12-the-solution-logging-module">1.2 The Solution: <code class="language-plaintext highlighter-rouge">logging</code> Module</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># ✅ NEW WAY - USE THIS ALWAYS
</span><span class="kn">import</span> <span class="n">logging</span>

<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">)</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Starting analysis...</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Loaded 100 records</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Student 42 has no math score</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Finished!</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Benefits:</strong></p>
<ol>
  <li>✅ <strong>5 levels</strong> (DEBUG, INFO, WARNING, ERROR, CRITICAL) - you choose the severity</li>
  <li>✅ <strong>Automatic timestamps</strong> - 2026-03-30 14:32:05,123 on every message</li>
  <li>✅ <strong>Filterable</strong> - One config change hides DEBUG noise in production</li>
  <li>✅ <strong>Multi-destination</strong> - Send to console AND file simultaneously</li>
  <li>✅ <strong>Machine-readable</strong> - Convert to JSON for real log systems</li>
</ol>

<h3 id="13-the-5-log-levels-traffic-light-analogy">1.3 The 5 Log Levels (Traffic Light Analogy)</h3>

<p>Think of driving a car through a city. Your log level is like a traffic light system:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>🔵 DEBUG    = Blue light (take the side streets) = "Checking if row 542 has email field"
              Use: Development only. Too detailed for production.

🟢 INFO     = Green light (proceed normally)      = "Loaded 10,000 records successfully"
              Use: Normal operation milestones. What's going right.

🟡 WARNING  = Yellow light (caution ahead)        = "5 records missing phone number"
              Use: Unexpected but recoverable. Needs attention but won't crash.

🔴 ERROR    = Red light (stop for hazard)         = "Failed to connect to database, retrying..."
              Use: Operation failed, but program continues. Needs fix soon.

🔴🔴 CRITICAL = Red light + alarm (emergency!)    = "Out of disk space, cannot continue"
              Use: Show-stoppers. Program must stop immediately.
</code></pre></div></div>

<p>In <strong>development</strong>, you show everything (DEBUG through CRITICAL).
In <strong>production</strong>, you show only INFO and above (hide DEBUG noise).</p>

<h3 id="14-basic-setup-5-lines-of-code">1.4 Basic Setup: 5 Lines of Code</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>

<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span>
    <span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">,</span>
    <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">%(asctime)s - %(name)s - %(levelname)s - %(message)s</span><span class="sh">'</span>
<span class="p">)</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>

<span class="c1"># Now use it:
</span><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Analysis started</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">warning</span><span class="p">(</span><span class="sh">"</span><span class="s">5 rows have missing data</span><span class="sh">"</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Failed to save plot</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>What does each part do?</strong></p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">level=logging.INFO</code>: Show INFO, WARNING, ERROR, CRITICAL. Hide DEBUG (too noisy).</li>
  <li><code class="language-plaintext highlighter-rouge">format='...'</code>: Template for every log message.
    <ul>
      <li><code class="language-plaintext highlighter-rouge">%(asctime)s</code> = timestamp (2026-03-30 14:32:05,123)</li>
      <li><code class="language-plaintext highlighter-rouge">%(name)s</code> = module name (e.g., “data_analysis”)</li>
      <li><code class="language-plaintext highlighter-rouge">%(levelname)s</code> = DEBUG/INFO/WARNING/ERROR/CRITICAL</li>
      <li><code class="language-plaintext highlighter-rouge">%(message)s</code> = your message</li>
    </ul>
  </li>
  <li><code class="language-plaintext highlighter-rouge">logging.getLogger(__name__)</code>: Create a logger named after this file</li>
</ul>

<h3 id="15-using-the-logger">1.5 Using the Logger</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>

<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">DEBUG</span><span class="p">)</span>  <span class="c1"># Show everything
</span><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sh">"</span><span class="s">Checking column </span><span class="sh">'</span><span class="s">email</span><span class="sh">'</span><span class="s"> in row 5</span><span class="sh">"</span><span class="p">)</span>           <span class="c1"># 🔵 Dev detail
</span><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Successfully loaded 10,000 records</span><span class="sh">"</span><span class="p">)</span>          <span class="c1"># 🟢 Normal
</span><span class="n">logger</span><span class="p">.</span><span class="nf">warning</span><span class="p">(</span><span class="sh">"</span><span class="s">Student #42 missing math score, using 0</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># 🟡 Unexpected
</span><span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Failed to save plot: /output/ not writable</span><span class="sh">"</span><span class="p">)</span> <span class="c1"># 🔴 Problem
</span><span class="n">logger</span><span class="p">.</span><span class="nf">critical</span><span class="p">(</span><span class="sh">"</span><span class="s">Database offline, cannot continue</span><span class="sh">"</span><span class="p">)</span>       <span class="c1"># 🔴🔴 Fatal
</span></code></pre></div></div>

<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DEBUG    - Checking column 'email' in row 5
INFO     - Successfully loaded 10,000 records
WARNING  - Student #42 missing math score, using 0
ERROR    - Failed to save plot: /output/ not writable
CRITICAL - Database offline, cannot continue
</code></pre></div></div>

<h3 id="16-handlers-send-logs-to-multiple-places">1.6 Handlers: Send Logs to Multiple Places</h3>

<p>By default, <code class="language-plaintext highlighter-rouge">basicConfig()</code> only sends to <strong>console</strong>. What if you want <strong>file + console</strong>?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">from</span> <span class="n">logging.handlers</span> <span class="kn">import</span> <span class="n">RotatingFileHandler</span>

<span class="c1"># Create logger
</span><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">DEBUG</span><span class="p">)</span>

<span class="c1"># HANDLER 1: Console (show only INFO and above)
</span><span class="n">console_handler</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nc">StreamHandler</span><span class="p">()</span>
<span class="n">console_handler</span><span class="p">.</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">)</span>
<span class="n">console_formatter</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nc">Formatter</span><span class="p">(</span><span class="sh">'</span><span class="s">%(levelname)-8s | %(message)s</span><span class="sh">'</span><span class="p">)</span>
<span class="n">console_handler</span><span class="p">.</span><span class="nf">setFormatter</span><span class="p">(</span><span class="n">console_formatter</span><span class="p">)</span>

<span class="c1"># HANDLER 2: File (save everything including DEBUG)
</span><span class="n">file_handler</span> <span class="o">=</span> <span class="nc">RotatingFileHandler</span><span class="p">(</span>
    <span class="sh">'</span><span class="s">app.log</span><span class="sh">'</span><span class="p">,</span>
    <span class="n">maxBytes</span><span class="o">=</span><span class="mi">5_000_000</span><span class="p">,</span>  <span class="c1"># Rotate at 5 MB
</span>    <span class="n">backupCount</span><span class="o">=</span><span class="mi">3</span>         <span class="c1"># Keep 3 old files
</span><span class="p">)</span>
<span class="n">file_handler</span><span class="p">.</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">DEBUG</span><span class="p">)</span>
<span class="n">file_formatter</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nc">Formatter</span><span class="p">(</span>
    <span class="sh">'</span><span class="s">%(asctime)s - %(name)s - %(levelname)s - %(message)s</span><span class="sh">'</span>
<span class="p">)</span>
<span class="n">file_handler</span><span class="p">.</span><span class="nf">setFormatter</span><span class="p">(</span><span class="n">file_formatter</span><span class="p">)</span>

<span class="c1"># Attach both handlers
</span><span class="n">logger</span><span class="p">.</span><span class="nf">addHandler</span><span class="p">(</span><span class="n">console_handler</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">addHandler</span><span class="p">(</span><span class="n">file_handler</span><span class="p">)</span>

<span class="c1"># Now log messages go BOTH places
</span><span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sh">"</span><span class="s">Detailed debugging info</span><span class="sh">"</span><span class="p">)</span>        <span class="c1"># Only in app.log
</span><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Normal operation</span><span class="sh">"</span><span class="p">)</span>                <span class="c1"># Both console and app.log
</span><span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sh">"</span><span class="s">Something failed</span><span class="sh">"</span><span class="p">)</span>               <span class="c1"># Both console and app.log
</span></code></pre></div></div>

<p><strong>Result:</strong></p>

<p>Console sees:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INFO     | Loaded 10,000 records
WARNING  | Student #42 missing data
ERROR    | Failed to save plot
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">app.log</code> saves:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2026-03-30 14:32:05,123 - __main__ - DEBUG - Checking if field 'email' exists
2026-03-30 14:32:05,234 - __main__ - INFO - Loaded 10,000 records
2026-03-30 14:32:05,345 - __main__ - WARNING - Student #42 missing data
2026-03-30 14:32:05,456 - __main__ - ERROR - Failed to save plot
</code></pre></div></div>

<h3 id="17-log-rotation-keep-files-tidy">1.7 Log Rotation: Keep Files Tidy</h3>

<p>If your script runs for days, <code class="language-plaintext highlighter-rouge">app.log</code> grows huge. Use <code class="language-plaintext highlighter-rouge">RotatingFileHandler</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">logging.handlers</span> <span class="kn">import</span> <span class="n">RotatingFileHandler</span>

<span class="n">handler</span> <span class="o">=</span> <span class="nc">RotatingFileHandler</span><span class="p">(</span>
    <span class="sh">'</span><span class="s">app.log</span><span class="sh">'</span><span class="p">,</span>
    <span class="n">maxBytes</span><span class="o">=</span><span class="mi">5_000_000</span><span class="p">,</span>  <span class="c1"># 5 MB per file
</span>    <span class="n">backupCount</span><span class="o">=</span><span class="mi">5</span>        <span class="c1"># Keep 5 old files
</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>What happens:</strong></p>
<ul>
  <li>When <code class="language-plaintext highlighter-rouge">app.log</code> reaches 5 MB → rename to <code class="language-plaintext highlighter-rouge">app.log.1</code></li>
  <li>Old <code class="language-plaintext highlighter-rouge">app.log.1</code> → becomes <code class="language-plaintext highlighter-rouge">app.log.2</code></li>
  <li>Old <code class="language-plaintext highlighter-rouge">app.log.5</code> → deleted</li>
  <li>New <code class="language-plaintext highlighter-rouge">app.log</code> created</li>
</ul>

<p>You’ll never run out of disk space!</p>

<h3 id="18-logger-hierarchy-control-noisy-libraries">1.8 Logger Hierarchy: Control Noisy Libraries</h3>

<p>Libraries like <code class="language-plaintext highlighter-rouge">requests</code> (HTTP client) log a lot. You can silence them:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>

<span class="n">logging</span><span class="p">.</span><span class="nf">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="p">.</span><span class="n">DEBUG</span><span class="p">)</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>

<span class="c1"># Silence the requests library - only show WARNING and above
</span><span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="sh">"</span><span class="s">requests</span><span class="sh">"</span><span class="p">).</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">WARNING</span><span class="p">)</span>
<span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="sh">"</span><span class="s">urllib3</span><span class="sh">"</span><span class="p">).</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">WARNING</span><span class="p">)</span>

<span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sh">"</span><span class="s">My app</span><span class="sh">'</span><span class="s">s debug info</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># ✅ Shows
</span><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">requests library debug info would go here</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># ❌ Hidden
</span></code></pre></div></div>

<p>Loggers form a <strong>hierarchy</strong> by name:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root
├── myapp (your main logger)
│   ├── myapp.database
│   └── myapp.analysis
└── requests (external library)
</code></pre></div></div>

<p>Change the parent, and all children follow.</p>

<h3 id="19-structured-json-logging-production-gold-standard">1.9 Structured JSON Logging (Production Gold Standard)</h3>

<p>Regular logs are text. JSON logs are <strong>machine-readable</strong>:</p>

<p><strong>Regular log:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2026-03-30 14:32:05,123 - analysis - ERROR - Missing value in row 542
</code></pre></div></div>

<p><strong>JSON log:</strong></p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="nl">"timestamp"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2026-03-30T14:32:05.123Z"</span><span class="p">,</span><span class="w"> </span><span class="nl">"logger"</span><span class="p">:</span><span class="w"> </span><span class="s2">"analysis"</span><span class="p">,</span><span class="w"> </span><span class="nl">"level"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ERROR"</span><span class="p">,</span><span class="w"> </span><span class="nl">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Missing value in row 542"</span><span class="p">,</span><span class="w"> </span><span class="nl">"row_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">542</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>JSON logs go into <strong>Datadog, Splunk, CloudWatch</strong>. Machines parse them, alert you, create dashboards automatically.</p>

<p><strong>Setup (requires <code class="language-plaintext highlighter-rouge">pip install python-json-logger</code>):</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">from</span> <span class="n">pythonjsonlogger</span> <span class="kn">import</span> <span class="n">jsonlogger</span>

<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">()</span>
<span class="n">handler</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nc">FileHandler</span><span class="p">(</span><span class="sh">'</span><span class="s">app-json.log</span><span class="sh">'</span><span class="p">)</span>
<span class="n">formatter</span> <span class="o">=</span> <span class="n">jsonlogger</span><span class="p">.</span><span class="nc">JsonFormatter</span><span class="p">()</span>
<span class="n">handler</span><span class="p">.</span><span class="nf">setFormatter</span><span class="p">(</span><span class="n">formatter</span><span class="p">)</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">addHandler</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>

<span class="c1"># Log with extra context
</span><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Analysis complete</span><span class="sh">"</span><span class="p">,</span> <span class="n">extra</span><span class="o">=</span><span class="p">{</span>
    <span class="sh">"</span><span class="s">rows_processed</span><span class="sh">"</span><span class="p">:</span> <span class="mi">10000</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">duration_sec</span><span class="sh">"</span><span class="p">:</span> <span class="mf">45.2</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">source</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">database</span><span class="sh">"</span>
<span class="p">})</span>
</code></pre></div></div>

<p><strong>Output in <code class="language-plaintext highlighter-rouge">app-json.log</code>:</strong></p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="nl">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Analysis complete"</span><span class="p">,</span><span class="w"> </span><span class="nl">"rows_processed"</span><span class="p">:</span><span class="w"> </span><span class="mi">10000</span><span class="p">,</span><span class="w"> </span><span class="nl">"duration_sec"</span><span class="p">:</span><span class="w"> </span><span class="mf">45.2</span><span class="p">,</span><span class="w"> </span><span class="nl">"source"</span><span class="p">:</span><span class="w"> </span><span class="s2">"database"</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Now tools like Datadog can parse this, create alerts (“if rows_processed &lt; 1000, alert”), and build dashboards.</p>

<hr />

<h2 id="part-2-matplotlib---create-publication-quality-plots">Part 2: Matplotlib - Create Publication-Quality Plots</h2>

<h3 id="21-object-oriented-api-vs-state-machine">2.1 Object-Oriented API vs State Machine</h3>

<p><strong>Don’t do this (state machine):</strong></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">plot</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">title</span><span class="p">(</span><span class="sh">"</span><span class="s">My Plot</span><span class="sh">"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">show</span><span class="p">()</span>  <span class="c1"># ❌ Doesn't work in scripts/servers/notebooks reliably
</span></code></pre></div></div>

<p><strong>Do this (object-oriented):</strong></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">plot</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">y=x²</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">X</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Y</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Quadratic Function</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">legend</span><span class="p">()</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">plot.png</span><span class="sh">'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="sh">'</span><span class="s">tight</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Why OO wins:</strong></p>
<ul>
  <li><strong>Explicit control</strong>: Every element is an object you control</li>
  <li><strong>Works everywhere</strong>: Scripts, servers, notebooks, Docker - anywhere</li>
  <li><strong>Subplots are easy</strong>: Just use <code class="language-plaintext highlighter-rouge">plt.subplots(2, 2)</code> for a 2×2 grid</li>
  <li><strong>No GUI needed</strong>: Can run on servers without a display</li>
</ul>

<h3 id="22-plot-types">2.2 Plot Types</h3>

<h4 id="line-plot-trends-over-time">Line Plot (trends over time)</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">plot</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="sh">'</span><span class="s">o</span><span class="sh">'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">y=x²</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">X</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Y</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Quadratic Function</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">legend</span><span class="p">()</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">line.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="scatter-plot-correlation-between-two-variables">Scatter Plot (correlation between two variables)</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">scatter</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="n">s</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.6</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">red</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Feature X</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Feature Y</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Relationship Between Features</span><span class="sh">'</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">scatter.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="bar-chart-comparison-across-categories">Bar Chart (comparison across categories)</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">categories</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">Q1</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Q2</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Q3</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Q4</span><span class="sh">'</span><span class="p">]</span>
<span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">24</span><span class="p">,</span> <span class="mi">36</span><span class="p">,</span> <span class="mi">18</span><span class="p">]</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">bar</span><span class="p">(</span><span class="n">categories</span><span class="p">,</span> <span class="n">values</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">steelblue</span><span class="sh">'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Revenue ($K)</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Quarterly Revenue</span><span class="sh">'</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">bar.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="histogram-distribution-of-single-variable">Histogram (distribution of single variable)</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">75</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">82</span><span class="p">,</span> <span class="mi">90</span><span class="p">,</span> <span class="mi">77</span><span class="p">,</span> <span class="mi">93</span><span class="p">]</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">hist</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">green</span><span class="sh">'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Score</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Frequency</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Score Distribution</span><span class="sh">'</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">hist.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="multiple-subplots-compare-multiple-plots">Multiple Subplots (compare multiple plots)</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">axes</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">10</span><span class="p">))</span>

<span class="c1"># Top-left: line plot
</span><span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">].</span><span class="nf">plot</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">].</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Line Plot</span><span class="sh">'</span><span class="p">)</span>

<span class="c1"># Top-right: scatter plot
</span><span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">].</span><span class="nf">scatter</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">].</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Scatter Plot</span><span class="sh">'</span><span class="p">)</span>

<span class="c1"># Bottom-left: bar chart
</span><span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">].</span><span class="nf">bar</span><span class="p">([</span><span class="sh">'</span><span class="s">A</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">B</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">C</span><span class="sh">'</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">].</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Bar Chart</span><span class="sh">'</span><span class="p">)</span>

<span class="c1"># Bottom-right: histogram
</span><span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">].</span><span class="nf">hist</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="n">bins</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">].</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Histogram</span><span class="sh">'</span><span class="p">)</span>

<span class="n">fig</span><span class="p">.</span><span class="nf">tight_layout</span><span class="p">()</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">subplots.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="23-styling">2.3 Styling</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>

<span class="c1"># Option 1: Use a built-in style
</span><span class="n">plt</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="nf">use</span><span class="p">(</span><span class="sh">'</span><span class="s">seaborn-v0_8-darkgrid</span><span class="sh">'</span><span class="p">)</span>

<span class="c1"># Option 2: Custom colors
</span><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">()</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">plot</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">#FF6B6B</span><span class="sh">'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>

<span class="c1"># Option 3: Color map (gradient)
</span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">50</span><span class="p">)</span>
<span class="n">colors</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">cm</span><span class="p">.</span><span class="nf">viridis</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="nf">len</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
<span class="k">for</span> <span class="n">xi</span><span class="p">,</span> <span class="n">color</span> <span class="ow">in</span> <span class="nf">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">colors</span><span class="p">):</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">scatter</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="nf">sin</span><span class="p">(</span><span class="n">xi</span><span class="p">),</span> <span class="n">color</span><span class="o">=</span><span class="n">color</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">50</span><span class="p">)</span>

<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">styled.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h2 id="part-3-seaborn---statistical-plots-made-easy">Part 3: Seaborn - Statistical Plots Made Easy</h2>

<p>Seaborn builds on Matplotlib. It makes <strong>statistical plots easier</strong> and <strong>prettier by default</strong>.</p>

<h3 id="31-setup">3.1 Setup</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="n">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>

<span class="c1"># Load your data
</span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nf">read_csv</span><span class="p">(</span><span class="sh">'</span><span class="s">students.csv</span><span class="sh">'</span><span class="p">)</span>

<span class="c1"># Set theme once for all plots
</span><span class="n">sns</span><span class="p">.</span><span class="nf">set_theme</span><span class="p">(</span><span class="n">style</span><span class="o">=</span><span class="sh">'</span><span class="s">darkgrid</span><span class="sh">'</span><span class="p">,</span> <span class="n">palette</span><span class="o">=</span><span class="sh">'</span><span class="s">husl</span><span class="sh">'</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="32-distribution-plots">3.2 Distribution Plots</h3>

<h4 id="histogram-with-smooth-curve">Histogram with smooth curve</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="nf">histplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="sh">'</span><span class="s">score</span><span class="sh">'</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">kde</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Score Distribution with KDE</span><span class="sh">'</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">hist_kde.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="box-plot-shows-quartiles-outliers">Box plot (shows quartiles, outliers)</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="nf">boxplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="sh">'</span><span class="s">grade</span><span class="sh">'</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="sh">'</span><span class="s">score</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Score by Grade</span><span class="sh">'</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">boxplot.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="violin-plot-like-box-plot-but-shows-full-distribution">Violin plot (like box plot but shows full distribution)</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="nf">violinplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="sh">'</span><span class="s">grade</span><span class="sh">'</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="sh">'</span><span class="s">score</span><span class="sh">'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Score Distribution by Grade</span><span class="sh">'</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">violinplot.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="33-correlation-analysis">3.3 Correlation Analysis</h3>

<h4 id="heatmap">Heatmap</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="n">corr</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="sh">'</span><span class="s">math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">english</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">history</span><span class="sh">'</span><span class="p">]].</span><span class="nf">corr</span><span class="p">()</span>
<span class="n">sns</span><span class="p">.</span><span class="nf">heatmap</span><span class="p">(</span><span class="n">corr</span><span class="p">,</span> <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="sh">'</span><span class="s">.2f</span><span class="sh">'</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="sh">'</span><span class="s">coolwarm</span><span class="sh">'</span><span class="p">,</span> <span class="n">square</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Subject Score Correlations</span><span class="sh">'</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">heatmap.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<p>The heatmap shows which subjects’ scores are related:</p>
<ul>
  <li><strong>1.0</strong> = perfect correlation (same subject)</li>
  <li><strong>0.8</strong> = strong correlation (students good at both subjects)</li>
  <li><strong>0.0</strong> = no correlation (independent)</li>
  <li><strong>-0.8</strong> = inverse correlation (good at one, bad at other)</li>
</ul>

<h4 id="pair-plot-all-scatter-plots-at-once">Pair plot (all scatter plots at once)</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Shows scatter plots for every pair of numeric columns
</span><span class="n">sns</span><span class="p">.</span><span class="nf">pairplot</span><span class="p">(</span><span class="n">df</span><span class="p">[[</span><span class="sh">'</span><span class="s">math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">english</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">history</span><span class="sh">'</span><span class="p">]],</span> <span class="n">hue</span><span class="o">=</span><span class="sh">'</span><span class="s">grade</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">pairplot.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="34-advanced-scatter-plot">3.4 Advanced Scatter Plot</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="nf">scatterplot</span><span class="p">(</span>
    <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">,</span>
    <span class="n">x</span><span class="o">=</span><span class="sh">'</span><span class="s">math</span><span class="sh">'</span><span class="p">,</span>
    <span class="n">y</span><span class="o">=</span><span class="sh">'</span><span class="s">english</span><span class="sh">'</span><span class="p">,</span>
    <span class="n">hue</span><span class="o">=</span><span class="sh">'</span><span class="s">grade</span><span class="sh">'</span><span class="p">,</span>        <span class="c1"># Color by grade
</span>    <span class="n">size</span><span class="o">=</span><span class="sh">'</span><span class="s">attendance</span><span class="sh">'</span><span class="p">,</span>  <span class="c1"># Size by attendance
</span>    <span class="n">s</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span>
    <span class="n">alpha</span><span class="o">=</span><span class="mf">0.6</span>
<span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Math vs English (colored by grade, sized by attendance)</span><span class="sh">'</span><span class="p">)</span>
<span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sh">'</span><span class="s">scatter_advanced.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h2 id="the-project-logging-data-analysis-dashboard">The Project: Logging Data Analysis Dashboard</h2>

<p>You’ll build a complete data analysis pipeline that:</p>
<ol>
  <li><strong>Logs every step</strong> (DEBUG, INFO, WARNING, ERROR)</li>
  <li><strong>Generates 4 plots</strong> (distribution, heatmap, bar chart, scatter)</li>
  <li><strong>Saves logs</strong> to both console AND file</li>
  <li><strong>Has type hints</strong> on every function</li>
  <li><strong>Zero <code class="language-plaintext highlighter-rouge">print()</code> calls</strong></li>
</ol>

<h3 id="file-1-configpy">File 1: <code class="language-plaintext highlighter-rouge">config.py</code></h3>

<p>Centralized logging configuration:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="sh">"""</span><span class="s">Logging configuration for the data analysis pipeline.</span><span class="sh">"""</span>

<span class="kn">import</span> <span class="n">logging</span>
<span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
<span class="kn">from</span> <span class="n">logging.handlers</span> <span class="kn">import</span> <span class="n">RotatingFileHandler</span>


<span class="k">def</span> <span class="nf">setup_logging</span><span class="p">(</span><span class="n">log_file</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="sh">"</span><span class="s">output/analysis.log</span><span class="sh">"</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">logging</span><span class="p">.</span><span class="n">Logger</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Configure logging to console and file.
    
    Args:
        log_file: Path to log file
        
    Returns:
        Configured logger instance
    </span><span class="sh">"""</span>
    <span class="c1"># Create output directory if it doesn't exist
</span>    <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">output</span><span class="sh">"</span><span class="p">).</span><span class="nf">mkdir</span><span class="p">(</span><span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>

    <span class="c1"># Create logger
</span>    <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="sh">"</span><span class="s">data_analysis</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">DEBUG</span><span class="p">)</span>

    <span class="c1"># Console handler (INFO and above only - clean output for user)
</span>    <span class="n">console_handler</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nc">StreamHandler</span><span class="p">()</span>
    <span class="n">console_handler</span><span class="p">.</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">)</span>
    <span class="n">console_formatter</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nc">Formatter</span><span class="p">(</span><span class="sh">'</span><span class="s">%(levelname)-8s | %(message)s</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">console_handler</span><span class="p">.</span><span class="nf">setFormatter</span><span class="p">(</span><span class="n">console_formatter</span><span class="p">)</span>

    <span class="c1"># File handler (DEBUG and above - detailed logs for debugging)
</span>    <span class="n">file_handler</span> <span class="o">=</span> <span class="nc">RotatingFileHandler</span><span class="p">(</span>
        <span class="n">log_file</span><span class="p">,</span>
        <span class="n">maxBytes</span><span class="o">=</span><span class="mi">5_000_000</span><span class="p">,</span>  <span class="c1"># Rotate at 5 MB
</span>        <span class="n">backupCount</span><span class="o">=</span><span class="mi">3</span>         <span class="c1"># Keep 3 old files
</span>    <span class="p">)</span>
    <span class="n">file_handler</span><span class="p">.</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">DEBUG</span><span class="p">)</span>
    <span class="n">file_formatter</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nc">Formatter</span><span class="p">(</span>
        <span class="sh">'</span><span class="s">%(asctime)s - %(name)s - %(levelname)s - %(message)s</span><span class="sh">'</span>
    <span class="p">)</span>
    <span class="n">file_handler</span><span class="p">.</span><span class="nf">setFormatter</span><span class="p">(</span><span class="n">file_formatter</span><span class="p">)</span>

    <span class="c1"># Attach both handlers
</span>    <span class="n">logger</span><span class="p">.</span><span class="nf">addHandler</span><span class="p">(</span><span class="n">console_handler</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">addHandler</span><span class="p">(</span><span class="n">file_handler</span><span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Logging system initialized</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">logger</span>
</code></pre></div></div>

<h3 id="file-2-data_generatorpy">File 2: <code class="language-plaintext highlighter-rouge">data_generator.py</code></h3>

<p>Generate sample data:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="sh">"""</span><span class="s">Generate sample student score data.</span><span class="sh">"""</span>

<span class="kn">import</span> <span class="n">csv</span>
<span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>


<span class="k">def</span> <span class="nf">generate_sample_data</span><span class="p">(</span><span class="n">filename</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="sh">"</span><span class="s">sample_data.csv</span><span class="sh">"</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create sample CSV with 10 students and 4 subject scores.
    
    Args:
        filename: Output CSV filename
    </span><span class="sh">"""</span>
    <span class="n">data</span> <span class="o">=</span> <span class="p">[</span>
        <span class="p">[</span><span class="sh">"</span><span class="s">student_id</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">math</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">english</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">science</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">history</span><span class="sh">"</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">85</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">88</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">91</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">85</span><span class="p">,</span> <span class="mi">82</span><span class="p">,</span> <span class="mi">79</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">89</span><span class="p">,</span> <span class="mi">94</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">75</span><span class="p">,</span> <span class="mi">72</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">75</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">95</span><span class="p">,</span> <span class="mi">91</span><span class="p">,</span> <span class="mi">97</span><span class="p">,</span> <span class="mi">96</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">82</span><span class="p">,</span> <span class="mi">86</span><span class="p">,</span> <span class="mi">84</span><span class="p">,</span> <span class="mi">87</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">8</span><span class="p">,</span> <span class="mi">90</span><span class="p">,</span> <span class="mi">89</span><span class="p">,</span> <span class="mi">91</span><span class="p">,</span> <span class="mi">88</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">77</span><span class="p">,</span> <span class="mi">79</span><span class="p">,</span> <span class="mi">76</span><span class="p">,</span> <span class="mi">81</span><span class="p">],</span>
        <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">93</span><span class="p">,</span> <span class="mi">94</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">95</span><span class="p">],</span>
    <span class="p">]</span>

    <span class="k">with</span> <span class="nf">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="sh">'</span><span class="s">w</span><span class="sh">'</span><span class="p">,</span> <span class="n">newline</span><span class="o">=</span><span class="sh">''</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
        <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="p">.</span><span class="nf">writer</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
        <span class="n">writer</span><span class="p">.</span><span class="nf">writerows</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>

    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Generated </span><span class="si">{</span><span class="n">filename</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span>
    <span class="nf">generate_sample_data</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="file-3-analysispy-main-script">File 3: <code class="language-plaintext highlighter-rouge">analysis.py</code> (Main Script)</h3>

<p>Complete data analysis with logging:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="sh">"""</span><span class="s">Data analysis dashboard with logging.</span><span class="sh">"""</span>

<span class="kn">import</span> <span class="n">logging</span>
<span class="kn">import</span> <span class="n">csv</span>
<span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">List</span><span class="p">,</span> <span class="n">Dict</span>
<span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="n">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">config</span> <span class="kn">import</span> <span class="n">setup_logging</span>

<span class="c1"># Initialize logging
</span><span class="n">logger</span> <span class="o">=</span> <span class="nf">setup_logging</span><span class="p">()</span>


<span class="k">def</span> <span class="nf">load_data</span><span class="p">(</span><span class="n">filepath</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">str</span><span class="p">]]:</span>
    <span class="sh">"""</span><span class="s">Load CSV data into a list of dictionaries.
    
    Args:
        filepath: Path to CSV file
        
    Returns:
        List of dictionaries with student data
        
    Raises:
        FileNotFoundError: If CSV file doesn</span><span class="sh">'</span><span class="s">t exist
    </span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Loading data from </span><span class="si">{</span><span class="n">filepath</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">try</span><span class="p">:</span>
        <span class="k">with</span> <span class="nf">open</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="sh">'</span><span class="s">r</span><span class="sh">'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
            <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="p">.</span><span class="nc">DictReader</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
            <span class="n">data</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">reader</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Successfully loaded </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="si">}</span><span class="s"> records</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">data</span>
    <span class="k">except</span> <span class="nb">FileNotFoundError</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">File not found: </span><span class="si">{</span><span class="n">filepath</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">raise</span>


<span class="k">def</span> <span class="nf">validate_data</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">str</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Check for missing values in data.
    
    Args:
        data: List of dictionaries with student data
    </span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Validating data...</span><span class="sh">"</span><span class="p">)</span>

    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span>
        <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">row</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
            <span class="k">if</span> <span class="ow">not</span> <span class="n">value</span> <span class="ow">or</span> <span class="n">value</span><span class="p">.</span><span class="nf">strip</span><span class="p">()</span> <span class="o">==</span> <span class="sh">''</span><span class="p">:</span>
                <span class="n">logger</span><span class="p">.</span><span class="nf">warning</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Row </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">: Missing </span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Validation complete</span><span class="sh">"</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">create_output_dir</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create output directories for plots.</span><span class="sh">"""</span>
    <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">output/plots</span><span class="sh">"</span><span class="p">).</span><span class="nf">mkdir</span><span class="p">(</span><span class="n">parents</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sh">"</span><span class="s">Created output/plots directory</span><span class="sh">"</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">plot_score_distribution</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">str</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create histogram of math scores.
    
    Args:
        data: List of dictionaries with student data
    </span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Creating score distribution plot...</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="nf">int</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="sh">'</span><span class="s">math</span><span class="sh">'</span><span class="p">])</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>

    <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">hist</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">steelblue</span><span class="sh">'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="sh">'</span><span class="s">black</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Score</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Frequency</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Distribution of Math Scores</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>

    <span class="n">filepath</span> <span class="o">=</span> <span class="sh">"</span><span class="s">output/plots/score_distribution.png</span><span class="sh">"</span>
    <span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="sh">'</span><span class="s">tight</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Saved: </span><span class="si">{</span><span class="n">filepath</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">plot_correlation_heatmap</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">str</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create correlation heatmap of all subjects.
    
    Args:
        data: List of dictionaries with student data
    </span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Creating correlation heatmap...</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">subjects</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">english</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">history</span><span class="sh">'</span><span class="p">]</span>
    
    <span class="c1"># Build score matrix
</span>    <span class="n">scores_dict</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">subject</span> <span class="ow">in</span> <span class="n">subjects</span><span class="p">:</span>
        <span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="nf">int</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="n">subject</span><span class="p">])</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
        <span class="n">scores_dict</span><span class="p">[</span><span class="n">subject</span><span class="p">]</span> <span class="o">=</span> <span class="n">scores</span>

    <span class="c1"># Calculate correlation
</span>    <span class="n">score_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="n">scores_dict</span><span class="p">[</span><span class="n">s</span><span class="p">]</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">subjects</span><span class="p">])</span>
    <span class="n">corr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">corrcoef</span><span class="p">(</span><span class="n">score_matrix</span><span class="p">)</span>

    <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
    <span class="n">sns</span><span class="p">.</span><span class="nf">heatmap</span><span class="p">(</span><span class="n">corr</span><span class="p">,</span> <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="sh">'</span><span class="s">.2f</span><span class="sh">'</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="sh">'</span><span class="s">coolwarm</span><span class="sh">'</span><span class="p">,</span>
                <span class="n">xticklabels</span><span class="o">=</span><span class="n">subjects</span><span class="p">,</span> <span class="n">yticklabels</span><span class="o">=</span><span class="n">subjects</span><span class="p">,</span>
                <span class="n">square</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Subject Score Correlations</span><span class="sh">'</span><span class="p">)</span>

    <span class="n">filepath</span> <span class="o">=</span> <span class="sh">"</span><span class="s">output/plots/correlation_heatmap.png</span><span class="sh">"</span>
    <span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="sh">'</span><span class="s">tight</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Saved: </span><span class="si">{</span><span class="n">filepath</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">plot_subject_comparison</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">str</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create bar chart comparing average scores by subject.
    
    Args:
        data: List of dictionaries with student data
    </span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Creating subject comparison plot...</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">subjects</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">english</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">history</span><span class="sh">'</span><span class="p">]</span>
    <span class="n">averages</span> <span class="o">=</span> <span class="p">[]</span>

    <span class="k">for</span> <span class="n">subject</span> <span class="ow">in</span> <span class="n">subjects</span><span class="p">:</span>
        <span class="n">avg</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="nf">int</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="n">subject</span><span class="p">])</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">data</span><span class="p">)</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
        <span class="n">averages</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">avg</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Average </span><span class="si">{</span><span class="n">subject</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">avg</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
    <span class="n">bars</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="nf">bar</span><span class="p">(</span><span class="n">subjects</span><span class="p">,</span> <span class="n">averages</span><span class="p">,</span>
                  <span class="n">color</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">#FF6B6B</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">#4ECDC4</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">#45B7D1</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">#FFA07A</span><span class="sh">'</span><span class="p">])</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Average Score</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Average Score by Subject</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_ylim</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">])</span>

    <span class="c1"># Add value labels on bars
</span>    <span class="k">for</span> <span class="n">bar</span> <span class="ow">in</span> <span class="n">bars</span><span class="p">:</span>
        <span class="n">height</span> <span class="o">=</span> <span class="n">bar</span><span class="p">.</span><span class="nf">get_height</span><span class="p">()</span>
        <span class="n">ax</span><span class="p">.</span><span class="nf">text</span><span class="p">(</span><span class="n">bar</span><span class="p">.</span><span class="nf">get_x</span><span class="p">()</span> <span class="o">+</span> <span class="n">bar</span><span class="p">.</span><span class="nf">get_width</span><span class="p">()</span><span class="o">/</span><span class="mf">2.</span><span class="p">,</span> <span class="n">height</span><span class="p">,</span>
                <span class="sa">f</span><span class="sh">'</span><span class="si">{</span><span class="n">height</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="n">f</span><span class="si">}</span><span class="sh">'</span><span class="p">,</span> <span class="n">ha</span><span class="o">=</span><span class="sh">'</span><span class="s">center</span><span class="sh">'</span><span class="p">,</span> <span class="n">va</span><span class="o">=</span><span class="sh">'</span><span class="s">bottom</span><span class="sh">'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>

    <span class="n">filepath</span> <span class="o">=</span> <span class="sh">"</span><span class="s">output/plots/subject_comparison.png</span><span class="sh">"</span>
    <span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="sh">'</span><span class="s">tight</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Saved: </span><span class="si">{</span><span class="n">filepath</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">plot_student_scatter</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">str</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create scatter plot: Math vs English.
    
    Args:
        data: List of dictionaries with student data
    </span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">Creating student scatter plot...</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">math_scores</span> <span class="o">=</span> <span class="p">[</span><span class="nf">int</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="sh">'</span><span class="s">math</span><span class="sh">'</span><span class="p">])</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
    <span class="n">english_scores</span> <span class="o">=</span> <span class="p">[</span><span class="nf">int</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="sh">'</span><span class="s">english</span><span class="sh">'</span><span class="p">])</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>

    <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">scatter</span><span class="p">(</span><span class="n">math_scores</span><span class="p">,</span> <span class="n">english_scores</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.6</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">purple</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_xlabel</span><span class="p">(</span><span class="sh">'</span><span class="s">Math Score</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_ylabel</span><span class="p">(</span><span class="sh">'</span><span class="s">English Score</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sh">'</span><span class="s">Math vs English Scores</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nf">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>

    <span class="n">filepath</span> <span class="o">=</span> <span class="sh">"</span><span class="s">output/plots/math_vs_english.png</span><span class="sh">"</span>
    <span class="n">fig</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">150</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="sh">'</span><span class="s">tight</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Saved: </span><span class="si">{</span><span class="n">filepath</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">close</span><span class="p">(</span><span class="n">fig</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Main analysis pipeline.</span><span class="sh">"""</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">STARTING DATA ANALYSIS DASHBOARD</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>

    <span class="k">try</span><span class="p">:</span>
        <span class="c1"># Setup
</span>        <span class="nf">create_output_dir</span><span class="p">()</span>

        <span class="c1"># Load and validate
</span>        <span class="n">data</span> <span class="o">=</span> <span class="nf">load_data</span><span class="p">(</span><span class="sh">"</span><span class="s">sample_data.csv</span><span class="sh">"</span><span class="p">)</span>
        <span class="nf">validate_data</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>

        <span class="c1"># Generate plots
</span>        <span class="nf">plot_score_distribution</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
        <span class="nf">plot_correlation_heatmap</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
        <span class="nf">plot_subject_comparison</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
        <span class="nf">plot_student_scatter</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>

        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">ANALYSIS COMPLETE - All plots saved to output/plots/</span><span class="sh">"</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span><span class="p">)</span>

    <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Pipeline failed: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="sh">"</span><span class="p">,</span> <span class="n">exc_info</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
        <span class="k">raise</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span>
    <span class="nf">main</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="running-the-project">Running the Project</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Step 1: Generate sample data</span>
python data_generator.py
<span class="c"># Output: Generated sample_data.csv</span>

<span class="c"># Step 2: Run analysis (creates plots + logs)</span>
python analysis.py

<span class="c"># Step 3: View results</span>
<span class="nb">ls</span> <span class="nt">-lh</span> output/plots/
<span class="nb">cat </span>output/analysis.log
</code></pre></div></div>

<h3 id="expected-console-output">Expected Console Output</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INFO     | Logging system initialized
INFO     | ============================================================
INFO     | STARTING DATA ANALYSIS DASHBOARD
INFO     | ============================================================
INFO     | Loading data from sample_data.csv
INFO     | Successfully loaded 10 records
INFO     | Validating data...
INFO     | Validation complete
INFO     | Creating score distribution plot...
INFO     | Saved: output/plots/score_distribution.png
INFO     | Creating correlation heatmap...
INFO     | Saved: output/plots/correlation_heatmap.png
INFO     | Creating subject comparison plot...
INFO     | Saved: output/plots/subject_comparison.png
INFO     | Creating student scatter plot...
INFO     | Saved: output/plots/math_vs_english.png
INFO     | ============================================================
INFO     | ANALYSIS COMPLETE - All plots saved to output/plots/
INFO     | ============================================================
</code></pre></div></div>

<h3 id="expected-log-file-output">Expected Log File Output</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2026-03-30 14:32:05,123 - data_analysis - INFO - Logging system initialized
2026-03-30 14:32:05,234 - data_analysis - INFO - ============================================================
2026-03-30 14:32:05,234 - data_analysis - INFO - STARTING DATA ANALYSIS DASHBOARD
2026-03-30 14:32:05,234 - data_analysis - INFO - ============================================================
2026-03-30 14:32:05,345 - data_analysis - INFO - Loading data from sample_data.csv
2026-03-30 14:32:05,456 - data_analysis - INFO - Successfully loaded 10 records
2026-03-30 14:32:05,567 - data_analysis - INFO - Validating data...
2026-03-30 14:32:05,678 - data_analysis - INFO - Validation complete
2026-03-30 14:32:05,789 - data_analysis - DEBUG - Average math: 87.50
2026-03-30 14:32:05,890 - data_analysis - DEBUG - Average english: 85.40
2026-03-30 14:32:06,001 - data_analysis - DEBUG - Average science: 88.60
2026-03-30 14:32:06,112 - data_analysis - DEBUG - Average history: 88.40
2026-03-30 14:32:06,223 - data_analysis - INFO - Created output/plots directory
2026-03-30 14:32:06,334 - data_analysis - INFO - Creating score distribution plot...
2026-03-30 14:32:06,445 - data_analysis - INFO - Saved: output/plots/score_distribution.png
...
</code></pre></div></div>

<hr />

<h2 id="whats-next">What’s Next</h2>

<p><strong>Day 8: Linear Algebra &amp; Calculus</strong></p>
<ul>
  <li>NumPy arrays and operations</li>
  <li>Matrix multiplication, determinants, inverses</li>
  <li>Derivatives, gradients, chain rule</li>
  <li>Logging and type hints continue as standard practice</li>
</ul>]]></content><author><name>Edward Praveen</name></author><category term="dl-llm-systems" /><category term="python" /><category term="matplotlib" /><category term="seaborn" /><category term="data-visualization" /><category term="logging" /><category term="ml-engineering" /><summary type="html"><![CDATA[Part of my 180-day AI Engineering journey - explained for beginners]]></summary></entry><entry><title type="html">Day 6 of 180 - Numpy &amp;amp; Pandas</title><link href="https://edwardpraveen.com/dl-llm-systems/numpy-pandas-day6/" rel="alternate" type="text/html" title="Day 6 of 180 - Numpy &amp;amp; Pandas" /><published>2026-03-25T00:00:00+05:30</published><updated>2026-03-25T00:00:00+05:30</updated><id>https://edwardpraveen.com/dl-llm-systems/numpy-pandas-day6</id><content type="html" xml:base="https://edwardpraveen.com/dl-llm-systems/numpy-pandas-day6/"><![CDATA[<blockquote>
  <h2 id="part-of-my-180-day-ai-engineering-journey---learning-in-public-one-hour-a-day-writing-everything-in-plain-english-so-beginners-can-follow-along-the-blog-is-written-with-the-help-of-ai"><em>Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI</em></h2>
</blockquote>

<h2 id="introduction-why-this-matters-for-ai">Introduction: Why This Matters for AI</h2>

<p>Before we can train any AI model, we need to feed it data. But real-world data is messy:</p>
<ul>
  <li>Customer records scattered across multiple files</li>
  <li>Duplicate entries and missing values</li>
  <li>Numbers in the wrong format</li>
  <li>Thousands (or millions) of rows to process</li>
</ul>

<p><strong>NumPy</strong> and <strong>Pandas</strong> are the tools that handle this. They let you:</p>
<ul>
  <li>Load data from CSV files</li>
  <li>Clean and transform it</li>
  <li>Filter and group it</li>
  <li>Calculate statistics</li>
  <li>Merge datasets together</li>
</ul>

<p>Every AI project starts here. You can’t train a model without data, and you can’t work with data without NumPy and Pandas.</p>

<hr />

<h2 id="setup">Setup</h2>

<p>Install the required libraries:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span><span class="nv">numpy</span><span class="o">==</span>1.26.3 <span class="nv">pandas</span><span class="o">==</span>2.1.3
</code></pre></div></div>

<p>Verify installation:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="n">pandas</span> <span class="k">as</span> <span class="n">pd</span>
 
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">NumPy version: </span><span class="si">{</span><span class="n">np</span><span class="p">.</span><span class="n">__version__</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Pandas version: </span><span class="si">{</span><span class="n">pd</span><span class="p">.</span><span class="n">__version__</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h2 id="part-1-numpy---fast-arrays">Part 1: NumPy - Fast Arrays</h2>

<h3 id="the-problem-speed">The Problem: Speed</h3>

<p>Think about a simple task: you have a list of 1 million student scores, and you need to multiply each one by 1.1 (like adding a 10% bonus).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Python list way (SLOW)
</span><span class="n">scores</span> <span class="o">=</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="p">...]</span>  <span class="c1"># 1 million items
</span><span class="n">bonused</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">score</span> <span class="ow">in</span> <span class="n">scores</span><span class="p">:</span>
    <span class="n">bonused</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">score</span> <span class="o">*</span> <span class="mf">1.1</span><span class="p">)</span>  <span class="c1"># loops 1 million times
</span></code></pre></div></div>

<p>This takes time because Python has to:</p>
<ol>
  <li>Start a loop</li>
  <li>Access each item individually</li>
  <li>Multiply it</li>
  <li>Append to a new list</li>
  <li>Repeat 1 million times</li>
</ol>

<p><strong>NumPy solves this:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
 
<span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="p">...])</span>  <span class="c1"># 1 million items
</span><span class="n">bonused</span> <span class="o">=</span> <span class="n">scores</span> <span class="o">*</span> <span class="mf">1.1</span>  <span class="c1"># INSTANT - happens all at once!
</span></code></pre></div></div>

<p>NumPy does the multiplication on the entire array at the CPU level, not in Python. This is 50-100x faster.</p>

<h3 id="what-is-an-array">What is an Array?</h3>

<p>An array is like a spreadsheet column (or grid) of numbers:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
 
<span class="c1"># 1D array (like a column)
</span><span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">])</span>
<span class="c1"># [85 92 78 88]
</span> 
<span class="c1"># 2D array (like a spreadsheet)
</span><span class="n">grades</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span>
    <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">92</span><span class="p">],</span>  <span class="c1"># Alice's math, science, english
</span>    <span class="p">[</span><span class="mi">92</span><span class="p">,</span> <span class="mi">85</span><span class="p">,</span> <span class="mi">88</span><span class="p">],</span>  <span class="c1"># Bob's scores
</span>    <span class="p">[</span><span class="mi">78</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">85</span><span class="p">]</span>   <span class="c1"># Charlie's scores
</span><span class="p">])</span>
<span class="c1"># [[85 88 92]
#  [92 85 88]
#  [78 92 85]]
</span></code></pre></div></div>

<h3 id="creating-arrays">Creating Arrays</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
 
<span class="c1"># From a Python list
</span><span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
<span class="c1"># [1 2 3 4 5]
</span> 
<span class="c1"># Zeros (useful for initializing)
</span><span class="n">zeros</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">zeros</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="c1"># [0. 0. 0. 0. 0.]
</span> 
<span class="c1"># Ones
</span><span class="n">ones</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">ones</span><span class="p">((</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
<span class="c1"># [[1. 1.]
#  [1. 1.]
#  [1. 1.]]
</span> 
<span class="c1"># Range (like Python's range(), but returns an array)
</span><span class="n">range_arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># [0 2 4 6 8]
</span> 
<span class="c1"># Evenly spaced numbers (give me 5 numbers between 0 and 1)
</span><span class="n">linspace</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="c1"># [0.   0.25 0.5  0.75 1.  ]
</span> 
<span class="c1"># Random numbers (for testing)
</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>  <span class="c1"># reproducible randomness
</span><span class="n">random_arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">randn</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="c1"># [[ 0.49671415 -0.1382643   0.64589411]
#  [-0.23415337 -0.23413696  1.57921282]
#  [ 0.76743473 -0.46947439  0.54256004]]
</span></code></pre></div></div>

<h3 id="array-properties">Array Properties</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]])</span>
 
<span class="c1"># How many rows and columns?
</span><span class="n">arr</span><span class="p">.</span><span class="n">shape</span>  <span class="c1"># (2, 3) - 2 rows, 3 columns
</span> 
<span class="c1"># What type of data?
</span><span class="n">arr</span><span class="p">.</span><span class="n">dtype</span>  <span class="c1"># dtype('int64') - integer, 64-bit
</span> 
<span class="c1"># How many dimensions?
</span><span class="n">arr</span><span class="p">.</span><span class="n">ndim</span>   <span class="c1"># 2 - it's a table (2D)
</span> 
<span class="c1"># Total number of elements?
</span><span class="n">arr</span><span class="p">.</span><span class="n">size</span>   <span class="c1"># 6
</span> 
<span class="c1"># Flip rows and columns
</span><span class="n">arr</span><span class="p">.</span><span class="n">T</span>
<span class="c1"># [[1 4]
#  [2 5]
#  [3 6]]
</span></code></pre></div></div>

<h3 id="indexing-getting-specific-elements">Indexing: Getting Specific Elements</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">90</span><span class="p">])</span>
 
<span class="c1"># Get the first score
</span><span class="n">scores</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>  <span class="c1"># 85
</span> 
<span class="c1"># Get the last score
</span><span class="n">scores</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>  <span class="c1"># 90
</span> 
<span class="c1"># Get scores from position 1 to 3 (not including 4)
</span><span class="n">scores</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">4</span><span class="p">]</span>  <span class="c1"># [92, 78, 88]
</span> 
<span class="c1"># Get every other score
</span><span class="n">scores</span><span class="p">[::</span><span class="mi">2</span><span class="p">]</span>  <span class="c1"># [85, 78, 90]
</span> 
<span class="c1"># For 2D arrays
</span><span class="n">grades</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">92</span><span class="p">],</span> <span class="p">[</span><span class="mi">92</span><span class="p">,</span> <span class="mi">85</span><span class="p">,</span> <span class="mi">88</span><span class="p">]])</span>
 
<span class="c1"># Get Alice's science score (row 0, column 1)
</span><span class="n">grades</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>  <span class="c1"># 88
</span> 
<span class="c1"># Get Charlie's entire row
</span><span class="n">grades</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="p">:]</span>  <span class="c1"># [92, 85, 88]
</span> 
<span class="c1"># Get all math scores (entire column 0)
</span><span class="n">grades</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span>  <span class="c1"># [85, 92]
</span></code></pre></div></div>

<h3 id="boolean-indexing-filtering">Boolean Indexing: Filtering</h3>

<p>This is incredibly useful. Get only the values that match a condition:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">92</span><span class="p">])</span>
 
<span class="c1"># Which scores are 85 or higher?
</span><span class="n">high_scores</span> <span class="o">=</span> <span class="n">scores</span><span class="p">[</span><span class="n">scores</span> <span class="o">&gt;=</span> <span class="mi">85</span><span class="p">]</span>
<span class="c1"># [85, 92, 88, 92]
</span> 
<span class="c1"># This creates a True/False array first:
</span><span class="n">scores</span> <span class="o">&gt;=</span> <span class="mi">85</span>
<span class="c1"># [True, True, False, True, True]
</span> 
<span class="c1"># Then keeps only the True ones
</span></code></pre></div></div>

<h3 id="broadcasting-the-magic-trick">Broadcasting: The Magic Trick</h3>

<p>Broadcasting is NumPy’s ability to make arrays of different sizes work together. Think of it like stretching a small image to fit a big frame:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Simple example: adding 10 to every score
</span><span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">80</span><span class="p">,</span> <span class="mi">85</span><span class="p">,</span> <span class="mi">90</span><span class="p">])</span>
<span class="n">scores</span> <span class="o">+</span> <span class="mi">10</span>
<span class="c1"># [90, 95, 100]
</span> 
<span class="c1"># More complex: adding the same bonus to each subject
</span><span class="n">math_scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">])</span>
<span class="n">science_scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">88</span><span class="p">,</span> <span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">])</span>
<span class="n">english_scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">92</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">85</span><span class="p">])</span>
 
<span class="c1"># Stack them into a table
</span><span class="n">grades</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="n">math_scores</span><span class="p">,</span> <span class="n">science_scores</span><span class="p">,</span> <span class="n">english_scores</span><span class="p">])</span>
<span class="c1"># [[85 92 78]
#  [88 85 92]
#  [92 88 85]]
</span> 
<span class="c1"># Add 5 points bonus to all grades
</span><span class="n">boosted</span> <span class="o">=</span> <span class="n">grades</span> <span class="o">+</span> <span class="mi">5</span>
<span class="c1"># [[90 97 83]
#  [93 90 97]
#  [97 93 90]]
</span> 
<span class="c1"># The 5 gets "broadcast" to all positions!
</span></code></pre></div></div>

<h3 id="vectorized-operations-the-real-power">Vectorized Operations: The Real Power</h3>

<p>Instead of loops, use NumPy functions:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">temps</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mf">72.5</span><span class="p">,</span> <span class="mf">73.1</span><span class="p">,</span> <span class="mf">71.8</span><span class="p">,</span> <span class="mf">74.2</span><span class="p">,</span> <span class="mf">72.9</span><span class="p">])</span>
 
<span class="c1"># Average temperature
</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">temps</span><span class="p">)</span>  <span class="c1"># 72.9
</span> 
<span class="c1"># Standard deviation (how spread out are they?)
</span><span class="n">np</span><span class="p">.</span><span class="nf">std</span><span class="p">(</span><span class="n">temps</span><span class="p">)</span>   <span class="c1"># ~0.94
</span> 
<span class="c1"># Total temperature (if we added them all up)
</span><span class="n">np</span><span class="p">.</span><span class="nf">sum</span><span class="p">(</span><span class="n">temps</span><span class="p">)</span>   <span class="c1"># 364.5
</span> 
<span class="c1"># Highest temperature
</span><span class="n">np</span><span class="p">.</span><span class="nf">max</span><span class="p">(</span><span class="n">temps</span><span class="p">)</span>   <span class="c1"># 74.2
</span> 
<span class="c1"># Position of highest (0-indexed)
</span><span class="n">np</span><span class="p">.</span><span class="nf">argmax</span><span class="p">(</span><span class="n">temps</span><span class="p">)</span>  <span class="c1"># 3
</span> 
<span class="c1"># Lowest temperature
</span><span class="n">np</span><span class="p">.</span><span class="nf">min</span><span class="p">(</span><span class="n">temps</span><span class="p">)</span>   <span class="c1"># 71.8
</span> 
<span class="c1"># Distance from mean (how far is each from average?)
</span><span class="n">deviations</span> <span class="o">=</span> <span class="n">temps</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">temps</span><span class="p">)</span>
<span class="c1"># [ -0.4  0.2  -1.1  1.3  0. ]
</span> 
<span class="c1"># Absolute values (ignore + or -)
</span><span class="n">abs_deviations</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">abs</span><span class="p">(</span><span class="n">deviations</span><span class="p">)</span>
<span class="c1"># [0.4, 0.2, 1.1, 1.3, 0.]
</span> 
<span class="c1"># Square each one (for variance calculation)
</span><span class="n">squared</span> <span class="o">=</span> <span class="n">temps</span> <span class="o">**</span> <span class="mi">2</span>
<span class="c1"># [5256.25, 5343.21, 5155.24, 5505.84, 5314.41]
</span> 
<span class="c1"># Square root
</span><span class="n">square_root</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">squared</span><span class="p">)</span>
 
<span class="c1"># Dot product (multiply matching elements and sum)
</span><span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">])</span>
<span class="n">np</span><span class="p">.</span><span class="nf">dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>  <span class="c1"># 1*4 + 2*5 + 3*6 = 32
</span></code></pre></div></div>

<h3 id="reshaping-changing-shape-without-losing-data">Reshaping: Changing Shape Without Losing Data</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">arange</span><span class="p">(</span><span class="mi">12</span><span class="p">)</span>  <span class="c1"># [0, 1, 2, ..., 11]
</span> 
<span class="c1"># Reshape to a 3x4 grid
</span><span class="n">grid</span> <span class="o">=</span> <span class="n">arr</span><span class="p">.</span><span class="nf">reshape</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="c1"># [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]
</span> 
<span class="c1"># Flatten back to 1D
</span><span class="n">flat</span> <span class="o">=</span> <span class="n">grid</span><span class="p">.</span><span class="nf">flatten</span><span class="p">()</span>
<span class="c1"># [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
</span> 
<span class="c1"># Transpose (swap rows and columns)
</span><span class="n">transposed</span> <span class="o">=</span> <span class="n">grid</span><span class="p">.</span><span class="n">T</span>
<span class="c1"># [[ 0  4  8]
#  [ 1  5  9]
#  [ 2  6 10]
#  [ 3  7 11]]
</span></code></pre></div></div>

<h3 id="stacking-combining-arrays">Stacking: Combining Arrays</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Stack horizontally (side by side)
</span><span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">]])</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([[</span><span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">]])</span>
 
<span class="n">np</span><span class="p">.</span><span class="nf">hstack</span><span class="p">([</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">])</span>
<span class="c1"># [[1 3]
#  [2 4]]
</span> 
<span class="c1"># Stack vertically (on top)
</span><span class="n">np</span><span class="p">.</span><span class="nf">vstack</span><span class="p">([</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">])</span>
<span class="c1"># [[1]
#  [2]
#  [3]
#  [4]]
</span> 
<span class="c1"># Concatenate along an axis
</span><span class="n">np</span><span class="p">.</span><span class="nf">concatenate</span><span class="p">([</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>  <span class="c1"># vertical
</span><span class="n">np</span><span class="p">.</span><span class="nf">concatenate</span><span class="p">([</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># horizontal
</span></code></pre></div></div>

<h3 id="useful-utility-functions">Useful Utility Functions</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">92</span><span class="p">])</span>
 
<span class="c1"># Clip: constrain values to a range
# (if someone scores above 95, cap it at 95)
</span><span class="n">clipped</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">clip</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">95</span><span class="p">)</span>
<span class="c1"># [85, 92, 78, 88, 92]
</span> 
<span class="c1"># Where: conditional operation
# (if score &lt; 80, add 10; otherwise keep it)
</span><span class="n">adjusted</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="n">scores</span> <span class="o">&lt;</span> <span class="mi">80</span><span class="p">,</span> <span class="n">scores</span> <span class="o">+</span> <span class="mi">10</span><span class="p">,</span> <span class="n">scores</span><span class="p">)</span>
<span class="c1"># [85, 92, 88, 88, 92]
</span> 
<span class="c1"># Unique: find all unique values
</span><span class="n">np</span><span class="p">.</span><span class="nf">unique</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
<span class="c1"># [78, 85, 88, 92]
</span> 
<span class="c1"># Linear algebra basics
</span><span class="n">A</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]])</span>
 
<span class="c1"># Determinant (single number describing the matrix)
</span><span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="nf">det</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>  <span class="c1"># -2.0
</span> 
<span class="c1"># Inverse (the "opposite" of a matrix)
</span><span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="nf">inv</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>
<span class="c1"># [[-2.   1. ]
#  [ 1.5 -0.5]]
</span> 
<span class="c1"># Norm (length/magnitude)
</span><span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="nf">norm</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>  <span class="c1"># 5.477
</span></code></pre></div></div>

<hr />

<h2 id="part-2-pandas---working-with-real-data">Part 2: Pandas - Working with Real Data</h2>

<h3 id="from-arrays-to-tables">From Arrays to Tables</h3>

<p>NumPy is great for math. But real data has <strong>column names</strong> and <strong>row labels</strong>. That’s where Pandas comes in.</p>

<p>A Pandas DataFrame is like an Excel spreadsheet:</p>
<ul>
  <li>Rows are observations (students)</li>
  <li>Columns are properties (name, math score, science score)</li>
  <li>Each cell has a value</li>
</ul>

<h3 id="series-a-single-column">Series: A Single Column</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">pandas</span> <span class="k">as</span> <span class="n">pd</span>
 
<span class="c1"># Create a Series (like a column)
</span><span class="n">scores</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">Series</span><span class="p">([</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">Math Scores</span><span class="sh">'</span><span class="p">)</span>
<span class="c1"># 0    85
# 1    92
# 2    78
# 3    88
# Name: Math Scores, dtype: int64
</span> 
<span class="c1"># Access by position
</span><span class="n">scores</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>  <span class="c1"># 85
</span> 
<span class="c1"># With custom index (row labels)
</span><span class="n">scores</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">Series</span><span class="p">(</span>
    <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">],</span>
    <span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">Alice</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Bob</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Charlie</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Diana</span><span class="sh">'</span><span class="p">],</span>
    <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">Math Scores</span><span class="sh">'</span>
<span class="p">)</span>
<span class="c1"># Alice        85
# Bob          92
# Charlie      78
# Diana        88
</span> 
<span class="c1"># Access by label
</span><span class="n">scores</span><span class="p">[</span><span class="sh">'</span><span class="s">Alice</span><span class="sh">'</span><span class="p">]</span>  <span class="c1"># 85
</span> 
<span class="c1"># All operations work like NumPy
</span><span class="n">scores</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span>     <span class="c1"># 85.75
</span><span class="n">scores</span><span class="p">.</span><span class="nf">max</span><span class="p">()</span>      <span class="c1"># 92
</span><span class="n">scores</span><span class="p">.</span><span class="nf">min</span><span class="p">()</span>      <span class="c1"># 78
</span><span class="n">scores</span><span class="p">.</span><span class="nf">std</span><span class="p">()</span>      <span class="c1"># ~6.23
</span> 
<span class="c1"># Filter (keep scores above 85)
</span><span class="n">scores</span><span class="p">[</span><span class="n">scores</span> <span class="o">&gt;</span> <span class="mi">85</span><span class="p">]</span>
<span class="c1"># Bob        92
# Diana      88
</span></code></pre></div></div>

<h3 id="dataframes-the-real-thing">DataFrames: The Real Thing</h3>

<p>A DataFrame is multiple Series stacked side by side:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">pandas</span> <span class="k">as</span> <span class="n">pd</span>
 
<span class="c1"># Create from a dictionary (most common)
</span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">Alice</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Bob</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Charlie</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Diana</span><span class="sh">'</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="mi">11</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">,</span> <span class="mi">88</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="mi">88</span><span class="p">,</span> <span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">90</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="mi">92</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">85</span><span class="p">,</span> <span class="mi">89</span><span class="p">]</span>
<span class="p">}</span>
 
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="c1">#      Name Grade Math Science English
# 0   Alice    10   85      88      92
# 1     Bob    10   92      85      88
# 2 Charlie    11   78      92      85
# 3   Diana    11   88      90      89
</span></code></pre></div></div>

<h3 id="exploring-your-data">Exploring Your Data</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># How big is it?
</span><span class="n">df</span><span class="p">.</span><span class="n">shape</span>  <span class="c1"># (4, 5) - 4 rows, 5 columns
</span> 
<span class="c1"># What are the column names?
</span><span class="n">df</span><span class="p">.</span><span class="n">columns</span>  <span class="c1"># ['Name', 'Grade', 'Math', 'Science', 'English']
</span> 
<span class="c1"># What are the data types?
</span><span class="n">df</span><span class="p">.</span><span class="n">dtypes</span>
<span class="c1"># Name       object (text)
# Grade       int64 (integer)
# Math        int64 (integer)
# Science     int64 (integer)
# English     int64 (integer)
</span> 
<span class="c1"># First few rows
</span><span class="n">df</span><span class="p">.</span><span class="nf">head</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="c1">#      Name Grade Math Science English
# 0   Alice    10   85      88      92
# 1     Bob    10   92      85      88
</span> 
<span class="c1"># Last few rows
</span><span class="n">df</span><span class="p">.</span><span class="nf">tail</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="c1">#      Name Grade Math Science English
# 3   Diana    11   88      90      89
</span> 
<span class="c1"># Summary statistics
</span><span class="n">df</span><span class="p">.</span><span class="nf">describe</span><span class="p">()</span>
<span class="c1">#        Grade  Math  Science  English
# count    4.0    4.0      4.0      4.0
# mean    10.5   85.75     89.0     89.0
# std      0.577  6.179     3.559    2.708
# min     10.0   78.0     85.0     85.0
# 25%     10.0   82.5     87.0     86.5
# 50%     10.5   86.5     89.0     89.0
# 75%     11.0   89.5     91.0     90.5
# max     11.0   92.0     92.0     92.0
</span> 
<span class="c1"># Detailed info
</span><span class="n">df</span><span class="p">.</span><span class="nf">info</span><span class="p">()</span>
<span class="c1"># &lt;class 'pandas.core.frame.DataFrame'&gt;
# RangeIndex: 4 entries, 0 to 3
# Data columns (but 4 non-null):
# Name       4 non-null object
# Grade      4 non-null int64
# Math       4 non-null int64
# Science    4 non-null int64
# English    4 non-null int64
</span></code></pre></div></div>

<h3 id="getting-data-columns-rows-cells">Getting Data: Columns, Rows, Cells</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Get a single column (returns a Series)
</span><span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">]</span>
<span class="c1"># 0      Alice
# 1        Bob
# 2    Charlie
# 3      Diana
</span> 
<span class="c1"># Get multiple columns (returns a DataFrame)
</span><span class="n">df</span><span class="p">[[</span><span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">]]</span>
<span class="c1">#      Name Math
# 0   Alice   85
# 1     Bob   92
# 2 Charlie   78
# 3   Diana   88
</span> 
<span class="c1"># Get a row by position (iloc = integer location)
</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="c1"># Name              Alice
# Grade                10
# Math                 85
# Science              88
# English              92
</span> 
<span class="c1"># Get a specific cell by position
</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>  <span class="c1"># Alice
</span> 
<span class="c1"># Get by label (loc = location by label)
</span><span class="n">df</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">]</span>  <span class="c1"># Alice
</span> 
<span class="c1"># Get all rows where condition is true
</span><span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">85</span><span class="p">]</span>
<span class="c1">#      Name Grade Math Science English
# 1     Bob    10   92      85      88
# 3   Diana    11   88      90      89
</span> 
<span class="c1"># Multiple conditions
</span><span class="n">df</span><span class="p">[(</span><span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">85</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">85</span><span class="p">)]</span>
<span class="c1">#      Name Grade Math Science English
# 1     Bob    10   92      85      88
# 3   Diana    11   88      90      89
</span></code></pre></div></div>

<h3 id="adding-columns">Adding Columns</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Calculate average score
</span><span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Average</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">]</span> <span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">]</span> <span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">])</span> <span class="o">/</span> <span class="mi">3</span>
 
<span class="n">df</span>
<span class="c1">#      Name Grade Math Science English  Average
# 0   Alice    10   85      88      92    88.33
# 1     Bob    10   92      85      88    88.33
# 2 Charlie    11   78      92      85    85.00
# 3   Diana    11   88      90      89    89.00
</span> 
<span class="c1"># Add a column with apply (custom function)
</span><span class="k">def</span> <span class="nf">grade_letter</span><span class="p">(</span><span class="n">score</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="k">if</span> <span class="n">score</span> <span class="o">&gt;=</span> <span class="mi">90</span><span class="p">:</span> <span class="k">return</span> <span class="sh">'</span><span class="s">A</span><span class="sh">'</span>
    <span class="k">elif</span> <span class="n">score</span> <span class="o">&gt;=</span> <span class="mi">80</span><span class="p">:</span> <span class="k">return</span> <span class="sh">'</span><span class="s">B</span><span class="sh">'</span>
    <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="sh">'</span><span class="s">C</span><span class="sh">'</span>
 
<span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">GradeLetter</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Average</span><span class="sh">'</span><span class="p">].</span><span class="nf">apply</span><span class="p">(</span><span class="n">grade_letter</span><span class="p">)</span>
 
<span class="n">df</span>
<span class="c1">#      Name Grade Math Science English  Average GradeLetter
# 0   Alice    10   85      88      92    88.33           B
# 1     Bob    10   92      85      88    88.33           B
# 2 Charlie    11   78      92      85    85.00           B
# 3   Diana    11   88      90      89    89.00           B
</span></code></pre></div></div>

<h3 id="removing-columns">Removing Columns</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Drop a column
</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="nf">drop</span><span class="p">(</span><span class="sh">'</span><span class="s">GradeLetter</span><span class="sh">'</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
 
<span class="c1"># Keep only certain columns
</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">]]</span>
</code></pre></div></div>

<h3 id="groupby-organize-then-calculate">GroupBy: Organize, Then Calculate</h3>

<p>GroupBy is like taking a stack of papers, sorting them by grade level, then analyzing each pile:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># What's the average math score per grade level?
</span><span class="n">df</span><span class="p">.</span><span class="nf">groupby</span><span class="p">(</span><span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">)[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">].</span><span class="nf">mean</span><span class="p">()</span>
<span class="c1"># Grade
# 10    88.5
# 11    83.0
</span> 
<span class="c1"># Multiple subjects
</span><span class="n">df</span><span class="p">.</span><span class="nf">groupby</span><span class="p">(</span><span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">)[[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">]].</span><span class="nf">mean</span><span class="p">()</span>
<span class="c1">#       Math  Science  English
# Grade
# 10    88.5     86.5     90.0
# 11    83.0     91.0     87.0
</span> 
<span class="c1"># More aggregations
</span><span class="n">df</span><span class="p">.</span><span class="nf">groupby</span><span class="p">(</span><span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">).</span><span class="nf">agg</span><span class="p">({</span>
    <span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">count</span><span class="sh">'</span>  <span class="c1"># how many students per grade
</span><span class="p">})</span>
<span class="c1">#        Math            Science         Name
#       mean max min       mean max min count
# Grade
# 10     88.5  92  85       86.5  88  85     2
# 11     83.0  88  78       91.0  92  90     2
</span> 
<span class="c1"># Count occurrences
</span><span class="n">df</span><span class="p">.</span><span class="nf">groupby</span><span class="p">(</span><span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">).</span><span class="nf">size</span><span class="p">()</span>
<span class="c1"># Grade
# 10    2
# 11    2
</span></code></pre></div></div>

<h3 id="merge-combining-two-dataframes">Merge: Combining Two DataFrames</h3>

<p>Imagine you have student scores in one file and student names in another. Merge combines them:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># File 1: Scores
</span><span class="n">scores_df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">({</span>
    <span class="sh">'</span><span class="s">StudentID</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">78</span><span class="p">]</span>
<span class="p">})</span>
<span class="c1">#    StudentID Math
# 0          1   85
# 1          2   92
# 2          3   78
</span> 
<span class="c1"># File 2: Names
</span><span class="n">names_df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">({</span>
    <span class="sh">'</span><span class="s">StudentID</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">Alice</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Bob</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Charlie</span><span class="sh">'</span><span class="p">]</span>
<span class="p">})</span>
<span class="c1">#    StudentID     Name
# 0          1    Alice
# 1          2      Bob
# 2          3  Charlie
</span> 
<span class="c1"># INNER join (only keep rows in BOTH files)
</span><span class="n">merged</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nf">merge</span><span class="p">(</span><span class="n">scores_df</span><span class="p">,</span> <span class="n">names_df</span><span class="p">,</span> <span class="n">on</span><span class="o">=</span><span class="sh">'</span><span class="s">StudentID</span><span class="sh">'</span><span class="p">)</span>
<span class="c1">#    StudentID Math     Name
# 0          1   85    Alice
# 1          2   92      Bob
# 2          3   78  Charlie
</span> 
<span class="c1"># LEFT join (keep all from left, add matching from right)
</span><span class="n">pd</span><span class="p">.</span><span class="nf">merge</span><span class="p">(</span><span class="n">scores_df</span><span class="p">,</span> <span class="n">names_df</span><span class="p">,</span> <span class="n">on</span><span class="o">=</span><span class="sh">'</span><span class="s">StudentID</span><span class="sh">'</span><span class="p">,</span> <span class="n">how</span><span class="o">=</span><span class="sh">'</span><span class="s">left</span><span class="sh">'</span><span class="p">)</span>
<span class="c1"># (same result if all IDs match)
</span> 
<span class="c1"># RIGHT join (keep all from right, add matching from left)
</span><span class="n">pd</span><span class="p">.</span><span class="nf">merge</span><span class="p">(</span><span class="n">scores_df</span><span class="p">,</span> <span class="n">names_df</span><span class="p">,</span> <span class="n">on</span><span class="o">=</span><span class="sh">'</span><span class="s">StudentID</span><span class="sh">'</span><span class="p">,</span> <span class="n">how</span><span class="o">=</span><span class="sh">'</span><span class="s">right</span><span class="sh">'</span><span class="p">)</span>
 
<span class="c1"># OUTER join (keep all rows from both)
</span><span class="n">pd</span><span class="p">.</span><span class="nf">merge</span><span class="p">(</span><span class="n">scores_df</span><span class="p">,</span> <span class="n">names_df</span><span class="p">,</span> <span class="n">on</span><span class="o">=</span><span class="sh">'</span><span class="s">StudentID</span><span class="sh">'</span><span class="p">,</span> <span class="n">how</span><span class="o">=</span><span class="sh">'</span><span class="s">outer</span><span class="sh">'</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="apply-run-functions-on-your-data">Apply: Run Functions on Your Data</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">bonus_points</span><span class="p">(</span><span class="n">score</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Add 5% bonus to score.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">score</span> <span class="o">*</span> <span class="mf">1.05</span>
 
<span class="c1"># Apply to a column
</span><span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Math_Bonus</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">].</span><span class="nf">apply</span><span class="p">(</span><span class="n">bonus_points</span><span class="p">)</span>
 
<span class="c1"># With lambda (inline function)
</span><span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Math2x</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">].</span><span class="nf">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span>
 
<span class="c1"># Apply to entire row
</span><span class="k">def</span> <span class="nf">calc_total</span><span class="p">(</span><span class="n">row</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Add up all three subjects.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">row</span><span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">]</span> <span class="o">+</span> <span class="n">row</span><span class="p">[</span><span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">]</span> <span class="o">+</span> <span class="n">row</span><span class="p">[</span><span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">]</span>
 
<span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Total</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="nf">apply</span><span class="p">(</span><span class="n">calc_total</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
 
<span class="n">df</span>
<span class="c1">#      Name Grade Math Science English  Math_Bonus  Math2x  Total
# 0   Alice    10   85      88      92       89.25     170    265
# 1     Bob    10   92      85      88       96.60     184    265
# 2 Charlie    11   78      92      85       81.90     156    255
# 3   Diana    11   88      90      89       92.40     176    267
</span></code></pre></div></div>

<h3 id="string-operations">String Operations</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">names</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">Series</span><span class="p">([</span><span class="sh">'</span><span class="s">Alice</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Bob</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Charlie</span><span class="sh">'</span><span class="p">])</span>
 
<span class="c1"># Convert to lowercase
</span><span class="n">names</span><span class="p">.</span><span class="nb">str</span><span class="p">.</span><span class="nf">lower</span><span class="p">()</span>
<span class="c1"># 0      alice
# 1        bob
# 2    charlie
</span> 
<span class="c1"># Convert to uppercase
</span><span class="n">names</span><span class="p">.</span><span class="nb">str</span><span class="p">.</span><span class="nf">upper</span><span class="p">()</span>
<span class="c1"># 0      ALICE
# 1        BOB
# 2    CHARLIE
</span> 
<span class="c1"># Length of each string
</span><span class="n">names</span><span class="p">.</span><span class="nb">str</span><span class="p">.</span><span class="nf">len</span><span class="p">()</span>
<span class="c1"># 0    5
# 1    3
# 2    7
</span> 
<span class="c1"># Check if contains substring
</span><span class="n">names</span><span class="p">.</span><span class="nb">str</span><span class="p">.</span><span class="nf">contains</span><span class="p">(</span><span class="sh">'</span><span class="s">li</span><span class="sh">'</span><span class="p">)</span>
<span class="c1"># 0     True
# 1    False
# 2     True
</span> 
<span class="c1"># Replace characters
</span><span class="n">names</span><span class="p">.</span><span class="nb">str</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="sh">'</span><span class="s">a</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">A</span><span class="sh">'</span><span class="p">)</span>
<span class="c1"># 0     Alice
# 1       Bob
# 2    ChArlie
</span> 
<span class="c1"># Split by character
</span><span class="n">names</span><span class="p">.</span><span class="nb">str</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sh">'</span><span class="s">i</span><span class="sh">'</span><span class="p">)</span>
<span class="c1"># 0    [Al, ce]
# 1       [Bob]
# 2    [Charl, e]
</span></code></pre></div></div>

<h3 id="datetime-operations">Datetime Operations</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dates</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nf">to_datetime</span><span class="p">([</span><span class="sh">'</span><span class="s">2024-01-15</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">2024-06-20</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">2024-12-25</span><span class="sh">'</span><span class="p">])</span>
<span class="c1"># DatetimeIndex(['2024-01-15', '2024-06-20', '2024-12-25'],
#                dtype='datetime64[ns]', freq=None)
</span> 
<span class="c1"># Extract parts
</span><span class="n">dates</span><span class="p">.</span><span class="n">dt</span><span class="p">.</span><span class="n">year</span>
<span class="c1"># 2024, 2024, 2024
</span> 
<span class="n">dates</span><span class="p">.</span><span class="n">dt</span><span class="p">.</span><span class="n">month</span>
<span class="c1"># 1, 6, 12
</span> 
<span class="n">dates</span><span class="p">.</span><span class="n">dt</span><span class="p">.</span><span class="n">day</span>
<span class="c1"># 15, 20, 25
</span> 
<span class="n">dates</span><span class="p">.</span><span class="n">dt</span><span class="p">.</span><span class="nf">day_name</span><span class="p">()</span>
<span class="c1"># 'Monday', 'Thursday', 'Monday'
</span> 
<span class="c1"># Calculate differences
</span><span class="p">(</span><span class="n">dates</span><span class="p">.</span><span class="nf">max</span><span class="p">()</span> <span class="o">-</span> <span class="n">dates</span><span class="p">.</span><span class="nf">min</span><span class="p">()).</span><span class="n">days</span>
<span class="c1"># 345 (days between first and last)
</span></code></pre></div></div>

<h3 id="handling-missing-data">Handling Missing Data</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Create data with missing values
</span><span class="n">data_with_nan</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">Alice</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Bob</span><span class="sh">'</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="sh">'</span><span class="s">Diana</span><span class="sh">'</span><span class="p">],</span>
    <span class="sh">'</span><span class="s">Score</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="mi">85</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">88</span><span class="p">]</span>
<span class="p">}</span>
<span class="n">df_nan</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nc">DataFrame</span><span class="p">(</span><span class="n">data_with_nan</span><span class="p">)</span>
<span class="c1">#      Name Score
# 0   Alice   85.0
# 1     Bob    NaN
# 2    None   92.0
# 3   Diana   88.0
</span> 
<span class="c1"># Check where data is missing
</span><span class="n">df_nan</span><span class="p">.</span><span class="nf">isna</span><span class="p">()</span>
<span class="c1">#      Name Score
# 0  False False
# 1  False  True
# 2   True False
# 3  False False
</span> 
<span class="c1"># Drop rows with any missing values
</span><span class="n">df_clean</span> <span class="o">=</span> <span class="n">df_nan</span><span class="p">.</span><span class="nf">dropna</span><span class="p">()</span>
<span class="c1">#      Name Score
# 0   Alice   85.0
# 3   Diana   88.0
</span> 
<span class="c1"># Drop rows where Score is missing
</span><span class="n">df_partial</span> <span class="o">=</span> <span class="n">df_nan</span><span class="p">.</span><span class="nf">dropna</span><span class="p">(</span><span class="n">subset</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">Score</span><span class="sh">'</span><span class="p">])</span>
<span class="c1">#      Name Score
# 0   Alice   85.0
# 2    None   92.0
# 3   Diana   88.0
</span> 
<span class="c1"># Fill missing with a value
</span><span class="n">df_filled</span> <span class="o">=</span> <span class="n">df_nan</span><span class="p">.</span><span class="nf">fillna</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="c1">#      Name Score
# 0   Alice   85.0
# 1     Bob    0.0
# 2       0   92.0
# 3   Diana   88.0
</span> 
<span class="c1"># Fill with average
</span><span class="n">df_filled</span> <span class="o">=</span> <span class="n">df_nan</span><span class="p">.</span><span class="nf">copy</span><span class="p">()</span>
<span class="n">df_filled</span><span class="p">[</span><span class="sh">'</span><span class="s">Score</span><span class="sh">'</span><span class="p">].</span><span class="nf">fillna</span><span class="p">(</span><span class="n">df_nan</span><span class="p">[</span><span class="sh">'</span><span class="s">Score</span><span class="sh">'</span><span class="p">].</span><span class="nf">mean</span><span class="p">(),</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1">#      Name Score
# 0   Alice   85.0
# 1     Bob   88.333333
# 2    None   92.0
# 3   Diana   88.0
</span></code></pre></div></div>

<h3 id="save--load-csv">Save &amp; Load CSV</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Save to CSV
</span><span class="n">df</span><span class="p">.</span><span class="nf">to_csv</span><span class="p">(</span><span class="sh">'</span><span class="s">students.csv</span><span class="sh">'</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 
<span class="c1"># Load from CSV
</span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nf">read_csv</span><span class="p">(</span><span class="sh">'</span><span class="s">students.csv</span><span class="sh">'</span><span class="p">)</span>
 
<span class="c1"># With options
</span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="nf">read_csv</span><span class="p">(</span><span class="sh">'</span><span class="s">students.csv</span><span class="sh">'</span><span class="p">,</span> 
                  <span class="n">delimiter</span><span class="o">=</span><span class="sh">'</span><span class="s">,</span><span class="sh">'</span><span class="p">,</span>      <span class="c1"># column separator
</span>                  <span class="n">index_col</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>        <span class="c1"># use first column as row index
</span>                  <span class="n">dtype</span><span class="o">=</span><span class="p">{</span><span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">:</span> <span class="nb">int</span><span class="p">})</span> <span class="c1"># specify data types
</span></code></pre></div></div>

<hr />

<h2 id="the-project-student-score-analysis-pipeline">The Project: Student Score Analysis Pipeline</h2>

<p>Now let’s build a complete, professional analysis pipeline with type hints on every function.</p>

<h3 id="step-1-create-sample-data">Step 1: Create Sample Data</h3>

<p>Create a file named <code class="language-plaintext highlighter-rouge">data.csv</code>:</p>

<pre><code class="language-csv">Name,Grade,Math,Science,English
Alice,10,85,88,92
Bob,10,92,85,88
Charlie,11,78,92,85
Diana,11,88,90,89
Eve,12,92,95,91
Frank,12,88,87,86
Grace,10,90,89,91
Henry,11,85,88,90
</code></pre>

<h3 id="step-2-build-the-analysis-pipeline">Step 2: Build the Analysis Pipeline</h3>

<p>Create a file named <code class="language-plaintext highlighter-rouge">analysis.py</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="n">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Dict</span><span class="p">,</span> <span class="n">Any</span>
 
<span class="k">def</span> <span class="nf">load_data</span><span class="p">(</span><span class="n">filepath</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Load student data from CSV file.
    
    Args:
        filepath: Path to CSV file
        
    Returns:
        DataFrame with student data
    </span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">pd</span><span class="p">.</span><span class="nf">read_csv</span><span class="p">(</span><span class="n">filepath</span><span class="p">)</span>
 
 
<span class="k">def</span> <span class="nf">calculate_overall_score</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Add overall average score for each student.
    
    Uses NumPy vectorization for efficiency.
    
    Args:
        df: DataFrame with Math, Science, English columns
        
    Returns:
        DataFrame with new </span><span class="sh">'</span><span class="s">Overall</span><span class="sh">'</span><span class="s"> column
    </span><span class="sh">"""</span>
    <span class="c1"># Get the score columns as a NumPy array
</span>    <span class="n">scores</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">]].</span><span class="n">values</span>
    
    <span class="c1"># Calculate mean across each row (axis=1)
</span>    <span class="n">overall</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
    
    <span class="c1"># Add as new column
</span>    <span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Overall</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">overall</span>
    
    <span class="k">return</span> <span class="n">df</span>
 
 
<span class="k">def</span> <span class="nf">assign_grades</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Assign letter grades based on overall score.
    
    Args:
        df: DataFrame with </span><span class="sh">'</span><span class="s">Overall</span><span class="sh">'</span><span class="s"> column
        
    Returns:
        DataFrame with new </span><span class="sh">'</span><span class="s">GradeLetter</span><span class="sh">'</span><span class="s"> column
    </span><span class="sh">"""</span>
    <span class="k">def</span> <span class="nf">grade_letter</span><span class="p">(</span><span class="n">score</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
        <span class="sh">"""</span><span class="s">Convert numeric score to letter grade.</span><span class="sh">"""</span>
        <span class="k">if</span> <span class="n">score</span> <span class="o">&gt;=</span> <span class="mi">90</span><span class="p">:</span>
            <span class="k">return</span> <span class="sh">'</span><span class="s">A</span><span class="sh">'</span>
        <span class="k">elif</span> <span class="n">score</span> <span class="o">&gt;=</span> <span class="mi">80</span><span class="p">:</span>
            <span class="k">return</span> <span class="sh">'</span><span class="s">B</span><span class="sh">'</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="k">return</span> <span class="sh">'</span><span class="s">C</span><span class="sh">'</span>
    
    <span class="c1"># Apply the function to the Overall column
</span>    <span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">GradeLetter</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="sh">'</span><span class="s">Overall</span><span class="sh">'</span><span class="p">].</span><span class="nf">apply</span><span class="p">(</span><span class="n">grade_letter</span><span class="p">)</span>
    
    <span class="k">return</span> <span class="n">df</span>
 
 
<span class="k">def</span> <span class="nf">grade_level_analysis</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Analyze performance by grade level.
    
    Args:
        df: DataFrame with Grade column and scores
        
    Returns:
        DataFrame with statistics per grade level
    </span><span class="sh">"""</span>
    <span class="n">stats</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="nf">groupby</span><span class="p">(</span><span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">).</span><span class="nf">agg</span><span class="p">({</span>
        <span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">],</span>
        <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">],</span>
        <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">:</span> <span class="p">[</span><span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">],</span>
        <span class="sh">'</span><span class="s">Overall</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">,</span>
        <span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">count</span><span class="sh">'</span>  <span class="c1"># number of students
</span>    <span class="p">}).</span><span class="nf">round</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
    
    <span class="k">return</span> <span class="n">stats</span>
 
 
<span class="k">def</span> <span class="nf">top_students</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">,</span> <span class="n">n</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">3</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Get top N students by overall score.
    
    Args:
        df: DataFrame with student data
        n: Number of top students to return
        
    Returns:
        DataFrame with top N students
    </span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">df</span><span class="p">.</span><span class="nf">nlargest</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="sh">'</span><span class="s">Overall</span><span class="sh">'</span><span class="p">)[[</span><span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Overall</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">GradeLetter</span><span class="sh">'</span><span class="p">]]</span>
 
 
<span class="k">def</span> <span class="nf">subject_stats</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">float</span><span class="p">]]:</span>
    <span class="sh">"""</span><span class="s">Calculate statistics for each subject.
    
    Uses NumPy for efficient calculation.
    
    Args:
        df: DataFrame with Math, Science, English columns
        
    Returns:
        Dictionary with stats for each subject
    </span><span class="sh">"""</span>
    <span class="n">subjects</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">]</span>
    <span class="n">stats</span> <span class="o">=</span> <span class="p">{}</span>
    
    <span class="k">for</span> <span class="n">subject</span> <span class="ow">in</span> <span class="n">subjects</span><span class="p">:</span>
        <span class="c1"># Convert to NumPy array for calculation
</span>        <span class="n">scores</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">subject</span><span class="p">].</span><span class="n">values</span>
        
        <span class="n">stats</span><span class="p">[</span><span class="n">subject</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
            <span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">mean</span><span class="p">(</span><span class="n">scores</span><span class="p">)),</span>
            <span class="sh">'</span><span class="s">std</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">std</span><span class="p">(</span><span class="n">scores</span><span class="p">)),</span>
            <span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">min</span><span class="p">(</span><span class="n">scores</span><span class="p">)),</span>
            <span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">:</span> <span class="nf">float</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">max</span><span class="p">(</span><span class="n">scores</span><span class="p">))</span>
        <span class="p">}</span>
    
    <span class="k">return</span> <span class="n">stats</span>
 
 
<span class="k">def</span> <span class="nf">correlations</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Calculate correlation matrix between subjects.
    
    Uses NumPy correlation.
    
    Args:
        df: DataFrame with Math, Science, English columns
        
    Returns:
        Correlation matrix (3x3 array)
    </span><span class="sh">"""</span>
    <span class="n">subject_data</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">]].</span><span class="n">values</span>
    <span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="nf">corrcoef</span><span class="p">(</span><span class="n">subject_data</span><span class="p">.</span><span class="n">T</span><span class="p">)</span>
 
 
<span class="k">def</span> <span class="nf">print_correlation_heatmap</span><span class="p">(</span><span class="n">corr_matrix</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Print correlation matrix as ASCII table.
    
    Args:
        corr_matrix: 3x3 correlation matrix
    </span><span class="sh">"""</span>
    <span class="n">subjects</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">]</span>
    
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">CORRELATION MATRIX:</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">        Math  Science English</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">subj</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">subjects</span><span class="p">):</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">subj</span><span class="si">:</span><span class="mi">7</span><span class="n">s</span><span class="si">}</span><span class="s"> </span><span class="sh">"</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="sh">""</span><span class="p">)</span>
        <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">subjects</span><span class="p">)):</span>
            <span class="n">val</span> <span class="o">=</span> <span class="n">corr_matrix</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span>
            <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">val</span><span class="si">:</span><span class="mf">6.2</span><span class="n">f</span><span class="si">}</span><span class="s"> </span><span class="sh">"</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="sh">""</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">()</span>
 
 
<span class="k">def</span> <span class="nf">top_n_per_subject</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">,</span> <span class="n">n</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Return top N students for each subject.
    
    Args:
        df: DataFrame with student data
        n: Number of top students per subject
        
    Returns:
        Dictionary mapping subject to top N students
    </span><span class="sh">"""</span>
    <span class="n">subjects</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">]</span>
    <span class="n">results</span> <span class="o">=</span> <span class="p">{}</span>
    
    <span class="k">for</span> <span class="n">subject</span> <span class="ow">in</span> <span class="n">subjects</span><span class="p">:</span>
        <span class="n">top</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="nf">nlargest</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">subject</span><span class="p">)[[</span><span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">,</span> <span class="n">subject</span><span class="p">]]</span>
        <span class="n">results</span><span class="p">[</span><span class="n">subject</span><span class="p">]</span> <span class="o">=</span> <span class="n">top</span>
    
    <span class="k">return</span> <span class="n">results</span>
 
 
<span class="k">def</span> <span class="nf">pivot_by_grade</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create pivot table of average scores by grade and subject.
    
    Args:
        df: DataFrame with Grade and subject columns
        
    Returns:
        Pivot table with grades as rows, subjects as columns
    </span><span class="sh">"""</span>
    <span class="n">pivot</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="nf">pivot_table</span><span class="p">(</span>
        <span class="n">values</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">],</span>
        <span class="n">index</span><span class="o">=</span><span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">,</span>
        <span class="n">aggfunc</span><span class="o">=</span><span class="sh">'</span><span class="s">mean</span><span class="sh">'</span>
    <span class="p">).</span><span class="nf">round</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
    
    <span class="k">return</span> <span class="n">pivot</span>
 
 
<span class="k">def</span> <span class="nf">main</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Run the complete analysis pipeline.</span><span class="sh">"""</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">STUDENT SCORE ANALYSIS PIPELINE</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
    
    <span class="c1"># Step 1: Load and process data
</span>    <span class="n">df</span> <span class="o">=</span> <span class="nf">load_data</span><span class="p">(</span><span class="sh">'</span><span class="s">data.csv</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">df</span> <span class="o">=</span> <span class="nf">calculate_overall_score</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
    <span class="n">df</span> <span class="o">=</span> <span class="nf">assign_grades</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
    
    <span class="c1"># Step 2: Display all students with grades
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">1. ALL STUDENTS WITH GRADES:</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">df</span><span class="p">[[</span><span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Grade</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Math</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Science</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">English</span><span class="sh">'</span><span class="p">,</span> 
              <span class="sh">'</span><span class="s">Overall</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">GradeLetter</span><span class="sh">'</span><span class="p">]].</span><span class="nf">to_string</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">))</span>
    
    <span class="c1"># Step 3: Grade level analysis
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">2. STATISTICS BY GRADE LEVEL:</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="nf">grade_level_analysis</span><span class="p">(</span><span class="n">df</span><span class="p">))</span>
    
    <span class="c1"># Step 4: Top 3 students
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">3. TOP 3 STUDENTS:</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="nf">top_students</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">3</span><span class="p">).</span><span class="nf">to_string</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">))</span>
    
    <span class="c1"># Step 5: Subject statistics
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">4. SUBJECT STATISTICS:</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">stats</span> <span class="o">=</span> <span class="nf">subject_stats</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">subject</span><span class="p">,</span> <span class="n">vals</span> <span class="ow">in</span> <span class="n">stats</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="se">\n</span><span class="si">{</span><span class="n">subject</span><span class="si">}</span><span class="s">:</span><span class="sh">"</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  Mean: </span><span class="si">{</span><span class="n">vals</span><span class="p">[</span><span class="sh">'</span><span class="s">mean</span><span class="sh">'</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  Std:  </span><span class="si">{</span><span class="n">vals</span><span class="p">[</span><span class="sh">'</span><span class="s">std</span><span class="sh">'</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  Min:  </span><span class="si">{</span><span class="n">vals</span><span class="p">[</span><span class="sh">'</span><span class="s">min</span><span class="sh">'</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  Max:  </span><span class="si">{</span><span class="n">vals</span><span class="p">[</span><span class="sh">'</span><span class="s">max</span><span class="sh">'</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="c1"># Step 6: Correlations
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">5. SUBJECT CORRELATIONS:</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">corr_matrix</span> <span class="o">=</span> <span class="nf">correlations</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
    <span class="nf">print_correlation_heatmap</span><span class="p">(</span><span class="n">corr_matrix</span><span class="p">)</span>
    
    <span class="c1"># Step 7: Top students per subject
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">6. TOP 2 STUDENTS PER SUBJECT:</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">top_per_subj</span> <span class="o">=</span> <span class="nf">top_n_per_subject</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">subject</span><span class="p">,</span> <span class="n">top_df</span> <span class="ow">in</span> <span class="n">top_per_subj</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="se">\n</span><span class="si">{</span><span class="n">subject</span><span class="si">}</span><span class="s">:</span><span class="sh">"</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="n">top_df</span><span class="p">.</span><span class="nf">to_string</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">))</span>
    
    <span class="c1"># Step 8: Pivot table
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">7. AVERAGE SCORES BY GRADE LEVEL:</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="nf">pivot_by_grade</span><span class="p">(</span><span class="n">df</span><span class="p">))</span>
    
    <span class="c1"># Step 9: Save results
</span>    <span class="n">df</span><span class="p">.</span><span class="nf">to_csv</span><span class="p">(</span><span class="sh">'</span><span class="s">results.csv</span><span class="sh">'</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">✓ Results saved to results.csv</span><span class="sh">"</span><span class="p">)</span>
 
 
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">'</span><span class="s">__main__</span><span class="sh">'</span><span class="p">:</span>
    <span class="nf">main</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="step-3-run-the-pipeline">Step 3: Run the Pipeline</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python analysis.py
</code></pre></div></div>

<h3 id="expected-output">Expected Output</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>======================================================================
STUDENT SCORE ANALYSIS PIPELINE
======================================================================
 
1. ALL STUDENTS WITH GRADES:
     Name Grade  Math Science English  Overall GradeLetter
    Alice    10    85       88      92    88.33           B
      Bob    10    92       85      88    88.33           B
  Charlie    11    78       92      85    85.00           B
    Diana    11    88       90      89    89.00           B
      Eve    12    92       95      91    92.67           A
    Frank    12    88       87      86    87.00           B
    Grace    10    90       89      91    90.00           A
    Henry    11    85       88      90    87.67           B
 
2. STATISTICS BY GRADE LEVEL:
       Math            Science         English       Overall Name
      mean max min      mean max min     mean max min   mean count
Grade
10     87.25  92  85    86.75  89  88   91.25  92  91  88.41    4
11     83.00  88  78    91.00  92  90   87.00  89  85  87.00    2
12     90.00  92  88    91.00  95  87   88.50  91  86  89.83    2
 
3. TOP 3 STUDENTS:
     Name Grade  Overall GradeLetter
      Eve    12    92.67           A
    Grace    10    90.00           A
    Diana    11    89.00           B
 
4. SUBJECT STATISTICS:
 
Math:
  Mean: 87.13
  Std:  5.32
  Min:  78.0
  Max:  92.0
 
Science:
  Mean: 89.25
  Std:  4.27
  Min:  85.0
  Max:  95.0
 
English:
  Mean: 89.00
  Std:  2.93
  Min:  85.0
  Max:  92.0
 
5. SUBJECT CORRELATIONS:
        Math  Science English
Math     1.00    0.35    0.52
Science  0.35    1.00   -0.13
English  0.52   -0.13    1.00
 
6. TOP 2 STUDENTS PER SUBJECT:
 
Math:
    Name Score
     Eve    92
     Bob    92
 
Science:
     Name Score
      Eve    95
  Charlie    92
 
English:
    Name Score
   Alice    92
    Eve    91
 
7. AVERAGE SCORES BY GRADE LEVEL:
         Math  Science  English
Grade
10     87.25    86.75    91.25
11     83.00    91.00    87.00
12     90.00    91.00    88.50
 
✓ Results saved to results.csv
</code></pre></div></div>
<hr />

<h2 id="whats-next-day-7">What’s Next: Day 7</h2>

<p><strong>Data Visualization: Matplotlib + Seaborn + Logging Module</strong></p>

<p>On Day 7, you’ll learn:</p>
<ol>
  <li><strong>Matplotlib</strong> - create line plots, bar charts, scatter plots</li>
  <li><strong>Seaborn</strong> - make beautiful statistical visualizations</li>
  <li><strong>Logging module</strong> - replace <code class="language-plaintext highlighter-rouge">print()</code> with professional logging</li>
</ol>]]></content><author><name>Edward Praveen</name></author><category term="dl-llm-systems" /><category term="python" /><category term="numpy" /><category term="pandas" /><category term="data-science" /><category term="data-engineering" /><category term="ml-fundamentals" /><summary type="html"><![CDATA[Part of my 180-day AI Engineering journey - explained for beginners]]></summary></entry><entry><title type="html">Day 5 of 180 - Comprehensions, Generators, Decorators &amp;amp; Modern Features</title><link href="https://edwardpraveen.com/dl-llm-systems/generators-typing-decorators-day5/" rel="alternate" type="text/html" title="Day 5 of 180 - Comprehensions, Generators, Decorators &amp;amp; Modern Features" /><published>2026-03-24T00:00:00+05:30</published><updated>2026-03-24T00:00:00+05:30</updated><id>https://edwardpraveen.com/dl-llm-systems/generators-typing-decorators-day5</id><content type="html" xml:base="https://edwardpraveen.com/dl-llm-systems/generators-typing-decorators-day5/"><![CDATA[<blockquote>
  <h2 id="part-of-my-180-day-ai-engineering-journey---learning-in-public-one-hour-a-day-writing-everything-in-plain-english-so-beginners-can-follow-along-the-blog-is-written-with-the-help-of-ai"><em>Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI</em></h2>
</blockquote>

<h2 id="why-these-topics-matter-for-ai-engineering">Why These Topics Matter for AI Engineering</h2>

<p>When working with large datasets, every byte of memory and every millisecond of computation matters. Today’s topics are the tools that separate production-grade Python from tutorial code.</p>

<p><strong>Generators</strong> let you process gigabytes of data without loading it all into RAM. <strong>List comprehensions</strong> make data transformation readable and fast. <strong>Decorators</strong> let you add behavior (like caching or timing) without rewriting code. <strong>Type hints</strong> catch bugs before your code runs-critical when working in teams. <strong>Dataclasses</strong> eliminate boilerplate for data structures. <strong>pathlib</strong> handles file paths correctly on Windows, Mac, and Linux without painful string concatenation.</p>

<p>Together, these are the everyday tools of AI engineers processing datasets, building ML pipelines, and shipping production code.</p>

<hr />

<h2 id="setup">Setup</h2>

<p>You need only Python’s standard library for this lesson. Open a terminal and create a virtual environment:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python3 <span class="nt">-m</span> venv day5_env
<span class="nb">source </span>day5_env/bin/activate  <span class="c"># On Windows: day5_env\Scripts\activate</span>
python <span class="nt">--version</span>  <span class="c"># Should be 3.9 or higher (3.10+ recommended for newer type hint syntax)</span>
</code></pre></div></div>

<p>Create a working directory:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> <span class="nt">-p</span> day5_project/<span class="o">{</span>data_samples,analysis_output<span class="o">}</span>
<span class="nb">cd </span>day5_project
</code></pre></div></div>

<hr />

<h2 id="part-1-list-comprehensions---readable-data-transformation">Part 1: List Comprehensions - Readable Data Transformation</h2>

<h3 id="the-analogy">The Analogy</h3>
<p>Imagine a factory assembly line where items pass through a filter gate. Some are rejected, the rest are transformed. List comprehensions do exactly that in one readable line.</p>

<h3 id="basic-syntax">Basic Syntax</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="n">expression</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">iterable</span> <span class="k">if</span> <span class="n">condition</span><span class="p">]</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">if</code> is optional. Read it left to right: “Make a list of (expression) for each (item) in (iterable) if (condition).”</p>

<h3 id="example-1-simple-transformation">Example 1: Simple Transformation</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Without comprehension (verbose)
</span><span class="n">numbers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
<span class="n">squared</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">numbers</span><span class="p">:</span>
    <span class="n">squared</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">squared</span><span class="p">)</span>  <span class="c1"># [1, 4, 9, 16, 25]
</span> 
<span class="c1"># With comprehension (readable)
</span><span class="n">squared</span> <span class="o">=</span> <span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">numbers</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">squared</span><span class="p">)</span>  <span class="c1"># [1, 4, 9, 16, 25]
</span></code></pre></div></div>

<h3 id="example-2-filtering">Example 2: Filtering</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">numbers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">]</span>
 
<span class="c1"># Filter: keep only even numbers
</span><span class="n">evens</span> <span class="o">=</span> <span class="p">[</span><span class="n">n</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">numbers</span> <span class="k">if</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">evens</span><span class="p">)</span>  <span class="c1"># [2, 4, 6, 8, 10]
</span> 
<span class="c1"># Filter and transform: keep even numbers, then square them
</span><span class="n">evens_squared</span> <span class="o">=</span> <span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">numbers</span> <span class="k">if</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">evens_squared</span><span class="p">)</span>  <span class="c1"># [4, 16, 36, 64, 100]
</span></code></pre></div></div>

<h3 id="example-3-dictionary-comprehensions">Example 3: Dictionary Comprehensions</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Create a dict mapping word to word length
</span><span class="n">words</span> <span class="o">=</span> <span class="p">[</span><span class="sh">"</span><span class="s">python</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">engineering</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">data</span><span class="sh">"</span><span class="p">]</span>
<span class="n">word_lengths</span> <span class="o">=</span> <span class="p">{</span><span class="n">word</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">word</span><span class="p">)</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span><span class="p">}</span>
<span class="nf">print</span><span class="p">(</span><span class="n">word_lengths</span><span class="p">)</span>  <span class="c1"># {'python': 6, 'engineering': 11, 'data': 4}
</span> 
<span class="c1"># Swap keys and values
</span><span class="n">original</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">a</span><span class="sh">"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="sh">"</span><span class="s">b</span><span class="sh">"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="sh">"</span><span class="s">c</span><span class="sh">"</span><span class="p">:</span> <span class="mi">3</span><span class="p">}</span>
<span class="n">swapped</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="p">:</span> <span class="n">k</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">original</span><span class="p">.</span><span class="nf">items</span><span class="p">()}</span>
<span class="nf">print</span><span class="p">(</span><span class="n">swapped</span><span class="p">)</span>  <span class="c1"># {1: 'a', 2: 'b', 3: 'c'}
</span></code></pre></div></div>

<h3 id="example-4-set-comprehensions">Example 4: Set Comprehensions</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">numbers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
 
<span class="c1"># Remove duplicates and filter
</span><span class="n">unique_evens</span> <span class="o">=</span> <span class="p">{</span><span class="n">n</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">numbers</span> <span class="k">if</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">}</span>
<span class="nf">print</span><span class="p">(</span><span class="n">unique_evens</span><span class="p">)</span>  <span class="c1"># {2, 4}
</span></code></pre></div></div>

<h3 id="example-5-nested-comprehensions---when-they-help">Example 5: Nested Comprehensions - When They Help</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Create a 3x3 matrix (list of lists)
</span><span class="n">matrix</span> <span class="o">=</span> <span class="p">[[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">+</span> <span class="n">j</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">matrix</span><span class="p">)</span>
<span class="c1"># [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
</span> 
<span class="c1"># Flatten a nested list
</span><span class="n">nested</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">],</span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">]]</span>
<span class="n">flattened</span> <span class="o">=</span> <span class="p">[</span><span class="n">item</span> <span class="k">for</span> <span class="n">sublist</span> <span class="ow">in</span> <span class="n">nested</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">sublist</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">flattened</span><span class="p">)</span>  <span class="c1"># [1, 2, 3, 4, 5, 6, 7, 8, 9]
</span></code></pre></div></div>

<h3 id="example-6-nested-comprehensions---when-to-avoid">Example 6: Nested Comprehensions - When to Avoid</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Too complex - hard to read. Use a regular loop or generator instead.
</span><span class="n">result</span> <span class="o">=</span> <span class="p">[</span>
    <span class="n">x</span> <span class="o">*</span> <span class="n">y</span>
    <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
    <span class="nf">if </span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">x</span> <span class="o">*</span> <span class="n">y</span> <span class="o">&gt;</span> <span class="mi">20</span>
<span class="p">]</span>
 
<span class="c1"># Better: be explicit
</span><span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
    <span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
        <span class="nf">if </span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">x</span> <span class="o">*</span> <span class="n">y</span> <span class="o">&gt;</span> <span class="mi">20</span><span class="p">:</span>
            <span class="n">result</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">x</span> <span class="o">*</span> <span class="n">y</span><span class="p">)</span>
</code></pre></div></div>

<p><strong>Rule of thumb:</strong> If it takes more than one second to read, use a regular loop. Comprehensions should be <em>readable</em>.</p>

<h3 id="example-7-generator-expressions-lazy-evaluation">Example 7: Generator Expressions (Lazy Evaluation)</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># List comprehension - evaluates immediately, stores all in memory
</span><span class="n">lazy_list</span> <span class="o">=</span> <span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1_000_000</span><span class="p">)]</span>  <span class="c1"># Uses lots of memory
</span> 
<span class="c1"># Generator expression - evaluates on demand, one at a time
</span><span class="n">lazy_gen</span> <span class="o">=</span> <span class="p">(</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1_000_000</span><span class="p">))</span>  <span class="c1"># Uses almost no memory
</span> 
<span class="c1"># They look identical except () vs []
# But generators are lazy - they don't compute until you ask
</span> 
<span class="nf">print</span><span class="p">(</span><span class="nf">next</span><span class="p">(</span><span class="n">lazy_gen</span><span class="p">))</span>  <span class="c1"># 0 (first square)
</span><span class="nf">print</span><span class="p">(</span><span class="nf">next</span><span class="p">(</span><span class="n">lazy_gen</span><span class="p">))</span>  <span class="c1"># 1 (second square)
</span></code></pre></div></div>

<hr />

<h2 id="part-2-generators---processing-without-loading-everything">Part 2: Generators - Processing Without Loading Everything</h2>

<h3 id="the-analogy-1">The Analogy</h3>
<p>A vending machine gives you one item at a time when you press the button, instead of handing you the entire inventory.</p>

<h3 id="why-generators-matter">Why Generators Matter</h3>
<p>Processing a 100 GB dataset? Load it line by line with a generator instead of loading all 100 GB into RAM.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">sys</span>
 
<span class="c1"># List: stores all in memory at once
</span><span class="n">list_squares</span> <span class="o">=</span> <span class="p">[</span><span class="n">n</span> <span class="o">**</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1000</span><span class="p">)]</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">List memory: </span><span class="si">{</span><span class="n">sys</span><span class="p">.</span><span class="nf">getsizeof</span><span class="p">(</span><span class="n">list_squares</span><span class="p">)</span><span class="si">}</span><span class="s"> bytes</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># ~9000+ bytes
</span> 
<span class="c1"># Generator: computes one at a time
</span><span class="k">def</span> <span class="nf">gen_squares</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
        <span class="k">yield</span> <span class="n">i</span> <span class="o">**</span> <span class="mi">2</span>
 
<span class="n">gen_squares_obj</span> <span class="o">=</span> <span class="nf">gen_squares</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Generator memory: </span><span class="si">{</span><span class="n">sys</span><span class="p">.</span><span class="nf">getsizeof</span><span class="p">(</span><span class="n">gen_squares_obj</span><span class="p">)</span><span class="si">}</span><span class="s"> bytes</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># ~128 bytes
</span></code></pre></div></div>

<h3 id="example-1-simple-generator-with-yield">Example 1: Simple Generator with <code class="language-plaintext highlighter-rouge">yield</code></h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">count_up</span><span class="p">(</span><span class="n">max_num</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Generator that yields numbers from 0 to max_num.</span><span class="sh">"""</span>
    <span class="n">current</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">while</span> <span class="n">current</span> <span class="o">&lt;</span> <span class="n">max_num</span><span class="p">:</span>
        <span class="k">yield</span> <span class="n">current</span>
        <span class="n">current</span> <span class="o">+=</span> <span class="mi">1</span>
 
<span class="c1"># Use it
</span><span class="k">for</span> <span class="n">num</span> <span class="ow">in</span> <span class="nf">count_up</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
<span class="c1"># Output: 0 1 2 3 4
</span></code></pre></div></div>

<h3 id="example-2-reading-large-files-line-by-line">Example 2: Reading Large Files Line by Line</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">read_large_file</span><span class="p">(</span><span class="n">file_path</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Generator that reads file line by line without loading all into memory.</span><span class="sh">"""</span>
    <span class="k">with</span> <span class="nf">open</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="sh">"</span><span class="s">r</span><span class="sh">"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
        <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span><span class="p">:</span>
            <span class="k">yield</span> <span class="n">line</span><span class="p">.</span><span class="nf">strip</span><span class="p">()</span>
 
<span class="c1"># Use it (simulated with a string)
# In real use: for line in read_large_file("huge_file.txt"):
#     process(line)
</span></code></pre></div></div>

<h3 id="example-3-yield-from---delegating-to-another-generator">Example 3: <code class="language-plaintext highlighter-rouge">yield from</code> - Delegating to Another Generator</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">gen_a</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">First generator.</span><span class="sh">"""</span>
    <span class="k">yield</span> <span class="mi">1</span>
    <span class="k">yield</span> <span class="mi">2</span>
 
<span class="k">def</span> <span class="nf">gen_b</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">Second generator.</span><span class="sh">"""</span>
    <span class="k">yield</span> <span class="mi">3</span>
    <span class="k">yield</span> <span class="mi">4</span>
 
<span class="k">def</span> <span class="nf">gen_combined</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">Combine both generators.</span><span class="sh">"""</span>
    <span class="k">yield</span> <span class="k">from</span> <span class="nf">gen_a</span><span class="p">()</span>
    <span class="k">yield</span> <span class="k">from</span> <span class="nf">gen_b</span><span class="p">()</span>
 
<span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="nf">gen_combined</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="c1"># Output: 1 2 3 4
</span></code></pre></div></div>

<h3 id="example-4-manual-iteration-with-next">Example 4: Manual Iteration with <code class="language-plaintext highlighter-rouge">next()</code></h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">countdown</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Countdown generator.</span><span class="sh">"""</span>
    <span class="k">while</span> <span class="n">n</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
        <span class="k">yield</span> <span class="n">n</span>
        <span class="n">n</span> <span class="o">-=</span> <span class="mi">1</span>
 
<span class="n">counter</span> <span class="o">=</span> <span class="nf">countdown</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="nf">next</span><span class="p">(</span><span class="n">counter</span><span class="p">))</span>  <span class="c1"># 3
</span><span class="nf">print</span><span class="p">(</span><span class="nf">next</span><span class="p">(</span><span class="n">counter</span><span class="p">))</span>  <span class="c1"># 2
</span><span class="nf">print</span><span class="p">(</span><span class="nf">next</span><span class="p">(</span><span class="n">counter</span><span class="p">))</span>  <span class="c1"># 1
# print(next(counter))  # Would raise StopIteration
</span></code></pre></div></div>

<h3 id="example-5-infinite-generator">Example 5: Infinite Generator</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">infinite_counter</span><span class="p">(</span><span class="n">start</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Infinite generator - keeps going forever.</span><span class="sh">"""</span>
    <span class="n">current</span> <span class="o">=</span> <span class="n">start</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="k">yield</span> <span class="n">current</span>
        <span class="n">current</span> <span class="o">+=</span> <span class="mi">1</span>
 
<span class="n">counter</span> <span class="o">=</span> <span class="nf">infinite_counter</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="nf">next</span><span class="p">(</span><span class="n">counter</span><span class="p">))</span>  <span class="c1"># 10
</span><span class="nf">print</span><span class="p">(</span><span class="nf">next</span><span class="p">(</span><span class="n">counter</span><span class="p">))</span>  <span class="c1"># 11
</span><span class="nf">print</span><span class="p">(</span><span class="nf">next</span><span class="p">(</span><span class="n">counter</span><span class="p">))</span>  <span class="c1"># 12
# Keep calling next() - it never stops
</span></code></pre></div></div>

<h3 id="example-6-generator-with-conditional-yield">Example 6: Generator with Conditional Yield</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">fibonacci</span><span class="p">(</span><span class="n">limit</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Generate Fibonacci numbers up to limit.</span><span class="sh">"""</span>
    <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span>
    <span class="k">while</span> <span class="n">a</span> <span class="o">&lt;</span> <span class="n">limit</span><span class="p">:</span>
        <span class="k">yield</span> <span class="n">a</span>
        <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="n">b</span><span class="p">,</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
 
<span class="k">for</span> <span class="n">fib</span> <span class="ow">in</span> <span class="nf">fibonacci</span><span class="p">(</span><span class="mi">100</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">fib</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="sh">"</span><span class="s"> </span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output: 0 1 1 2 3 5 8 13 21 34 55 89
</span></code></pre></div></div>

<hr />

<h2 id="part-3-decorators---adding-behavior-without-rewriting-code">Part 3: Decorators - Adding Behavior Without Rewriting Code</h2>

<h3 id="the-analogy-2">The Analogy</h3>
<p>A gift wrapper takes your present and wraps it in paper and ribbon, then hands back the wrapped version. The present is unchanged, but it’s now decorated. Decorators do the same for functions.</p>

<h3 id="example-1-write-a-decorator-from-scratch">Example 1: Write a Decorator from Scratch</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">time</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Any</span>
 
<span class="k">def</span> <span class="nf">timer</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Callable</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Decorator that measures how long a function takes.</span><span class="sh">"""</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">:</span> <span class="n">Any</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Any</span><span class="p">:</span>
        <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="nf">time</span><span class="p">()</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="nf">time</span><span class="p">()</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s"> took </span><span class="si">{</span><span class="n">end</span> <span class="o">-</span> <span class="n">start</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> seconds</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">result</span>
    <span class="k">return</span> <span class="n">wrapper</span>
 
<span class="c1"># Use it
</span><span class="nd">@timer</span>
<span class="k">def</span> <span class="nf">slow_function</span><span class="p">():</span>
    <span class="n">time</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
    <span class="k">return</span> <span class="sh">"</span><span class="s">Done!</span><span class="sh">"</span>
 
<span class="n">result</span> <span class="o">=</span> <span class="nf">slow_function</span><span class="p">()</span>
<span class="c1"># Output: slow_function took 1.0001 seconds
# Done!
</span></code></pre></div></div>

<h3 id="problem-decorators-lose-function-metadata">Problem: Decorators Lose Function Metadata</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">simple_decorator</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="k">return</span> <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">wrapper</span>
 
<span class="nd">@simple_decorator</span>
<span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Say hello to someone.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Hello, </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">!</span><span class="sh">"</span>
 
<span class="nf">print</span><span class="p">(</span><span class="n">greet</span><span class="p">.</span><span class="n">__name__</span><span class="p">)</span>  <span class="c1"># 'wrapper' - WRONG! Should be 'greet'
</span><span class="nf">print</span><span class="p">(</span><span class="n">greet</span><span class="p">.</span><span class="n">__doc__</span><span class="p">)</span>   <span class="c1"># None - WRONG! Should be the docstring
</span></code></pre></div></div>

<h3 id="example-2-fix-with-functoolswraps">Example 2: Fix with <code class="language-plaintext highlighter-rouge">functools.wraps</code></h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">functools</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Any</span>
 
<span class="k">def</span> <span class="nf">timer</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Callable</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Decorator that measures function execution time.</span><span class="sh">"""</span>
    <span class="nd">@functools.wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">:</span> <span class="n">Any</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Any</span><span class="p">:</span>
        <span class="kn">import</span> <span class="n">time</span>
        <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="nf">time</span><span class="p">()</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="nf">time</span><span class="p">()</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s"> took </span><span class="si">{</span><span class="n">end</span> <span class="o">-</span> <span class="n">start</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> seconds</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">result</span>
    <span class="k">return</span> <span class="n">wrapper</span>
 
<span class="nd">@timer</span>
<span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Say hello to someone.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Hello, </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">!</span><span class="sh">"</span>
 
<span class="nf">print</span><span class="p">(</span><span class="n">greet</span><span class="p">.</span><span class="n">__name__</span><span class="p">)</span>  <span class="c1"># 'greet' - CORRECT
</span><span class="nf">print</span><span class="p">(</span><span class="n">greet</span><span class="p">.</span><span class="n">__doc__</span><span class="p">)</span>   <span class="c1"># 'Say hello to someone.' - CORRECT
</span></code></pre></div></div>

<h3 id="example-3-decorator-with-arguments-decorator-factory">Example 3: Decorator with Arguments (Decorator Factory)</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">functools</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Any</span>
 
<span class="k">def</span> <span class="nf">repeat</span><span class="p">(</span><span class="n">times</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Callable</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Decorator factory that repeats a function N times.</span><span class="sh">"""</span>
    <span class="k">def</span> <span class="nf">decorator</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Callable</span><span class="p">:</span>
        <span class="nd">@functools.wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
        <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">:</span> <span class="n">Any</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
            <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">times</span><span class="p">):</span>
                <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">wrapper</span>
    <span class="k">return</span> <span class="n">decorator</span>
 
<span class="nd">@repeat</span><span class="p">(</span><span class="n">times</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">say_hello</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Hello!</span><span class="sh">"</span><span class="p">)</span>
 
<span class="nf">say_hello</span><span class="p">()</span>
<span class="c1"># Output:
# Hello!
# Hello!
# Hello!
</span></code></pre></div></div>

<h3 id="example-4-stacking-decorators">Example 4: Stacking Decorators</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">functools</span>
<span class="kn">import</span> <span class="n">time</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Any</span>
 
<span class="k">def</span> <span class="nf">timer</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Callable</span><span class="p">:</span>
    <span class="nd">@functools.wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">:</span> <span class="n">Any</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Any</span><span class="p">:</span>
        <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="nf">time</span><span class="p">()</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="nf">time</span><span class="p">()</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">[TIMER] </span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">end</span> <span class="o">-</span> <span class="n">start</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">s</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">result</span>
    <span class="k">return</span> <span class="n">wrapper</span>
 
<span class="k">def</span> <span class="nf">log_call</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Callable</span><span class="p">:</span>
    <span class="nd">@functools.wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">:</span> <span class="n">Any</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Any</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">[LOG] Calling </span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s"> with args=</span><span class="si">{</span><span class="n">args</span><span class="si">}</span><span class="s">, kwargs=</span><span class="si">{</span><span class="n">kwargs</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">wrapper</span>
 
<span class="c1"># Stack decorators - applied bottom to top
</span><span class="nd">@timer</span>
<span class="nd">@log_call</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
    <span class="n">time</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="mf">0.1</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
 
<span class="n">result</span> <span class="o">=</span> <span class="nf">add</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="c1"># Output:
# [LOG] Calling add with args=(2, 3), kwargs={}
# [TIMER] add: 0.1005s
# 5
</span></code></pre></div></div>

<h3 id="example-5-caching-decorator-memoization">Example 5: Caching Decorator (Memoization)</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">functools</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Any</span>
 
<span class="k">def</span> <span class="nf">cache</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Callable</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Decorator that caches function results.</span><span class="sh">"""</span>
    <span class="n">results</span> <span class="o">=</span> <span class="p">{}</span>
 
    <span class="nd">@functools.wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Any</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">args</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
            <span class="n">results</span><span class="p">[</span><span class="n">args</span><span class="p">]</span> <span class="o">=</span> <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">results</span><span class="p">[</span><span class="n">args</span><span class="p">]</span>
    <span class="k">return</span> <span class="n">wrapper</span>
 
<span class="nd">@cache</span>
<span class="k">def</span> <span class="nf">expensive_computation</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Simulates a slow calculation.</span><span class="sh">"""</span>
    <span class="kn">import</span> <span class="n">time</span>
    <span class="n">time</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">n</span> <span class="o">**</span> <span class="mi">2</span>
 
<span class="nf">print</span><span class="p">(</span><span class="nf">expensive_computation</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># Takes 1 second
</span><span class="nf">print</span><span class="p">(</span><span class="nf">expensive_computation</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># Instant (from cache)
</span></code></pre></div></div>

<hr />

<h2 id="part-4-type-hints---your-first-production-standard">Part 4: Type Hints - Your First Production Standard</h2>

<h3 id="why-type-hints-matter">Why Type Hints Matter</h3>
<ul>
  <li><strong>Catch bugs early:</strong> IDEs warn you when you pass the wrong type</li>
  <li><strong>Better autocomplete:</strong> Your editor knows what methods are available</li>
  <li><strong>Self-documenting:</strong> Readers immediately know what types are expected</li>
  <li><strong>Easier refactoring:</strong> Change a type signature, find all broken calls</li>
</ul>

<p>Type hints are now a Day 5 production standard. From this point forward, <strong>every function must have type hints on parameters and return type</strong>.</p>

<h3 id="example-1-basic-types">Example 1: Basic Types</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Say hello to someone.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Hello, </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">!</span><span class="sh">"</span>
 
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Add two integers.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
 
<span class="k">def</span> <span class="nf">is_positive</span><span class="p">(</span><span class="n">num</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">bool</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Check if a number is positive.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">num</span> <span class="o">&gt;</span> <span class="mi">0</span>
</code></pre></div></div>

<h3 id="example-2-collections">Example 2: Collections</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">first_three</span><span class="p">(</span><span class="n">numbers</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Return the first three numbers.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">numbers</span><span class="p">[:</span><span class="mi">3</span><span class="p">]</span>
 
<span class="k">def</span> <span class="nf">count_words</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">int</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Count occurrences of each word.</span><span class="sh">"""</span>
    <span class="n">words</span> <span class="o">=</span> <span class="n">text</span><span class="p">.</span><span class="nf">split</span><span class="p">()</span>
    <span class="k">return</span> <span class="p">{</span><span class="n">word</span><span class="p">:</span> <span class="n">words</span><span class="p">.</span><span class="nf">count</span><span class="p">(</span><span class="n">word</span><span class="p">)</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="nf">set</span><span class="p">(</span><span class="n">words</span><span class="p">)}</span>
 
<span class="k">def</span> <span class="nf">get_coords</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="nb">float</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Return latitude and longitude.</span><span class="sh">"""</span>
    <span class="nf">return </span><span class="p">(</span><span class="mf">40.7128</span><span class="p">,</span> <span class="o">-</span><span class="mf">74.0060</span><span class="p">)</span>
 
<span class="k">def</span> <span class="nf">unique_tags</span><span class="p">(</span><span class="n">tags</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Return unique tags.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="nf">set</span><span class="p">(</span><span class="n">tags</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="example-3-optional-types">Example 3: Optional Types</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Optional</span>
 
<span class="k">def</span> <span class="nf">find_user</span><span class="p">(</span><span class="n">user_id</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Find a user by ID, or None if not found.</span><span class="sh">"""</span>
    <span class="n">users</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">:</span> <span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="mi">2</span><span class="p">:</span> <span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="p">}</span>
    <span class="k">return</span> <span class="n">users</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">user_id</span><span class="p">)</span>
 
<span class="c1"># Python 3.10+ allows this syntax
</span><span class="k">def</span> <span class="nf">find_user_modern</span><span class="p">(</span><span class="n">user_id</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span> <span class="o">|</span> <span class="bp">None</span><span class="p">:</span>
    <span class="n">users</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">:</span> <span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="mi">2</span><span class="p">:</span> <span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="p">}</span>
    <span class="k">return</span> <span class="n">users</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">user_id</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="example-4-union-types-multiple-possible-types">Example 4: Union Types (Multiple Possible Types)</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Union</span>
 
<span class="k">def</span> <span class="nf">process_data</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">str</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Accept either int or str.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="nf">str</span><span class="p">(</span><span class="n">data</span><span class="p">).</span><span class="nf">upper</span><span class="p">()</span>
 
<span class="c1"># Python 3.10+ allows this syntax
</span><span class="k">def</span> <span class="nf">process_data_modern</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="nb">int</span> <span class="o">|</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="k">return</span> <span class="nf">str</span><span class="p">(</span><span class="n">data</span><span class="p">).</span><span class="nf">upper</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="example-5-generic-types-with-typevar">Example 5: Generic Types with TypeVar</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">TypeVar</span>
 
<span class="n">T</span> <span class="o">=</span> <span class="nc">TypeVar</span><span class="p">(</span><span class="sh">'</span><span class="s">T</span><span class="sh">'</span><span class="p">)</span>  <span class="c1"># 'T' can be any type
</span> 
<span class="k">def</span> <span class="nf">get_first</span><span class="p">(</span><span class="n">items</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">T</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="n">T</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Get the first item from a list.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="n">items</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
 
<span class="c1"># Works with any type
</span><span class="n">first_num</span> <span class="o">=</span> <span class="nf">get_first</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>      <span class="c1"># int
</span><span class="n">first_str</span> <span class="o">=</span> <span class="nf">get_first</span><span class="p">([</span><span class="sh">"</span><span class="s">a</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">b</span><span class="sh">"</span><span class="p">])</span>     <span class="c1"># str
</span></code></pre></div></div>

<h3 id="example-6-callable-types-function-types">Example 6: Callable Types (Function Types)</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Callable</span>
 
<span class="k">def</span> <span class="nf">apply_twice</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">],</span> <span class="n">value</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Apply a function twice to a value.</span><span class="sh">"""</span>
    <span class="k">return</span> <span class="nf">func</span><span class="p">(</span><span class="nf">func</span><span class="p">(</span><span class="n">value</span><span class="p">))</span>
 
<span class="k">def</span> <span class="nf">square</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
    <span class="k">return</span> <span class="n">n</span> <span class="o">**</span> <span class="mi">2</span>
 
<span class="n">result</span> <span class="o">=</span> <span class="nf">apply_twice</span><span class="p">(</span><span class="n">square</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>  <span class="c1"># 3 -&gt; 9 -&gt; 81
</span><span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>  <span class="c1"># 81
</span></code></pre></div></div>

<h3 id="example-7-protocol-for-duck-typing">Example 7: Protocol for Duck Typing</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Protocol</span>
 
<span class="k">class</span> <span class="nc">Drawable</span><span class="p">(</span><span class="n">Protocol</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Anything with a draw() method.</span><span class="sh">"""</span>
    <span class="k">def</span> <span class="nf">draw</span><span class="p">(</span><span class="n">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
        <span class="bp">...</span>
 
<span class="k">class</span> <span class="nc">Circle</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">draw</span><span class="p">(</span><span class="n">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Drawing circle</span><span class="sh">"</span><span class="p">)</span>
 
<span class="k">class</span> <span class="nc">Square</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">draw</span><span class="p">(</span><span class="n">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Drawing square</span><span class="sh">"</span><span class="p">)</span>
 
<span class="k">def</span> <span class="nf">render</span><span class="p">(</span><span class="n">shape</span><span class="p">:</span> <span class="n">Drawable</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Draw any shape.</span><span class="sh">"""</span>
    <span class="n">shape</span><span class="p">.</span><span class="nf">draw</span><span class="p">()</span>
 
<span class="nf">render</span><span class="p">(</span><span class="nc">Circle</span><span class="p">())</span>   <span class="c1"># Works
</span><span class="nf">render</span><span class="p">(</span><span class="nc">Square</span><span class="p">())</span>   <span class="c1"># Works
</span></code></pre></div></div>

<h3 id="example-8-typeddict-for-typed-dictionaries">Example 8: TypedDict for Typed Dictionaries</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">TypedDict</span>
 
<span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">TypedDict</span><span class="p">):</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
    <span class="n">age</span><span class="p">:</span> <span class="nb">int</span>
    <span class="n">email</span><span class="p">:</span> <span class="nb">str</span>
 
<span class="k">def</span> <span class="nf">greet_person</span><span class="p">(</span><span class="n">person</span><span class="p">:</span> <span class="n">Person</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Hello </span><span class="si">{</span><span class="n">person</span><span class="p">[</span><span class="sh">'</span><span class="s">name</span><span class="sh">'</span><span class="p">]</span><span class="si">}</span><span class="s">, age </span><span class="si">{</span><span class="n">person</span><span class="p">[</span><span class="sh">'</span><span class="s">age</span><span class="sh">'</span><span class="p">]</span><span class="si">}</span><span class="sh">"</span>
 
<span class="c1"># Dict with correct types
</span><span class="n">person</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">age</span><span class="sh">"</span><span class="p">:</span> <span class="mi">30</span><span class="p">,</span> <span class="sh">"</span><span class="s">email</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">alice@example.com</span><span class="sh">"</span><span class="p">}</span>
<span class="nf">print</span><span class="p">(</span><span class="nf">greet_person</span><span class="p">(</span><span class="n">person</span><span class="p">))</span>
</code></pre></div></div>

<hr />

<h2 id="part-5-dataclasses---auto-generating-data-structure-boilerplate">Part 5: Dataclasses - Auto-Generating Data Structure Boilerplate</h2>

<h3 id="the-analogy-3">The Analogy</h3>
<p>A dataclass is like a form template. Instead of writing <code class="language-plaintext highlighter-rouge">__init__</code>, <code class="language-plaintext highlighter-rouge">__repr__</code>, and <code class="language-plaintext highlighter-rouge">__eq__</code> by hand, the decorator fills them in automatically.</p>

<h3 id="example-1-basic-dataclass">Example 1: Basic Dataclass</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">dataclasses</span> <span class="kn">import</span> <span class="n">dataclass</span>
 
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
    <span class="n">age</span><span class="p">:</span> <span class="nb">int</span>
    <span class="n">email</span><span class="p">:</span> <span class="nb">str</span>
 
<span class="c1"># Automatically gets __init__, __repr__, __eq__
</span><span class="n">person</span> <span class="o">=</span> <span class="nc">Person</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span> <span class="n">email</span><span class="o">=</span><span class="sh">"</span><span class="s">alice@example.com</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">person</span><span class="p">)</span>  <span class="c1"># Person(name='Alice', age=30, email='alice@example.com')
</span> 
<span class="c1"># Can compare directly
</span><span class="n">person2</span> <span class="o">=</span> <span class="nc">Person</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span> <span class="n">email</span><span class="o">=</span><span class="sh">"</span><span class="s">alice@example.com</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">person</span> <span class="o">==</span> <span class="n">person2</span><span class="p">)</span>  <span class="c1"># True
</span></code></pre></div></div>

<h3 id="example-2-default-values">Example 2: Default Values</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">dataclasses</span> <span class="kn">import</span> <span class="n">dataclass</span>
 
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">Config</span><span class="p">:</span>
    <span class="n">host</span><span class="p">:</span> <span class="nb">str</span>
    <span class="n">port</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">8000</span>
    <span class="n">debug</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="bp">False</span>
 
<span class="c1"># Can omit fields with defaults
</span><span class="n">config</span> <span class="o">=</span> <span class="nc">Config</span><span class="p">(</span><span class="n">host</span><span class="o">=</span><span class="sh">"</span><span class="s">localhost</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>  <span class="c1"># Config(host='localhost', port=8000, debug=False)
</span></code></pre></div></div>

<h3 id="example-3-field-for-custom-defaults">Example 3: <code class="language-plaintext highlighter-rouge">field()</code> for Custom Defaults</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">dataclasses</span> <span class="kn">import</span> <span class="n">dataclass</span><span class="p">,</span> <span class="n">field</span>
 
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">Team</span><span class="p">:</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
    <span class="n">members</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="nf">field</span><span class="p">(</span><span class="n">default_factory</span><span class="o">=</span><span class="nb">list</span><span class="p">)</span>
    <span class="n">scores</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="nf">field</span><span class="p">(</span><span class="n">default_factory</span><span class="o">=</span><span class="nb">dict</span><span class="p">)</span>
 
<span class="n">team1</span> <span class="o">=</span> <span class="nc">Team</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Alpha</span><span class="sh">"</span><span class="p">)</span>
<span class="n">team2</span> <span class="o">=</span> <span class="nc">Team</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Beta</span><span class="sh">"</span><span class="p">)</span>
 
<span class="c1"># Each team has its own list/dict (not shared!)
</span><span class="n">team1</span><span class="p">.</span><span class="n">members</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">team1</span><span class="p">.</span><span class="n">members</span><span class="p">)</span>  <span class="c1"># ['Alice']
</span><span class="nf">print</span><span class="p">(</span><span class="n">team2</span><span class="p">.</span><span class="n">members</span><span class="p">)</span>  <span class="c1"># []
</span></code></pre></div></div>

<h3 id="example-4-validation-with-__post_init__">Example 4: Validation with <code class="language-plaintext highlighter-rouge">__post_init__</code></h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">dataclasses</span> <span class="kn">import</span> <span class="n">dataclass</span>
 
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
    <span class="n">age</span><span class="p">:</span> <span class="nb">int</span>
 
    <span class="k">def</span> <span class="nf">__post_init__</span><span class="p">(</span><span class="n">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
        <span class="sh">"""</span><span class="s">Validate after initialization.</span><span class="sh">"""</span>
        <span class="k">if</span> <span class="n">self</span><span class="p">.</span><span class="n">age</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">:</span>
            <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Age cannot be negative: </span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">age</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="n">self</span><span class="p">.</span><span class="n">name</span><span class="p">:</span>
            <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">"</span><span class="s">Name cannot be empty</span><span class="sh">"</span><span class="p">)</span>
 
<span class="c1"># This works
</span><span class="n">person</span> <span class="o">=</span> <span class="nc">Person</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="mi">30</span><span class="p">)</span>
 
<span class="c1"># This fails
</span><span class="k">try</span><span class="p">:</span>
    <span class="n">bad_person</span> <span class="o">=</span> <span class="nc">Person</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="p">,</span> <span class="n">age</span><span class="o">=-</span><span class="mi">5</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">ValueError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Error: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># Error: Age cannot be negative: -5
</span></code></pre></div></div>

<h3 id="example-5-immutable-dataclass-with-frozentrue">Example 5: Immutable Dataclass with <code class="language-plaintext highlighter-rouge">frozen=True</code></h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">dataclasses</span> <span class="kn">import</span> <span class="n">dataclass</span>
 
<span class="nd">@dataclass</span><span class="p">(</span><span class="n">frozen</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Point</span><span class="p">:</span>
    <span class="n">x</span><span class="p">:</span> <span class="nb">float</span>
    <span class="n">y</span><span class="p">:</span> <span class="nb">float</span>
 
<span class="n">point</span> <span class="o">=</span> <span class="nc">Point</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="mf">10.0</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="mf">20.0</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">point</span><span class="p">)</span>  <span class="c1"># Point(x=10.0, y=20.0)
</span> 
<span class="c1"># Try to modify
</span><span class="k">try</span><span class="p">:</span>
    <span class="n">point</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="mf">15.0</span>
<span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Error: </span><span class="si">{</span><span class="nf">type</span><span class="p">(</span><span class="n">e</span><span class="p">).</span><span class="n">__name__</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># Error: FrozenInstanceError
</span></code></pre></div></div>

<hr />

<h2 id="part-6-pathlib---working-with-file-paths-correctly">Part 6: pathlib - Working with File Paths Correctly</h2>

<h3 id="why-pathlib-over-ospath">Why pathlib Over os.path</h3>
<ul>
  <li><code class="language-plaintext highlighter-rouge">os.path.join()</code> requires string concatenation: <code class="language-plaintext highlighter-rouge">os.path.join("folder", "file.txt")</code></li>
  <li><code class="language-plaintext highlighter-rouge">pathlib</code> uses the <code class="language-plaintext highlighter-rouge">/</code> operator: <code class="language-plaintext highlighter-rouge">Path("folder") / "file.txt"</code></li>
  <li>Handles Windows, Mac, Linux paths automatically</li>
  <li>More readable, more Pythonic</li>
</ul>

<h3 id="example-1-creating-and-checking-paths">Example 1: Creating and Checking Paths</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
 
<span class="c1"># Create a Path
</span><span class="n">file_path</span> <span class="o">=</span> <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">data</span><span class="sh">"</span><span class="p">)</span> <span class="o">/</span> <span class="sh">"</span><span class="s">input.txt</span><span class="sh">"</span>  <span class="c1"># Works on any OS
</span><span class="nf">print</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span>  <span class="c1"># data/input.txt (or data\input.txt on Windows)
</span> 
<span class="c1"># Check if it exists
</span><span class="k">if</span> <span class="n">file_path</span><span class="p">.</span><span class="nf">exists</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">File exists!</span><span class="sh">"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">File does not exist</span><span class="sh">"</span><span class="p">)</span>
 
<span class="c1"># Check type
</span><span class="k">if</span> <span class="n">file_path</span><span class="p">.</span><span class="nf">is_file</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">It</span><span class="sh">'</span><span class="s">s a file</span><span class="sh">"</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">file_path</span><span class="p">.</span><span class="nf">is_dir</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">It</span><span class="sh">'</span><span class="s">s a directory</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="example-2-reading-and-writing">Example 2: Reading and Writing</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
 
<span class="c1"># Write text
</span><span class="n">output_file</span> <span class="o">=</span> <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">output</span><span class="sh">"</span><span class="p">)</span> <span class="o">/</span> <span class="sh">"</span><span class="s">result.txt</span><span class="sh">"</span>
<span class="n">output_file</span><span class="p">.</span><span class="n">parent</span><span class="p">.</span><span class="nf">mkdir</span><span class="p">(</span><span class="n">parents</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>  <span class="c1"># Create parent dir if needed
</span><span class="n">output_file</span><span class="p">.</span><span class="nf">write_text</span><span class="p">(</span><span class="sh">"</span><span class="s">Hello, World!</span><span class="sh">"</span><span class="p">)</span>
 
<span class="c1"># Read text
</span><span class="n">content</span> <span class="o">=</span> <span class="n">output_file</span><span class="p">.</span><span class="nf">read_text</span><span class="p">()</span>
<span class="nf">print</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>  <span class="c1"># Hello, World!
</span> 
<span class="c1"># Append text (read, modify, write back)
</span><span class="n">content</span> <span class="o">=</span> <span class="n">output_file</span><span class="p">.</span><span class="nf">read_text</span><span class="p">()</span>
<span class="n">output_file</span><span class="p">.</span><span class="nf">write_text</span><span class="p">(</span><span class="n">content</span> <span class="o">+</span> <span class="sh">"</span><span class="se">\n</span><span class="s">New line added</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="example-3-listing-files-with-glob">Example 3: Listing Files with <code class="language-plaintext highlighter-rouge">.glob()</code></h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
 
<span class="c1"># Find all .txt files recursively
</span><span class="k">for</span> <span class="n">txt_file</span> <span class="ow">in</span> <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">data</span><span class="sh">"</span><span class="p">).</span><span class="nf">glob</span><span class="p">(</span><span class="sh">"</span><span class="s">**/*.txt</span><span class="sh">"</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">txt_file</span><span class="p">)</span>
 
<span class="c1"># Find all .py files in current directory (non-recursive)
</span><span class="k">for</span> <span class="n">py_file</span> <span class="ow">in</span> <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">.</span><span class="sh">"</span><span class="p">).</span><span class="nf">glob</span><span class="p">(</span><span class="sh">"</span><span class="s">*.py</span><span class="sh">"</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">py_file</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="example-4-iterating-directories-with-iterdir">Example 4: Iterating Directories with <code class="language-plaintext highlighter-rouge">.iterdir()</code></h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
 
<span class="c1"># List everything in a directory
</span><span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">data</span><span class="sh">"</span><span class="p">).</span><span class="nf">iterdir</span><span class="p">():</span>
    <span class="k">if</span> <span class="n">item</span><span class="p">.</span><span class="nf">is_file</span><span class="p">():</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">File: </span><span class="si">{</span><span class="n">item</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">elif</span> <span class="n">item</span><span class="p">.</span><span class="nf">is_dir</span><span class="p">():</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Directory: </span><span class="si">{</span><span class="n">item</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="example-5-working-with-path-components">Example 5: Working with Path Components</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
 
<span class="n">file_path</span> <span class="o">=</span> <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">data/analysis/results.txt</span><span class="sh">"</span><span class="p">)</span>
 
<span class="nf">print</span><span class="p">(</span><span class="n">file_path</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>        <span class="c1"># results.txt
</span><span class="nf">print</span><span class="p">(</span><span class="n">file_path</span><span class="p">.</span><span class="n">stem</span><span class="p">)</span>        <span class="c1"># results (filename without extension)
</span><span class="nf">print</span><span class="p">(</span><span class="n">file_path</span><span class="p">.</span><span class="n">suffix</span><span class="p">)</span>      <span class="c1"># .txt (extension)
</span><span class="nf">print</span><span class="p">(</span><span class="n">file_path</span><span class="p">.</span><span class="n">parent</span><span class="p">)</span>      <span class="c1"># data/analysis
</span><span class="nf">print</span><span class="p">(</span><span class="n">file_path</span><span class="p">.</span><span class="n">parts</span><span class="p">)</span>       <span class="c1"># ('data', 'analysis', 'results.txt')
</span></code></pre></div></div>

<h3 id="example-6-resolving-to-absolute-path">Example 6: Resolving to Absolute Path</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
 
<span class="n">relative</span> <span class="o">=</span> <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">data/input.txt</span><span class="sh">"</span><span class="p">)</span>
<span class="n">absolute</span> <span class="o">=</span> <span class="n">relative</span><span class="p">.</span><span class="nf">resolve</span><span class="p">()</span>
<span class="nf">print</span><span class="p">(</span><span class="n">absolute</span><span class="p">)</span>  <span class="c1"># /full/path/to/data/input.txt
</span></code></pre></div></div>

<hr />

<h2 id="the-project-file-analysis-tool">The Project: File Analysis Tool</h2>

<p>Let’s build a <strong>complete file analysis tool</strong> that uses all six concepts together:</p>

<h3 id="requirements">Requirements</h3>
<ul>
  <li>Use <strong>pathlib</strong> to scan a directory</li>
  <li>Use a <strong>generator</strong> to lazily yield file statistics for each file</li>
  <li>Use <strong>list comprehension</strong> to filter by file extension</li>
  <li>Use <strong>@dataclass</strong> for the FileStats data structure</li>
  <li>Use <strong>@timer decorator</strong> to measure analysis time</li>
  <li>Use <strong>type hints</strong> on every function</li>
</ul>

<h3 id="step-1-create-the-main-script">Step 1: Create the Main Script</h3>

<p>Create <code class="language-plaintext highlighter-rouge">analyzer.py</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">time</span>
<span class="kn">import</span> <span class="n">functools</span>
<span class="kn">from</span> <span class="n">dataclasses</span> <span class="kn">import</span> <span class="n">dataclass</span>
<span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Generator</span><span class="p">,</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Any</span>
 
<span class="c1"># =============================================================================
# Data Structure: FileStats
# =============================================================================
</span> 
<span class="nd">@dataclass</span>
<span class="k">class</span> <span class="nc">FileStats</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Statistics for a single file.</span><span class="sh">"""</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
    <span class="n">path</span><span class="p">:</span> <span class="n">Path</span>
    <span class="n">size_bytes</span><span class="p">:</span> <span class="nb">int</span>
    <span class="n">extension</span><span class="p">:</span> <span class="nb">str</span>
 
    <span class="k">def</span> <span class="nf">__post_init__</span><span class="p">(</span><span class="n">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
        <span class="sh">"""</span><span class="s">Validate after initialization.</span><span class="sh">"""</span>
        <span class="k">if</span> <span class="n">self</span><span class="p">.</span><span class="n">size_bytes</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">:</span>
            <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Size cannot be negative: </span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">size_bytes</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
<span class="c1"># =============================================================================
# Decorator: Timer
# =============================================================================
</span> 
<span class="k">def</span> <span class="nf">timer</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Callable</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Decorator that measures function execution time.</span><span class="sh">"""</span>
    <span class="nd">@functools.wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">:</span> <span class="n">Any</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Any</span><span class="p">:</span>
        <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="nf">time</span><span class="p">()</span>
        <span class="n">result</span> <span class="o">=</span> <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="nf">time</span><span class="p">()</span>
        <span class="n">elapsed</span> <span class="o">=</span> <span class="n">end</span> <span class="o">-</span> <span class="n">start</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="se">\n</span><span class="s">[TIMER] </span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s"> took </span><span class="si">{</span><span class="n">elapsed</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> seconds</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">result</span>
    <span class="k">return</span> <span class="n">wrapper</span>
 
<span class="c1"># =============================================================================
# Generator: Analyze Files
# =============================================================================
</span> 
<span class="k">def</span> <span class="nf">analyze_directory</span><span class="p">(</span><span class="n">directory</span><span class="p">:</span> <span class="n">Path</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Generator</span><span class="p">[</span><span class="n">FileStats</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">
    Generator that yields FileStats for each file in a directory.
    Lazy evaluation - doesn</span><span class="sh">'</span><span class="s">t load all files at once.
    </span><span class="sh">"""</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">directory</span><span class="p">.</span><span class="nf">is_dir</span><span class="p">():</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Not a directory: </span><span class="si">{</span><span class="n">directory</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
    <span class="k">for</span> <span class="n">file_path</span> <span class="ow">in</span> <span class="n">directory</span><span class="p">.</span><span class="nf">rglob</span><span class="p">(</span><span class="sh">"</span><span class="s">*</span><span class="sh">"</span><span class="p">):</span>
        <span class="k">if</span> <span class="n">file_path</span><span class="p">.</span><span class="nf">is_file</span><span class="p">():</span>
            <span class="k">try</span><span class="p">:</span>
                <span class="n">size</span> <span class="o">=</span> <span class="n">file_path</span><span class="p">.</span><span class="nf">stat</span><span class="p">().</span><span class="n">st_size</span>
                <span class="k">yield</span> <span class="nc">FileStats</span><span class="p">(</span>
                    <span class="n">name</span><span class="o">=</span><span class="n">file_path</span><span class="p">.</span><span class="n">name</span><span class="p">,</span>
                    <span class="n">path</span><span class="o">=</span><span class="n">file_path</span><span class="p">,</span>
                    <span class="n">size_bytes</span><span class="o">=</span><span class="n">size</span><span class="p">,</span>
                    <span class="n">extension</span><span class="o">=</span><span class="n">file_path</span><span class="p">.</span><span class="n">suffix</span><span class="p">,</span>
                <span class="p">)</span>
            <span class="k">except</span> <span class="nb">OSError</span><span class="p">:</span>
                <span class="c1"># Skip files we can't access
</span>                <span class="k">pass</span>
 
<span class="c1"># =============================================================================
# Main Analysis Functions with Type Hints
# =============================================================================
</span> 
<span class="k">def</span> <span class="nf">filter_by_extension</span><span class="p">(</span>
    <span class="n">stats</span><span class="p">:</span> <span class="n">Generator</span><span class="p">[</span><span class="n">FileStats</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">],</span>
    <span class="n">extension</span><span class="p">:</span> <span class="nb">str</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="n">FileStats</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">
    Filter files by extension using list comprehension.
    Example: filter_by_extension(gen, </span><span class="sh">"</span><span class="s">.py</span><span class="sh">"</span><span class="s">) returns only Python files.
    </span><span class="sh">"""</span>
    <span class="k">return</span> <span class="p">[</span>
        <span class="n">stat</span> <span class="k">for</span> <span class="n">stat</span> <span class="ow">in</span> <span class="n">stats</span>
        <span class="k">if</span> <span class="n">stat</span><span class="p">.</span><span class="n">extension</span><span class="p">.</span><span class="nf">lower</span><span class="p">()</span> <span class="o">==</span> <span class="n">extension</span><span class="p">.</span><span class="nf">lower</span><span class="p">()</span>
    <span class="p">]</span>
 
<span class="k">def</span> <span class="nf">get_largest_files</span><span class="p">(</span>
    <span class="n">stats</span><span class="p">:</span> <span class="n">Generator</span><span class="p">[</span><span class="n">FileStats</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">],</span>
    <span class="n">top_n</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">5</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="n">FileStats</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">
    Return the N largest files, sorted by size descending.
    </span><span class="sh">"""</span>
    <span class="n">all_stats</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">stats</span><span class="p">)</span>
    <span class="k">return</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">all_stats</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="n">s</span><span class="p">.</span><span class="n">size_bytes</span><span class="p">,</span> <span class="n">reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">)[:</span><span class="n">top_n</span><span class="p">]</span>
 
<span class="k">def</span> <span class="nf">total_size_by_extension</span><span class="p">(</span>
    <span class="n">stats</span><span class="p">:</span> <span class="n">Generator</span><span class="p">[</span><span class="n">FileStats</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">]</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">int</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">
    Use dict comprehension to sum sizes by extension.
    </span><span class="sh">"""</span>
    <span class="n">all_stats</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">stats</span><span class="p">)</span>
    <span class="k">return</span> <span class="p">{</span>
        <span class="n">ext</span><span class="p">:</span> <span class="nf">sum</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">size_bytes</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">all_stats</span> <span class="k">if</span> <span class="n">s</span><span class="p">.</span><span class="n">extension</span> <span class="o">==</span> <span class="n">ext</span><span class="p">)</span>
        <span class="k">for</span> <span class="n">ext</span> <span class="ow">in</span> <span class="p">{</span><span class="n">s</span><span class="p">.</span><span class="n">extension</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">all_stats</span><span class="p">}</span>
    <span class="p">}</span>
 
<span class="k">def</span> <span class="nf">format_size</span><span class="p">(</span><span class="n">size_bytes</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Convert bytes to human-readable format.</span><span class="sh">"""</span>
    <span class="k">for</span> <span class="n">unit</span> <span class="ow">in</span> <span class="p">[</span><span class="sh">"</span><span class="s">B</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">KB</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">MB</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">GB</span><span class="sh">"</span><span class="p">]:</span>
        <span class="k">if</span> <span class="n">size_bytes</span> <span class="o">&lt;</span> <span class="mi">1024</span><span class="p">:</span>
            <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">size_bytes</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="n">f</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="n">unit</span><span class="si">}</span><span class="sh">"</span>
        <span class="n">size_bytes</span> <span class="o">/=</span> <span class="mi">1024</span>
    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">size_bytes</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="n">f</span><span class="si">}</span><span class="s"> TB</span><span class="sh">"</span>
 
<span class="c1"># =============================================================================
# Main Entry Point
# =============================================================================
</span> 
<span class="nd">@timer</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">directory</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Run full analysis with timing.
    </span><span class="sh">"""</span>
    <span class="n">target_dir</span> <span class="o">=</span> <span class="nc">Path</span><span class="p">(</span><span class="n">directory</span><span class="p">)</span>
 
    <span class="k">if</span> <span class="ow">not</span> <span class="n">target_dir</span><span class="p">.</span><span class="nf">exists</span><span class="p">():</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Error: Directory does not exist: </span><span class="si">{</span><span class="n">target_dir</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span>
 
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Analyzing directory: </span><span class="si">{</span><span class="n">target_dir</span><span class="si">}</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>
 
    <span class="c1"># Example 1: All Python files
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">PYTHON FILES</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
    <span class="n">py_files</span> <span class="o">=</span> <span class="nf">filter_by_extension</span><span class="p">(</span><span class="nf">analyze_directory</span><span class="p">(</span><span class="n">target_dir</span><span class="p">),</span> <span class="sh">"</span><span class="s">.py</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">stat</span> <span class="ow">in</span> <span class="n">py_files</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  </span><span class="si">{</span><span class="n">stat</span><span class="p">.</span><span class="n">name</span><span class="si">:</span><span class="o">&lt;</span><span class="mi">40</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="nf">format_size</span><span class="p">(</span><span class="n">stat</span><span class="p">.</span><span class="n">size_bytes</span><span class="p">)</span><span class="si">:</span><span class="o">&gt;</span><span class="mi">10</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
    <span class="c1"># Example 2: Largest files
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="sh">"</span> <span class="o">+</span> <span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">TOP 5 LARGEST FILES</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
    <span class="n">largest</span> <span class="o">=</span> <span class="nf">get_largest_files</span><span class="p">(</span><span class="nf">analyze_directory</span><span class="p">(</span><span class="n">target_dir</span><span class="p">),</span> <span class="n">top_n</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">stat</span> <span class="ow">in</span> <span class="n">largest</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  </span><span class="si">{</span><span class="n">stat</span><span class="p">.</span><span class="n">name</span><span class="si">:</span><span class="o">&lt;</span><span class="mi">40</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="nf">format_size</span><span class="p">(</span><span class="n">stat</span><span class="p">.</span><span class="n">size_bytes</span><span class="p">)</span><span class="si">:</span><span class="o">&gt;</span><span class="mi">10</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
    <span class="c1"># Example 3: Size by extension
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="sh">"</span> <span class="o">+</span> <span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">SIZE BY FILE EXTENSION</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">=</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">70</span><span class="p">)</span>
    <span class="n">sizes</span> <span class="o">=</span> <span class="nf">total_size_by_extension</span><span class="p">(</span><span class="nf">analyze_directory</span><span class="p">(</span><span class="n">target_dir</span><span class="p">))</span>
    <span class="k">for</span> <span class="n">ext</span><span class="p">,</span> <span class="n">size</span> <span class="ow">in</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">sizes</span><span class="p">.</span><span class="nf">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
        <span class="n">ext_label</span> <span class="o">=</span> <span class="n">ext</span> <span class="k">if</span> <span class="n">ext</span> <span class="k">else</span> <span class="sh">"</span><span class="s">[no extension]</span><span class="sh">"</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">  </span><span class="si">{</span><span class="n">ext_label</span><span class="si">:</span><span class="o">&lt;</span><span class="mi">40</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="nf">format_size</span><span class="p">(</span><span class="n">size</span><span class="p">)</span><span class="si">:</span><span class="o">&gt;</span><span class="mi">10</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span>
    <span class="c1"># Analyze the current directory
</span>    <span class="nf">main</span><span class="p">(</span><span class="sh">"</span><span class="s">.</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="step-2-create-sample-data">Step 2: Create Sample Data</h3>

<p>Create <code class="language-plaintext highlighter-rouge">create_sample_data.py</code> to generate test files:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
 
<span class="k">def</span> <span class="nf">create_sample_structure</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Create sample directory structure for testing.</span><span class="sh">"""</span>
    <span class="n">base</span> <span class="o">=</span> <span class="nc">Path</span><span class="p">(</span><span class="sh">"</span><span class="s">sample_data</span><span class="sh">"</span><span class="p">)</span>
    <span class="n">base</span><span class="p">.</span><span class="nf">mkdir</span><span class="p">(</span><span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
 
    <span class="c1"># Create subdirectories
</span>    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">python_scripts</span><span class="sh">"</span><span class="p">).</span><span class="nf">mkdir</span><span class="p">(</span><span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">config</span><span class="sh">"</span><span class="p">).</span><span class="nf">mkdir</span><span class="p">(</span><span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">docs</span><span class="sh">"</span><span class="p">).</span><span class="nf">mkdir</span><span class="p">(</span><span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
 
    <span class="c1"># Create sample files
</span>    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">python_scripts</span><span class="sh">"</span> <span class="o">/</span> <span class="sh">"</span><span class="s">main.py</span><span class="sh">"</span><span class="p">).</span><span class="nf">write_text</span><span class="p">(</span>
        <span class="sh">"</span><span class="s">print(</span><span class="sh">'</span><span class="s">Hello World</span><span class="sh">'</span><span class="s">)</span><span class="se">\n</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">100</span>
    <span class="p">)</span>
    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">python_scripts</span><span class="sh">"</span> <span class="o">/</span> <span class="sh">"</span><span class="s">utils.py</span><span class="sh">"</span><span class="p">).</span><span class="nf">write_text</span><span class="p">(</span>
        <span class="sh">"</span><span class="s">def helper():</span><span class="se">\n</span><span class="s">    pass</span><span class="se">\n</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">50</span>
    <span class="p">)</span>
    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">config</span><span class="sh">"</span> <span class="o">/</span> <span class="sh">"</span><span class="s">settings.json</span><span class="sh">"</span><span class="p">).</span><span class="nf">write_text</span><span class="p">(</span>
        <span class="sh">'</span><span class="s">{</span><span class="sh">"</span><span class="s">debug</span><span class="sh">"</span><span class="s">: true, </span><span class="sh">"</span><span class="s">port</span><span class="sh">"</span><span class="s">: 8000}</span><span class="se">\n</span><span class="sh">'</span> <span class="o">*</span> <span class="mi">30</span>
    <span class="p">)</span>
    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">config</span><span class="sh">"</span> <span class="o">/</span> <span class="sh">"</span><span class="s">database.ini</span><span class="sh">"</span><span class="p">).</span><span class="nf">write_text</span><span class="p">(</span>
        <span class="sh">"</span><span class="s">[database]</span><span class="se">\n</span><span class="s">host=localhost</span><span class="se">\n</span><span class="s">port=5432</span><span class="se">\n</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">40</span>
    <span class="p">)</span>
    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">docs</span><span class="sh">"</span> <span class="o">/</span> <span class="sh">"</span><span class="s">README.md</span><span class="sh">"</span><span class="p">).</span><span class="nf">write_text</span><span class="p">(</span>
        <span class="sh">"</span><span class="s"># Project Documentation</span><span class="se">\n\n</span><span class="s">This is a sample file.</span><span class="se">\n</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">60</span>
    <span class="p">)</span>
    <span class="p">(</span><span class="n">base</span> <span class="o">/</span> <span class="sh">"</span><span class="s">notes.txt</span><span class="sh">"</span><span class="p">).</span><span class="nf">write_text</span><span class="p">(</span>
        <span class="sh">"</span><span class="s">Random notes and ideas</span><span class="se">\n</span><span class="sh">"</span> <span class="o">*</span> <span class="mi">25</span>
    <span class="p">)</span>
 
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Sample data created in: </span><span class="si">{</span><span class="n">base</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span>
    <span class="nf">create_sample_structure</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="step-3-run-the-analysis">Step 3: Run the Analysis</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create sample data</span>
python create_sample_data.py
<span class="c"># Output: Sample data created in: sample_data</span>
 
<span class="c"># Analyze the sample data</span>
python analyzer.py sample_data
</code></pre></div></div>

<h3 id="expected-output">Expected Output</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Analyzing directory: sample_data
 
======================================================================
PYTHON FILES
======================================================================
  main.py                                    1.2 KB
  utils.py                                   0.6 KB
 
======================================================================
TOP 5 LARGEST FILES
======================================================================
  README.md                                  3.5 KB
  settings.json                              1.6 KB
  database.ini                               2.1 KB
  main.py                                    1.2 KB
  utils.txt                                  0.6 KB
 
======================================================================
SIZE BY FILE EXTENSION
======================================================================
  .md                                        3.5 KB
  .ini                                       2.1 KB
  .json                                      1.6 KB
  .py                                        1.8 KB
  [no extension]                             0.6 KB
 
[TIMER] main took 0.0024 seconds
</code></pre></div></div>

<hr />

<h2 id="whats-next">What’s Next</h2>

<p><strong>Day 6</strong> introduces <strong>NumPy and Pandas</strong>, Python’s powerhouses for numerical computing and data analysis.</p>

<hr />]]></content><author><name>Edward Praveen</name></author><category term="dl-llm-systems" /><category term="python" /><category term="generators" /><category term="decorators" /><category term="type-hints" /><category term="comprehensions" /><category term="advanced-python" /><summary type="html"><![CDATA[Part of my 180-day AI Engineering journey - explained for beginners]]></summary></entry><entry><title type="html">Day 4 of 180 - Collections, Control Flow &amp;amp; Object-Oriented Programming</title><link href="https://edwardpraveen.com/dl-llm-systems/python-oops-day4/" rel="alternate" type="text/html" title="Day 4 of 180 - Collections, Control Flow &amp;amp; Object-Oriented Programming" /><published>2026-03-23T00:00:00+05:30</published><updated>2026-03-23T00:00:00+05:30</updated><id>https://edwardpraveen.com/dl-llm-systems/python-oops-day4</id><content type="html" xml:base="https://edwardpraveen.com/dl-llm-systems/python-oops-day4/"><![CDATA[<blockquote>
  <h2 id="part-of-my-180-day-ai-engineering-journey---learning-in-public-one-hour-a-day-writing-everything-in-plain-english-so-beginners-can-follow-along-the-blog-is-written-with-the-help-of-ai"><em>Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI</em></h2>
  <h2 id="introduction">Introduction</h2>
</blockquote>

<p>By the end of today’s 1.5-hour session, you’ll understand the <strong>four pillars of Python programming</strong>:</p>

<ol>
  <li><strong>Collections:</strong> Lists (ordered shelves) and dicts (labeled drawers)</li>
  <li><strong>Control Flow:</strong> Loops that repeat code</li>
  <li><strong>Functions:</strong> Reusable code recipes</li>
  <li><strong>Classes &amp; OOP:</strong> Blueprints for objects</li>
</ol>

<p>Why these matter for AI Engineering: <strong>Every single AI program you write will use all four of these.</strong> Lists store your data. Loops process that data. Functions make your code DRY (Don’t Repeat Yourself). Classes organize code so a 100,000-line project doesn’t become spaghetti.</p>

<p>Best of all, you’ll build a <strong>working linear regression model from scratch</strong> using only pure Python-no NumPy, no PyTorch, no magic. You’ll see how AI training <em>actually works</em>: it’s not magic, just math.</p>

<hr />

<h2 id="setup">Setup</h2>

<p>Day 4 needs <strong>nothing but Python</strong>. No external libraries yet. Just create a working directory:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> <span class="nt">-p</span> ~/ai-engineering/day-4
<span class="nb">cd</span> ~/ai-engineering/day-4
python3 <span class="nt">--version</span>  <span class="c"># Should be 3.8+</span>
</code></pre></div></div>

<p>All code today runs directly: <code class="language-plaintext highlighter-rouge">python3 script.py</code></p>

<hr />

<h2 id="part-1-lists--dicts---the-data-containers">Part 1: Lists &amp; Dicts - The Data Containers</h2>

<p>Every program needs to store data. Python gives you two main containers: <strong>lists</strong> (ordered) and <strong>dicts</strong> (labeled).</p>

<h3 id="lists-ordered-shelves">Lists: Ordered Shelves</h3>

<p>Think of a list like a <strong>shelf in a grocery store</strong>. Items sit in a specific order: position 0, position 1, position 2, etc. You find things by their <em>position</em>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Create a list
</span><span class="n">fruits</span> <span class="o">=</span> <span class="p">[</span><span class="sh">"</span><span class="s">apple</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">banana</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">cherry</span><span class="sh">"</span><span class="p">]</span>
 
<span class="c1"># Access by position (0-indexed!)
</span><span class="nf">print</span><span class="p">(</span><span class="n">fruits</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>        <span class="c1"># "apple" (first item, index 0)
</span><span class="nf">print</span><span class="p">(</span><span class="n">fruits</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>        <span class="c1"># "banana" (second item, index 1)
</span><span class="nf">print</span><span class="p">(</span><span class="n">fruits</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>       <span class="c1"># "cherry" (last item, use -1)
</span> 
<span class="c1"># Modify
</span><span class="n">fruits</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="sh">"</span><span class="s">blueberry</span><span class="sh">"</span>
<span class="nf">print</span><span class="p">(</span><span class="n">fruits</span><span class="p">)</span>           <span class="c1"># ["apple", "blueberry", "cherry"]
</span> 
<span class="c1"># Add items
</span><span class="n">fruits</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="sh">"</span><span class="s">date</span><span class="sh">"</span><span class="p">)</span>   <span class="c1"># Add one item to end
</span><span class="nf">print</span><span class="p">(</span><span class="n">fruits</span><span class="p">)</span>           <span class="c1"># ["apple", "blueberry", "cherry", "date"]
</span> 
<span class="n">fruits</span><span class="p">.</span><span class="nf">extend</span><span class="p">([</span><span class="sh">"</span><span class="s">fig</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">grape</span><span class="sh">"</span><span class="p">])</span>  <span class="c1"># Add multiple items
</span><span class="nf">print</span><span class="p">(</span><span class="n">fruits</span><span class="p">)</span>           <span class="c1"># ["apple", "blueberry", "cherry", "date", "fig", "grape"]
</span> 
<span class="c1"># Remove items
</span><span class="n">removed</span> <span class="o">=</span> <span class="n">fruits</span><span class="p">.</span><span class="nf">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>  <span class="c1"># Remove and return first item
</span><span class="nf">print</span><span class="p">(</span><span class="n">removed</span><span class="p">)</span>          <span class="c1"># "apple"
</span><span class="nf">print</span><span class="p">(</span><span class="n">fruits</span><span class="p">)</span>           <span class="c1"># ["blueberry", "cherry", "date", "fig", "grape"]
</span> 
<span class="n">fruits</span><span class="p">.</span><span class="nf">remove</span><span class="p">(</span><span class="sh">"</span><span class="s">cherry</span><span class="sh">"</span><span class="p">)</span>  <span class="c1"># Remove by value (not index)
</span><span class="nf">print</span><span class="p">(</span><span class="n">fruits</span><span class="p">)</span>           <span class="c1"># ["blueberry", "date", "fig", "grape"]
</span> 
<span class="c1"># How many?
</span><span class="nf">print</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">fruits</span><span class="p">))</span>      <span class="c1"># 4
</span> 
<span class="c1"># Slice: get a range
</span><span class="n">numbers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">numbers</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="mi">5</span><span class="p">])</span>     <span class="c1"># [2, 3, 4] (index 2 to 5, excludes 5)
</span><span class="nf">print</span><span class="p">(</span><span class="n">numbers</span><span class="p">[:</span><span class="mi">3</span><span class="p">])</span>      <span class="c1"># [0, 1, 2] (from start to index 3)
</span><span class="nf">print</span><span class="p">(</span><span class="n">numbers</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span>      <span class="c1"># [5, 6, 7, 8, 9] (from index 5 to end)
</span><span class="nf">print</span><span class="p">(</span><span class="n">numbers</span><span class="p">[::</span><span class="mi">2</span><span class="p">])</span>     <span class="c1"># [0, 2, 4, 6, 8] (every other item)
</span></code></pre></div></div>

<h3 id="dicts-labeled-drawers">Dicts: Labeled Drawers</h3>

<p>A dict is like a <strong>filing cabinet with labeled drawers</strong>. Each item has a <em>name</em> (the “key”) and a <em>value</em> (what’s inside). You find things by their label, not their position.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Create a dict
</span><span class="n">student</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">age</span><span class="sh">"</span><span class="p">:</span> <span class="mi">25</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">major</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Computer Science</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">gpa</span><span class="sh">"</span><span class="p">:</span> <span class="mf">3.8</span>
<span class="p">}</span>
 
<span class="c1"># Access by key (label)
</span><span class="nf">print</span><span class="p">(</span><span class="n">student</span><span class="p">[</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">])</span>      <span class="c1"># "Alice"
</span><span class="nf">print</span><span class="p">(</span><span class="n">student</span><span class="p">[</span><span class="sh">"</span><span class="s">gpa</span><span class="sh">"</span><span class="p">])</span>       <span class="c1"># 3.8
</span> 
<span class="c1"># Safe access (won't crash if key doesn't exist)
</span><span class="nf">print</span><span class="p">(</span><span class="n">student</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">age</span><span class="sh">"</span><span class="p">))</span>           <span class="c1"># 25
</span><span class="nf">print</span><span class="p">(</span><span class="n">student</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">email</span><span class="sh">"</span><span class="p">))</span>         <span class="c1"># None (key doesn't exist)
</span><span class="nf">print</span><span class="p">(</span><span class="n">student</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">email</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">N/A</span><span class="sh">"</span><span class="p">))</span>  <span class="c1"># "N/A" (provide default)
</span> 
<span class="c1"># Update
</span><span class="n">student</span><span class="p">[</span><span class="sh">"</span><span class="s">age</span><span class="sh">"</span><span class="p">]</span> <span class="o">=</span> <span class="mi">26</span>
<span class="n">student</span><span class="p">[</span><span class="sh">"</span><span class="s">gpa</span><span class="sh">"</span><span class="p">]</span> <span class="o">=</span> <span class="mf">3.9</span>
 
<span class="c1"># Add new key
</span><span class="n">student</span><span class="p">[</span><span class="sh">"</span><span class="s">phone</span><span class="sh">"</span><span class="p">]</span> <span class="o">=</span> <span class="sh">"</span><span class="s">555-1234</span><span class="sh">"</span>
 
<span class="c1"># Remove key
</span><span class="k">del</span> <span class="n">student</span><span class="p">[</span><span class="sh">"</span><span class="s">major</span><span class="sh">"</span><span class="p">]</span>
 
<span class="c1"># Check if key exists
</span><span class="k">if</span> <span class="sh">"</span><span class="s">name</span><span class="sh">"</span> <span class="ow">in</span> <span class="n">student</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Name: </span><span class="si">{</span><span class="n">student</span><span class="p">[</span><span class="sh">'</span><span class="s">name</span><span class="sh">'</span><span class="p">]</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
<span class="k">if</span> <span class="sh">"</span><span class="s">email</span><span class="sh">"</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">student</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">No email on file</span><span class="sh">"</span><span class="p">)</span>
 
<span class="c1"># Get all keys
</span><span class="nf">print</span><span class="p">(</span><span class="n">student</span><span class="p">.</span><span class="nf">keys</span><span class="p">())</span>       <span class="c1"># dict_keys(['name', 'age', 'gpa', 'phone'])
</span> 
<span class="c1"># Get all values
</span><span class="nf">print</span><span class="p">(</span><span class="n">student</span><span class="p">.</span><span class="nf">values</span><span class="p">())</span>     <span class="c1"># dict_values(['Alice', 26, 3.9, '555-1234'])
</span> 
<span class="c1"># Get all key-value pairs
</span><span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">student</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="c1"># name: Alice
</span>    <span class="c1"># age: 26
</span>    <span class="c1"># gpa: 3.9
</span>    <span class="c1"># phone: 555-1234
</span></code></pre></div></div>

<h3 id="dicts-with-defaults-defaultdict">Dicts with Defaults: defaultdict</h3>

<p>Sometimes you want a dict to have a default value if a key doesn’t exist yet:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">collections</span> <span class="kn">import</span> <span class="n">defaultdict</span>
 
<span class="c1"># Regular dict would crash here:
# scores = {}
# scores["Alice"] += 10  # KeyError!
</span> 
<span class="c1"># defaultdict doesn't crash-it uses a default value
</span><span class="n">scores</span> <span class="o">=</span> <span class="nf">defaultdict</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>  <span class="c1"># Default value is 0
</span><span class="n">scores</span><span class="p">[</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">10</span>      <span class="c1"># Works! Creates key with 0, then adds 10
</span><span class="n">scores</span><span class="p">[</span><span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">5</span>
 
<span class="nf">print</span><span class="p">(</span><span class="n">scores</span><span class="p">[</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">])</span>     <span class="c1"># 10
</span><span class="nf">print</span><span class="p">(</span><span class="n">scores</span><span class="p">[</span><span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="p">])</span>       <span class="c1"># 5
</span><span class="nf">print</span><span class="p">(</span><span class="n">scores</span><span class="p">[</span><span class="sh">"</span><span class="s">Charlie</span><span class="sh">"</span><span class="p">])</span>   <span class="c1"># 0 (never been set, but default is 0)
</span></code></pre></div></div>

<h3 id="when-to-use-list-vs-dict">When to Use List vs Dict</h3>

<table>
  <thead>
    <tr>
      <th>Situation</th>
      <th>Use List</th>
      <th>Use Dict</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>“Give me the 3rd item”</td>
      <td>YES</td>
      <td>NO</td>
    </tr>
    <tr>
      <td>“Find the item called ‘age’”</td>
      <td>NO</td>
      <td>YES</td>
    </tr>
    <tr>
      <td>Order matters</td>
      <td>YES</td>
      <td>NO</td>
    </tr>
    <tr>
      <td>Fast lookup by name</td>
      <td>NO</td>
      <td>YES</td>
    </tr>
    <tr>
      <td>You know the keys ahead of time</td>
      <td>NO</td>
      <td>YES</td>
    </tr>
    <tr>
      <td>Keys are things like “name”, “age”, “email”</td>
      <td>NO</td>
      <td>YES</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="part-2-loops---repeating-code">Part 2: Loops - Repeating Code</h2>

<p>Loops let you repeat code without writing it 100 times. Two types: <strong>for</strong> (when you know how many times) and <strong>while</strong> (when you loop until a condition is true).</p>

<h3 id="the-for-loop-repeat-over-a-collection">The <code class="language-plaintext highlighter-rouge">for</code> Loop: Repeat Over a Collection</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Loop over a list
</span><span class="n">fruits</span> <span class="o">=</span> <span class="p">[</span><span class="sh">"</span><span class="s">apple</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">banana</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">cherry</span><span class="sh">"</span><span class="p">]</span>
<span class="k">for</span> <span class="n">fruit</span> <span class="ow">in</span> <span class="n">fruits</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">fruit</span><span class="p">)</span>
<span class="c1"># Output:
# apple
# banana
# cherry
</span> 
<span class="c1"># Loop over a dict
</span><span class="n">student</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">age</span><span class="sh">"</span><span class="p">:</span> <span class="mi">25</span><span class="p">,</span> <span class="sh">"</span><span class="s">major</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">CS</span><span class="sh">"</span><span class="p">}</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">student</span><span class="p">:</span>
    <span class="n">value</span> <span class="o">=</span> <span class="n">student</span><span class="p">[</span><span class="n">key</span><span class="p">]</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output:
# name: Alice
# age: 25
# major: CS
</span> 
<span class="c1"># Better way with .items()
</span><span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">student</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="range-generate-numbers"><code class="language-plaintext highlighter-rouge">range()</code>: Generate Numbers</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># range(n) = 0, 1, 2, ..., n-1
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="c1"># Output: 0 1 2 3 4
</span> 
<span class="c1"># range(start, stop, step)
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">2</span><span class="p">):</span>  <span class="c1"># Start at 2, stop before 8, step by 2
</span>    <span class="nf">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="c1"># Output: 2 4 6
</span> 
<span class="c1"># Count backwards
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>  <span class="c1"># Start at 5, count down
</span>    <span class="nf">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="c1"># Output: 5 4 3 2 1
</span></code></pre></div></div>

<h3 id="enumerate-loop-with-index"><code class="language-plaintext highlighter-rouge">enumerate()</code>: Loop with Index</h3>

<p>Often you want both the <em>index</em> (position) and the <em>value</em>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fruits</span> <span class="o">=</span> <span class="p">[</span><span class="sh">"</span><span class="s">apple</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">banana</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">cherry</span><span class="sh">"</span><span class="p">]</span>
 
<span class="c1"># Without enumerate (awkward)
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">fruits</span><span class="p">)):</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">fruits</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
<span class="c1"># With enumerate (clean)
</span><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">fruit</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">fruits</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">fruit</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output:
# 0: apple
# 1: banana
# 2: cherry
</span></code></pre></div></div>

<h3 id="zip-loop-over-multiple-lists-together"><code class="language-plaintext highlighter-rouge">zip()</code>: Loop Over Multiple Lists Together</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">names</span> <span class="o">=</span> <span class="p">[</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Charlie</span><span class="sh">"</span><span class="p">]</span>
<span class="n">ages</span> <span class="o">=</span> <span class="p">[</span><span class="mi">25</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">28</span><span class="p">]</span>
 
<span class="c1"># Loop through both at once
</span><span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">age</span> <span class="ow">in</span> <span class="nf">zip</span><span class="p">(</span><span class="n">names</span><span class="p">,</span> <span class="n">ages</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s"> is </span><span class="si">{</span><span class="n">age</span><span class="si">}</span><span class="s"> years old</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output:
# Alice is 25 years old
# Bob is 30 years old
# Charlie is 28 years old
</span></code></pre></div></div>

<h3 id="nested-loops-loops-inside-loops">Nested Loops: Loops Inside Loops</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Print a 3x3 grid
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
    <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">(</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">j</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="sh">"</span><span class="s"> </span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">()</span>  <span class="c1"># Newline after each row
# Output:
# (0, 0) (0, 1) (0, 2) 
# (1, 0) (1, 1) (1, 2) 
# (2, 0) (2, 1) (2, 2)
</span></code></pre></div></div>

<h3 id="the-while-loop-repeat-until-a-condition">The <code class="language-plaintext highlighter-rouge">while</code> Loop: Repeat Until a Condition</h3>

<p>Use <code class="language-plaintext highlighter-rouge">while</code> when you don’t know ahead of time how many loops you need:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Keep looping while condition is true
</span><span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">count</span> <span class="o">&lt;</span> <span class="mi">5</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
    <span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># Output: 0 1 2 3 4
</span> 
<span class="c1"># Loop until something happens
</span><span class="n">password</span> <span class="o">=</span> <span class="sh">""</span>
<span class="k">while</span> <span class="n">password</span> <span class="o">!=</span> <span class="sh">"</span><span class="s">secret</span><span class="sh">"</span><span class="p">:</span>
    <span class="n">password</span> <span class="o">=</span> <span class="nf">input</span><span class="p">(</span><span class="sh">"</span><span class="s">Enter password: </span><span class="sh">"</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">password</span> <span class="o">==</span> <span class="sh">"</span><span class="s">secret</span><span class="sh">"</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Access granted!</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Try again</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="break-and-continue"><code class="language-plaintext highlighter-rouge">break</code> and <code class="language-plaintext highlighter-rouge">continue</code></h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># break: exit the loop immediately
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
    <span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="mi">5</span><span class="p">:</span>
        <span class="k">break</span>  <span class="c1"># Stop the loop
</span>    <span class="nf">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="c1"># Output: 0 1 2 3 4
</span> 
<span class="c1"># continue: skip to next iteration
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
    <span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
        <span class="k">continue</span>  <span class="c1"># Skip this iteration
</span>    <span class="nf">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="c1"># Output: 0 1 3 4 (skips 2)
</span></code></pre></div></div>

<h3 id="when-to-use-for-vs-while">When to Use <code class="language-plaintext highlighter-rouge">for</code> vs <code class="language-plaintext highlighter-rouge">while</code></h3>

<ul>
  <li>Use <strong><code class="language-plaintext highlighter-rouge">for</code></strong> when you know (or can count) how many times to loop</li>
  <li>Use <strong><code class="language-plaintext highlighter-rouge">while</code></strong> when you loop until a condition becomes false</li>
</ul>

<hr />

<h2 id="part-3-functions---reusable-code">Part 3: Functions - Reusable Code</h2>

<p>A function is like a <strong>recipe</strong>: you write the steps once, then use it many times. Functions make code DRY (Don’t Repeat Yourself).</p>

<h3 id="the-basics">The Basics</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Define a function with def
</span><span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
    <span class="c1"># name is a parameter
</span>    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Hello, </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">!</span><span class="sh">"</span>
 
<span class="c1"># Call it (use it)
</span><span class="n">result</span> <span class="o">=</span> <span class="nf">greet</span><span class="p">(</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>  <span class="c1"># "Hello, Alice!"
</span> 
<span class="c1"># Another example
</span><span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
 
<span class="nf">print</span><span class="p">(</span><span class="nf">add</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>  <span class="c1"># 8
</span><span class="nf">print</span><span class="p">(</span><span class="nf">add</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">))</span>  <span class="c1"># 30
</span></code></pre></div></div>

<h3 id="return-values">Return Values</h3>

<p>A function can return a value or nothing:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Return a value
</span><span class="k">def</span> <span class="nf">double</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span>
 
<span class="n">result</span> <span class="o">=</span> <span class="nf">double</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>  <span class="c1"># 10
</span> 
<span class="c1"># No return = returns None
</span><span class="k">def</span> <span class="nf">print_twice</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
    <span class="c1"># No return statement!
</span> 
<span class="n">result</span> <span class="o">=</span> <span class="nf">print_twice</span><span class="p">(</span><span class="sh">"</span><span class="s">Hi</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>  <span class="c1"># None
</span></code></pre></div></div>

<h3 id="default-arguments">Default Arguments</h3>

<p>Provide default values so the caller doesn’t always have to pass them:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">greet</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">greeting</span><span class="o">=</span><span class="sh">"</span><span class="s">Hello</span><span class="sh">"</span><span class="p">):</span>
    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">greeting</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">!</span><span class="sh">"</span>
 
<span class="nf">print</span><span class="p">(</span><span class="nf">greet</span><span class="p">(</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">))</span>              <span class="c1"># "Hello, Alice!"
</span><span class="nf">print</span><span class="p">(</span><span class="nf">greet</span><span class="p">(</span><span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Hi</span><span class="sh">"</span><span class="p">))</span>          <span class="c1"># "Hi, Bob!"
</span><span class="nf">print</span><span class="p">(</span><span class="nf">greet</span><span class="p">(</span><span class="sh">"</span><span class="s">Charlie</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Hey</span><span class="sh">"</span><span class="p">))</span>     <span class="c1"># "Hey, Charlie!"
</span></code></pre></div></div>

<h3 id="keyword-arguments">Keyword Arguments</h3>

<p>Pass arguments by name (order doesn’t matter):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">describe_pet</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">species</span><span class="p">,</span> <span class="n">age</span><span class="p">):</span>
    <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s"> is a </span><span class="si">{</span><span class="n">age</span><span class="si">}</span><span class="s">-year-old </span><span class="si">{</span><span class="n">species</span><span class="si">}</span><span class="sh">"</span>
 
<span class="c1"># Positional (order matters)
</span><span class="nf">print</span><span class="p">(</span><span class="nf">describe_pet</span><span class="p">(</span><span class="sh">"</span><span class="s">Fluffy</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">cat</span><span class="sh">"</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
 
<span class="c1"># Keyword (order doesn't matter)
</span><span class="nf">print</span><span class="p">(</span><span class="nf">describe_pet</span><span class="p">(</span><span class="n">age</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">species</span><span class="o">=</span><span class="sh">"</span><span class="s">dog</span><span class="sh">"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Rex</span><span class="sh">"</span><span class="p">))</span>
 
<span class="c1"># Mixed
</span><span class="nf">print</span><span class="p">(</span><span class="nf">describe_pet</span><span class="p">(</span><span class="sh">"</span><span class="s">Bella</span><span class="sh">"</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">species</span><span class="o">=</span><span class="sh">"</span><span class="s">parrot</span><span class="sh">"</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="args-accept-any-number-of-arguments"><code class="language-plaintext highlighter-rouge">*args</code>: Accept Any Number of Arguments</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">sum_all</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
    <span class="c1"># args is a tuple of all arguments passed
</span>    <span class="n">total</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">for</span> <span class="n">num</span> <span class="ow">in</span> <span class="n">args</span><span class="p">:</span>
        <span class="n">total</span> <span class="o">+=</span> <span class="n">num</span>
    <span class="k">return</span> <span class="n">total</span>
 
<span class="nf">print</span><span class="p">(</span><span class="nf">sum_all</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>  <span class="c1"># 15
</span><span class="nf">print</span><span class="p">(</span><span class="nf">sum_all</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">))</span>  <span class="c1"># 30
</span><span class="nf">print</span><span class="p">(</span><span class="nf">sum_all</span><span class="p">(</span><span class="mi">7</span><span class="p">))</span>  <span class="c1"># 7
</span></code></pre></div></div>

<h3 id="kwargs-accept-keyword-arguments"><code class="language-plaintext highlighter-rouge">**kwargs</code>: Accept Keyword Arguments</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">print_info</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
    <span class="c1"># kwargs is a dict of all keyword arguments
</span>    <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">kwargs</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
<span class="nf">print_info</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span> <span class="n">city</span><span class="o">=</span><span class="sh">"</span><span class="s">NYC</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Output:
# name: Alice
# age: 25
# city: NYC
</span></code></pre></div></div>

<h3 id="first-class-functions-passing-functions-as-arguments">First-Class Functions: Passing Functions as Arguments</h3>

<p>In Python, functions are <strong>first-class citizens</strong>: you can pass them to other functions!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">apply_twice</span><span class="p">(</span><span class="n">func</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
    <span class="c1"># func is a function, not a value
</span>    <span class="k">return</span> <span class="nf">func</span><span class="p">(</span><span class="nf">func</span><span class="p">(</span><span class="n">value</span><span class="p">))</span>
 
<span class="k">def</span> <span class="nf">double</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span>
 
<span class="n">result</span> <span class="o">=</span> <span class="nf">apply_twice</span><span class="p">(</span><span class="n">double</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>  <span class="c1"># double(double(3)) = 12
</span> 
<span class="c1"># Lambda: anonymous function (one-liner)
</span><span class="n">result</span> <span class="o">=</span> <span class="nf">apply_twice</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>  <span class="c1"># 7
</span></code></pre></div></div>

<h3 id="pure-functions-vs-side-effects">Pure Functions vs Side Effects</h3>

<p>A <strong>pure function</strong> always returns the same output for the same input and doesn’t change anything in the world. A function with <strong>side effects</strong> does other things (print, modify global variables, etc.):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Pure function: same input = same output, no side effects
</span><span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>  <span class="c1"># Only returns
</span> 
<span class="c1"># Side effect: changes the world (prints)
</span><span class="k">def</span> <span class="nf">add_and_print</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>  <span class="c1"># Side effect!
</span>    <span class="k">return</span> <span class="n">result</span>
 
<span class="c1"># Side effect: modifies global state
</span><span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span>
 
<span class="k">def</span> <span class="nf">increment_counter</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">counter</span>
    <span class="n">counter</span> <span class="o">+=</span> <span class="mi">1</span>  <span class="c1"># Side effect: changes global variable
</span>    <span class="k">return</span> <span class="n">counter</span>
</code></pre></div></div>

<p>For AI engineering, <strong>pure functions are your friend</strong>. They’re easier to test, reason about, and debug.</p>

<h3 id="scope-local-vs-global">Scope: Local vs Global</h3>

<p>Variables have <strong>scope</strong>: where they exist and can be used.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="mi">10</span>  <span class="c1"># Global scope
</span> 
<span class="k">def</span> <span class="nf">change_x</span><span class="p">():</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Local scope-doesn't touch the global x
</span>    <span class="nf">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
 
<span class="nf">change_x</span><span class="p">()</span>  <span class="c1"># Prints 5
</span><span class="nf">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>    <span class="c1"># Still 10! The global x is unchanged
</span> 
<span class="c1"># To modify global, use 'global' keyword
</span><span class="k">def</span> <span class="nf">really_change_x</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">x</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">20</span>
 
<span class="nf">really_change_x</span><span class="p">()</span>
<span class="nf">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>  <span class="c1"># Now 20
</span></code></pre></div></div>

<hr />

<h2 id="part-4-classes--oop---organizing-code">Part 4: Classes &amp; OOP - Organizing Code</h2>

<p>A <strong>class</strong> is a blueprint for objects. <strong>OOP</strong> (Object-Oriented Programming) lets you organize code into reusable, logical chunks.</p>

<h3 id="the-analogy-blueprint-vs-house">The Analogy: Blueprint vs House</h3>

<ul>
  <li>A <strong>class</strong> is like a blueprint for a house (describes rooms, size, layout)</li>
  <li>An <strong>object</strong> (instance) is an actual house built from that blueprint</li>
  <li>You can build 100 houses from one blueprint; each is different (different furniture, paint color), but they all follow the same blueprint</li>
</ul>

<h3 id="basic-class">Basic Class</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Dog</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">):</span>
        <span class="c1"># __init__ = constructor
</span>        <span class="c1"># Runs when you create a new Dog
</span>        <span class="c1"># self = the object being created
</span>        <span class="n">self</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>  <span class="c1"># Instance attribute
</span>        <span class="n">self</span><span class="p">.</span><span class="n">age</span> <span class="o">=</span> <span class="n">age</span>
 
    <span class="k">def</span> <span class="nf">bark</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># Instance method (function inside a class)
</span>        <span class="c1"># Takes self as first parameter
</span>        <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="s"> says woof!</span><span class="sh">"</span>
 
<span class="c1"># Create instances (actual objects)
</span><span class="n">dog1</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">(</span><span class="sh">"</span><span class="s">Buddy</span><span class="sh">"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">dog2</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">(</span><span class="sh">"</span><span class="s">Max</span><span class="sh">"</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
 
<span class="nf">print</span><span class="p">(</span><span class="n">dog1</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>        <span class="c1"># "Buddy"
</span><span class="nf">print</span><span class="p">(</span><span class="n">dog2</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>        <span class="c1"># "Max"
</span><span class="nf">print</span><span class="p">(</span><span class="n">dog1</span><span class="p">.</span><span class="nf">bark</span><span class="p">())</span>      <span class="c1"># "Buddy says woof!"
</span><span class="nf">print</span><span class="p">(</span><span class="n">dog2</span><span class="p">.</span><span class="nf">bark</span><span class="p">())</span>      <span class="c1"># "Max says woof!"
</span></code></pre></div></div>

<h3 id="instance-attributes-vs-class-attributes">Instance Attributes vs Class Attributes</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Dog</span><span class="p">:</span>
    <span class="n">species</span> <span class="o">=</span> <span class="sh">"</span><span class="s">Canis familiaris</span><span class="sh">"</span>  <span class="c1"># Class attribute
</span>    <span class="c1"># ^ Shared by ALL Dogs
</span> 
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>  <span class="c1"># Instance attribute
</span>        <span class="c1"># ^ Unique to each Dog
</span> 
<span class="n">dog1</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">(</span><span class="sh">"</span><span class="s">Buddy</span><span class="sh">"</span><span class="p">)</span>
<span class="n">dog2</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">(</span><span class="sh">"</span><span class="s">Max</span><span class="sh">"</span><span class="p">)</span>
 
<span class="nf">print</span><span class="p">(</span><span class="n">dog1</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>         <span class="c1"># "Buddy"
</span><span class="nf">print</span><span class="p">(</span><span class="n">dog2</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>         <span class="c1"># "Max"
</span><span class="nf">print</span><span class="p">(</span><span class="n">dog1</span><span class="p">.</span><span class="n">species</span><span class="p">)</span>      <span class="c1"># "Canis familiaris" (same for all)
</span><span class="nf">print</span><span class="p">(</span><span class="n">dog2</span><span class="p">.</span><span class="n">species</span><span class="p">)</span>      <span class="c1"># "Canis familiaris" (same for all)
</span></code></pre></div></div>

<h3 id="instance-methods-class-methods-static-methods">Instance Methods, Class Methods, Static Methods</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Dog</span><span class="p">:</span>
    <span class="n">species</span> <span class="o">=</span> <span class="sh">"</span><span class="s">Canis familiaris</span><span class="sh">"</span>
 
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
 
    <span class="k">def</span> <span class="nf">bark</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># Instance method: uses self (the specific dog)
</span>        <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="s"> barks!</span><span class="sh">"</span>
 
    <span class="nd">@classmethod</span>
    <span class="k">def</span> <span class="nf">create_from_dict</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
        <span class="c1"># Class method: uses cls (the class)
</span>        <span class="c1"># Useful for creating objects in different ways
</span>        <span class="k">return</span> <span class="nf">cls</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">])</span>
 
    <span class="nd">@staticmethod</span>
    <span class="k">def</span> <span class="nf">info</span><span class="p">():</span>
        <span class="c1"># Static method: doesn't use self or cls
</span>        <span class="c1"># Just a function that lives in the class
</span>        <span class="k">return</span> <span class="sh">"</span><span class="s">Dogs are loyal animals</span><span class="sh">"</span>
 
<span class="c1"># Usage
</span><span class="n">dog</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">(</span><span class="sh">"</span><span class="s">Buddy</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">dog</span><span class="p">.</span><span class="nf">bark</span><span class="p">())</span>  <span class="c1"># "Buddy barks!"
</span> 
<span class="n">dog2</span> <span class="o">=</span> <span class="n">Dog</span><span class="p">.</span><span class="nf">create_from_dict</span><span class="p">({</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Rex</span><span class="sh">"</span><span class="p">})</span>
<span class="nf">print</span><span class="p">(</span><span class="n">dog2</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>  <span class="c1"># "Rex"
</span> 
<span class="nf">print</span><span class="p">(</span><span class="n">Dog</span><span class="p">.</span><span class="nf">info</span><span class="p">())</span>  <span class="c1"># "Dogs are loyal animals"
</span></code></pre></div></div>

<h3 id="inheritance-code-reuse-via-is-a">Inheritance: Code Reuse via “IS-A”</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Animal</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
 
    <span class="k">def</span> <span class="nf">speak</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="s"> makes a sound</span><span class="sh">"</span>
 
<span class="k">class</span> <span class="nc">Dog</span><span class="p">(</span><span class="n">Animal</span><span class="p">):</span>
    <span class="c1"># Dog inherits from Animal
</span>    <span class="c1"># Dog automatically has name and speak()
</span>    <span class="k">pass</span>
 
<span class="k">class</span> <span class="nc">Cat</span><span class="p">(</span><span class="n">Animal</span><span class="p">):</span>
    <span class="k">pass</span>
 
<span class="n">dog</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">(</span><span class="sh">"</span><span class="s">Buddy</span><span class="sh">"</span><span class="p">)</span>
<span class="n">cat</span> <span class="o">=</span> <span class="nc">Cat</span><span class="p">(</span><span class="sh">"</span><span class="s">Whiskers</span><span class="sh">"</span><span class="p">)</span>
 
<span class="nf">print</span><span class="p">(</span><span class="n">dog</span><span class="p">.</span><span class="nf">speak</span><span class="p">())</span>   <span class="c1"># "Buddy makes a sound"
</span><span class="nf">print</span><span class="p">(</span><span class="n">cat</span><span class="p">.</span><span class="nf">speak</span><span class="p">())</span>   <span class="c1"># "Whiskers makes a sound"
</span></code></pre></div></div>

<h3 id="method-overriding">Method Overriding</h3>

<p>Replace a parent method with a child version:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Animal</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">speak</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sh">"</span><span class="s">Generic sound</span><span class="sh">"</span>
 
<span class="k">class</span> <span class="nc">Dog</span><span class="p">(</span><span class="n">Animal</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">speak</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># Override: replace parent's speak with dog-specific
</span>        <span class="k">return</span> <span class="sh">"</span><span class="s">Woof!</span><span class="sh">"</span>
 
<span class="k">class</span> <span class="nc">Cat</span><span class="p">(</span><span class="n">Animal</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">speak</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sh">"</span><span class="s">Meow!</span><span class="sh">"</span>
 
<span class="n">dog</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">()</span>
<span class="n">cat</span> <span class="o">=</span> <span class="nc">Cat</span><span class="p">()</span>
 
<span class="nf">print</span><span class="p">(</span><span class="n">dog</span><span class="p">.</span><span class="nf">speak</span><span class="p">())</span>  <span class="c1"># "Woof!" (Dog's version)
</span><span class="nf">print</span><span class="p">(</span><span class="n">cat</span><span class="p">.</span><span class="nf">speak</span><span class="p">())</span>  <span class="c1"># "Meow!" (Cat's version)
</span></code></pre></div></div>

<h3 id="property-computed-attributes"><code class="language-plaintext highlighter-rouge">@property</code>: Computed Attributes</h3>

<p>Make a method look like an attribute:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Circle</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">radius</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">radius</span> <span class="o">=</span> <span class="n">radius</span>
 
    <span class="nd">@property</span>
    <span class="k">def</span> <span class="nf">area</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># Looks like self.area, but it's computed on the fly
</span>        <span class="k">return</span> <span class="mf">3.14159</span> <span class="o">*</span> <span class="n">self</span><span class="p">.</span><span class="n">radius</span> <span class="o">**</span> <span class="mi">2</span>
 
<span class="n">circle</span> <span class="o">=</span> <span class="nc">Circle</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">circle</span><span class="p">.</span><span class="n">area</span><span class="p">)</span>  <span class="c1"># 78.54975 (computed, not stored)
</span> 
<span class="n">circle</span><span class="p">.</span><span class="n">radius</span> <span class="o">=</span> <span class="mi">10</span>
<span class="nf">print</span><span class="p">(</span><span class="n">circle</span><span class="p">.</span><span class="n">area</span><span class="p">)</span>  <span class="c1"># 314.159 (automatically recomputed)
</span></code></pre></div></div>

<h3 id="abstractmethod-enforce-subclass-implementation"><code class="language-plaintext highlighter-rouge">@abstractmethod</code>: Enforce Subclass Implementation</h3>

<p>Force all subclasses to implement certain methods:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">abc</span> <span class="kn">import</span> <span class="n">ABC</span><span class="p">,</span> <span class="n">abstractmethod</span>
 
<span class="k">class</span> <span class="nc">Animal</span><span class="p">(</span><span class="n">ABC</span><span class="p">):</span>
    <span class="nd">@abstractmethod</span>
    <span class="k">def</span> <span class="nf">speak</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># Subclasses MUST implement this
</span>        <span class="k">pass</span>
 
<span class="k">class</span> <span class="nc">Dog</span><span class="p">(</span><span class="n">Animal</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">speak</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sh">"</span><span class="s">Woof!</span><span class="sh">"</span>
 
<span class="n">dog</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">()</span>
<span class="nf">print</span><span class="p">(</span><span class="n">dog</span><span class="p">.</span><span class="nf">speak</span><span class="p">())</span>  <span class="c1"># "Woof!"
</span> 
<span class="c1"># This would fail:
# animal = Animal()  # TypeError: can't instantiate abstract class
</span></code></pre></div></div>

<h3 id="dunder-methods-magic-methods">Dunder Methods: Magic Methods</h3>

<p>Special methods that make your class work with Python’s built-ins:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Dog</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
        <span class="n">self</span><span class="p">.</span><span class="n">age</span> <span class="o">=</span> <span class="n">age</span>
 
    <span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># str(dog) uses this
</span>        <span class="c1"># Human-readable
</span>        <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Dog named </span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="sh">"</span>
 
    <span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># repr(dog) uses this
</span>        <span class="c1"># For debugging, more detailed
</span>        <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Dog(name=</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">name</span><span class="si">!r}</span><span class="s">, age=</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">age</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span>
 
    <span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># len(dog) uses this
</span>        <span class="k">return</span> <span class="n">self</span><span class="p">.</span><span class="n">age</span>
 
    <span class="k">def</span> <span class="nf">__eq__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
        <span class="c1"># dog1 == dog2 uses this
</span>        <span class="k">return</span> <span class="n">self</span><span class="p">.</span><span class="n">name</span> <span class="o">==</span> <span class="n">other</span><span class="p">.</span><span class="n">name</span> <span class="ow">and</span> <span class="n">self</span><span class="p">.</span><span class="n">age</span> <span class="o">==</span> <span class="n">other</span><span class="p">.</span><span class="n">age</span>
 
    <span class="k">def</span> <span class="nf">__add__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
        <span class="c1"># dog1 + dog2 uses this
</span>        <span class="c1"># Create a new dog with combined names
</span>        <span class="k">return</span> <span class="nc">Dog</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">name</span> <span class="o">+</span> <span class="n">other</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="nf">max</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">age</span><span class="p">,</span> <span class="n">other</span><span class="p">.</span><span class="n">age</span><span class="p">))</span>
 
<span class="n">dog1</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">(</span><span class="sh">"</span><span class="s">Buddy</span><span class="sh">"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">dog2</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">(</span><span class="sh">"</span><span class="s">Max</span><span class="sh">"</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
 
<span class="nf">print</span><span class="p">(</span><span class="nf">str</span><span class="p">(</span><span class="n">dog1</span><span class="p">))</span>    <span class="c1"># "Dog named Buddy"
</span><span class="nf">print</span><span class="p">(</span><span class="nf">repr</span><span class="p">(</span><span class="n">dog1</span><span class="p">))</span>   <span class="c1"># "Dog(name='Buddy', age=5)"
</span><span class="nf">print</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">dog1</span><span class="p">))</span>    <span class="c1"># 5
</span><span class="nf">print</span><span class="p">(</span><span class="n">dog1</span> <span class="o">==</span> <span class="n">dog2</span><span class="p">)</span> <span class="c1"># False
</span><span class="n">dog3</span> <span class="o">=</span> <span class="n">dog1</span> <span class="o">+</span> <span class="n">dog2</span>
<span class="nf">print</span><span class="p">(</span><span class="n">dog3</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>    <span class="c1"># "BuddyMax"
</span></code></pre></div></div>

<h3 id="composition-vs-inheritance-has-a-vs-is-a">Composition vs Inheritance: “HAS-A” vs “IS-A”</h3>

<p><strong>Inheritance:</strong> “Dog IS-A Animal”</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Animal</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">breathe</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sh">"</span><span class="s">Breathing...</span><span class="sh">"</span>
 
<span class="k">class</span> <span class="nc">Dog</span><span class="p">(</span><span class="n">Animal</span><span class="p">):</span>
    <span class="c1"># Dog inherits breathe() from Animal
</span>    <span class="k">pass</span>
 
<span class="n">dog</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">()</span>
<span class="nf">print</span><span class="p">(</span><span class="n">dog</span><span class="p">.</span><span class="nf">breathe</span><span class="p">())</span>  <span class="c1"># "Breathing..."
</span></code></pre></div></div>

<p><strong>Composition:</strong> “Dog HAS-A Tail”</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Tail</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">wag</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sh">"</span><span class="s">Wagging...</span><span class="sh">"</span>
 
<span class="k">class</span> <span class="nc">Dog</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">tail</span> <span class="o">=</span> <span class="nc">Tail</span><span class="p">()</span>  <span class="c1"># Dog HAS a Tail
</span> 
<span class="n">dog</span> <span class="o">=</span> <span class="nc">Dog</span><span class="p">()</span>
<span class="nf">print</span><span class="p">(</span><span class="n">dog</span><span class="p">.</span><span class="n">tail</span><span class="p">.</span><span class="nf">wag</span><span class="p">())</span>  <span class="c1"># "Wagging..."
</span></code></pre></div></div>

<p><strong>When to use:</strong></p>
<ul>
  <li>Inheritance if “IS-A” relationship (Dog IS-A Animal)</li>
  <li>Composition if “HAS-A” relationship (Dog HAS-A Tail)</li>
</ul>

<hr />

<h2 id="the-project-gradient-descent-from-scratch">The Project: Gradient Descent from Scratch</h2>

<p>Now you’ll build a real machine learning model using <strong>only</strong> lists, loops, functions, and classes. No NumPy, no PyTorch-just pure Python math.</p>

<h3 id="what-youre-building">What You’re Building</h3>

<p>A <code class="language-plaintext highlighter-rouge">LinearRegressionModel</code> that:</p>
<ol>
  <li>Makes predictions: <code class="language-plaintext highlighter-rouge">y = weight * x + bias</code></li>
  <li>Measures loss: mean squared error</li>
  <li>Trains itself using <strong>gradient descent</strong>: a real AI algorithm</li>
</ol>

<h3 id="the-math-plain-english">The Math (Plain English)</h3>

<ul>
  <li><strong>Prediction:</strong> Draw a line through data. The line is <code class="language-plaintext highlighter-rouge">y = weight * x + bias</code></li>
  <li><strong>Loss:</strong> How far is each prediction from the actual value? Average all the errors squared.</li>
  <li><strong>Gradient:</strong> “Which direction should I move weight and bias to reduce loss?” (calculus, but Python does it)</li>
  <li><strong>Training:</strong> Move in that direction, repeat 100 times. The model gets better each time.</li>
</ul>

<h3 id="complete-working-code">Complete Working Code</h3>

<p>Save this as <code class="language-plaintext highlighter-rouge">linear_regression.py</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">LinearRegressionModel</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">initial_weight</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">initial_bias</span><span class="o">=</span><span class="mf">0.0</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">
        Initialize the model with weight and bias.
 
        weight = slope of the line (how steep)
        bias = y-intercept (where the line crosses y-axis)
        </span><span class="sh">"""</span>
        <span class="n">self</span><span class="p">.</span><span class="n">weight</span> <span class="o">=</span> <span class="n">initial_weight</span>
        <span class="n">self</span><span class="p">.</span><span class="n">bias</span> <span class="o">=</span> <span class="n">initial_bias</span>
        <span class="n">self</span><span class="p">.</span><span class="n">training_steps</span> <span class="o">=</span> <span class="mi">0</span>
 
    <span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">
        Make a prediction for a single input.
 
        Formula: y_pred = weight * x + bias
 
        This is just the equation of a line!
        </span><span class="sh">"""</span>
        <span class="k">return</span> <span class="n">self</span><span class="p">.</span><span class="n">weight</span> <span class="o">*</span> <span class="n">x</span> <span class="o">+</span> <span class="n">self</span><span class="p">.</span><span class="n">bias</span>
 
    <span class="k">def</span> <span class="nf">loss</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x_list</span><span class="p">,</span> <span class="n">y_list</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">
        Calculate Mean Squared Error (MSE).
 
        MSE = average of (prediction - actual)^2
 
        Lower loss = better predictions.
        </span><span class="sh">"""</span>
        <span class="n">total_error</span> <span class="o">=</span> <span class="mf">0.0</span>
 
        <span class="c1"># For each training example
</span>        <span class="k">for</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="ow">in</span> <span class="nf">zip</span><span class="p">(</span><span class="n">x_list</span><span class="p">,</span> <span class="n">y_list</span><span class="p">):</span>
            <span class="n">prediction</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="nf">predict</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
            <span class="n">error</span> <span class="o">=</span> <span class="n">prediction</span> <span class="o">-</span> <span class="n">y</span>
            <span class="n">squared_error</span> <span class="o">=</span> <span class="n">error</span> <span class="o">**</span> <span class="mi">2</span>
            <span class="n">total_error</span> <span class="o">+=</span> <span class="n">squared_error</span>
 
        <span class="c1"># Return the average
</span>        <span class="n">n</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">x_list</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">total_error</span> <span class="o">/</span> <span class="n">n</span>
 
    <span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x_list</span><span class="p">,</span> <span class="n">y_list</span><span class="p">,</span> <span class="n">lr</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">100</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">
        Train the model using gradient descent.
 
        lr = learning rate (how big a step to take)
        epochs = how many times to go through the data
 
        Gradient descent: repeatedly move in the direction of lower loss.
        </span><span class="sh">"""</span>
        <span class="n">n</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">x_list</span><span class="p">)</span>
 
        <span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">epochs</span><span class="p">):</span>
            <span class="c1"># Calculate gradients (direction to move)
</span>            <span class="n">dw</span> <span class="o">=</span> <span class="mf">0.0</span>  <span class="c1"># Change in weight
</span>            <span class="n">db</span> <span class="o">=</span> <span class="mf">0.0</span>  <span class="c1"># Change in bias
</span> 
            <span class="c1"># Go through all training examples
</span>            <span class="k">for</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="ow">in</span> <span class="nf">zip</span><span class="p">(</span><span class="n">x_list</span><span class="p">,</span> <span class="n">y_list</span><span class="p">):</span>
                <span class="n">prediction</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="nf">predict</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
                <span class="n">error</span> <span class="o">=</span> <span class="n">prediction</span> <span class="o">-</span> <span class="n">y</span>
 
                <span class="c1"># Gradient formulas for linear regression
</span>                <span class="c1"># (These come from calculus, but we just use them)
</span>                <span class="n">dw</span> <span class="o">+=</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">x</span> <span class="o">*</span> <span class="n">error</span>
                <span class="n">db</span> <span class="o">+=</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">error</span>
 
            <span class="c1"># Average the gradients
</span>            <span class="n">dw</span> <span class="o">=</span> <span class="n">dw</span> <span class="o">/</span> <span class="n">n</span>
            <span class="n">db</span> <span class="o">=</span> <span class="n">db</span> <span class="o">/</span> <span class="n">n</span>
 
            <span class="c1"># Update: move opposite to the gradient
</span>            <span class="c1"># (Opposite = towards lower loss)
</span>            <span class="n">self</span><span class="p">.</span><span class="n">weight</span> <span class="o">-=</span> <span class="n">lr</span> <span class="o">*</span> <span class="n">dw</span>
            <span class="n">self</span><span class="p">.</span><span class="n">bias</span> <span class="o">-=</span> <span class="n">lr</span> <span class="o">*</span> <span class="n">db</span>
 
            <span class="c1"># Track that we did a training step
</span>            <span class="n">self</span><span class="p">.</span><span class="n">training_steps</span> <span class="o">+=</span> <span class="mi">1</span>
 
            <span class="c1"># Print progress every 20 epochs
</span>            <span class="nf">if </span><span class="p">(</span><span class="n">epoch</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">20</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
                <span class="n">current_loss</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="nf">loss</span><span class="p">(</span><span class="n">x_list</span><span class="p">,</span> <span class="n">y_list</span><span class="p">)</span>
                <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Epoch </span><span class="si">{</span><span class="n">epoch</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="s">: Loss = </span><span class="si">{</span><span class="n">current_loss</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
 
    <span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">
        User-friendly representation.
 
        Used by print(model)
        </span><span class="sh">"""</span>
        <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">LinearRegressionModel(weight=</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">weight</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">, bias=</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">bias</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span>
 
    <span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">
        Developer-friendly representation.
 
        Used by repr(model) for debugging
        </span><span class="sh">"""</span>
        <span class="k">return</span> <span class="sa">f</span><span class="sh">"</span><span class="s">LinearRegressionModel(weight=</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">weight</span><span class="si">}</span><span class="s">, bias=</span><span class="si">{</span><span class="n">self</span><span class="p">.</span><span class="n">bias</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span>
 
    <span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="sh">"""</span><span class="s">
        Return number of training steps.
 
        Used by len(model)
        </span><span class="sh">"""</span>
        <span class="k">return</span> <span class="n">self</span><span class="p">.</span><span class="n">training_steps</span>
 
 
<span class="c1"># Example: Train the model
</span><span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span>
    <span class="c1"># Create some sample data
</span>    <span class="c1"># The true relationship is: y = 2x + 1 (with noise)
</span>    <span class="n">x_data</span> <span class="o">=</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">4.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">]</span>
    <span class="n">y_data</span> <span class="o">=</span> <span class="p">[</span><span class="mf">3.1</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">,</span> <span class="mf">7.2</span><span class="p">,</span> <span class="mf">9.1</span><span class="p">,</span> <span class="mf">10.9</span><span class="p">]</span>
 
    <span class="c1"># Create the model
</span>    <span class="n">model</span> <span class="o">=</span> <span class="nc">LinearRegressionModel</span><span class="p">()</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Before training: </span><span class="si">{</span><span class="n">model</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">()</span>
 
    <span class="c1"># Train it
</span>    <span class="n">model</span><span class="p">.</span><span class="nf">train</span><span class="p">(</span><span class="n">x_data</span><span class="p">,</span> <span class="n">y_data</span><span class="p">,</span> <span class="n">lr</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">()</span>
 
    <span class="c1"># See the result
</span>    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">After training: </span><span class="si">{</span><span class="n">model</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Total training steps: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">model</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">()</span>
 
    <span class="c1"># Make predictions
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Predictions:</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">x_data</span><span class="p">:</span>
        <span class="n">prediction</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="nf">predict</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">x=</span><span class="si">{</span><span class="n">x</span><span class="si">}</span><span class="s">: predicted y=</span><span class="si">{</span><span class="n">prediction</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="how-to-run-it">How to Run It</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python3 linear_regression.py
</code></pre></div></div>

<h3 id="expected-output">Expected Output</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Before training: LinearRegressionModel(weight=0.0000, bias=0.0000)
 
Epoch 20: Loss = 8.4521
Epoch 40: Loss = 3.1842
Epoch 60: Loss = 1.5921
Epoch 80: Loss = 0.8521
Epoch 100: Loss = 0.5621
 
After training: LinearRegressionModel(weight=1.9821, bias=1.0342)
Total training steps: 100
 
Predictions:
x=1.0: predicted y=3.02
x=2.0: predicted y=5.00
x=3.0: predicted y=6.98
x=4.0: predicted y=8.97
x=5.0: predicted y=10.95
</code></pre></div></div>

<p>Notice: The model learned weight ≈ 2 and bias ≈ 1, which matches the true relationship <code class="language-plaintext highlighter-rouge">y = 2x + 1</code>!</p>

<hr />

<h2 id="whats-next-day-5---type-hints--documentation">What’s Next: Day 5 - Type Hints &amp; Documentation</h2>

<p>Day 5 introduces <strong>type hints</strong>: annotations that tell Python (and other programmers) what types variables and functions expect.</p>]]></content><author><name>Edward Praveen</name></author><category term="dl-llm-systems" /><category term="python" /><category term="oop" /><category term="object-oriented" /><category term="collections" /><category term="control-flow" /><category term="ml-engineering" /><summary type="html"><![CDATA[Part of my 180-day AI Engineering journey - explained for beginners]]></summary></entry></feed>