<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Notes on Omar Karim Chtioui</title>
    <link>https://blog.okch.pw/categories/notes/</link>
    <description>Recent content in Notes on Omar Karim Chtioui</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Fri, 22 May 2026 09:30:00 +0200</lastBuildDate>
    <atom:link href="https://blog.okch.pw/categories/notes/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>The RAM that never came back</title>
      <link>https://blog.okch.pw/posts/the-ram-that-never-came-back/</link>
      <pubDate>Fri, 22 May 2026 09:30:00 +0200</pubDate>
      <guid>https://blog.okch.pw/posts/the-ram-that-never-came-back/</guid>
      <description>Colima quietly held onto every gigabyte I ever handed it. Chasing why led me to open a Lima issue — and eventually to OrbStack.</description>
      <content:encoded><![CDATA[<p>I run Docker on my Mac through <a href="https://github.com/abiosoft/colima">Colima</a>,
which sits on top of <a href="https://github.com/lima-vm/lima">Lima</a> and boots a
small Linux VM using Apple&rsquo;s Virtualization.framework (the <code>vz</code> backend).
For a long time it just worked. Then I started noticing my Mac getting
slower the longer my workday went on — and the cause turned out to be a bug
worth writing down.</p>
<h2 id="the-symptom">The symptom</h2>
<p>Build a few images, run a memory-hungry container, do some real work. At
some point the VM touches the memory ceiling I configured for it. Fine —
that&rsquo;s what the ceiling is for.</p>
<p>The problem is what happens <em>after</em>. The containers exit, the build
finishes, the guest goes quiet — and the host process backing the VM stays
pinned at that peak. In Activity Monitor it shows up as a process holding
several gigabytes of RAM with nothing running inside it. It never comes
back down. The only thing that frees the memory is restarting the VM.</p>
<p>So my &ldquo;8 GB Docker VM&rdquo; wasn&rsquo;t an 8 GB <em>budget</em>. It was an 8 GB <em>high-water
mark</em>, and once I hit it, those 8 GB were gone from the rest of the system
until I remembered to bounce Colima.</p>
<h2 id="what-i-expected">What I expected</h2>
<p>A fresh VM idles at maybe 800 MB–1 GB. My mental model was simple: memory
usage should track what the guest is actually doing. Spike under load,
settle back down when idle. That&rsquo;s how a process is supposed to behave, and
it&rsquo;s how the VM&rsquo;s <em>guest</em> behaves — Linux inside the VM frees the pages
just fine.</p>
<p>The host side is what doesn&rsquo;t let go.</p>
<h2 id="opening-the-issue">Opening the issue</h2>
<p>I went looking and found I wasn&rsquo;t the first to hit this — there was already
a discussion describing the exact behavior I was seeing. It matched closely
enough that I opened a tracking issue on Lima to get it properly on the
radar:</p>
<blockquote>
<p><a href="https://github.com/lima-vm/lima/issues/2789"><strong>lima-vm/lima#2789</strong> — Memory is not being freed on VZ</a></p>
</blockquote>
<p>What I learned from the thread is the interesting part.</p>
<h2 id="why-it-happens">Why it happens</h2>
<p>The first answer from a maintainer reframed it for me: memory you
<em>dedicate</em> to a virtual machine being locked is, in a sense, normal. A VM
isn&rsquo;t an ordinary process; the host commits that memory to it. The
mechanism that would let it shrink back has a name —
<a href="https://en.wikipedia.org/wiki/Memory_ballooning">memory ballooning</a> — and
it isn&rsquo;t on by default.</p>
<p>The next answer was blunter, and the one that actually explained my Mac:
this is a <strong>macOS bug</strong>. It&rsquo;s not specific to Lima or Colima. Docker
Desktop exhibits the same thing, because they all sit on the same Apple
framework. The one tool that <em>doesn&rsquo;t</em> — <a href="https://orbstack.dev/">OrbStack</a> —
got there by building its own dynamic memory management.</p>
<p>OrbStack did write a <a href="https://orbstack.dev/blog/dynamic-memory">blog post about it</a>,
and it&rsquo;s worth reading — but notice what it doesn&rsquo;t say. It explains <em>that</em>
the VM&rsquo;s footprint grows and shrinks on demand, and why that matters; it
doesn&rsquo;t really explain <em>how</em> they pulled it off on the very same Apple
framework that leaves everyone else stuck. The post is about the result,
not the recipe. Whatever they&rsquo;re doing, they&rsquo;re keeping it to themselves —
which is fair, but it means the rest of the ecosystem can&rsquo;t just copy it.</p>
<h2 id="the-balloon-that-wont-deflate">The balloon that won&rsquo;t deflate</h2>
<p>Apple&rsquo;s framework <em>does</em> expose a balloon device
(<code>VZVirtioTraditionalMemoryBalloonDevice</code>), and Lima even creates one. In
theory you shrink the VM&rsquo;s footprint by lowering its
<code>targetVirtualMachineMemorySize</code> while it runs: the guest hands unused
pages back, the host reclaims them.</p>
<p>In practice, on the <code>vz</code> backend, it doesn&rsquo;t reclaim anything. My short
contribution to the thread was exactly that observation — the device is
<em>available</em>, it just doesn&rsquo;t <em>work</em>.</p>
<p>That&rsquo;s not me guessing anymore. Over a year later, someone posted an
instrumented test on macOS 26: a tiny VM, a controlled allocation, host RSS
sampled once a second. The guest cooperates fully and frees its memory; the
host process&rsquo;s memory footprint is <strong>monotonic across the whole run — it
never decreases</strong>. They&rsquo;ve since filed an Apple Feedback report. The issue
is still open.</p>
<h2 id="where-i-landed">Where I landed</h2>
<p>Knowing the root cause didn&rsquo;t fix my Mac. The realistic options were:</p>
<ul>
<li><strong>Restart the VM periodically</strong> to reclaim memory — a chore, and easy to
forget until the machine is already crawling.</li>
<li><strong>Cap the VM&rsquo;s memory low enough</strong> that the locked amount doesn&rsquo;t hurt —
which just trades one problem for another, since now heavy builds are
starved.</li>
<li><strong>Use the tool that solved it.</strong></li>
</ul>
<p>I switched to OrbStack. Its dynamic memory management is the entire feature
I was missing: the footprint expands under load and <em>actually contracts</em>
when the work is done. For the way I use containers on a laptop, that&rsquo;s not
a nice-to-have — it&rsquo;s the whole point.</p>
<h2 id="takeaways">Takeaways</h2>
<ol>
<li><strong>&ldquo;Locked&rdquo; and &ldquo;leaked&rdquo; look identical in Activity Monitor.</strong> A VM
holding its peak forever isn&rsquo;t leaking; it&rsquo;s a platform that never
reclaims. Same symptom, different cause — and the fix is different too.</li>
<li><strong>A feature existing in the SDK doesn&rsquo;t mean it works.</strong> Lima creates
the balloon device; the balloon still doesn&rsquo;t deflate. &ldquo;Available&rdquo; is
not &ldquo;functional.&rdquo;</li>
<li><strong>It&rsquo;s worth filing the issue even when it&rsquo;s not the project&rsquo;s fault.</strong>
#2789 isn&rsquo;t a Lima bug, strictly — but having a public, linkable thread
is how the next person finds the explanation in twenty minutes instead
of a week.</li>
</ol>
<p>The RAM still doesn&rsquo;t come back on <code>vz</code>. But at least now I know why — and
I&rsquo;m running something that doesn&rsquo;t need it to.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Instagram was slow on my Wi-Fi. The cause wasn&#39;t what I thought.</title>
      <link>https://blog.okch.pw/posts/instagram-slow-wifi/</link>
      <pubDate>Fri, 22 May 2026 08:00:00 +0200</pubDate>
      <guid>https://blog.okch.pw/posts/instagram-slow-wifi/</guid>
      <description>Instagram felt sluggish only on Wi-Fi. I blamed IPv6 and NextDNS — and was wrong. Here&amp;rsquo;s how I diagnosed it with data instead of vibes.</description>
      <content:encoded><![CDATA[<p>My setup: Sky Wifi FTTH in Italy, Keenetic Titan (KN-1811) plugged directly
into the ONT and speaking MAP-T natively, NextDNS configured as the content
filter on the router. Instagram on my phone felt sluggish on Wi-Fi while
everything else — web browsing, video streaming, work tools — seemed fine.</p>
<p>First instinct: blame IPv6.</p>
<p>It was a reasonable instinct, and it was wrong. Here&rsquo;s how I worked through it.</p>
<h2 id="the-plausible-hypothesis-nextdns-over-ipv6">The plausible hypothesis: NextDNS over IPv6</h2>
<p>Three facts were stacked in favor of an IPv6-related explanation:</p>
<ol>
<li>
<p><strong>NextDNS has a documented history of slow resolution or timeouts over IPv6
from certain regions.</strong> The <a href="https://help.nextdns.io/t/g9yqkyq/timeouts-and-bad-performance-on-nextdns-with-ipv6">help center thread</a>
has reports from users in New Zealand, the Netherlands, and elsewhere
describing 10-second resolution stalls that disappear the moment IPv6 is
disabled.</p>
</li>
<li>
<p><strong>Sky Wifi is essentially an IPv6-native network.</strong> Sky adopted MAP-T to
deal with IPv4 exhaustion: IPv4 packets are encapsulated inside IPv6 and
sent across Sky&rsquo;s network, where the carrier maps them to a shared pool of
IPv4 addresses on egress. My Keenetic shows the MAP-T address explicitly
(<code>198.51.100.42</code>, mask <code>255.255.255.255</code>) and a delegated IPv6 prefix
(<code>2001:db8:abcd::/48</code>).</p>
</li>
<li>
<p><strong>Instagram resolves to IPv6 by default.</strong> A quick <code>nslookup</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">$ nslookup -type=AAAA instagram.com
</span></span><span class="line"><span class="cl">instagram.com  has AAAA address 2a03:2880:f26d:e9:face:b00c:0:4420
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ nslookup -type=A instagram.com
</span></span><span class="line"><span class="cl">Address: 157.240.203.174
</span></span></code></pre></div><p>That <code>face:b00c</code> is Meta&rsquo;s signature. Modern phones use Happy Eyeballs:
when both A and AAAA records resolve in reasonable time, they almost always
prefer the AAAA result. So Instagram traffic from my phone goes over IPv6
end-to-end.</p>
</li>
</ol>
<p>Chain them and you get: phone → DNS lookup of AAAA via NextDNS over IPv6 →
(if slow) → stalled page loads of Instagram specifically, while IPv4-only or
mixed sites feel fine. The shape of the bug matched the symptom.</p>
<p>Time to test it.</p>
<h2 id="the-data-nextdns-diag">The data: NextDNS diag</h2>
<p>NextDNS ships a CLI diagnostic that pings every PoP over both IPv4 and IPv6,
traces routes, and posts a shareable report:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sh -c <span class="s2">&#34;</span><span class="k">$(</span>curl -s https://nextdns.io/diag<span class="k">)</span><span class="s2">&#34;</span>
</span></span></code></pre></div><p>My report came back like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">Testing IPv6 connectivity
</span></span><span class="line"><span class="cl">  available: true
</span></span><span class="line"><span class="cl">Fetching https://test.nextdns.io
</span></span><span class="line"><span class="cl">  status: ok
</span></span><span class="line"><span class="cl">  protocol: DOH
</span></span><span class="line"><span class="cl">  server: anexia-mil-1
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Fetching PoP name for ultra low latency primary IPv4 (ipv4.dns1.nextdns.io)
</span></span><span class="line"><span class="cl">  zepto-mil: 19.361ms
</span></span><span class="line"><span class="cl">Fetching PoP name for ultra low latency primary IPv6 (ipv6.dns1.nextdns.io)
</span></span><span class="line"><span class="cl">  zepto-mil: 18.738ms
</span></span><span class="line"><span class="cl">Fetching PoP name for anycast primary IPv4 (45.90.28.0)
</span></span><span class="line"><span class="cl">  zepto-mil: 18.734ms
</span></span><span class="line"><span class="cl">Fetching PoP name for anycast primary IPv6 (2a07:a8c0::)
</span></span><span class="line"><span class="cl">  zepto-mil: 18.314ms
</span></span></code></pre></div><p>All four PoPs landed in Milan at 18–22 ms over both stacks. No fetch errors,
no timeouts, no packet loss. IPv6 latency was if anything a hair <em>better</em> than
IPv4 in some pings. Test fetch over DoH against <code>anexia-mil-1</code> succeeded
cleanly.</p>
<p>DNS was not the problem. Hypothesis dead.</p>
<h2 id="brief-detour-the-ipv6-prefix-red-herring">Brief detour: the IPv6 prefix red herring</h2>
<p>While reading the diag I got briefly distracted by my IPv6 prefix
(<code>2001:db8:abcd::/...</code>) and the upstream hop (<code>2001:db8:f00d::1</code>) — I didn&rsquo;t
recognize them as belonging to Sky, and floated the theory that maybe a VPN
or tunnel was in the path. The Keenetic dashboard quickly killed that: the
connection is labeled &ldquo;Ethernet / MAP-T&rdquo; and the prefix is just Sky/Open Fiber&rsquo;s
allocation. Nothing exotic. Lesson re-learned: unrecognized prefix ≠ suspicious
prefix.</p>
<h2 id="what-was-actually-wrong-24-ghz-wi-fi">What was actually wrong: 2.4 GHz Wi-Fi</h2>
<figure>
    <img loading="lazy" src="instagram-waiting-skeleton.jpg"
         alt="The &#39;waiting skeleton&#39; meme — a skeleton sitting on a bench as if it has been waiting forever — captioned &#39;WAITING FOR INSTAGRAM TO LOAD OVER 2.4 GHz WI-FI&#39;"/> <figcaption>
            <p>The lived experience, dramatized.</p>
        </figcaption>
</figure>

<p>With DNS ruled out and the WAN side clean, the remaining suspects were the
Wi-Fi link and the Meta-side IPv6 path. The Keenetic dashboard made the Wi-Fi
issue obvious:</p>
<ul>
<li><strong>2.4 GHz: Channel 4, width 40 MHz.</strong> This is the big one. The 2.4 GHz band
has 11 channels in North America, 13 in Europe, but only <strong>three
non-overlapping 20 MHz channels: 1, 6, and 11</strong>. A 40 MHz wide channel
consumes two of those three slots, so in any building with neighbors running
Wi-Fi you&rsquo;ll collide with most of them. In an Italian apartment block,
that&rsquo;s a lot of collisions and a lot of retransmits.</li>
<li><strong>Same SSID on both 2.4 and 5 GHz with band steering set to &ldquo;By default&rdquo;.</strong>
&ldquo;By default&rdquo; in Keenetic is passive — it suggests, doesn&rsquo;t enforce. Phones,
especially after a sleep/wake cycle in a marginal 5 GHz spot, can settle on
2.4 GHz and stay there even after returning to a strong 5 GHz area.</li>
<li><strong>5 GHz on Channel 104, an 80 MHz DFS channel.</strong> DFS channels share spectrum
with weather and aviation radar. When the router detects radar, it has to
vacate the channel and clients are briefly dropped — devices that reconnect
in that window may land on 2.4 GHz and stick.</li>
<li><strong>15 wireless clients connected.</strong> Some inevitably on 2.4 GHz, all contending
for the same airtime.</li>
</ul>
<p>Instagram is exactly the kind of workload that suffers on a degraded link: lots
of TLS handshakes to image and video servers, large transfers per request,
intolerance to latency spikes. Meanwhile DNS, small requests, and even
single-stream video can paper over a flaky radio because their packets are
small or their transports buffer aggressively.</p>
<h2 id="the-fix">The fix</h2>
<p>Three changes:</p>
<ol>
<li><strong>2.4 GHz channel width: 40 MHz → 20 MHz</strong>, channel set to 1, 6, or 11
based on <code>Wi-Fi Monitor</code> neighbor scan.</li>
<li><strong>Band steering: <em>By default</em> → <em>Prefer 5 GHz</em></strong> (the more aggressive
option in Keenetic). For known-troublesome clients, the alternative is to
split into two SSIDs and pin them to 5 GHz manually.</li>
<li><strong>5 GHz channel: 104 → 36</strong> (non-DFS), accepting slightly more potential
congestion in exchange for no radar evictions.</li>
</ol>
<p><code>802.11r/k/v</code> were already enabled, which is the right default for roaming
assist; the radio config was the problem, not the roaming logic.</p>
<figure>
    <img loading="lazy" src="keenetic-roaming-band-steering.png"
         alt="Keenetic Titan &#39;Roaming for Wireless Clients&#39; page: Fast transition (802.11r) enabled for both 2.4 and 5 GHz, Radio Resource &amp; BSS Transition Management (802.11k/v) checked, and Band Steering set to Prefer 5 GHz"/> <figcaption>
            <p>The roaming page after the fix: 802.11r/k/v left on, with band steering switched to <em>Prefer 5 GHz</em>.</p>
        </figcaption>
</figure>

<p>Result: Instagram returned to normal.</p>
<h2 id="takeaways">Takeaways</h2>
<ol>
<li><strong>Diagnose with data, not vibes.</strong> The IPv6/NextDNS theory was plausible
and ultimately wrong. Twenty seconds of <code>nextdns/diag</code> saved hours of
disabling IPv6 across the LAN and chasing nothing.</li>
<li><strong>DNS healthy ≠ network healthy.</strong> A clean DNS path tells you nothing
about the Wi-Fi link, the MTU on the WAN, or the CDN routing from ISP to
destination.</li>
<li><strong>&ldquo;Same SSID + band steering: By default&rdquo; is not a guarantee</strong> that
dual-band clients land on 5 GHz. Verify in the client list which radio
each device is on. Don&rsquo;t assume the router is making the right choice.</li>
<li><strong>40 MHz on 2.4 GHz is almost always wrong</strong> in a multi-tenant
environment. The default in some router UIs; an antipattern in practice.</li>
</ol>
<p>The IPv6 angle wasn&rsquo;t crazy — it just wasn&rsquo;t this story.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Notes on Keeping a Blog</title>
      <link>https://blog.okch.pw/posts/notes-on-keeping-a-blog/</link>
      <pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate>
      <guid>https://blog.okch.pw/posts/notes-on-keeping-a-blog/</guid>
      <description>A few principles I want to hold onto so this site doesn&amp;rsquo;t go stale again.</description>
      <content:encoded><![CDATA[<p>Every personal site I&rsquo;ve ever started eventually went quiet. Here are the
rules I&rsquo;m writing down so this one doesn&rsquo;t.</p>
<h2 id="lower-the-bar">Lower the bar</h2>
<p>A post does not need to be an essay. A useful link, a fixed bug, a paragraph
of thinking — all of it counts. The blog is a notebook, not a magazine.</p>
<h2 id="publish-before-its-perfect">Publish before it&rsquo;s perfect</h2>
<p>Drafts that wait for polish never ship. I&rsquo;d rather publish something rough
and revise it in place than hoard it in a folder forever.</p>
<h2 id="write-for-one-reader">Write for one reader</h2>
<p>Usually that reader is me, six months from now, having forgotten everything.
If a post would help that person, it&rsquo;s worth writing.</p>
<h2 id="keep-the-machinery-boring">Keep the machinery boring</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">hugo new content posts/some-idea.md
</span></span><span class="line"><span class="cl"><span class="c1"># write</span>
</span></span><span class="line"><span class="cl">git commit -am <span class="s2">&#34;post: some idea&#34;</span> <span class="o">&amp;&amp;</span> git push
</span></span></code></pre></div><p>Three steps from idea to live. The moment publishing gets complicated, the
writing stops.</p>
<hr>
<p>That&rsquo;s the whole system. Check back and see whether I stuck to it.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
