Notes on Omar Karim Chtioui

The RAM that never came back

Fri, 22 May 2026 09:30:00 +0200

I run Docker on my Mac through Colima, which sits on top of Lima and boots a small Linux VM using Apple’s Virtualization.framework (the vz backend). For a long time it just worked. Then I started noticing my Mac getting slower the longer my workday went on — and the cause turned out to be a bug worth writing down.

The symptom

Build a few images, run a memory-hungry container, do some real work. At some point the VM touches the memory ceiling I configured for it. Fine — that’s what the ceiling is for.

The problem is what happens after. The containers exit, the build finishes, the guest goes quiet — and the host process backing the VM stays pinned at that peak. In Activity Monitor it shows up as a process holding several gigabytes of RAM with nothing running inside it. It never comes back down. The only thing that frees the memory is restarting the VM.

So my “8 GB Docker VM” wasn’t an 8 GB budget. It was an 8 GB high-water mark, and once I hit it, those 8 GB were gone from the rest of the system until I remembered to bounce Colima.

What I expected

A fresh VM idles at maybe 800 MB–1 GB. My mental model was simple: memory usage should track what the guest is actually doing. Spike under load, settle back down when idle. That’s how a process is supposed to behave, and it’s how the VM’s guest behaves — Linux inside the VM frees the pages just fine.

The host side is what doesn’t let go.

Opening the issue

I went looking and found I wasn’t the first to hit this — there was already a discussion describing the exact behavior I was seeing. It matched closely enough that I opened a tracking issue on Lima to get it properly on the radar:

lima-vm/lima#2789 — Memory is not being freed on VZ

What I learned from the thread is the interesting part.

Why it happens

The first answer from a maintainer reframed it for me: memory you dedicate to a virtual machine being locked is, in a sense, normal. A VM isn’t an ordinary process; the host commits that memory to it. The mechanism that would let it shrink back has a name — memory ballooning — and it isn’t on by default.

The next answer was blunter, and the one that actually explained my Mac: this is a macOS bug. It’s not specific to Lima or Colima. Docker Desktop exhibits the same thing, because they all sit on the same Apple framework. The one tool that doesn’t — OrbStack — got there by building its own dynamic memory management.

OrbStack did write a blog post about it, and it’s worth reading — but notice what it doesn’t say. It explains that the VM’s footprint grows and shrinks on demand, and why that matters; it doesn’t really explain how they pulled it off on the very same Apple framework that leaves everyone else stuck. The post is about the result, not the recipe. Whatever they’re doing, they’re keeping it to themselves — which is fair, but it means the rest of the ecosystem can’t just copy it.

The balloon that won’t deflate

Apple’s framework does expose a balloon device (VZVirtioTraditionalMemoryBalloonDevice), and Lima even creates one. In theory you shrink the VM’s footprint by lowering its targetVirtualMachineMemorySize while it runs: the guest hands unused pages back, the host reclaims them.

In practice, on the vz backend, it doesn’t reclaim anything. My short contribution to the thread was exactly that observation — the device is available, it just doesn’t work.

That’s not me guessing anymore. Over a year later, someone posted an instrumented test on macOS 26: a tiny VM, a controlled allocation, host RSS sampled once a second. The guest cooperates fully and frees its memory; the host process’s memory footprint is monotonic across the whole run — it never decreases. They’ve since filed an Apple Feedback report. The issue is still open.

Where I landed

Knowing the root cause didn’t fix my Mac. The realistic options were:

Restart the VM periodically to reclaim memory — a chore, and easy to forget until the machine is already crawling.
Cap the VM’s memory low enough that the locked amount doesn’t hurt — which just trades one problem for another, since now heavy builds are starved.
Use the tool that solved it.

I switched to OrbStack. Its dynamic memory management is the entire feature I was missing: the footprint expands under load and actually contracts when the work is done. For the way I use containers on a laptop, that’s not a nice-to-have — it’s the whole point.

Takeaways

“Locked” and “leaked” look identical in Activity Monitor. A VM holding its peak forever isn’t leaking; it’s a platform that never reclaims. Same symptom, different cause — and the fix is different too.
A feature existing in the SDK doesn’t mean it works. Lima creates the balloon device; the balloon still doesn’t deflate. “Available” is not “functional.”
It’s worth filing the issue even when it’s not the project’s fault. #2789 isn’t a Lima bug, strictly — but having a public, linkable thread is how the next person finds the explanation in twenty minutes instead of a week.

The RAM still doesn’t come back on vz. But at least now I know why — and I’m running something that doesn’t need it to.

Instagram was slow on my Wi-Fi. The cause wasn't what I thought.

Fri, 22 May 2026 08:00:00 +0200

My setup: Sky Wifi FTTH in Italy, Keenetic Titan (KN-1811) plugged directly into the ONT and speaking MAP-T natively, NextDNS configured as the content filter on the router. Instagram on my phone felt sluggish on Wi-Fi while everything else — web browsing, video streaming, work tools — seemed fine.

First instinct: blame IPv6.

It was a reasonable instinct, and it was wrong. Here’s how I worked through it.

The plausible hypothesis: NextDNS over IPv6

Three facts were stacked in favor of an IPv6-related explanation:

NextDNS has a documented history of slow resolution or timeouts over IPv6 from certain regions. The help center thread has reports from users in New Zealand, the Netherlands, and elsewhere describing 10-second resolution stalls that disappear the moment IPv6 is disabled.
Sky Wifi is essentially an IPv6-native network. Sky adopted MAP-T to deal with IPv4 exhaustion: IPv4 packets are encapsulated inside IPv6 and sent across Sky’s network, where the carrier maps them to a shared pool of IPv4 addresses on egress. My Keenetic shows the MAP-T address explicitly (198.51.100.42, mask 255.255.255.255) and a delegated IPv6 prefix (2001:db8:abcd::/48).
Instagram resolves to IPv6 by default. A quick nslookup:
```
$ nslookup -type=AAAA instagram.com
instagram.com  has AAAA address 2a03:2880:f26d:e9:face:b00c:0:4420

$ nslookup -type=A instagram.com
Address: 157.240.203.174
```
That face:b00c is Meta’s signature. Modern phones use Happy Eyeballs: when both A and AAAA records resolve in reasonable time, they almost always prefer the AAAA result. So Instagram traffic from my phone goes over IPv6 end-to-end.

Chain them and you get: phone → DNS lookup of AAAA via NextDNS over IPv6 → (if slow) → stalled page loads of Instagram specifically, while IPv4-only or mixed sites feel fine. The shape of the bug matched the symptom.

Time to test it.

The data: NextDNS diag

NextDNS ships a CLI diagnostic that pings every PoP over both IPv4 and IPv6, traces routes, and posts a shareable report:

sh -c "$(curl -s https://nextdns.io/diag)"

My report came back like this:

Testing IPv6 connectivity
  available: true
Fetching https://test.nextdns.io
  status: ok
  protocol: DOH
  server: anexia-mil-1

Fetching PoP name for ultra low latency primary IPv4 (ipv4.dns1.nextdns.io)
  zepto-mil: 19.361ms
Fetching PoP name for ultra low latency primary IPv6 (ipv6.dns1.nextdns.io)
  zepto-mil: 18.738ms
Fetching PoP name for anycast primary IPv4 (45.90.28.0)
  zepto-mil: 18.734ms
Fetching PoP name for anycast primary IPv6 (2a07:a8c0::)
  zepto-mil: 18.314ms

All four PoPs landed in Milan at 18–22 ms over both stacks. No fetch errors, no timeouts, no packet loss. IPv6 latency was if anything a hair better than IPv4 in some pings. Test fetch over DoH against anexia-mil-1 succeeded cleanly.

DNS was not the problem. Hypothesis dead.

Brief detour: the IPv6 prefix red herring

While reading the diag I got briefly distracted by my IPv6 prefix (2001:db8:abcd::/...) and the upstream hop (2001:db8:f00d::1) — I didn’t recognize them as belonging to Sky, and floated the theory that maybe a VPN or tunnel was in the path. The Keenetic dashboard quickly killed that: the connection is labeled “Ethernet / MAP-T” and the prefix is just Sky/Open Fiber’s allocation. Nothing exotic. Lesson re-learned: unrecognized prefix ≠ suspicious prefix.

What was actually wrong: 2.4 GHz Wi-Fi

The lived experience, dramatized.

With DNS ruled out and the WAN side clean, the remaining suspects were the Wi-Fi link and the Meta-side IPv6 path. The Keenetic dashboard made the Wi-Fi issue obvious:

2.4 GHz: Channel 4, width 40 MHz. This is the big one. The 2.4 GHz band has 11 channels in North America, 13 in Europe, but only three non-overlapping 20 MHz channels: 1, 6, and 11. A 40 MHz wide channel consumes two of those three slots, so in any building with neighbors running Wi-Fi you’ll collide with most of them. In an Italian apartment block, that’s a lot of collisions and a lot of retransmits.
Same SSID on both 2.4 and 5 GHz with band steering set to “By default”. “By default” in Keenetic is passive — it suggests, doesn’t enforce. Phones, especially after a sleep/wake cycle in a marginal 5 GHz spot, can settle on 2.4 GHz and stay there even after returning to a strong 5 GHz area.
5 GHz on Channel 104, an 80 MHz DFS channel. DFS channels share spectrum with weather and aviation radar. When the router detects radar, it has to vacate the channel and clients are briefly dropped — devices that reconnect in that window may land on 2.4 GHz and stick.
15 wireless clients connected. Some inevitably on 2.4 GHz, all contending for the same airtime.

Instagram is exactly the kind of workload that suffers on a degraded link: lots of TLS handshakes to image and video servers, large transfers per request, intolerance to latency spikes. Meanwhile DNS, small requests, and even single-stream video can paper over a flaky radio because their packets are small or their transports buffer aggressively.

The fix

Three changes:

2.4 GHz channel width: 40 MHz → 20 MHz, channel set to 1, 6, or 11 based on Wi-Fi Monitor neighbor scan.
Band steering: By default → Prefer 5 GHz (the more aggressive option in Keenetic). For known-troublesome clients, the alternative is to split into two SSIDs and pin them to 5 GHz manually.
5 GHz channel: 104 → 36 (non-DFS), accepting slightly more potential congestion in exchange for no radar evictions.

802.11r/k/v were already enabled, which is the right default for roaming assist; the radio config was the problem, not the roaming logic.

The roaming page after the fix: 802.11r/k/v left on, with band steering switched to Prefer 5 GHz.

Result: Instagram returned to normal.

Takeaways

Diagnose with data, not vibes. The IPv6/NextDNS theory was plausible and ultimately wrong. Twenty seconds of nextdns/diag saved hours of disabling IPv6 across the LAN and chasing nothing.
DNS healthy ≠ network healthy. A clean DNS path tells you nothing about the Wi-Fi link, the MTU on the WAN, or the CDN routing from ISP to destination.
“Same SSID + band steering: By default” is not a guarantee that dual-band clients land on 5 GHz. Verify in the client list which radio each device is on. Don’t assume the router is making the right choice.
40 MHz on 2.4 GHz is almost always wrong in a multi-tenant environment. The default in some router UIs; an antipattern in practice.

The IPv6 angle wasn’t crazy — it just wasn’t this story.

Notes on Keeping a Blog

Thu, 21 May 2026 00:00:00 +0000

Every personal site I’ve ever started eventually went quiet. Here are the rules I’m writing down so this one doesn’t.

Lower the bar

A post does not need to be an essay. A useful link, a fixed bug, a paragraph of thinking — all of it counts. The blog is a notebook, not a magazine.

Publish before it’s perfect

Drafts that wait for polish never ship. I’d rather publish something rough and revise it in place than hoard it in a folder forever.

Write for one reader

Usually that reader is me, six months from now, having forgotten everything. If a post would help that person, it’s worth writing.

Keep the machinery boring

hugo new content posts/some-idea.md
# write
git commit -am "post: some idea" && git push

Three steps from idea to live. The moment publishing gets complicated, the writing stops.

That’s the whole system. Check back and see whether I stuck to it.