You Can Download More RAM: Swap Files, zram, and Running Big Models

So picture this: it’s a Tuesday night and you’re hacking along on the new Ollama launch commands that came out. After downloading the brand new glm-4.7-flash model, you decide to give it a whirl with OpenCode.

Then disaster strikes! While your GPU is powerful enough to run the model, you sadly didn’t stockpile enough RAM to run the larger 30b parameter models. Your computer freezes and needs to be shut off. How can we avoid such a tragedy? If I actually had code open, I might lose or corrupt my work with such a hard shutdown.

It turns out… you can actually download more RAM. Well, not in a literal sense, but I wanted you to click on the blog post and read it (sorry!). We’re actually going to talk about swap files in Linux, which is a really cool concept.

As I’ve spoken about in other posts, I’ve been working on getting deeper into Linux with essentially all my personal computers now running either Fedora or Arch—with the exception of my MacBook (and the only reason that isn’t is because Asahi Linux isn’t mature enough for the M4 chipset!). On a recent learning experience installing Arch from scratch, I reached the point where you need to partition your hard drive so that you have your boot, swap, and root partitions established before installing the other components of your operating system. Naturally, I did plenty of Google/AI searches alongside going through the Arch Wiki to learn what all of that actually means.

What Is Swap?#

So, what is a swap file? Essentially, it’s a way to set aside virtual memory on your computer that lives either in a file or a dedicated partition on your HDD or SSD. When your main memory starts getting too full, your computer can page older content over to swap so it doesn’t completely freeze like mine did.

Enter zram: Compressed Swap in RAM#

In my case, I was running a variant of Fedora that had a few Chrome tabs open and my terminal emulator. An interesting thing about Fedora is that it defines its swap using something called zram.

Here’s what makes zram clever: instead of swapping to disk (which is slow), zram creates a compressed block device that lives in your RAM. When your system needs to free up memory, it compresses inactive pages and stores them in the zram device. Because compression typically achieves a 2-3x ratio, you effectively extend your usable memory without touching your slower disk storage.

zram can also be configured to store temporary files (/tmp), but its primary use case on most distros is as a swap device.

The Fix#

To run that model again without bringing my whole system down, I checked my current zram configuration:

zramctl

Then resized the zram device to give myself more headroom:

sudo zramctl --size 16G /dev/zram0

And voilà—it worked! It was slow (compressed RAM is still slower than uncompressed), but the model ran and my system stayed responsive enough to actually use.

When to Use What#

A quick rule of thumb:

zram: Great for systems with limited RAM where you want to squeeze out extra capacity without disk I/O penalties. Most modern distros enable this by default.
Disk-based swap: Still useful for hibernation support, or if you’re running workloads that genuinely exceed what compressed RAM can handle and you’d rather swap to an NVMe than OOM-kill processes.

Anyway, I’m sure someone could explain this in a more technically precise fashion, but I wanted to share something I found interesting about Linux. If you’re curious about my setup: it’s a custom build with an Nvidia 3060 Ti (8GB VRAM) and 16GB of DDR4 RAM.

Have you run into memory limits running local models? I’d be curious to hear how others are handling it.