Blasting AI into the past: modders get Llama AI working on an old Windows 98 PC

Remember when you were young, your responsibilities were far fewer, and you were still at least a little hopeful about the future potential of tech? Anyway! In our present moment, nothing appears to be safe from the sticky fingers of so-called AI—and that includes nostalgic hardware of yesteryear.

Exo Labs, an outfit with the mission statement of democratising access to AI, such as large language models, has lifted the lid on its latest project: a modified version of Meta’s Llama 2 running on a Windows 98 Pentium II machine (via Hackaday). Though not the latest Llama model, it’s no less head-turning—even for me, a frequent AI-naysayer.

To be fair, when it comes to big tech’s hold over AI, Exo Labs and I seem to be of a similarly wary mind. So, setting aside my own AI-scepticism for the moment, this is undoubtedly an impressive project chiefly because it doesn’t rely on a power-hungry, very much environmentally-unfriendly middleman datacenter to run.

The journey to Llama running on ancient-though-local hardware enjoys some twists and turns; after securing the second hand machine, Exo Labs had to contend with finding compatible PS/2 peripherals, and then figure out how they’d even transfer the necessary files onto the decades-old machine. Did you know FTP over an ethernet cable was backwards compatible to this degree? I certainly didn’t!

Don’t be fooled though—I’m making it sound way easier than it was. Even before FTP finagling was figured out, Exo Labs had to find a way to compile modern code for a pre-Pentium Pro machine. Longer story short-ish, the team went with Borland C++ 5.02, a “26-year-old [integrated development environment] and compiler that ran directly on Windows 98.” However, compatibility issues persisted with the programming language C++, so the team had to use the older incarnation of C and deal with declaring variables at the start of every function. Oof.

Then, there’s the hardware at the heart of this project. For those needing a refresher, the Pentium II machine sports an itty bitty 128 MB of RAM, while a full size Llama 2 LLM boasts 70 billion parameters. Managing all of these hefty constraints, the results are even more interesting.

Unsurprisingly, Exo Labs had to craft a comparatively svelte version of Llama for this project, now available to tool around with yourself via GitHub. As a result of everything aforementioned, the retrofitted LLM features 1 billion parameters and spits out 0.0093 Tokens per second—hardly blistering, but the headline take here really is that it works at all.

Best gaming PC: The top pre-built machines.
Best gaming laptop: Great devices for mobile gaming.