First Post | FormaSimplex

The Local AI/LLM revolution

Wednesday, 13 May 2025

Llama.cpp Opencode Local AI LLM Qwen

It has been a wild ride these last couple of years. Stable diffusion dropped, Meta gave us Llama, Alibaba gave us Qwen. We've had so many improvements in such a short time, from Will Smith eating spaghetti to OpenAI already killing off their incredible video generation Sora models. We've had Tech layoffs abound, Claude Code breaking the software development cycle. And so much more, throughout all of this the open source community has just been giving so so much to the home user. Opencode, Pi, Hugging Face datasets, Llama.cpp.. and again so so much more. In this Blog I want to jot down some of my own progress throughout this journey.

When stable diffusion dropped I was using one of those black tin can Mac Pro's for generating images. It was a powerhouse back in the day and still makes for a great multi media home server. I quickly setup some python files for generating images which gave me great fun, deforum had dropped and I was experimenting with that and parSeq, around that time Keras CV had a great tutorial about latent space walking, see link section at the bottom if your interested. I left that tin can running for a whole month to generate a 30 minute animation. Oh dear...

I spent a small amount of money renting GPU servers on linode so I could speed up the animation progress and I made some cool trippy work, we would project them during the Kushty Buck Records music nights in Middlesbrough around the time using VDMX for some fun Midi/pitch automations to go with the music.

In the day job we had been given early access to co-pilot and I found myself not that impressed with the early models. I had recently switched to Vim as my daily driver so my speed was already pretty fast when it came to coding. I spent the last year working on my vim motions so I could write code at the speed of thought. Then came the deluge of coding agents, and newer models. I upgraded my home system around the release of Stable Diffusion 2.0, I bought a 2023 Mac Studio M2 ultra with 64gig. Little did I know what a great investment that was given the recent spikes in price.

Fast forward to 2026, OMG ! Google dropped Gemma4 and couple of weeks later introduced us to MTP Multi token prediction, Qwen are on 3.6 and if you are lucky enough to be running this software locally, right now if your on it, you are feeling the vibes indeed. What a time! Anyway..

Right now this technology can port complex applications into other languages in a couple of days, I know, cause I've just done it. Not some shitty port but a production level close as close, Rails to Rust, not without oversight mind. Which is the caveat here. If you are technical and know how to read the docs and you know how to write the code yourself, you now have a super power because you do not need to write the code yourself, you're not the actor, you're the director. Whats that, jeering at the back.. AI just produces slop, blah blah, maybe for you but more and more experienced engineers including myself have a different opinion. These Genies/Clankers/Conscious systems (according to Richard Dawkins) - I don't buy that like Richy, nor did I buy much of his militant atheism either. I digress. These compressed data machines aka large language models are so much more than we have figured out yet. I'm not saying as we scale they will reach AGI, I'm with Gary Marcus on that one but as a hopeful technologist I think we still have a lot of discovery around the systems we can build in which an LLM plays a central part.

Along the journey I experimented with looping agents, each with a very specific system prompt, I was trying to design something a long the lines of the Jungian concept of opposites. It was fun, but really just produced a lot of waffle about the meaning of life. I would ask it a question, then leave it running over night, each quarternity of the system getting a turn to add some input into the room, meanwhile an Omega agent would parse from the four opposites and produce something coherent and save to file. I'd read the next day and be like wowza. Useful? probably not, interesting, yeah, sure why not.

Given that experiment was fun I got thinking about Conways Game of life and the principle of emergent complexity, and this one I think still has gold in them their hills. I started the Wunicorm project, some simple rules for the LLM/Agent to follow, some plist files that run daily that trigger the LLM to read some files and continue on its evolution. It includes some of that latent space walking I discussed earlier using the Flux models, it generates a 6 second 'dream' based on its reflections. It sends me a daily email of its ponderings and since I've been looking for a job as of late it even finds me a list of jobs. All of its different tasks compartmentalised so as not to destroy its brain..

One of the next avenues of adventure I'm interested in came from a blog post I read by Volodymyr Pavlyshyn in which he discussed Semantic Space and Time for Ai Agents. The concept relates to memory architecture for AI systems so if this something your interested in check his work out. And one more pioneer I've been following, is Chris Hay. He put together a really interesting video on youtube last month, LLM's are databases in which he uses a query language to probe the inner workings of the LLM, fascinating stuff and again a lot more gold in them mountains.

Most importantly the strides and leaps I'm exploring are locally hosted, and in a world of survielance capitalism, its always nice to own your data, ya get me?

Anyway, enjoy the links

https://www.youtube.com/@chrishayuk

https://keras.io/examples/generative/random_walks_with_stable_diffusion_3/#a-walk-through-latent-space-with-stable-diffusion-3

https://volodymyrpavlyshyn.medium.com/

https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/