Post History

Current VersionApr 20, 2026 at 14:45

To be clear, I still use ChatGPT and the zip file development process to complete all of my current production development work. I still only spend $20 per month for all of the work that gets accomplished with that workflow. I've never hit a rate limit, and there have been times lately when I've spent many hours 7 days a week for weeks at a time, running multiple simultaneous ChatGPT sessions, cranking away constantly, without any additional expense.

I've built some astoundingly complex systems over the past half year with that system, and have never run into any development challenges, even in projects that have involved more than 600 deployed versions. That zip file + .mhtml methodology has scaled to handle extraordinarily complex development goals, and there doesn't seem to be any end in site to its ability to handle large context, long term development projects - all for a total of $20 per month. I'd certainly be spending several thousand dollars a month if I was completing the same volume of work with Claude Code and the Claude LLMs. That workflow requires zero software installed on any local machine, and I can switch between multiple local machines at any location (even my phone), to complete any development task, wherever I am. It's been a rock solid, completely effective solution across a wide variety of projects.

My concern is that at some point, OpenAI will almost certainly not be able to continue providing that sort of capability without rate limits, and if they go out of business, my most productive approach to software development yet, will evaporate.

So I'm making a strong push to build alternate solutions, and Nullclaw with gemini-3.1-flash-lite-preview may be the best solution I've found yet. Nullclaw is tiny and extremely fast/light to set up, and gemini-3.1-flash-lite-preview is astoundingly capable and fast, for the money. Nullclaw can even be set up on my Android phone, and I'm excited to think about being able to use it even IoT environments where other agents would be far to large, bloated, and resource intensive to be usable. And of course, there are plenty of other capable hosted LLM APIs available for use with Nullclaw (or any of the other claws), and OpenRouter makes it a piece of cake to switch between any of them.

Beyond software development, Nullclaw is also useful for accomplishing a pile of tasks which GPT can't help with. I've been using local agentic systems lately to do a lot more than to write code. I just cleaned up a few old servers using agents (cleaning HD and RAM used by older apps that were no longer needed, docker containers, old long running processes that were no longer needed, etc.) - tasks which would have taken me many times longer, and which would have been much more frustrating, without a little agentic helper. It's a lot nicer and more productive to speak plain English to an AI agent, than to have to do all the work manually.

Beyond all that, the most recent workflow I ran with Nullclaw, which involved actually running, testing, and updating the code of a project I created with GPT, based on live interactions with a third party web app (using Playwright code that was being generated on the spot for each interaction), was pretty mind blowing. It truly would have taken me at least several weeks to perform all the iterations completed in the single autonomous run enabled by Nullclaw.

So, I'm getting to be much more excited by what's possible with locally installed agentic systems, especially with Nullclaw because it's so tiny, requires no dependencies or installation, and is deeply configurable - as safely configured by default as any other agentic system I've seen, but easily given full permission to enable a frontier model to have full control of the system it's working on. And of course there are all the other features, like the ability to instantly enable communications with the agent, using messaging systems like Telegram, Whatsapp, Signal, and self-hosted messaging alternatives. Also, the ability to instantly switch between all the commonly hosted LLM APIs, as well as locally hosted models - and all the other built-in connectivity options and other features.

And that leads me to the other purpose I have in mind for Nullclaw, which is to build fully in-house self-hosted development solutions, using a variety of local open source models. None of those LLMs will be as fast or as immediately capable as a hosted frontier model like gemini-3.1-flash-lite-preview, but as you may have seen with my old GPT-4o case study, even less capable models can perform very specific and large context development tasks, when given enough iteration steps (that case study involved GPT 4o, which is roughly equivalent in capability to many of the current mid-size open source models). And within a fully autonomous agentic workflow, using a framework like Nullclaw, locally hosted open source LLMs can complete those long tasks unattended.

The ability to spawn sub-agents, to enable basically unlimited context size, and to automate the whole software development iteration process, is what makes all these agentic systems so capable. Even if a small locally hosted model can't write perfect code first shot, it can iterate all on its own, responding to application output, performing automated application interactions, reading debug errors, etc., in an automated loop, until code is properly crafted.

There are plenty of open source models which can produce good enough quality code, on inexpensive consumer GPU hardware (qwen3coder next, GLM 4.7Flash, all the various Qwen 3.6 and Gemma 4 models, Nemotron, GPT-OSS:120 and 20b, etc.), when given the opportunity to iterate autonomously. The eventual effect of autonomous iteration, even with less than perfect models, is the completion of long tasks. And with relatively affordable hardware like the Strix Halo ASUS ROG Flow Z13 laptop, very capable models can be used at home, or even on the road, and in situations where no Internet is available. And clustering multiple Strix Halo or GB10 systems like the ASUS Ascent GX10, opens up the possibility to run closer to frontier quality models, without outrageous cost or electricity requirements.

Version 3Apr 20, 2026 at 14:45

To be clear, I still use ChatGPT and the zip file development process to complete all of my current production development work. I still only spend $20 per month for all of the work that gets accomplished with that workflow. I've never hit a rate limit, and there have been times lately when I've spent many hours 7 days a week for weeks at a time, running multiple simultaneous ChatGPT sessions, cranking away constantly, without any additional expense.

I've built some astoundingly complex systems over the past half year with that system, and have never run into any development challenges, even in projects that have involved more than 600 deployed versions. That zip file + .mhtml methodology has scaled to handle extraordinarily complex development goals, and there doesn't seem to be any end in site to its ability to handle large context, long term development projects - all for a total of $20 per month. I'd certainly be spending several thousand dollars a month if I was completing the same volume of work with Claude Code and the Claude LLMs. That workflow requires zero software installed on any local machine, and I can switch between multiple local machines at any location (even my phone), to complete any development task, wherever I am. It's been a rock solid, completely effective solution across a wide variety of projects.

Minimax is the only other hosted frontier model chat interface that provides zip file handling, .mhtml reading, and all the other built-in agentic capabilities needed to read, understand, and surgically replace code in entire large projects contained in a single zip file, automatically spawning sub-agents to complete tasks without filling up the main conversation context window, automatically compacting the main context window as needed, etc. And although Minimax is pretty darn capable, it's no where near as good/reliable at completing complex development tasks, as any of the GPT 5+ series.

So, my concern is that at some point, OpenAI will almost certainly not be able to continue providing that sort of capability without rate limits, and if they go out of business, my most productive approach to software development yet, will evaporate.

So I'm making a strong push to build alternate solutions, and Nullclaw with gemini-3.1-flash-lite-preview may be the best solution I've found yet. Nullclaw is tiny and extremely fast/light to set up, and gemini-3.1-flash-lite-preview is astoundingly capable and fast, for the money. Nullclaw can even be set up on my Android phone, and I'm excited to think about being able to use it even IoT environments where other agents would be far to large, bloated, and resource intensive to be usable. And of course, there are plenty of other capable hosted LLM APIs available for use with Nullclaw (or any of the other claws), and OpenRouter makes it a piece of cake to switch between any of them.

Beyond software development, Nullclaw is also useful for accomplishing a pile of tasks which GPT can't help with. I've been using local agentic systems lately to do a lot more than to write code. I just cleaned up a few old servers using agents (cleaning HD and RAM used by older apps that were no longer needed, docker containers, old long running processes that were no longer needed, etc.) - tasks which would have taken me many times longer, and which would have been much more frustrating, without a little agentic helper. It's a lot nicer and more productive to speak plain English to an AI agent, than to have to do all the work manually.

Beyond all that, the most recent workflow I ran with Nullclaw, which involved actually running, testing, and updating the code of a project I created with GPT, based on live interactions with a third party web app (using Playwright code that was being written on the spot for each interaction), was pretty mind blowing. It truly would have taken me at least several weeks to perform all the iterations completed in the single autonomous run enabled by Nullclaw.

So, I'm getting to be much more excited by what's possible with locally installed agentic systems, especially with Nullclaw because it's so tiny, requires no dependencies or installation, and is deeply configurable - as safely configured by default as any other agentic system I've seen, but easily given full permission to enable a frontier model to have full control of the system it's working on. And of course there are all the other features, like the ability to instantly enable communications with the agent, using messaging systems like Telegram, Whatsapp, Signal, and self-hosted messaging alternatives. Also, the ability to instantly switch between all the commonly hosted LLM APIs, as well as locally hosted models - and all the other built-in connectivity options and other features.

And that leads me to the other purpose I have in mind for Nullclaw, which is to build fully in-house self-hosted development solutions, using a variety of local open source models. None of those LLMs will be as fast or as immediately capable as a hosted frontier model like gemini-3.1-flash-lite-preview, but as you may have seen with my old GPT-4o case study, even less capable models can perform very specific and large context development tasks, when given enough iteration steps (that case study involved GPT 4o, which is roughly equivalent in capability to many of the current mid-size open source models). And within a fully autonomous agentic workflow, using a framework like Nullclaw, locally hosted open source LLMs can complete those long tasks unattended.

The ability to spawn sub-agents, to enable basically unlimited context size, and to automate the whole software development iteration process, is what makes all these agentic systems so capable. Even if a small locally hosted model can't write perfect code first shot, it can iterate all on its own, responding to application output, performing automated application interactions, reading debug errors, etc., in an automated loop, until code is properly crafted.

There are plenty of open source models which can produce good enough quality code, on inexpensive consumer GPU hardware (qwen3coder next, GLM 4.7Flash, all the various Qwen 3.5 and Gemma 4 models, Nemotron, GPT-OSS:120 and 20b, etc.), when given the opportunity to iterate autonomously. The eventual effect of autonomous iteration, even with less than perfect models, is the completion of long tasks. And with relatively affordable hardware like the Strix Halo ASUS ROG Flow Z13 laptop, very capable models can be used at home, or even on the road, and in situations where no Internet is available. And clustering multiple Strix Halo or GB10 systems like the ASUS Ascent GX10, opens up the possibility to run closer to frontier quality models, without outrageous cost or electricity requirements.

Version 2Apr 16, 2026 at 04:20