An experimental, private, autonomous todo list

Products and services are easiest to explain when they are “plain vanilla with one difference.”See: Flip the Script by Oren Klaff, an underrated-but-annoying-to-read book on pitching. Think: “the same great thing you already love, but cheaper!” or “just like what you already use, but faster!”

Well: what I’m about to try to explain does not fit that model. It’s an experimental product I pulled together that has neither a clear use case, nor a singular differentiator. But it’s interesting, and I feel like there is something to be pulled out of it.

Here’s the sketch:

a Trello / Kanban type interface for tasks
with a long-running multi-AI-agent system that plans, executes, and reviews each one (including extremely long/complex tasks)
and enables human/AI interaction through comments and reviews
where the AI agents are deployed on a server/computer of their own and as such have access to the filesystem and certain programs, expanding their capacity
and there are also local LLMs deployed on the server so tasks can optionally run privately without anything being sent to model providers
and the whole thing is accessed via a zero-trust private network that only you can reach

So, uh, imagine Trello x n8n x Claude Code x Zero Trust x Llama, I guess?

For example, I created a task to review every post on my Substack and pull out all the products I’ve recommended.

The planning agent structured the task and set acceptance criteria and subtasks. Then the execution agent figured out how to find each post, and downloaded them to the filesystem to preserve the context window.

Then it parsed the downloaded files one by one to put them into a data file, again preserving the context window. Then it turned that data file into an output file, and the review agent checked its work before approving.

Here’s a snippet of the logs from them:

I, then — connected through the private/zero-trust network — dropped a comment asking for the agents to add a section with the recommendations ordered differently. The plan/execute/review cycle began again, but based on the existing downloads and data files, and it quickly finished the job.

It was successful! It compiled every single recommendation from all 15 of my Substack posts. And this success was unique — I tried the same prompt on GPT-5.2 Pro, GPT-5.1 Pro, Claude Opus 4.5 (Extended Thinking), and Gemini 3 Pro Preview and they all failed to cover everything.

GPT-5.2 Pro:

GPT-5.1 Pro:

Opus 4.5:

Gemini 3 Pro:

This product is really good at complex research tasks and projects with ongoing work that reference the same original materials.

As noted above, there are at least a few ideas in here all jumbled together:

Kanban boards with comments as an interface to interact with AI for more complex tasks
multi-agent plan/execute/review systems
agents with access to a filesystem for persistence, common programs, and scripting ability
local LLMs hosted on your own server, accessed via a zero-trust private network

(Plus other little touches: recurring tasks; AI vs. human differentiation via fonts/visual indicators; individual cost budgets for tasks; differentiating “needs human review” vs. “done”; subtasks created by the planner…)

I’m not quite sure what I’m going to do with this. Maybe some of those ideas will get integrated into interface0, or some other product.

I like the “private AI” angle; I like the “Claude Code for things other than code” angle; I like the “multi-agent judge setup” angle; I like the “alternatives to chat interfaces for complex tasks” angle. But do those all belong together? Probably not?

If you’re interested in this or have any ideas, drop me a note. I’m also thinking about cleaning it up and open-sourcing it.

The deployment process isn’t bad — I rented a server on OVHcloud for ~$50/month that has good enough specs to run decent LLMs locally at a not-terrible speed and made a Kamal deploy flow that handles everything (besides setting up the Cloudflare Zero Trust tunnels and such, which are free). And once you’re up, you then have access to a very-private AI “in the cloud,” which is pretty cool. You could also easily set it up on a homelab server in your closet.

There are also a bunch more features I want to add: per-task command permission levels; external interactions like emails; transient GPU pods for better local LLM performance; integrations with Obsidian and other data sources; shared code libraries and scripts between tasks; and much more…

I’m excited to see more experimentation like this in AI interfaces and workflows.

Looking for more to read?

Want to hear about new essays? Subscribe to my roughly-monthly newsletter recapping my recent writing and things I'm enjoying:

And I'd love to hear from you directly: andy@andybromberg.com