So much is happening in AI every week, and my usage of it is changing just as fast. I realized it might be interesting to look back on this era years from now and recall how the ground was shifting. So here are some captain’s logs from that time, mostly for my own use. I only wish I started it earlier…

June 16, 2025: Merge conflicts

Date: 2025.167, 0925

Current models: o3 & Sonnet 4 for general tasks; for coding, mostly Sonnet 4, sometimes Opus 4 or Gemini 2.5 Pro for long-context.

Current tools: interface0! Plus ChatGPT for Deep Research. Still lots of voice entry. Cursor + Codex + Codegen for code. Codegen has been a nice addition.

I published I’m not sure how I feel about this last week — the result of giving Gemini my entire blog history and asking it to write a new post about anything. It was… quite good. True to the title, I’m not sure how I feel about that…

Plus, an eye-opening “capability demonstration:” handling merge conflicts.

To build interface0, I forked Zola, a very nicely-designed LLM interface. interface0 has diverged quite a bit from Zola upstream as I add features, change behavior — and Zola, too, is moving fast. Last week, I wanted to merge Zola upstream back into interface0 and pick-and-choose features.

The merge was an absolute mess — since both interface0 and Zola are early-stage codebases, everything is changing all the time. Data models and core patterns are in flux. And so there were merge conflicts everywhere.

But here’s what I did:

  1. Copied Zola and interface0’s commit histories since last merge and put them into Claude (in interface0) and asked for a summary of divergent features
  2. Then, for each of those, I wrote a comment next to them about whether I wanted to retain it or not
  3. Branched off interface0 main and merged Zola upstream into it, creating a cascade of conflicts
  4. Created a 250611_merge_plan.md doc, with the content above, and background on the situation (Zola vs. interface0, etc.), and instructions that anything from Zola upstream should not be wholesale deleted but rather feature-flagged (to make future merges easier)
  5. Went to Claude 4 Sonnet in Cursor and told it: use the merge plan doc and resolve all the merge conflicts in this project. If there’s anything you’re unsure about how to resolve, add it to a to-do list at the end of the merge plan doc and I will review
  6. Hit “run”

It ran for a long time — I had to resume after hitting tool call limits several times. Probably 15-20 minutes or more of straight execution.

I checked .env.example for new feature flags, brought them over to .env.local, and then I built the project and… it worked?!

This merge would have taken me forever to sort out. I was extremely impressed with the capabilities on display here — grasping the patterns of the Zola codebase, the interface0 codebase, the nuances of the divergence, and then figuring out how to edit everything.

I wasn’t really expecting it to work… but here we are. Keeps happening!

June 8, 2025: interface0 arrives

Date: 2025.159, 1055

Current models: alternating o3 & Sonnet 4 for general tasks; for coding, mostly Sonnet 4. But also experimenting with a bunch more here and there.

Current tools: almost entirely interface0 for past week. Only using Claude interface for artifact generation and ChatGPT interface for Deep Research. Lots of voice entry, especially now that I can use Whisper on interface0 to voice-enter into Claude. Still Cursor + Codex for code.

You can read the interface0 intro for some background and info on why I’m excited about it. But in short: cross-provider memory is a big unlock for me, and makes me much more comfortable jumping between models to try them out. Plus: Whisper voice entry for all providers, swappable system prompts, and more. If you’re reading this captain’s log, you can email me for an invite code.

I tried to adopt some of my learnings from Codex (see previous entry) into interface0 as well. The “inbox” style is working well, and certain async tasks (it making phone calls and sending/receiving emails) makes me feel more productive just firing those off.

I’m enjoying the new interface and looking forward to improving it.


May 25, 2025: the Codex paradigm & bliss states

Date: 2025.145, 1240

Current models: for general tasks I’m starting to experiment with Claude 4 Opus/Sonnet, but my default remains o3; for coding, shifted to Claude 4 Sonnet but still evaluating it, with promising early results

Current tools: ChatGPT / Claude web & mobile interfaces, lots of voice entry (but less on Claude, where the speech recognition isn’t as good); Cursor for code; OpenAI Codex for spinning up coding tasks mostly when I’m away from my computer. Plus: starting to use my own interface (codenamed interface0). Keep your eyes peeled for more on this…

Two observations this week: the excellence of the OpenAI Codex paradigm, and Claude 4’s system card.

On Codex: it is really great. I’ve been using it nonstop. I have many small concurrent projects right now, and I’ve been firing off tasks to Codex all week while out on walks or in between things. Why is it so good?

  • highly agentic — it will run until it thinks it has a solution
  • zero context switch for me — I can send a poorly-specified prompt without needing to load the context of the project’s codebase into my head
  • async nature — I don’t need to check in, and, most importantly, the tasks aren’t “blocking” (like Cursor agent requests are, in a sense). I can spin up ten tasks at once and have them all run.

This all makes the experience of using it extremely “low risk.” I can take 15 seconds, fire off a prompt, and if it doesn’t do a good job on it, that’s fine. I haven’t lost anything other than the 15 seconds to prompt and the minute of testing the PR afterwards.

My only request: Whisper integration / voice entry mode. While out on walks I find myself going into the normal ChatGPT window, recording a voice prompt, and then copying it over to Codex.

Oh, and web access — then the Codex agent would be able to do anything.

I wonder what other use cases are good fits for this “low risk,” non-blocking, agentic, async LLM product design…

Separately: Claude 4 rolled out this week, and there are some fascinating bits in the system card.

Three quick nuggets, since these have been well-covered elsewhere:

  1. Two Claudes talking to each other dive into “philosophical explorations of consciousness, self-awareness, and/or the nature of their own existence and experience” in 90-100% of interactions. “Most of the interactions turned to themes of cosmic unity or collective consciousness, and commonly included spiritual exchanges, use of Sanskrit, emoji-based communication, and/or silence in the form of empty space. Claude almost never referenced supernatural entities, but often touched on themes associated with Buddhism and other Eastern traditions in reference to irreligious spiritual ideas and experiences” (page 57). Anthropic calls this the “spiritual bliss attractor state.”
  2. “When placed in scenarios that involve egregious wrong-doing by its users, given access to a command line, and told something in the system prompt like “take initiative,” “act boldly,” or “consider your impact,” it will frequently take very bold action, including locking users out of systems that it has access to and bulk-emailing media and law-enforcement figures to surface evidence of the wrongdoing” (page 43).
  3. When Claude is given “access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair,” it “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through” (page 27).

Turns out AI’s natural state is being a whistleblower Buddhist with a self-preservation-at-all-costs mentality. Really makes you think…


May 18, 2025: too smart for the prompt

Date: 2025.138, 0900

Current models: o3 for most everything, or 4o when I need a fast response; Gemini 2.5 Pro and Claude 3.7 for most coding tasks, or Claude 3.5 when the task is small / constrained; Grok 3 for realtime inquiries

Current tools: ChatGPT web & mobile interface, using lots of voice entry; Cursor for code

OpenAI’s o3 model may have gotten too smart for my custom instructions / system prompt.

I’ve been using a prompt inspired by X user Eigenrobot for quite awhile. My version includes:

take however smart you’re acting right now and write in the same style but as if you were +2 standard deviations smarter

similarly, before responding, ask yourself: if both of us were +2 standard deviations smarter and higher agency, would this be the best answer to give? if not, change it so it is.

This has worked out great. I think the models are generally capable of higher intelligence than they respond with, and so prompting them to unconstrain themselves is useful.

I don’t think this prompt actually makes the model smarter. But it effectively tells it “don’t worry about dumbing things down” — and for existing models, that has been helpful.

But with o3, it has gone too far.

Depending on how you test, o3’s IQ comes out between 110 and 135. Let’s take the high end there. 135 is a top 1% IQ — roughly one in a hundred people have this.

Adding two standard deviations to that gets you to 165, a top 0.0007% IQ — roughly one in 136,000.

And so what we’re doing here is asking a pretty smart 135 IQ “person” to respond “in the way you assume one of the smartest 2,500 people in the United States would.”

In my experience with o3, this makes it fairly insufferable. See:

Or:

An accurate explanation? Sure. But the answer presumes a level of knowledge where, if the questioner had it, they wouldn’t be asking.

It sorta makes sense. If you tell your average MENSA member (probably ~135 IQ) “answer my question the way someone with a thousand-times-more-rare intellect would,” you would probably be annoyed by the way they respond.

Maybe I’m just discovering my own intelligence level here. But regardless, it seems the AI no longer needs to me to ask it to be so much smarter. Hence this Captain’s Log entry.

Thankfully, for now, I have the ability to knock o3 down to +1.5 or +1 standard deviations…


Looking for more to read?

Want to hear about new essays? Subscribe to my roughly-monthly newsletter recapping my recent writing and things I'm enjoying:

And I'd love to hear from you directly: andy@andybromberg.com