Introductory field guide to Context Engineering for LLM users

Contents

Background
Conversation history & prompts
Call parameters
System prompt
Injected knowledge / context
Tools & MCP
The full picture

tl;dr: it’s not just “prompts” that matter for good AI results — context is king. LLM interfaces inject a lot of context themselves, and you can augment it. If you do so well, you’ll get better answers!

As I began to onboard friends to interface0, I realized many of them could use a primer on what information actually goes into an LLM call. A strong layman’s understanding of this allows you to get better results from AI.

Recently, the management of this information has taken up the moniker “context engineering,” popularized by tweets from Tobi Lutke and Andrej Karpathy.

LLMs have enormously complex networked “brains.” As users, we send inquiries (“prompts”) to those brains, along with “context” that augments the brains’ existing intelligence.

Without context, the brain only knows what it was trained on. That means that it doesn’t know about your preferences, your company’s specific data or practices, or anything else that isn’t on the open web.

Think of the LLMs as a new, genius Ph.D.-level team member or assistant — but one for whom it is always their first day at work. The LLM itself doesn’t remember or know anything about you or your company, so context is required. This post is about how you and the products you use can get that context to them.

Much has been said about “prompt engineering”: making the right inquiries of the model. But as Tobi and Andrej point out, equally critical is “context engineering”: making sure the right supplemental information goes to the model.

Background

Let’s say you want to ask an LLM a health question. We can imagine three levels of framing:

Bad prompt, bad context: is giving blood good for you?
Better prompt, bad context: does giving blood help with elevated SHBG levels?
Better prompt, better context: does giving blood help with elevated SHBG levels? Here are attachments of my historical blood panels, dates of giving blood, and relevant gene testing results.

As the prompt and context improve, so too will the results — dramatically. The more complex your question, the more the right context will help.

But the context that goes into LLMs is typically opaque to us as users. Even when all you see is a simple chatbox, there is a lot of context going in behind the scenes.

I’ll put the bottom line up front: here’s my mental model for a typicalEach one is different, of course! And this leaves out some of the more technical aspects. This post is meant to be more useful than correct. LLM call. There’s a lot here — don’t worry, we’ll break it down piece-by-piece.

Context Engineering Model

System prompt

Model system prompt
"You are GPT 4-1..."

Product system prompt
"The user is using interface0 to interact with you..."

Personas / custom instructions
"Be clear and concise..."

Tooling (incl. MCP)

Available tools
"[searchWeb], [sendEmail], ..."

Tool definitions
"searchWeb: requires term, ..."

Recent tool outputs
"searchWeb_1 returned..."

Injected knowledge / context

Short-term memory
"User recently asked about..."

Long-term memory / preferences
"User is named Andy, lives in..."

Retrieved data ("RAG")
"Relevant docs: [file.docx], ..."

User-submitted data
"User uploaded a file..."

Call parameters

Temperature
Creativity level

Max tokens
Budget for length of prompt

Logit bias
Specific term bias

Other
Tool-choice, etc.

Conversation history + new prompt

User: "What types of stores..."

Assistant: "Hardware stores..."

User: "what are the hours..."

Yes, all of that is included in an inquiry as simple as this:

Conversation history & prompts

Your experience of using an LLM frontend (ChatGPT or Claude or interface0) mostly centers around the blue section in the diagram (“Conversation history + new prompt”).

That bit is straightforward enough, although we’ll revisit it at the end of this post — there are techniques for doing even that part of the process better.

Call parameters

Then, the call parameters (orange in the diagram) are rarely exposed to you as a user. Typically, whatever frontend you’re using selects this for you. The key ones are:

Call Parameters

Temperature

creativity level

A value between 0 and 1 that determines how "creative" or "random" the model is. Close to 0 means very similar results every time. Close to 1 means more "out there" responses.

Max tokens

budget for length of prompt

The maximum context length that the model can handle. This is the sum of everything: conversation, system prompt, injected context, etc.

Logit bias

specific term bias

LLMs can be guided towards or away from certain tokens/words. Rarely exposed to users, but can encourage higher probability of certain words.

Other

tool-choice, etc.

Additional parameters like tool selection preferences, response format requirements, and other model-specific settings.

Regarding “max tokens”: each model has a maximum “context length” that it can handle. If the context exceeds the maximum context length of the LLM, it will fail (this is when you get the notification “this conversation is getting too long, please start a new one”). Some frontends handle this for you by beginning to compress or summarize the conversation history or context, thus making more space (but also possibly losing fidelity or details).

Context lengths are defined in “tokens.” For reasons beyond the scope of this post, 1 token is roughly 3/4 of a average word. For example, Claude Pro’s context window is 200k tokens at time of writing. This is around 150k words, or almost two normal books’ worth of content.

System prompt

Now let’s talk about the “system prompt,” which in my view is often made up of three sub-components:

System Prompt Components

Model system prompt

"You are GPT 4-1..."

The base instructions that come with the model itself. These define the model's fundamental behavior, capabilities, and limitations. Usually set by the AI company (OpenAI, Anthropic, etc.) and not user-modifiable.

Product system prompt

"The user is using interface0 to interact with you..."

Instructions added by the frontend application you're using. These tell the model about the specific context, tools available, and how to behave within that particular interface or product.

Personas / custom instructions

"Be clear and concise..."

Your personal customizations and preferences. These are the instructions you've given to modify how the AI should respond to you specifically - your preferred tone, style, or specialized knowledge areas.

The last of these — “personas” or “custom instructions” is typically the one you are able to edit. In ChatGPT you go to your settings, then Personalization, and then Custom Instructions. In Claude, Settings and then fill in your Personal Preferences. In interface0, you can set a variety of Personas from the chat bar and swap them in.

Let’s just revisit the components of the full prompt briefly. So far we’ve gone through “Conversation history + new prompt,” “Call parameters,” and “System prompt:”

Context Engineering Model

System prompt

Model system prompt
"You are GPT 4-1..."

Product system prompt
"The user is using interface0 to interact with you..."

Personas / custom instructions
"Be clear and concise..."

Tooling (incl. MCP)

Available tools
"[searchWeb], [sendEmail], ..."

Tool definitions
"searchWeb: requires term, ..."

Recent tool outputs
"searchWeb_1 returned..."

Injected knowledge / context

Short-term memory
"User recently asked about..."

Long-term memory / preferences
"User is named Andy, lives in..."

Retrieved data ("RAG")
"Relevant docs: [file.docx], ..."

User-submitted data
"User uploaded a file..."

Call parameters

Temperature
Creativity level

Max tokens
Budget for length of prompt

Logit bias
Specific term bias

Other
Tool-choice, etc.

Conversation history + new prompt

User: "What types of stores..."

Assistant: "Hardware stores..."

User: "what are the hours..."

What remains is injected knowledge and tooling. We’ll cover those, then recap what you can practically do about all this, and some tips for prompting & context management.

Injected knowledge / context

Injected knowledge is the meat of what we usually think of as “context.” It includes:

Memories that the system has preserved about you
Data the system retrieves from some sort of connected datastore
Anything you upload

You typically have little control over #1 on a prompt-by-prompt basis — although in some applications, you can see a list of retained memories and add/delete them globally.

One differentiator of interface0 is that it preserves memory across providers — so if you say something to OpenAI o3, when you are later talking to Claude Sonnet 4, Claude will “remember” that. This is actually illustrative of the overall point here: memory (and all this other context) does not happen at the model / LLM level. Rather, it all happens at the application level.

The models o3 and Sonnet 4 are not “remembering” anything — rather, the interface you’re using to interact with them is retrieving relevant memories and including them in the context before sending them off to the big-brain LLM.

In much the same way, you can retrieve data (often called Retrieval-Augmented Generation, or “RAG”) from databases or elsewhere before sending the inquiry to the model. I’m very excited about next-generation retrieval engines like Lucenia that can power workflows like this (disclaimer: so excited that I recently invested!).

You can think of this happening in a business context, where your AI-powered customer service agent application conducts a retrieval operation across your internal knowledgebase to find relevant articles, before attaching those articles as context in the inquiry to the LLM which ultimately produces the message to be sent to the user. The LLM doesn’t “search” itself — rather, the infrastructure searches and finds the context to attach before shipping it all off to the LLM.

Last, user-submitted data is obviously relevant. You can either attach files directly to a prompt, or you might use a feature like ChatGPT/Claude’s “Projects” or interface0’s “Knowledge,” where saved files and text can be attached across many related conversations.

Tools & MCP

Finally, we have “tools.” You may have heard buzz about “MCP,” or the Model Context Protocol. MCP fits into the tools category.

Tools can be anything from searching the web, to drafting emails, to writing and running code, or anything else you can imagine. But tools aren’t magical — nor do they even really have anything specifically to do with LLMs.

Anything that was programmable before LLMs — like, again, searching the web, or drafting emails — can be a “tool.”

And, in fact, the LLM itself doesn’t even use the tools (just like the LLM doesn’t “remember” things or “search”). The LLM is a floating brain-in-a-jar. Brains-in-jars don’t use tools.

Rather, there’s an “orchestrator” — a piece of backend infrastructure - that coordinates sending the ultimate inquiry to the LLM brain, and receiving the results. And the LLMs have been trained in such a way that they can be given a list of tools (called “tool specifications”), and that when they want to use that tool (“I think it’d be appropriate to search the web now”), they emit a certain sequence of tokens/words that indicate this.

Once they emit those tokens/words, the orchestrator takes over, and does the programmatic thing for the LLM. Once that thing is done, there is presumably some result (“success!” or “here are 10 restaurants in New York from Yelp” or whatever), and then the orchestrator passes that result back to the LLM to finish its work.

People sometimes call this type of architecture “agentic” — meaning that the LLM doesn’t always just spit back an answer immediately, but rather (managed by the non-LLM orchestrator) goes through some “loops” of trying different tools, thinking, and assessing its progress before ultimately saying “I’m done!” and returning the result back to the user.

There has been a lot of buzz about “MCP” technology. It is indeed cool (albeit with some risks). A full breakdown is out of scope here, but the simplest way to think about it is that “MCP” is just a standard by which tools can be implemented across many different LLMs. That is, it’s a unification of how services should respond to LLM-initiated calls and how they should expose their abilities to those LLMs and their orchestrators. “MCP” itself doesn’t really do anything — it’s a standard that other tools follow.

But LLMs can use non-MCP tools too! In fact, many LLM systems just take MCP specifications and transform them to more “traditional” tool specifications.

Anyway. Let’s sum it all up by showing our diagram again. But this time, let’s highlight in red the parts that are actually in your control (typically):

Context Engineering Model

System prompt

Model system prompt
"You are GPT 4-1..."

Product system prompt
"The user is using interface0 to interact with you..."

Personas / custom instructions
"Be clear and concise..."

Tooling (incl. MCP)

Available tools
"[searchWeb], [sendEmail], ..."

Tool definitions
"searchWeb: requires term, ..."

Recent tool outputs
"searchWeb_1 returned..."

Injected knowledge / context

Short-term memory
"User recently asked about..."

Long-term memory / preferences
"User is named Andy, lives in..."

Retrieved data ("RAG")
"Relevant docs: [file.docx], ..."

User-submitted data
"User uploaded a file..."

Call parameters

Temperature
Creativity level

Max tokens
Budget for length of prompt

Logit bias
Specific term bias

Other
Tool-choice, etc.

Conversation history + new prompt

User: "What types of stores..."

Assistant: "Hardware stores..."

User: "what are the hours..."

All that explanation for just five parameters!

Personas/custom instructions: I think this is underrated by people right now. You can ask the model to behave however you want it to! Use this! If you have stylistic or intelligence preferences, or anything else, put them in here. (I believe this is partially because most frontends make it hard to edit this on a message-by-message basis.)
Available tools: interfaces are increasingly exposing a list of tools to you (connect to your Gmail, search the web, etc.). Intelligent usage here is key. Cursor and other AI IDEs make heavy use of tools, and that’s a big part of why they are so useful.
RAG: you don’t directly choose the queries that lead to the retrieved documents, but, especially in enterprise settings, you can choose which datastores you are connecting for queryability (as well as the search solutions you’re using). This is a key lever for performance. In consumer use cases this is less relevant.
User-submitted data: more context is better! Attach files whenever you think it would be helpful. Again, features like Projects or interface0’s “knowledge” help make this easier, so you aren’t re-attaching all the time. Again, Cursor and other AI IDEs make it trivial to attach files that are relevant, and it makes an enormous performance difference.
Your prompt: last of all, there’s what you ultimately send in the chatbox. This, of course, matters greatly.

Much has been written about the prompt itself, and so I’ll spare you the words. But the one thing to note is that different models perform best with different prompts. Models like o3 pro need a lot of detail to perform at their best (see this great writeup). Other models can get by with less.

Consider the model when putting together the prompt (p.s. this is why interface0’s “enhance prompt” button returns different enhancements depending on the model you’re using!).

The full picture

Here’s a rough sketch of the whole flow:

You type in your prompt and attach files
The application enriches that with memories or data it retrieves
It then compiles the full context, including pulling together the various system prompts, the tools selected, the call parameters, all the injected context, and the prompt and conversation history
It drops that to the orchestration / LLM services
The orchestrator then fires off a request to the LLM brain itself, and they together may decide to call some tools, do some agentic looping, whatever
Eventually the LLM decides it’s done, sends the results to the orchestrator, and you get your response

In an image:

When you sum it all up, here’s what the full request ends up looking like. I’ve taken some creative liberties and removed some of the true technical details here, but it’s directionally accurate.

LLM Call Data Structure

I’ve been shamelessly plugging interface0 throughout this post — but it’s because I have genuinely built it to solve my problems with making LLMs more performant. The Personas and Knowledge features (and Template Prompts) are a direct result of my frustration with the main interfaces to these models having poor context engineering capabilities.

I will mention one more feature I use a lot: Summarize Chat. Like I mentioned, models can run out of context capacity, and performance will start to degrade (or they will fail entirely). I built interface0 so that with one click, you can generate a summary of a long chat. That summary can then be turned into a “Knowledge entry,” which you can then tag in on any subsequent chat.

So I’ll often run short on context length, hit Summarize, add it as a new context, and then start a new chat, tagging in that context to get things rolling without missing a beat.

I’m looking forward to adding more context engineering features as well! If you have any ideas, I’m all ears.

Plus: any feedback on this post? Let me know.

Looking for more to read?

Want to hear about new essays? Subscribe to my roughly-monthly newsletter recapping my recent writing and things I'm enjoying:

And I'd love to hear from you directly: andy@andybromberg.com