# Ellie / Big Brain — LinkedIn drafts (v4)

Six posts, three variants each, long + short cuts. Voice: Ellie's (British, journal-like, anti-pitch).

Built end-to-end through the Snappy skill system. 18 editorial passes on Posts 01-03, fresh drafts for Posts 04-06, 18 short cuts.

---

## Post 01 — The 2023 → 2026 rebuild arc

### Long

#### Reflective

There's a project I've been carrying around for a couple of years. I started building it in late 2023, got most of the way there, ran out of money before I could finish, and went back to client work. Last month, with a bit of breathing room, I opened the file again.

The first version lived on Xano. It mostly worked. The architecture is the same architecture I'm using now, drawn on a whiteboard in 2023, and the thinking still holds up. But every conditional in the system had to be told, very explicitly, what to look for. Stitching it together took an outrageous amount of time. Setting up something like a regenerate flow was a day's work in Wized. I never quite got to the end of it.

Coming back to the same idea now, what's surprised me is how much of the labour the model does. The architecture I drew is the architecture that runs. The wiring that used to take a day takes about five minutes. The regenerate flow I never got to ship in 2023 was working before lunch on a Tuesday, and I noticed it because I'd gone to make a coffee expecting to come back to a half-broken state.

A consultant friend once told me ideas have a season. I think mine had one and I just didn't have a long enough runway to meet it. This doesn't really feel like a comeback. It feels more like the same plan finally being affordable to deliver.


---

If you've shelved something because the build cost too much, the file might be worth opening again. I'll share more of what I'm building over the next few weeks. I've been calling it Big Brain.


#### Build-log

Spent the last few weeks rebuilding something I first tried to make in late 2023. Same architecture, second attempt.

The first version lived on Xano: a routing endpoint that fired off internal endpoints depending on what the AI was being asked to do. Generating an audience segment, updating a problem statement, pulling context for a piece of copy, each one a separate dispatch with conditionals stacked underneath, template literals doing find-and-replace inside prompt strings. It worked, mostly. It was also slow and brittle, and I never got to the end of it before the money ran out.

This time I'm on Vercel, with Neon for storage, FalkorDB for the knowledge graph, and Gemini direct for the model calls. The architectural thinking is the same. What's changed is who's doing the wiring. I describe the behaviour I want, the model writes the code, and the things that used to take me a day take five minutes.

I'm not sure what to do with that observation other than note it. The bottleneck on this kind of work used to be my time stitching pieces together. It isn't really my time any more. It's the clarity of what I'm asking for, which turns out to be a different muscle entirely.


---

If you shelved a build for cost reasons, the maths might have moved. I've been calling this one Big Brain. More soon.


#### Lesson

I think a lot about ideas having seasons. Some land at the right moment, get built, do their thing. Others arrive too early. You carry them around without quite knowing what to do with them, and by the time the ground catches up you've moved on. Or you've run out of money trying to make them real, which is what happened to me in late 2023.

The project I shelved then is a context-management layer for AI work. The thinking was that if you give an AI proper context, your audience and your offer and your current problem statement and the threads connecting them, it stops being a smart-but-amnesiac assistant and starts being something closer to a colleague who already knows where you are. That feels obvious now. It was less obvious in 2023.

I've spent the last few weeks rebuilding it, and the architecture has barely changed. What's changed is the cost of the labour to wire it. The idea was right. I just couldn't afford to deliver it on the timeline the season was open, which is its own kind of lesson and not a very tidy one.


---

I've been calling the rebuild Big Brain. The lesson, if there is one, isn't that you should wait for the right moment. It's that if you have to shelve something, you should keep the notebook. The notebook is the thing that survives the gap.


### Short (~80 words)

#### Reflective

There's a project I've been carrying around for a couple of years. Started in late 2023, ran out of money, went back to client work. Last month, with a bit of breathing room, I opened the file again.

The architecture I drew on a whiteboard in 2023 is the architecture that runs now. Every conditional in the old version had to be told, very explicitly, what to look for, and stitching it together took an outrageous amount of time. The wiring that used to take a day takes about five minutes. The regenerate flow I never got to ship was working before lunch on a Tuesday. I noticed because I'd gone to make a coffee expecting to come back to a half-broken state.


---

Doesn't feel like a comeback. Feels more like the same plan finally being affordable to deliver. I've been calling it Big Brain.


#### Build-log

Spent the last few weeks rebuilding something I first tried to make in late 2023. Same architecture, second attempt.

First version was Xano: a routing endpoint dispatching to internal endpoints by intent, conditionals stacked underneath, template literals doing find-and-replace inside prompt strings. Slow and brittle, and I never got to the end of it before the money ran out. This time it's Vercel, Neon, FalkorDB, Gemini direct. I describe the behaviour I want, the model writes the code, and things that used to take me a day take five minutes.

The bottleneck on this kind of work used to be my time stitching pieces together. It isn't really my time any more. It's the clarity of what I'm asking for, which turns out to be a different muscle entirely.


---

If you shelved a build for cost reasons, the maths might have moved. Calling it Big Brain.


#### Lesson

I think a lot about ideas having seasons. Some land at the right moment and get built. Others arrive too early, and you carry them around without quite knowing what to do with them, or you run out of money trying to make them real, which is what happened to me in late 2023.

The project I shelved then is a context-management layer for AI work. I've spent the last few weeks rebuilding it. The architecture has barely changed. What's changed is the cost of the labour to wire it. The idea was right. I just couldn't afford to deliver it on the timeline the season was open.


---

The lesson isn't that you should wait for the right moment. It's that if you have to shelve something, keep the notebook. The notebook is the thing that survives the gap.


---

## Post 02 — The context-layer thesis

### Long

#### Conversational

I got tired of pasting my bio into ChatGPT every time I opened a new chat. The same orientation, copied in, every session: who the audience is, what the offer is, what I'm working on this week. Then the back-and-forth about not that audience, the other one, that's the offer but the problem statement has shifted, try again. By the time I was actually working, I'd had the same conversation about who I am four times that week.

So I've been building a layer underneath the chat. A single source of truth I can talk to. It holds my audience segments, my offers, my problem statements, and the threads connecting them. The AI pulls from there instead of from a fresh paste-in.

The first time I noticed the difference, I was in the middle of a different piece of work entirely. I'd opened a chat to fix a bit of copy and realised I'd already been working for ten minutes without typing my standard preamble. It hadn't occurred to me to type it. The model wasn't asking. The way you sometimes carry a heavy bag for half an hour and only feel the relief once you put it down, that was the shape of it.

Most of the conversation about getting AI to know you ends up in technical territory: longer context windows, retrieval, embeddings. Those are the mechanics. The thing I'm chasing isn't a mechanism. It's what it feels like when the model joins you mid-sentence, already oriented, and you don't have to introduce yourself again.


---

Quietly different, in practice. Less rehearsing my own situation back to a model. More actual work.


#### Definition-led

There's a phrase floating around, context is the new prompt, that I think is mostly right but mostly under-explored. People reach for it and immediately land on the technical reading: longer context windows, RAG, retrieval, embedding-based memory. Those are the mechanics. They are also the easiest part to talk about, which is probably why the phrase has flattened into an engineering note.

The interesting question is the lived one. What does it actually feel like to work with an AI that already knows who you are? Most of us don't find out. We start fresh. We paste in our bio, our audience, our current project, our voice, every session, and we treat the AI like a clever stranger and pay the cost of that re-orientation every time. The slogan is technically satisfied by a longer window. The experience it's pointing at isn't.

What I've been building is the layer underneath. A single source of truth I can talk to in natural language. Audience segments, offers, problem statements, the threads between them. The AI reads from there. I stop briefing and start working.

The thing that's surprised me most isn't the speed, although the work is faster. It's that my underlying picture of who I'm serving has become sharper. The cost of changing my mind dropped, so I change it more, and the picture stays current. That second effect is the one I keep turning over in my head, because it didn't show up in the spec when I drew the architecture.


---

I've been calling this Big Brain. More soon.


#### Personal shift

Something has changed about how I work in the last few weeks and I'm still figuring out how to talk about it.

The change is that I no longer brief AI. The orientation I used to type out at the start of every session, in some form, every single day: who the audience is, what the offer is, what I'm working on. That work is done now. It happens once, lives in a database, and the model reads from there. When I open a chat I am already in the middle of the work.

It sounds small. The first time it happened I asked for a thing and got it without the four rounds of "no, the other audience, that's the offer from a different segment," and I sat with the result for a minute, slightly confused about why I had a clear afternoon ahead of me. There's a fluency that arrives when the system already knows what you mean. I hadn't noticed how much of my mental cycles were going on the warm-up until they stopped.

I've been building this layer for a few weeks now. A single source of truth, talkable, that holds my audience segments and offers and problem statements and the connections between them. I've been calling it Big Brain. Most of the discourse around getting AI to know you stays at the level of mechanism, longer windows and retrieval and memory plumbing, but the part I think is actually interesting is the lived experience of opening a chat and finding the AI already knows where you are.


---

If you're still pasting your bio into a fresh chat, you might be able to feel what I mean. There's a weight in that warm-up you only notice once it's gone.


### Short (~80 words)

#### Conversational

I got tired of pasting my bio into ChatGPT every time I opened a new chat. The same orientation, copied in every session, then the back-and-forth about not that audience, the other one. By the time I was actually working, I'd had the same conversation about who I am four times that week.

So I built a layer underneath the chat. Audience segments, offers, problem statements in a database. The AI reads from there. The first time I noticed the difference, I'd already been working for ten minutes without typing my standard preamble. It hadn't occurred to me to type it. The way you sometimes carry a heavy bag for half an hour and only feel the relief once you put it down, that was the shape of it.


---

Less rehearsing my own situation back to a model. More actual work.


#### Definition-led

Context is the new prompt, people say, and immediately land on longer windows and RAG. Those are the mechanics. They satisfy the slogan technically and miss what it's pointing at.

The interesting question is the lived one. What does it feel like to work with an AI that already knows who you are? Most of us don't find out, because we start fresh. We paste in our bio, our audience, our current project, our voice, every session, and we pay the cost of that re-orientation every time. So I've been building the layer underneath. A single source of truth I can talk to in natural language, holding my audience segments, my offers, my problem statements, and the threads between them. The AI reads from there. I stop briefing and start working.


---

I've been calling it Big Brain. More soon.


#### Personal shift

Something has changed about how I work in the last few weeks. I no longer brief AI. The orientation I used to type out at the start of every session, in some form, every single day, lives in a database now. The model reads from there, and when I open a chat I'm already in the middle of the work.

The first time I got a thing without four rounds of "no, the other audience, that's the offer from a different segment," I sat with the result for a minute, slightly confused about why I had a clear afternoon ahead of me. There's a fluency that arrives when the system already knows what you mean. I hadn't noticed how much of my mental cycles were going on the warm-up until they stopped.


---

There's a weight in that warm-up you only notice once it's gone.


---

## Post 03 — Talk-to-update-truth (the demo)

### Long

#### Story

Something happened the other day that I want to write down before I forget what surprised me about it.

I was in a chat, working through a client's problem statement, and I said something like "actually, the audience for this one is closer to mid-stage founders than early-stage." The audience-segment row updated, the change was logged against that conversation, and the corrected version got pulled into the next piece of work. I edited a database row by talking to it. No form, no admin panel, no context-switch.

It's the most boring possible demo to describe. A chat updates a row. I know. But using it has reorganised something about how I work, and I think the mechanic is worth noting.

Years ago I had a notebook habit that fell apart for the same reason. The thinking would happen in conversation with someone, and I'd promise myself I'd write it down later, and later was a different mode I never got back into. The act of leaving the conversation to record it was the thing that killed the practice. Most software has the same problem in miniature: maintaining the system and using the system are two separate activities, and the switching cost is small per-instance and enormous in aggregate. Once the chat is the editor, the maintenance happens inside the work, not next to it.

Two things I didn't expect. Every change is provenance-tagged automatically, so I can trace why a problem statement reads the way it does. And the cost of changing my mind has dropped enough that I edit far more often than I used to, which means the picture of who I'm serving stays current instead of slowly going stale.


---

Building on Vercel, Neon, and a knowledge graph in FalkorDB. If you're working on something similar I'd happily compare notes on the wiring.


#### Observation-led

The bigger surprise first. Every change I make is provenance-tagged automatically, and I can ask the system, in conversation, why a problem statement reads the way it does, and it tells me which prior conversation produced the wording. That kind of legibility used to require deliberate annotation, the kind you put off doing and then can't reconstruct. Now it's a free byproduct of doing the work.

The smaller surprise, although it might end up mattering more, is that I edit far more often. The cost of changing my mind has dropped. There's no admin panel to navigate to, no form to fill. If I notice mid-sentence that the audience description is no longer right, I say so, and the underlying row updates and gets logged. The picture of who I'm serving is sharper than it was when editing was a chore.

The plumbing itself is unremarkable. A chat that updates a database when I describe a change in conversation. "Actually, the audience for this one is closer to mid-stage founders" and the relevant row updates, with provenance attached, and the next piece of work pulls the new version. CRUD by talking. Boring on paper.

The general pattern feels worth naming. Forms make you context-switch out of the work to maintain the system. Once the chat is the editor, maintenance happens inside the work. The savings aren't really in clicks, they're in the cognitive cost of switching modes, which is the kind of cost you stop noticing once you've internalised it.


---

Building on Vercel, Neon, and FalkorDB. If you're working on something similar I'd happily compare notes on the wiring.


#### Mechanic

Most software splits the world into two activities: doing the work, and maintaining the system that holds the work. You write a thing, then you go to admin to update a record. You make a decision in conversation, then you copy it into the right form. The mental tax of switching between modes is small per-instance and enormous in aggregate, and most of us no longer notice it because we've internalised it as part of doing the work.

I built a small piece of plumbing recently to see if I could collapse the two. The setup is simple. A chat with my context layer in the back. When I describe a change in normal conversation ("the audience is closer to mid-stage founders, not early-stage") the audience-segment row updates, the edit is logged against that conversation, and the corrected version gets pulled into the next piece of work. The talking and the curating happen in the same pass.

It's a small change in the plumbing. The effect on how the work feels is bigger than I expected. Two things I'm watching. Every edit is provenance-tagged automatically, which makes the system legible in a way it wasn't before, and on top of that I edit more often, because the cost of changing my mind dropped, so I update the picture as my thinking moves instead of waiting for the next strategy review.


---

The general pattern, I think, is something like the chat being the editor. I'm sure others are working on this from different angles. Building on Vercel, Neon, and a knowledge graph in FalkorDB. If you're working on something similar I'd happily compare notes on the wiring.


### Short (~80 words)

#### Story

I was in a chat, working through a client's problem statement, and said "actually, the audience for this one is closer to mid-stage founders than early-stage." The audience-segment row updated, the change was logged against that conversation, the next piece of work pulled the corrected version. I edited a database row by talking to it. No form, no admin panel, no context-switch.

Boring demo. But it's reorganised something about how I work. Years ago I had a notebook habit that fell apart because writing things down was a different mode I never got back into. The act of leaving the conversation to record it killed the practice. Most software has that problem in miniature. Once the chat is the editor, maintenance happens inside the work, not next to it.


---

Vercel, Neon, FalkorDB. Happy to compare notes on the wiring.


#### Observation-led

The plumbing is unremarkable on its own. A chat that updates a database when I describe a change in conversation. "Actually, the audience for this one is closer to mid-stage founders" and the relevant row updates, with provenance attached, and the next piece of work pulls the new version. CRUD by talking.

The bigger surprise: every change is provenance-tagged automatically. I can ask why a problem statement reads the way it does and the system tells me, in conversation, which prior conversation produced the wording. That kind of legibility used to require deliberate annotation. The smaller surprise, although it might end up mattering more, is that I edit far more often, because the cost of changing my mind dropped. The picture of who I'm serving stays current instead of slowly going stale.


---

Vercel, Neon, FalkorDB. Happy to compare notes on the wiring.


#### Mechanic

Most software splits the world into doing the work and maintaining the system that holds the work. You leave the work to update the work. The mental tax of switching between modes is small per-instance and enormous in aggregate, and most of us no longer notice it because we've internalised it as part of doing the work.

I built a small piece of plumbing to collapse the two. A chat with my context layer in the back. When I describe a change in normal conversation, the audience-segment row updates, the edit is logged against that conversation, and the corrected version gets pulled into the next piece of work. The talking and the curating happen in the same pass. Side effect: every edit is provenance-tagged automatically, so the system is legible in a way it wasn't before.


---

Vercel, Neon, FalkorDB. Happy to compare notes on the wiring.


---

## Post 04 — Gemini-direct dev tip

### Long

#### Tip

Quick note for anyone doing image generation through an aggregator. I've been routing image calls through Google Gemini direct rather than via OpenRouter, and the difference has been larger than I expected.

The setup is unfussy. A small custom router in front of my model calls. It defaults to Gemini for most things because the free tier is generous, and falls through to other providers for higher-tier work. For image generation specifically, going to Google directly cuts a lot of round-trip overhead. Same model, fewer hops.

The wider point I keep noticing: aggregators are great for breadth, and great for quickly testing across models, but once you've decided on a primary, the source is usually faster, cheaper, and easier to debug. With image gen the gap is particularly visible because every second of latency is felt, in a way it isn't with text. You sit there watching a spinner. Document analysis benefits in the same way for the same reason.

I'm not arguing against aggregators. They earn their keep when you genuinely don't know which model is right yet, or when you want a single billing surface. Once you do know, the small custom router pattern is, in my experience, worth the half-day it takes to write.


---

Happy to share the router pattern if anyone's looking at the same trade-off.


#### Numbers

Image generation calls dropped from around sixty seconds to around ten when I stopped routing through an aggregator. Same model. Different path.

I had been calling Google's image models through OpenRouter because it kept billing simple and let me swap providers easily. Once I committed to Google for image work specifically, I rewrote that bit of my router to call the API directly. Latency fell to roughly a sixth of where it had been. Costs went down at the same time. Document analysis tracked similar gains.

I don't think this is an OpenRouter complaint exactly. Aggregators are doing real work; you pay a bit of latency and margin for breadth and a unified bill. The trade-off is fine when you're still deciding which model to use, and it stops making sense once you know the answer for a given task.

For image gen, the answer for me has been Gemini direct. The free tier is generous enough to absorb most of my volume, and the latency makes the experience feel like working with a tool, not waiting on one. For text I still use the router so I can swap models freely. Per-task routing turns out to be worth the small upfront cost, mostly because the calls that used to take long enough to break flow now don't.


---

If your image calls feel sluggish, the path is worth checking before the model is.


#### Provider thinking

There's a small architectural choice I keep coming back to: when to go to the source and when to go through the aggregator.

For a while I routed almost everything through OpenRouter. It's lovely for breadth. One billing surface, easy to swap models, low-friction way to test how a prompt behaves across providers. The cost is a little latency and a little margin, both fair.

Once I knew which model I wanted for which job, the calculus shifted. For image generation I now go to Google directly. The latency improvement is noticeable enough that the experience changes shape; calls that used to take long enough to break flow now don't. For document analysis, similar. For text generation across multiple models, I still keep the aggregator path because the value of swap-friendliness is real there.

The pattern, I think, is that aggregators are best treated like a discovery layer. You use them to figure out what you want. Once you know, you wire that specific route directly. The router in front of my calls now does both. It knows which providers each task should hit directly, and it falls back to the aggregator for anything still in flux.

It's a small architectural distinction, but it's the kind of thing that shows up in the experience of using your own tools. Most of the AI plumbing decisions I'm pleased with in retrospect are like this. Boring on paper, immediately felt in the day's work.


---

Happy to share the router pattern if anyone's working through the same trade-off.


### Short (~80 words)

#### Tip

Quick note for anyone doing image generation through an aggregator. I've moved my image calls to Google Gemini direct rather than via OpenRouter and the difference has been bigger than I expected. Same model, fewer hops, much faster, cheaper. The setup is unfussy: a small custom router in front of my model calls that defaults to Gemini because the free tier is generous, and falls through to other providers for higher-tier work.

Aggregators are great for breadth, and for quickly testing across models. Once you've picked a primary for a given task, the source is usually faster, cheaper, and easier to debug. With image gen the gap is particularly visible because every second of latency is felt, in a way it isn't with text. You sit there watching a spinner.


---

Happy to share the small router pattern if anyone's looking at the same trade-off.


#### Numbers

Image gen calls dropped from around sixty seconds to around ten when I stopped routing through an aggregator. Same model, different path. Costs fell at the same time. Document analysis tracked similar gains.

I had been calling Google's image models through OpenRouter because it kept billing simple and let me swap providers easily. Once I committed to Google for image work specifically, I rewrote that bit of my router to call the API directly. Aggregators are fine when you're still deciding which model is right. Once you know the answer for a given task, the source usually wins on latency and on price, often by a lot more than I'd assumed. The calls that used to take long enough to break flow now don't.


---

If your image calls feel sluggish, the path is worth checking before the model is.


#### Provider thinking

Small architectural choice I keep coming back to: source versus aggregator. Aggregators are lovely for breadth and for figuring out what you want. One billing surface, easy to swap models, low-friction way to test how a prompt behaves across providers.

Once I knew which model I wanted for which job, the calculus shifted. For image generation I now go to Google directly. For document analysis, similar. For text generation across multiple models, I still keep the aggregator path because the value of swap-friendliness is real there. The router in front of my calls now does both: direct for tasks I've committed to, aggregator for anything still in flux.

It's a small distinction, but the kind of thing that shows up in the experience of using your own tools. Boring on paper, immediately felt in the day's work.


---

Happy to share the pattern if anyone's working through the same trade-off.


---

## Post 05 — Xano post-mortem

### Long

#### Reflection

I want to be careful about how I talk about Xano because it's a tool I genuinely respect, even though it stopped being the right shape for what I was building.

The version of Big Brain I tried to make in late 2023 lived on Xano. A routing endpoint that dispatched to internal endpoints by intent: generating an audience segment, updating a problem statement, pulling context for a piece of copy. Inside those endpoints, conditionals checking the shape of the input, and template literals doing find-and-replace inside prompt strings. It worked. It also took an outrageous amount of time to wire.

The honest reckoning is that I was using a no-code tool to build something that wanted to be a small piece of code. Each time the requirements shifted I was paying the no-code tax twice: once to express the change in the builder, once to keep all the conditionals consistent. That cost was bearable for a while, then it wasn't. The money ran out before the system did.

None of that is Xano's fault. Xano is excellent for shaped, repeatable backend work. What I was building was less shaped and more recursive, and the mismatch was on me. I think about it the way you sometimes think about a flat you outgrew: nothing wrong with the flat, you just need different walls.


---

The tool that lets you ship the first version is sometimes the one that stops you shipping the third. It's worth thinking about that before you commit, not after.


#### Mismatch

Looking back, the reason my 2023 build kept falling over wasn't really the tool I built it on. It was that I'd picked a tool whose shape didn't match the shape of the problem.

I was building a system where the AI's needs change continuously. New audience segment shapes, new prompt structures, new pieces of context flowing in from new sources. The problem was inherently recursive: prompts referencing prompts, endpoints calling endpoints, behaviour deciding behaviour. The tool I'd picked was a visual backend builder optimised for the opposite shape, shaped requests and predictable schemas and well-defined CRUD.

It was like trying to draft a novel inside a spreadsheet. You can do it. The cells will hold the words. But every time the structure of the story changes you spend more time reformatting cells than writing prose, and eventually you give up and open a text editor.

The rebuild this year is on Vercel with Neon for storage and FalkorDB for the knowledge graph. Code, not visual nodes. The shape of the substrate matches the shape of the problem. Recursive logic stays in code, schema lives in the database, and the model writes most of the wiring. The thing that surprises me, having had a year off it, is how much of the friction I'd accepted as just-the-cost-of-building turned out to be friction with the tool, not with the work.


---

If you've been struggling with a build, the first question I'd ask isn't whether the tool is any good. It's whether the tool is the right shape for what you're trying to do. Different question, different answers.


#### Practical

If I were starting the same project again from scratch tomorrow, here's the substrate I'd reach for and why.

Postgres for the relational data, in my case Neon. Audience segments, offers, problem statements, anything with a shape I want to query in a structured way. A graph database for the relationships between those things, in my case FalkorDB. The graph is where the contextual richness lives, the threads that turn a list of facts into something the model can reason over. Code on Vercel for the application layer, deployed continuously, and a small custom router in front of model calls so I can pick the right provider per task.

What I wouldn't do is build the orchestration logic in a no-code backend. Not because no-code is bad, but because the problem I'm working on (a system whose internal shape keeps changing) isn't well-served by the constraints that make no-code productive elsewhere. Template literals as find-and-replace inside prompt strings work for the first version and become a quiet drag on every version after. I felt the drag for months before I named it.

The first time around I picked the tool because it let me build fast. That was the right instinct for the first ten percent and the wrong instinct for the next ninety, and the difficulty is that the wrong instinct doesn't announce itself until you're well past the point where switching is cheap.


---

The fortieth checkpoint is the one that kills you, not the first. Worth weighing the substrate against that, not the demo.


### Short (~80 words)

#### Reflection

I want to be careful about how I talk about Xano because it's a tool I genuinely respect, even though it stopped being the right shape for what I was building. My 2023 attempt at Big Brain lived on Xano, and it worked, but every conditional had to be told what to look for explicitly and stitching the system together took an outrageous amount of time.

The honest reckoning is that I was using a no-code tool to build something that wanted to be a small piece of code, paying the no-code tax twice each time the requirements shifted. None of that is Xano's fault. I think about it the way you sometimes think about a flat you outgrew. Nothing wrong with the flat, you just need different walls.


---

The tool that lets you ship the first version is sometimes the one that stops you shipping the third.


#### Mismatch

The reason my 2023 build kept falling over wasn't the tool. It was the shape of the tool against the shape of the problem. I was building a system whose AI needs change continuously, prompts referencing prompts and behaviour deciding behaviour. The tool I'd picked was a visual backend builder optimised for the opposite shape, shaped requests and predictable schemas. Like drafting a novel inside a spreadsheet. The cells will hold the words, but every time the structure of the story changes you spend more time reformatting cells than writing prose, and eventually you give up and open a text editor.

The rebuild is in code on Vercel with Neon and FalkorDB. The shape of the substrate matches the shape of the work, and most of the friction I'd accepted as just the cost of building turned out to be friction with the tool.


---

Different question, different answers.


#### Practical

If I were starting the same project from scratch tomorrow: Postgres on Neon for the shaped data, FalkorDB for the relationship graph, code on Vercel for the application layer, and a small custom router in front of model calls so I can pick the right provider per task.

What I wouldn't do is build the orchestration in a no-code backend. Template literals as find-and-replace inside prompt strings work for the first version and become a quiet drag on every version after. First time round I picked the tool that let me build fast. Right instinct for the first ten percent, wrong instinct for the next ninety, and the wrong instinct doesn't announce itself until you're past the point where switching is cheap.


---

The fortieth checkpoint is the one that kills you, not the first.


---

## Post 06 — Self-learning skills with gates

### Long

#### Incident

A small thing went wrong in my build last week that I think is worth writing down.

I run my project with self-learning skill files: small markdown documents that tell the model how to work in a given area of the system. Design system, data layer, prompts, deployment. The model reads the relevant skill before it acts, which keeps it pointed in roughly the right direction.

Last week one of the skills wasn't enforcing the design system. The check was missing. The model didn't know it was supposed to use design tokens, so it inlined styles directly into components every time it touched the UI. Several components were quietly destroyed in the time it took me to make a coffee.

I went looking for the fix expecting something subtle. It wasn't. I added explicit lines to the skill: if you're touching a component, here are the tokens you must use; if a token doesn't exist, stop and ask, don't invent. I added a gate that flags inline styles as a deviation. Both checks belong in the skill file, because the skill file is what the model reads before it acts. Putting them anywhere else would have been theatre.


---

A missing check looks identical to a present check until something goes wrong. Worth auditing the skills you trust most.


#### Architecture

I keep coming back to gates as the bit of the self-learning-skills setup that does the load-bearing work, even though the skill file itself gets all the attention.

My setup is a set of skill files: small markdown documents that describe how to work in a given area. Design system, data layer, prompts, deployment. The model reads the relevant skill before it acts, and the skill is where rules and checks live together.

Without gates, a skill is essentially a suggestion. The model reads it and does its best, but on a long enough timeline it drifts in ways that are hard to spot until something obvious breaks. With gates, the same skill becomes a contract. "If you are touching a component, use these tokens, do not inline styles" up top, and at the bottom: "if the diff contains inline styles, this is a deviation, surface it." The model behaves better, and the more important shift is that the system becomes auditable. I can read the skill and know what the model was held to.

Gates also convert hand-wavy expectations into checkable ones. The act of writing the gate forces you to decide what "correct" actually means, which is a quietly demanding piece of work. A lot of the time the bug isn't that the model doesn't know the rule. It's that the rule was never stated precisely enough to be checked, and the gate is what surfaces the imprecision.


---

Cheapest piece of infrastructure I've added. Paid back most.


#### Governance

There's a debate quietly running through agentic-systems work that I keep wanting to weigh in on: do agents need rails or freedom.

The freedom argument is that constraints stifle the model, that you should give it tools and a goal and let it figure out the rest. The rails argument is that any non-trivial system has invariants that have to be preserved, and those invariants don't enforce themselves.

My experience is squarely on the rails side, although the rails are softer than the word suggests. In my build the rails are skill files: small markdown documents per area, with explicit checks and gates. The design system skill says which tokens to use and flags inline styles. The data skill says which columns are write-protected. The deployment skill says what cannot ship without a passing test run.

None of those rules diminish what the model can do. They define the surface within which what it does is acceptable. The first time I ran the system without a rail in the design-system skill, the model quietly destroyed several components by inlining styles in the time it took me to make a coffee. Not maliciously. A missing rule is, from the model's point of view, the same as no rule, and the model treated the absence as permission.

The freedom is what the rails make possible. Without them, the model isn't free, it's untethered, and there's a difference.


---

If you're nervous about giving an agent more autonomy, the question isn't how much freedom it can handle. It's what rails you haven't written yet.


### Short (~80 words)

#### Incident

I run my project with self-learning skill files: small markdown documents that tell the model how to work in a given area. Design system, data layer, prompts, deployment. Last week one of them wasn't enforcing the design system. The check was missing. The model didn't know it was supposed to use design tokens, so it inlined styles every time it touched the UI. Several components quietly destroyed in the time it took me to make a coffee.

I went looking for the fix expecting something subtle. It wasn't. Explicit lines in the skill, plus a gate that flags inline styles as a deviation. Both belong in the skill file because the skill file is what the model reads before it acts. Anywhere else would have been theatre.


---

A missing check looks identical to a present check until something goes wrong.


#### Architecture

Gates are the bit of the self-learning-skills setup that does the load-bearing work, even though the skill file itself gets all the attention. Without them, a skill is essentially a suggestion, and on a long enough timeline the model drifts in ways that are hard to spot until something obvious breaks. With them, the same file becomes a contract. "If the diff contains inline styles, this is a deviation, surface it." The system becomes auditable. I can read the skill and know what the model was held to.

Useful side-effect: writing the gate forces you to decide what correct actually means, which is a quietly demanding piece of work. Most bugs aren't ignorance. They're rules that were never stated precisely enough to check.


---

Cheapest infrastructure I've added. Paid back most.


#### Governance

Quiet debate in agentic-systems work: rails or freedom. The freedom argument is that constraints stifle the model. The rails argument is that any non-trivial system has invariants that don't enforce themselves.

My experience is squarely on the rails side, although the rails are softer than the word suggests. In my build the rails are skill files with explicit checks: which tokens to use, which columns are write-protected. They don't diminish what the model can do. They define the surface within which what it does is acceptable. First time I ran the system without a rail in the design-system skill, the model quietly destroyed several components by inlining styles in the time it took me to make a coffee. A missing rule is, from the model's point of view, the same as no rule.


---

The question isn't how much freedom. It's what rails you haven't written yet.


---