转：Eight years of wanting, three months of building with AI

Rolei_zl

113人浏览 · 2026-04-13 17:44:56

Rolei_zl · 2026-04-13 17:44:56 发布

Eight years of wanting, three months of building with

Eight years of wanting, three months of building with AI

Apr 5, 2026·Essay

For eight years, I’ve wanted a high-quality set of devtools for working with SQLite. Given how important SQLite is to the industry1, I’ve long been puzzled that no one has invested in building a really good developer experience for it2.

A couple of weeks ago, after ~250 hours of effort over three months3 on evenings, weekends, and vacation days, I finally released syntaqlite (GitHub), fulfilling this long-held wish. And I believe the main reason this happened was because of AI coding agents4.

Of course, there’s no shortage of posts claiming that AI one-shot their project or pushing back and declaring that AI is all slop. I’m going to take a very different approach and, instead, systematically break down my experience building syntaqlite with AI, both where it helped and where it was detrimental.

I’ll do this while contextualizing the project and my background so you can independently assess how generalizable this experience was. And whenever I make a claim, I’ll try to back it up with evidence from my project journal, coding transcripts, or commit history5.

Why I wanted it

In my work on Perfetto, I maintain a SQLite-based language for querying performance traces called PerfettoSQL. It’s basically the same as SQLite but with a few extensions to make the trace querying experience better. There are ~100K lines of PerfettoSQL internally in Google and it’s used by a wide range of teams.

Having a language which gets traction means your users also start expecting things like formatters, linters, and editor extensions. I’d hoped that we could adapt some SQLite tools from open source but the more I looked into it, the more disappointed I was. What I found either wasn’t reliable enough, fast enough6, or flexible enough to adapt to PerfettoSQL. There was clearly an opportunity to build something from scratch, but it was never the “most important thing we could work on”. We’ve been reluctantly making do with the tools out there but always wishing for better.

On the other hand, there was the option to do something in my spare time. I had built lots of open source projects in my teens7 but this had faded away during university when I felt that I just didn’t have the motivation anymore. Being a maintainer is much more than just “throwing the code out there” and seeing what happens. It’s triaging bugs, investigating crashes, writing documentation, building a community, and, most importantly, having a direction for the project.

But the itch of open source (specifically freedom to work on what I wanted while helping others) had never gone away. The SQLite devtools project was eternally in my mind as “something I’d like to work on”. But there was another reason why I kept putting it off: it sits at the intersection of being both hard and tedious.

What makes it hard and tedious

If I was going to invest my personal time working on this project, I didn’t want to build something that only helped Perfetto: I wanted to make it work for any SQLite user out there8. And this means parsing SQL exactly like SQLite.

The heart of any language-oriented devtool is the parser. This is responsible for turning the source code into a “parse tree” which acts as the central data structure anything else is built on top of. If your parser isn’t accurate, then your formatters and linters will inevitably inherit those inaccuracies; many of the tools I found suffered from having parsers which approximated the SQLite language rather than representing it precisely.

Unfortunately, unlike many other languages, SQLite has no formal specification describing how it should be parsed. It doesn’t expose a stable API for its parser either. In fact, quite uniquely, in its implementation it doesn’t even build a parse tree at all9! The only reasonable approach left in my opinion is to carefully extract the relevant parts of SQLite’s source code and adapt it to build the parser I wanted10.

This means getting into the weeds of SQLite source code, a fiendishly difficult codebase to understand. The whole project is written in C in an incredibly dense style; I’ve spent days just understanding the virtual table API 11 and implementation. Trying to grasp the full parser stack was daunting.

There’s also the fact that there are >400 rules in SQLite which capture the full surface area of its language. I’d have to specify in each of these “grammar rules” how that part of the syntax maps to the matching node in the parse tree. It’s extremely repetitive work; each rule is similar to all the ones around it but also, by definition, different.

And it’s not just the rules but also coming up with and writing tests to make sure it’s correct, debugging if something is wrong, triaging and fixing the inevitable bugs people filed when I got something wrong…

For years, this was where the idea died. Too hard for a side project12, too tedious to sustain motivation, too risky to invest months into something that might not work.

How it happened

I’ve been using coding agents since early 2025 (Aider, Roo Code, then Claude Code since July) and they’d definitely been useful but never something I felt I could trust a serious project to. But towards the end of 2025, the models seemed to make a significant step forward in quality13. At the same time, I kept hitting problems in Perfetto which would have been trivially solved by having a reliable parser. Each workaround left the same thought in the back of my mind: maybe it’s finally time to build it for real.

I got some space to think and reflect over Christmas and decided to really stress test the most maximalist version of AI: could I vibe-code the whole thing using just Claude Code on the Max plan (£200/month)?

Through most of January, I iterated, acting as semi-technical manager and delegating almost all the design and all the implementation to Claude. Functionally, I ended up in a reasonable place: a parser in C extracted from SQLite sources using a bunch of Python scripts, a formatter built on top, support for both the SQLite language and the PerfettoSQL extensions, all exposed in a web playground.

But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti14. I didn’t understand large parts of the Python source extraction pipeline, functions were scattered in random files without a clear shape, and a few files had grown to several thousand lines. It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision, never mind integrating it into the Perfetto tools. The saving grace was that it had proved the approach was viable and generated more than 500 tests, many of which I felt I could reuse.

I decided to throw away everything and start from scratch while also switching most of the codebase to Rust15. I could see that C was going to make it difficult to build the higher level components like the validator and the language server implementation. And as a bonus, it would also let me use the same language for both the extraction and runtime instead of splitting it across C and Python.

More importantly, I completely changed my role in the project. I took ownership of all decisions16 and used it more as “autocomplete on steroids” inside a much tighter process: opinionated design upfront, reviewing every change thoroughly, fixing problems eagerly as I spotted them, and investing in scaffolding (like linting, validation, and non-trivial testing17) to check AI output automatically.

The core features came together through February and the final stretch (upstream test validation, editor extensions, packaging, docs) led to a 0.1 launch in mid-March.

But in my opinion, this timeline is the least interesting part of this story. What I really want to talk about is what wouldn’t have happened without AI and also the toll it took on me as I used it.

AI is why this project exists, and why it’s as complete as it is

Overcoming inertia

I’ve written in the past about how one of my biggest weaknesses as a software engineer is my tendency to procrastinate when facing a big new project. Though I didn’t realize it at the time, it could not have applied more perfectly to building syntaqlite.

AI basically let me put aside all my doubts on technical calls, my uncertainty of building the right thing and my reluctance to get started by giving me very concrete problems to work on. Instead of “I need to understand how SQLite’s parsing works”, it was “I need to get AI to suggest an approach for me so I can tear it up and build something better"18. I work so much better with concrete prototypes to play with and code to look at than endlessly thinking about designs in my head, and AI lets me get to that point at a pace I could not have dreamed about before. Once I took the first step, every step after that was so much easier.

Faster at churning code

AI turned out to be better than me at the act of writing code itself, assuming that code is obvious. If I can break a problem down to “write a function with this behaviour and parameters” or “write a class matching this interface,” AI will build it faster than I would and, crucially, in a style that might well be more intuitive to a future reader. It documents things I’d skip, lays out code consistently with the rest of the project, and sticks to what you might call the “standard dialect” of whatever language you’re working in19.

That standardness is a double-edged sword. For the vast majority of code in any project, standard is exactly what you want: predictable, readable, unsurprising. But every project has pieces that are its edge, the parts where the value comes from doing something non-obvious. For syntaqlite, that was the extraction pipeline and the parser architecture. AI’s instinct to normalize was actively harmful there, and those were the parts I had to design in depth and often resorted to just writing myself.

But here’s the flip side: the same speed that makes AI great at obvious code also makes it great at refactoring. If you’re using AI to generate code at industrial scale, you have to refactor constantly and continuously20. If you don’t, things immediately get out of hand. This was the central lesson of the vibe-coding month: I didn’t refactor enough, the codebase became something I couldn’t reason about, and I had to throw it all away. In the rewrite, refactoring became the core of my workflow. After every large batch of generated code, I’d step back and ask “is this ugly?” Sometimes AI could clean it up. Other times there was a large-scale abstraction that AI couldn’t see but I could; I’d give it the direction and let it execute21. If you have taste, the cost of a wrong approach drops dramatically because you can restructure quickly22.

Teaching assistant

Of all the ways I used AI, research had by far the highest ratio of value delivered to time spent.

I’ve worked with interpreters and parsers before but I had never heard of Wadler-Lindig pretty printing23. When I needed to build the formatter, AI gave me a concrete and actionable lesson from a point of view I could understand and pointed me to the papers to learn more. I could have found this myself eventually, but AI compressed what might have been a day or two of reading into a focused conversation where I could ask “but why does this work?” until I actually got it.

This extended to entire domains I’d never worked in. I have deep C++ and Android performance expertise but had barely touched Rust tooling or editor extension APIs. With AI, it wasn’t a problem: the fundamentals are the same, the terminology is similar, and AI bridges the gap24. The VS Code extension would have taken me a day or two of learning the API before I could even start. With AI, I had a working extension within an hour.

It was also invaluable for reacquainting myself with parts of the project I hadn’t looked at for a few days25. I could control how deep to go: “tell me about this component” for a surface-level refresher, “give me a detailed linear walkthrough” for a deeper dive, “audit unsafe usages in this repo” to go hunting for problems. When you’re context switching a lot, you lose context fast. AI let me reacquire it on demand.

More than I’d have built alone

Beyond making the project exist at all, AI is also the reason it shipped as complete as it did. Every open source project has a long tail of features that are important but not critical: the things you know theoretically how to do but keep deprioritizing because the core work is more pressing. For syntaqlite, that list was long: editor extensions, Python bindings, a WASM playground, a docs site, packaging for multiple ecosystems26. AI made these cheap enough that skipping them felt like the wrong trade-off.

It also freed up mental energy for UX27. Instead of spending all my time on implementation, I could think about what a user’s first experience should feel like: what error messages would actually help them fix their SQL, how the formatter output should look by default, whether the CLI flags were intuitive. These are the things that separate a tool people try once from one they keep using, and AI gave me the headroom to care about them. Without AI, I would have built something much smaller, probably no editor extensions or docs site. AI didn’t just make the same project faster. It changed what the project was.

Where AI had its costs

The addiction

There’s an uncomfortable parallel between using AI coding tools and playing slot machines28. You send a prompt, wait, and either get something great or something useless. I found myself up late at night wanting to do “just one more prompt,” constantly trying AI just to see what would happen even when I knew it probably wouldn’t work. The sunk cost fallacy kicked in too: I’d keep at it even in tasks it was clearly ill-suited for, telling myself “maybe if I phrase it differently this time.”

The tiredness feedback loop made it worse29. When I had energy, I could write precise, well-scoped prompts and be genuinely productive. But when I was tired, my prompts became vague, the output got worse, and I’d try again, getting more tired in the process. In these cases, AI was probably slower than just implementing something myself, but it was too hard to break out of the loop30.

Losing touch

Several times during the project, I lost my mental model of the codebase31. Not the overall architecture or how things fitted together. But the day-to-day details of what lived where, which functions called which, the small decisions that accumulate into a working system. When that happened, surprising issues would appear and I’d find myself at a total loss to understand what was going wrong. I hated that feeling.

The deeper problem was that losing touch created a communication breakdown32. When you don’t have the mental thread of what’s going on, it becomes impossible to communicate meaningfully with the agent. Every exchange gets longer and more verbose. Instead of “change FooClass to do X,” you end up saying “change the thing which does Bar to do X”. Then the agent has to figure out what Bar is, how that maps to FooClass, and sometimes it gets it wrong33. It’s exactly the same complaint engineers have always had about managers who don’t understand the code asking for fanciful or impossible things. Except now you’ve become that manager.

The fix was deliberate: I made it a habit to read through the code immediately after it was implemented and actively engage to see “how would I have done this differently?”.

Of course, in some sense all of the above is also true of code I wrote a few months ago (hence the sentiment that AI code is legacy code), but AI makes the drift happen faster because you’re not building the same muscle memory that comes from originally typing it out.

The slow corrosion

There were some other problems I only discovered incrementally over the three months.

I found that AI made me procrastinate on key design decisions34. Because refactoring was cheap, I could always say “I’ll deal with this later.” And because AI could refactor at the same industrial scale it generated code, the cost of deferring felt low. But it wasn’t: deferring decisions corroded my ability to think clearly because the codebase stayed confusing in the meantime. The vibe-coding month was the most extreme version of this. Yes, I understood the problem, but if I had been more disciplined about making hard design calls earlier, I could have converged on the right architecture much faster.

Tests created a similar false comfort35. Having 500+ tests felt reassuring, and AI made it easy to generate more. But neither humans nor AI are creative enough to foresee every edge case you’ll hit in the future; there are several times in the vibe-coding phase where I’d come up with a test case and realise the design of some component was completely wrong and needed to be totally reworked. This was a significant contributor to my lack of trust and the decision to scrap everything and start from scratch.

Basically, I learned that the “normal rules” of software still apply in the AI age: if you don’t have a fundamental foundation (clear architecture, well-defined boundaries) you’ll be left eternally chasing bugs as they appear.

No sense of time

Something I kept coming back to was how little AI understood about the passage of time36. It sees a codebase in a certain state but doesn’t feel time the way humans do. I can tell you what it feels like to use an API, how it evolved over months or years, why certain decisions were made and later reversed.

The natural problem from this lack of understanding is that you either make the same mistakes you made in the past and have to relearn the lessons or you fall into new traps which were successfully avoided the first time, slowing you down in the long run. In my opinion, this is a similar problem to why losing a high-quality senior engineer hurts a team so much: they carry history and context that doesn’t exist anywhere else and act as a guide for others around them.

In theory, you can try to preserve this context by keeping specs and docs up to date. But there’s a reason we didn’t do this before AI: capturing implicit design decisions exhaustively is incredibly expensive and time-consuming to write down. AI can help draft these docs, but because there’s no way to automatically verify that it accurately captured what matters, a human still has to manually audit the result. And that’s still time-consuming.

There’s also the context pollution problem. You never know when a design note about API A will echo in API B. Consistency is a huge part of what makes codebases work, and for that you don’t just need context about what you’re working on right now but also about other things which were designed in a similar way. Deciding what’s relevant requires exactly the kind of judgement that institutional knowledge provides in the first place.

Relativity

Reflecting on the above, the pattern of when AI helped and when it hurt was fairly consistent.

When I was working on something I already understood deeply, AI was excellent. I could review its output instantly, catch mistakes before they landed and move at a pace I’d never have managed alone. The parser rule generation is the clearest example37: I knew exactly what each rule should produce, so I could review AI’s output within a minute or two and iterate fast.

When I was working on something I could describe but didn’t yet know, AI was good but required more care. Learning Wadler-Lindig for the formatter was like this: I could articulate what I wanted, evaluate whether the output was heading in the right direction, and learn from what AI explained. But I had to stay engaged and couldn’t just accept what it gave me.

When I was working on something where I didn’t even know what I wanted, AI was somewhere between unhelpful and harmful. The architecture of the project was the clearest case: I spent weeks in the early days following AI down dead ends, exploring designs that felt productive in the moment but collapsed under scrutiny. In hindsight, I have to wonder if it would have been faster just thinking it through without AI in the loop at all.

But expertise alone isn’t enough. Even when I understood a problem deeply, AI still struggled if the task had no objectively checkable answer38. Implementation has a right answer, at least at a local level: the code compiles, the tests pass, the output matches what you asked for. Design doesn’t. We’re still arguing about OOP decades after it first took off.

Concretely, I found that designing the public API of syntaqlite was where this hit home the hardest. I spent several days in early March doing nothing but API refactoring, manually fixing things any experienced engineer would have instinctively avoided but AI made a total mess of. There’s no test or objective metric for “is this API pleasant to use” and “will this API help users solve the problems they have” and that’s exactly why the coding agents did so badly at it.

This takes me back to the days I was obsessed with physics and, specifically, relativity. The laws of physics look simple and Newtonian in any small local area, but zoom out and spacetime curves in ways you can’t predict from the local picture alone. Code is the same: at the level of a function or a class, there’s usually a clear right answer, and AI is excellent there. But architecture is what happens when all those local pieces interact, and you can’t get good global behaviour by stitching together locally correct components.

Knowing where you are on these axes at any given moment is, I think, the core skill of working with AI effectively.

Wrap-up

Eight years is a long time to carry a project in your head. Seeing these SQLite tools actually exist and function after only three months of work is a massive win, and I’m fully aware they wouldn’t be here without AI.

But the process wasn’t the clean, linear success story people usually post. I lost an entire month to vibe-coding. I fell into the trap of managing a codebase I didn’t actually understand, and I paid for that with a total rewrite.

The takeaway for me is simple: AI is an incredible force multiplier for implementation, but it’s a dangerous substitute for design. It’s brilliant at giving you the right answer to a specific technical question, but it has no sense of history, taste, or how a human will actually feel using your API. If you rely on it for the “soul” of your software, you’ll just end up hitting a wall faster than you ever have before.

What I’d like to see more of from others is exactly what I’ve tried to do here: honest, detailed accounts of building real software with these tools; not weekend toys or one-off scripts but the kind of software that has to survive contact with users, bug reports, and your own changing mind.

1. Note SQLite ships in every smartphone, every major browser and countless embedded systems. See Most Widely Deployed.

2. Explanation Devtools: a formatter, linter and language server (LSP). High-quality: I can trust to work all SQLite SQL e.g. a formatter which doesn’t “eat” comments, a linter supporting SQLite-specific features, a language server featureful enough to give a similar experience to Typescript in VSCode

3. Note 36 days with commits between Jan 14 and Mar 18, plus the very early period (Dec 29-Jan 12) where I didn’t even bother committing code.

4. Qualification Justified in “AI is why this project exists, and why it’s as complete as it is”.

5. Note The journal runs to ~4,000 words with dated entries throughout the project.

6. Note See the comparison page for benchmarks against other SQLite tooling.

7. Note E.g. a video converter wrapping ffmpeg, and a semi-successful IRC client for Android.

8. Note A recurring tendency of mine: I’m rarely satisfied with the immediate problem I’m facing and can’t help but think “but what if I did something even more ambitious”.

9. Explanation SQLite goes straight from SQL text to bytecode without building an intermediate tree.

10. Note

In more detail:

Extract the tokenizer directly from SQLite’s sources
Extract the grammar rules and use SQLite’s Lemon parser generator to create our own parser.
Go through each of the ~400 extracted grammar rules and decide how they should be represented in a parse tree.

11. Note Even after using the virtual table API for 8 years in Perfetto, I still don’t feel I have a perfect handle on all its nuances.

12. Note Another big challenge is supporting dialects like PerfettoSQL without needing to fork. I didn’t want to get into it because there’s a whole heap of complexity to explain around how to design the parser to be extensible but also fast.

13. Note Andrej Karpathy put it well: “coding agents basically didn’t work before December […] models have significantly higher quality, long-term coherence and tenacity that can power through large and long tasks, making them extremely disruptive to the default programming workflow.”

14. Journal From the journal, reflecting on the vibe-coding prototypes: “I quickly got exhausted and started accepting too much random code and once you mess up, it becomes very hard to get it back without throwing it away.”

15. Note The tokenizer and parser remained in C, extracted from SQLite’s sources. Everything above that (formatter, linter, validator, language server) is Rust.

16. Transcript Quoting from transcript at the time, the opening prompt was: “How I want to do this is that I want to be incharge of all decisions and direction and I want to tell you what to do. I don’t want you to plan, I don’t want you to be independent. Is that clear?”

17. Note One example: I wrote a TCL driver that hooks into SQLite’s own ~1,390 upstream test files. Every SQL statement is run through both real SQLite (sqlite3_prepare_v2) and syntaqlite’s parser side-by-side: if SQLite accepts a statement, we must accept it too; if SQLite rejects it, we must reject it too. This catches classes of bugs that hand-written tests never would.

18. Journal From the journal: “Use it to prototype changes really fast and then come back and delete all the code and then rebuild everything again in a more structured way properly.” AI turned abstract uncertainty into concrete artefacts to react to: the prototypes from vibe-coding proved the approach was viable even though the code itself was thrown away.

19. Journal From the journal: “I’ve both largely let AI code whole modules while also being picky and hand writing SIMD and looking at compiler explorer asm/machine code depending on the problem I’m solving.” The standard code is the vast majority; the hand-written pieces are the exceptions.

20. Journal From the journal: “If you’re using AI to generate code at industrial scale you have to refactor constantly and continuously. If you don’t, you immediately get out of hand.” Also: “After every large amount of code, it’s worth taking the time as a human to ask ‘is this ugly’ and if so, do something about it.”

21. Journal Verbatim from the project journal. The journal adds: “audits + refactors work amazingly well together.”

22. Journal From the journal: “Thankfully the cost of mistakes also goes down a lot if you have taste as AI can refactor also at industrial scale.”

23. Transcript Wadler-Lindig is an algorithm for pretty-printing that lets you describe document layout declaratively and have the printer decide where to break lines. In the formatter design transcript, Claude proposed “width-aware formatting from the start (Wadler-Lindig document model)” and I could immediately engage with the trade-offs rather than spending days discovering the approach.

24. Journal From the journal, on switching between domains: “I know performance on C++/Android really well. But I don’t know Python/Go/Rust tools at all. With AI, not a problem: the fundamentals are the same, the terminology is similar. I can now switch between domains so fast.”

25. Journal From the journal, on codebase audits: “You can control how in-depth you want to go: surface level overview for a quick refresher, detailed walkthrough for a deeper dive, targeted audit to hunt for problems.”

26. Note The launch prep phase covered VS Code extension, docs site, crates.io, PyPI, npm, Homebrew publishing, and a Zed extension. Each of these is a “weekend project” on its own.

27. Journal “Not just ‘what problem should I solve’ but ‘what error messages are most useful’ and ‘how do I make this really simple to use’.” Examples: rustc-style multi-error diagnostics with “did you mean” suggestions, quick-fix code actions, and a syntaqlite.toml config file so users don’t need CLI flags.

28. Note “I felt addicted at points to the ‘slots’ nature of it, draining time and (over the long term) health.”

29. Journal From the journal: “My specificity with AI is directly proportional to how tired I am. When I have lots of energy, I can be really precise and productive. But when I’m tired, I start saying ‘do X thing’ without much detail and the AI output gets much much much worse.”

30. Note “AI often was slower than if I had implemented something myself but it was too hard to break out of the ‘AI loop’”

31. Note “Several times I ’lost touch’ with the codebase and there were surprising issues where I would just have to say ‘AI, please debug’, and I hated that feeling.” The fix: “I made it a habit to read code myself regularly to stay in ’touch’ with the system.”

32. Journal From the journal: “It becomes very difficult to communicate your intent of a change with the agent. Because you lose the mental thread of ‘what is going on’, it becomes impossible to communicate meaningfully and clearly and every exchange becomes longer and more verbose requiring the agent to do more work.”

33. Journal From the journal: “Instead of ‘change FooClass to do X’, you have to be like ‘change the thing which does Bar to do X’. And then the agent has to figure out Bar, how that maps to FooClass, sometimes it will get it wrong. Exactly the same complaint we’ve had forever with software engineering managers who don’t understand the code asking for fanciful things.”

34. Note “AI made me procrastinate about actually making key design decisions — because it was easy to refactor, several times I was able to just say ‘I can deal with this later’. But it corroded my ability to think clearly in the meantime because the codebase was confusing.”

35. Journal “Neither humans nor AI are creative enough to foresee the sort of crazy things you might hit in the future. If you don’t have some fundamental foundation, you will be left eternally chasing bugs as they happen.” The vibe-coded prototype had 500+ tests and still fell apart. The rewrite invested in idempotency tests and upstream SQLite validation instead of just adding more unit tests.

36. Journal From the journal: “Models don’t have a sense of time. They see the codebase in a certain state and yes they can with context/docs/memory get a sense of it. But they don’t feel time in the same way as humans do. For example, I can tell you what it feels like to use an API and the progression over time and why things are the way they are. Models can only get this with explicit capture and that’s very expensive to do constantly.”

37. Journal From the journal: “Agent team was successful beyond my wildest dreams. Was able to build everything in one evening.” But I had to engineer the scaffolding first: restructuring the project so agents could work on different files, and building a diffing script that grouped errors into actionable feedback. Then the unavoidable manual pass: “need to go through every single one of the tests. Found a bunch of problems (flags not being correctly formatted, missing field names etc).”

38. Journal From the journal: “The more trivial a property it is for a human or AI to verify correctness, the better AI is at dealing with those tasks.” Also: “concrete codegen fast, abstract codegen really slow and inconsistent.”

If you enjoyed this post, consider subscribing to my newsletter or following via RSS. You can also share it on Hacker News or Lobsters.

AI 翻译

八年期盼，三月AI共筑
2026年4月5日
·
文章
八年来，我一直渴望拥有一套高质量的SQLite开发工具。鉴于SQLite在业界的重要地位1，我始终困惑为何无人投资打造一套出色的SQLite开发体验工具2。

几周前，经过三个月的业余时间（包括晚上、周末和假期）约250小时的努力3，我终于发布了syntaqlite（GitHub项目），实现了这一长久以来的愿望。我认为，这一成果主要得益于AI编码助手4。

当然，网上不乏声称AI助力项目一蹴而就或驳斥AI一无是处的言论。我将采取截然不同的方式，系统剖析我在利用AI构建syntaqlite过程中的体验，包括AI的助力与局限。

在介绍项目背景和我个人经历的同时，我将提供项目日志、编码记录或提交历史5等证据来支持我的观点。

为何我渴望拥有它
在Perfetto项目中，我维护了一种基于SQLite的查询性能跟踪语言——PerfettoSQL。它本质上与SQLite相同，但增加了一些扩展以改善跟踪查询体验。Google内部约有10万行PerfettoSQL代码，被多个团队广泛使用。

拥有一种广受欢迎的语言意味着用户开始期待格式化工具、代码检查工具和编辑器扩展等。我曾希望从开源项目中借鉴一些SQLite工具，但深入研究后，我深感失望。我发现的工具要么不够可靠，要么速度不够快6，要么灵活性不足，难以适应PerfettoSQL。显然，有机会从头开始构建一套工具，但这并非“我们当前最重要的工作”。我们一直勉强使用现有工具，但始终渴望更好的解决方案。

另一方面，我也可以选择在业余时间进行开发。我在青少年时期构建了许多开源项目7，但进入大学后，由于缺乏动力，这一热情逐渐消退。作为维护者，不仅仅是“抛出代码”并观察结果。还需要处理错误、调查崩溃、编写文档、建立社区，最重要的是为项目指明方向。

然而，开源的吸引力（特别是帮助他人并自由选择工作内容的自由）始终未减。SQLite开发工具项目一直是我心中的“渴望之作”。但我一直推迟这一项目的原因在于：它既困难又繁琐。

为何它既困难又繁琐
如果我要在这个项目上投入个人时间，我不希望它仅对Perfetto有帮助：我希望它适用于任何SQLite用户8。这意味着需要像SQLite一样精确解析SQL。

任何面向语言的开发工具的核心都是解析器。它负责将源代码转换为“解析树”，作为其他所有内容构建的基础数据结构。如果解析器不准确，那么格式化工具和代码检查工具将不可避免地继承这些不准确性；我发现的许多工具都存在解析器近似SQLite语言而非精确表示的问题。

不幸的是，与其他许多语言不同，SQLite没有正式规范描述其应如何解析。它也没有为解析器提供稳定的API。实际上，在其实现中，它甚至根本不构建解析树9！在我看来，唯一合理的做法是仔细提取SQLite源代码的相关部分，并改编以构建我所需的解析器10。

这意味着需要深入研究SQLite源代码，这是一个极其难以理解的代码库。整个项目用C语言编写，风格极为紧凑；我曾花费数天时间理解虚拟表API11及其实现。试图掌握整个解析器堆栈令人望而生畏。

SQLite中有超过400条规则，涵盖了其语言的全部表面。我需要在每条“语法规则”中指定该语法部分如何映射到解析树中的匹配节点。这是一项极其重复的工作；每条规则都与周围的规则相似，但根据定义又有所不同。

不仅如此，还需要编写测试以确保其正确性，调试错误，处理并修复用户在使用过程中不可避免地报告的问题……

多年来，这一想法一直未能实现。对于业余项目来说太难了12，维持动力过于繁琐，投入数月时间在可能无法成功的项目上风险太大。

它是如何实现的
自2025年初以来，我一直在使用编码助手（先是Aider、Roo Code，然后从7月起使用Claude Code），它们确实很有用，但我从未觉得可以将它们用于严肃的项目。但到了2025年底，模型质量似乎有了显著提升13。同时，我在Perfetto中不断遇到问题，这些问题如果有一个可靠的解析器就能轻松解决。每次变通方案都让我心中暗想：也许是时候真正构建它了。

圣诞节期间，我有了一些思考和反思的时间，决定真正测试一下AI的最大潜力：我能否仅使用Max套餐（每月200英镑）的Claude Code通过“氛围编码”完成整个项目？

在一月的大部分时间里，我不断迭代，担任半技术经理的角色，将几乎所有的设计和实现工作都委托给Claude。最终，我得到了一个功能合理的成果：使用一系列Python脚本从SQLite源代码中提取的C语言解析器、基于该解析器的格式化工具、支持SQLite语言和PerfettoSQL扩展的功能，所有这些都集成在一个网络游乐场中。

但当我在一月底详细审查代码库时，缺点显而易见：代码库一团糟14。我不理解Python源代码提取管道的大部分内容，函数散落在随机文件中，没有清晰的逻辑结构，一些文件甚至增长到数千行。它极其脆弱；虽然解决了眼前的问题，但永远无法满足我的更大愿景，更不用说集成到Perfetto工具中了。值得庆幸的是，它证明了这一方法的可行性，并生成了500多个测试，其中许多我认为可以重用。

我决定抛弃一切，从头开始，同时将大部分代码库切换到Rust15。我意识到C语言将使构建验证器和语言服务器实现等高级组件变得困难。此外，这样做还可以让我在提取和运行时使用同一种语言，而不是分散在C和Python之间。

更重要的是，我完全改变了在项目中的角色。我掌控了所有决策16，并在一个更加紧凑的流程中将其作为“强大的自动补全工具”使用：预先进行有主见的设计，彻底审查每一次更改，发现问题时立即解决，并投资于脚手架（如代码检查、验证和非平凡测试17）以自动检查AI输出。

核心功能在二月逐渐成型，最后的冲刺阶段（上游测试验证、编辑器扩展、打包、文档）导致3月中旬发布了0.1版本。

但在我看来，这一时间线是这个故事中最不有趣的部分。我真正想讨论的是没有AI就无法实现的内容，以及我在使用AI过程中所付出的代价。

AI是这个项目存在的原因，也是它如此完整的原因
克服惰性
我曾在过去写过，作为软件工程师，我最大的弱点之一是在面对大型新项目时容易拖延。虽然当时我并未意识到，但这一点在构建syntaqlite上体现得淋漓尽致。

AI让我能够抛开所有对技术决策的疑虑、对构建正确事物的不确定性以及对开始的犹豫，给我提供了非常具体的问题来处理。与“我需要理解SQLite的解析方式”不同，现在是“我需要让AI为我建议一种方法，以便我可以拆解并构建更好的东西”18。与在脑海中无休止地思考设计相比，我更擅长处理具体的原型来把玩和查看代码，而AI让我能够以以前无法想象的速度达到这一点。一旦迈出第一步，之后的每一步都变得容易得多。

编写代码速度更快
事实证明，在编写代码方面，AI比我更出色，前提是代码显而易见。如果我能将一个问题分解为“编写一个具有此行为和参数的函数”或“编写一个符合此接口的类”，AI将比我更快地构建它，并且更重要的是，其风格可能对未来的读者更加直观。它会记录我可能会跳过的内容，以与项目其余部分一致的方式布局代码，并坚持使用你所称的“标准方言”19，无论你使用的是哪种语言。

这种标准性是一把双刃剑。对于任何项目中的绝大多数代码来说，标准性正是你想要的：可预测、可读、不出所料。但每个项目都有其独特之处，即其价值来自于做一些非显而易见的事情。对于syntaqlite来说，这就是提取管道和解析器架构。AI的标准化本能在这里是有害的，这些是我必须深入设计并经常自己编写代码的部分。

但另一方面：AI在重构方面的速度同样出色，这使其在显而易见的代码方面表现出色。如果你使用AI大规模生成代码，你必须不断且持续地进行重构20。否则，事情将立即失控。这就是氛围编码月的核心教训：我没有进行足够的重构，代码库变得我无法理解，我不得不抛弃一切。在重写过程中，重构成为了我工作流程的核心。在生成大量代码后，我会退后一步问“这丑吗？”有时AI可以清理它。其他时候，存在一个AI看不到但我能看到的大规模抽象；我会给出方向并让它执行21。如果你有品味，错误方法的成本会大幅下降，因为你可以快速重构22。

助教
在我使用AI的所有方式中，研究在投入时间与获得价值的比例方面无疑是最高的。

我以前曾使用过解释器和解析器，但从未听说过Wadler-Lindig漂亮打印算法23。当我需要构建格式化工具时，AI从我可以理解的角度给了我一个具体且可行的教训，并指引我阅读相关论文以了解更多信息。我最终可能会自己找到这一点，但AI将可能花费一两天时间的阅读压缩成了一次集中的对话，我可以不断问“但为什么这样有效？”直到我真正理解。

这扩展到了我从未涉足的整个领域。我在C++和Android性能方面拥有深厚的专业知识，但几乎未接触过Rust工具链或编辑器扩展API。有了AI，这不成问题：基本原理是相同的，术语相似，AI填补了空白24。如果没有AI，VS Code扩展可能需要我一两天时间学习API才能开始。有了AI，我在一小时内就得到了一个可用的扩展。

对于重新熟悉项目几天未看的部分内容来说，AI也是无价的25。我可以控制深入程度：“告诉我关于这个组件的信息”以进行表面层次的回顾，“给我一个详细的线性遍历”以进行深入探索，“审核仓库中的不安全用法”以查找问题。当你经常进行上下文切换时，你会很快失去上下文。AI让我可以根据需要重新获取它。

构建的内容比单独完成时更多
除了使项目得以存在之外，AI还是它如此完整地发布的原因。每个开源项目都有一系列重要但不关键的功能：那些你理论上知道如何做但一直推迟的功能，因为核心工作更加紧迫。对于syntaqlite来说，这个列表很长：编辑器扩展、Python绑定、WASM游乐场、文档网站、为多个生态系统打包26。AI使这些功能的实现成本足够低，以至于跳过它们感觉像是错误的权衡。

它还释放了用于用户体验设计的精力27。我没有将所有时间都花在实现上，而是可以思考用户首次体验应该是什么感觉：什么样的错误信息才能真正帮助他们修复SQL，格式化工具的默认输出应该是什么样子，CLI标志是否直观。这些是区分用户只尝试一次的工具和用户持续使用的工具的关键因素，而AI让我有精力去关注它们。没有AI，我会构建一个更小的项目，可能没有编辑器扩展或文档网站。AI不仅加快了同一项目的速度。它改变了项目的本质。

AI的代价
上瘾
使用AI编码工具与玩老虎机有着令人不安的相似之处28。你发送一个提示，等待，然后要么得到很棒的东西，要么得到无用的东西。我发现自己深夜还想“再做一个提示”，不断尝试AI只是为了看看会发生什么，即使我知道它可能不起作用。沉没成本谬误也起作用了：即使对于它明显不适合的任务，我也会继续坚持，告诉自己“也许这次我换个说法就行了”。

疲劳反馈循环使其更糟29。当我有精力时，我可以编写精确、范围明确的提示，并真正提高生产力。但当我疲劳时，我的提示变得模糊，输出变差，我会再试一次，过程中变得更加疲劳。在这些情况下，AI可能比我自己实现某些东西还要慢，但很难打破这个循环30。

失去联系
在项目过程中，我多次失去了对代码库的心理模型31。不是整体架构或组件如何组合在一起。而是日常细节：什么在哪里，哪些函数调用哪些函数，积累成工作系统的小决策。当这种情况发生时，会出现令人惊讶的问题，我发现自己完全无法理解出了什么问题。我讨厌那种感觉。

更深层次的问题是失去联系导致了沟通障碍32。当你对正在发生的事情没有心理线索时，就无法与代理进行有效沟通。每次交流都变得更长、更冗长。与其说“更改FooClass以执行X”，你最终会说“更改执行Bar的东西以执行X”。然后代理必须弄清楚Bar是什么，它如何映射到FooClass，有时它会弄错33。这正与工程师们一直对不了解代码却提出不切实际或不可能要求的管理者的抱怨如出一辙。只不过现在你成了那个管理者。

解决方法是故意的：我养成了在代码实现后立即阅读的习惯，并积极参与思考“如果我自己来做，会有什么不同？”。

当然，从某种意义上说，以上所有内容对于我几个月前编写的代码也是如此（因此有观点认为AI代码是遗留代码），但AI使这种漂移发生得更快，因为你没有建立最初输入代码时的肌肉记忆。

缓慢侵蚀
在三个月的时间里，我还逐渐发现了一些其他问题。

我发现AI让我在关键设计决策上拖延34。因为重构成本低，我总可以说“我以后再处理这个问题”。而且因为AI可以以与生成代码相同的工业规模进行重构，推迟决策的成本感觉很低。但事实并非如此：推迟决策侵蚀了我清晰思考的能力，因为代码库在此期间仍然令人困惑。氛围编码月是最极端的例子。是的，我理解了问题，但如果我更有纪律性地在早期做出艰难的设计决策，我本可以更快地收敛到正确的架构。

测试创造了类似的虚假安慰35。拥有500多个测试让人感到安心，而且AI使生成更多测试变得容易。但无论是人类还是AI，都不足以预见你未来会遇到的所有边缘情况；在氛围编码阶段，有几次我想出了一个测试用例，并意识到某个组件的设计完全错误，需要完全重做。这是我缺乏信任并决定抛弃一切、从头开始的重要原因。

基本上，我了解到在AI时代，“软件常规规则”仍然适用：如果你没有坚实的基础（清晰的架构、明确定义的边界），你将永远在追赶出现的错误。

没有时间观念
我一直反复思考的是AI对时间流逝的理解有多么有限36。它以某种状态查看代码库，但无法像人类一样感受时间。我可以告诉你使用API的感觉，它在数月或数年间的演变，为什么做出某些决策以及后来为何撤销。

这种缺乏理解导致的自然问题是，你要么重复过去的错误并不得不重新学习教训，要么陷入第一次成功避免的新陷阱，从长远来看会减慢你的速度。在我看来，这与为什么失去一位高质量的高级工程师对团队伤害如此之大的原因类似：他们承载着其他地方不存在的历史和上下文，并作为周围人的指南。

理论上，你可以尝试通过保持规范和文档的更新来保留这种上下文。但我们之前没有这样做的原因是有原因的：全面捕捉隐式设计决策极其昂贵且耗时。AI可以帮助起草这些文档，但由于无法自动验证它是否准确捕捉了重要内容，人类仍然必须手动审核结果。这仍然很耗时。

还有上下文污染问题。你永远不知道关于API A的设计说明何时会在API B中产生回响。一致性是代码库工作的巨大部分，为此你不仅需要了解当前正在处理的内容，还需要了解以类似方式设计的其他内容。决定什么相关需要正是机构知识首先提供的那种判断力。

相对性
回顾以上内容，AI在何时有帮助、何时有害的模式相当一致。

当我处理已经深刻理解的内容时，AI非常出色。我可以立即审查其输出，在错误落地前捕捉它们，并以我独自无法达到的速度前进。解析器规则生成是最明显的例子37：我确切知道每条规则应该产生什么，所以我可以在一两分钟内审查AI的输出并快速迭代。

当我处理可以描述但尚不了解的内容时，AI表现良好但需要更多关注。为格式化工具学习Wadler-Lindig算法就是这样：我可以阐述我想要什么，评估输出是否朝着正确的方向前进，并从AI的解释中学习。但我必须保持参与，不能只是接受它给我的东西。

当我处理甚至不知道自己想要什么的内容时，AI要么无帮助，要么有害。项目架构是最明显的例子：在早期，我花了数周时间跟随AI走入死胡同，探索当时感觉有成效但经不起推敲的设计。回想起来，我不禁想，如果完全不依赖AI，只是自己思考，是否会更快。

但仅有专业知识是不够的。即使我深刻理解一个问题，如果任务没有客观可检查的答案，AI仍然会挣扎38。实现有正确答案，至少在局部层面：代码编译，测试通过，输出与你的要求匹配。设计则没有。即使在面向对象编程首次兴起数十年后，我们仍在争论它。

具体来说，我发现设计syntaqlite的公共API是最具挑战性的部分。三月初，我花了几天时间专门进行API重构，手动修复任何有经验的工程师本能上会避免但AI搞得一团糟的事情。没有测试或客观指标可以衡量“这个API是否使用愉快”以及“这个API是否会帮助用户解决他们的问题”，这正是编码代理在这方面表现如此糟糕的原因。

这让我回到了曾经痴迷于物理学的日子，特别是相对论。在任何小的局部区域，物理定律看起来简单且牛顿式，但放大后，时空会以你无法仅从局部图片预测的方式弯曲。代码也是如此：在函数或类层面，通常有一个明确的正确答案，AI在这方面非常出色。但架构是所有这些局部组件相互作用的结果，你不能通过将局部正确的组件拼凑在一起来获得良好的全局行为。

我认为，知道自己在任何给定时刻处于这些轴上的哪个位置是与AI有效合作的核心技能。

总结
在脑海中孕育一个项目长达八年。看到这些SQLite工具在仅仅三个月的工作后真正存在并发挥作用，这是一个巨大的胜利，我充分意识到没有AI，它们就不会存在。

但这个过程并不是人们通常发布的那种干净、线性的成功故事。我因为氛围编码浪费了整整一个月。我陷入了管理一个我实际上并不理解的代码库的陷阱，并为此付出了重写的代价。

对我来说，教训很简单：AI是实现的惊人助力，但它是设计的危险替代品。它在给你特定技术问题的正确答案方面非常出色，但它没有历史感、品味或人类实际使用你的API的感觉。如果你依赖它来构建软件的“灵魂”，你只会比以往更快地撞墙。

我希望看到更多人像我一样尝试：诚实、详细地记录使用这些工具构建真实软件的经历；不是周末玩具或一次性脚本，而是那种必须经受住用户、错误报告和你自己不断变化的想法考验的软件。

注意：SQLite内置于每部智能手机、每个主要浏览器和无数嵌入式系统中。见最广泛部署。
解释：开发工具：格式化工具、代码检查工具和语言服务器（LSP）。高质量：我可以信任它处理所有SQLite SQL，例如不会“吃掉”注释的格式化工具、支持SQLite特定功能的代码检查工具、功能足够丰富的语言服务器以在VSCode中提供类似于Typescript的体验。
注意：1月14日至3月18日之间有36天提交记录，加上非常早期的阶段（12月29日至1月12日），当时我甚至没有费心提交代码。
限定：在“AI是这个项目存在的原因，也是它如此完整的原因”部分有正当理由。
注意：日志全文约4000字，项目过程中有日期记录。
注意：见与其他SQLite工具的对比页面以获取基准测试。
注意：例如，一个封装ffmpeg的视频转换器，以及一个半成功的Android IRC客户端。
注意：我的一种反复出现的倾向：我很少对自己面临的问题感到满意，不禁会想“但如果我做一些更雄心勃勃的事情呢？”
解释：SQLite直接从SQL文本转换为字节码，而不构建中间树。
注意：更详细地说：
直接从SQLite源代码中提取词法分析器
提取语法规则并使用SQLite的Lemon解析器生成器创建我们自己的解析器。
遍历每条约400条提取的语法规则，并决定它们应在解析树中如何表示。
注意：即使在Perfetto中使用了8年的虚拟表API，我仍然感觉没有完全掌握其所有细微差别。
注意：另一个重大挑战是支持像PerfettoSQL这样的方言而无需分叉。我不想深入探讨，因为围绕如何设计可扩展且快速的解析器有一大堆复杂性需要解释。
注意：Andrej Karpathy说得很好：“在12月之前，编码代理基本上不起作用……模型的质量、长期连贯性和韧性有了显著提高，可以完成大型且长期的任务，这对默认的编程工作流程极具破坏性。”
日志：从日志中反思氛围编码原型：“我很快感到疲惫，开始接受太多随机代码，一旦搞砸，就很难在不抛弃一切的情况下恢复。”
注意：词法分析器和解析器仍保留在C语言中，从SQLite源代码中提取。其余部分（格式化工具、代码检查工具、验证器、语言服务器）均使用Rust。
记录：引用当时的记录，开场提示是：“我希望这样做：我希望掌控所有决策和方向，并告诉你该做什么。我不希望你做计划，我不希望你独立。清楚了吗？”
注意：一个例子：我编写了一个TCL驱动程序，它挂接到SQLite自己的约1390个上游测试文件中。每条SQL语句都同时通过真实SQLite（sqlite3_prepare_v2）和syntaqlite的解析器运行：如果SQLite接受一条语句，我们也必须接受；如果SQLite拒绝它，我们也必须拒绝。这捕捉到了手动测试永远不会发现的错误类别。
日志：从日志中：“使用它非常快速地原型化更改，然后回来删除所有代码，然后以更结构化的方式重新构建一切。”AI将抽象的不确定性转化为可以反应的具体产物：氛围编码的原型证明了这一方法的可行性，即使代码本身被抛弃了。
日志：从日志中：“我既在很大程度上让AI编写整个模块，又根据我解决的问题挑剔地手写SIMD代码并查看编译器资源管理器中的汇编/机器代码。”标准代码占绝大多数；手写部分是例外。
日志：从日志中：“如果你使用AI大规模生成代码，你必须不断且持续地进行重构。如果你不这样做，事情将立即失控。”还：“在生成大量代码后，值得花时间作为人类问‘这丑吗？’如果是，就做点什么。”
日志：直接引用项目日志。日志还补充说：“审核+重构效果惊人。”
日志：从日志中：“谢天谢地，如果你有品味，因为AI也可以以工业规模进行重构，所以错误的成本也大大降低。”
记录：Wadler-Lindig是一种漂亮打印算法，它允许你以声明方式描述文档布局，并让打印机决定在哪里换行。在格式化工具设计记录中，Claude提出了“从一开始就考虑宽度感知格式化（Wadler-Lindig文档模型）”，我立即可以参与权衡，而不是花费数天时间发现这种方法。
日志：从日志中，关于切换领域：“我非常了解C++/Android性能。但我根本不了解Python/Go/Rust工具。有了AI，这不是问题：基本原理是相同的，术语相似。我现在可以如此快速地切换领域。”
日志：从日志中，关于代码库审核：“你可以控制深入程度：表面层次概述以快速回顾，详细遍历以深入探索，针对性审核以查找问题。”
注意：发布准备阶段包括VS Code扩展、文档网站、crates.io、PyPI、npm、Homebrew发布和Zed扩展。每个都是一个“周末项目”。
日志：“不仅仅是‘我应该解决什么问题’，而是‘什么错误信息最有帮助’和‘如何使这个非常简单易用’。”例子：rustc风格的多错误诊断，带有“你的意思是”建议、快速修复代码操作和syntaqlite.toml配置文件，因此用户不需要CLI标志。
注意：“在某些时候，我感觉对它的‘老虎机’性质上瘾了，消耗了时间和（从长远来看）健康。”
日志：从日志中：“我与AI的精确度与我的疲劳程度直接成正比。当我有大量精力时，我可以非常精确且高效。但当我疲劳时，我开始说‘做X事情’而没有太多细节，AI的输出变得非常非常非常差。”
注意：“AI通常比我自己实现某些东西要慢，但很难打破‘AI循环’”
注意：“我几次与代码库‘失去联系’，出现了令人惊讶的问题，我只好说‘AI，请调试’，我讨厌那种感觉。”解决方法：“我养成了定期自己阅读代码的习惯，以保持与系统的‘联系’。”
日志：从日志中：“与代理沟通你的变更意图变得非常困难。因为你失去了‘正在发生什么’的心理线索，所以无法进行有意义且清晰的沟通，每次交流都变得更长、更冗长，要求代理做更多工作。”
日志：从日志中：“而不是说‘更改FooClass以执行X’，你不得不说‘更改执行Bar的东西以执行X’。然后代理必须弄清楚Bar是什么，它如何映射到FooClass，有时它会弄错。这与我们长期以来对不了解代码的软件工程经理提出不切实际的要求的抱怨完全相同。”
注意：“AI让我在做出关键设计决策上拖延——因为重构很容易，好几次我都能说‘我以后可以处理这个问题’。但这在此期间侵蚀了我清晰思考的能力，因为代码库令人困惑。”
日志：“无论是人类还是AI，都不足以预见你未来可能遇到的疯狂事情。如果你没有一些基本的基础，你将永远在追赶出现的错误。”氛围编码的原型有500多个测试，但仍然崩溃了。重写投资于幂等性测试和上游SQLite验证，而不仅仅是添加更多单元测试。
日志：从日志中：“模型没有时间观念。它们以某种状态查看代码库，是的，它们可以通过上下文/文档/记忆对其有所了解。但它们无法像人类一样感受时间。例如，我可以告诉你使用API的感觉以及随时间的变化以及为什么事情是这样的。模型只能通过显式捕捉来获得这一点，而这非常昂贵且需要不断进行。”
日志：从日志中：“代理团队的成功超出了我最疯狂的梦想。能够在一个晚上构建一切。”但我必须先设计脚手架：重构项目以便代理可以在不同文件上工作，并构建一个差异脚本，将错误分组为可操作的反馈。然后不可避免的手动检查：“需要遍历每一个测试。发现了很多问题（标志未正确格式化、缺少字段名等）。”
日志：从日志中：“对于人类或AI来说，验证正确性的属性越琐碎，AI处理这些任务就越好。”还：“具体代码生成快，抽象代码生成真的慢且不一致。”

如果你喜欢这篇文章，可以考虑订阅我的时事通讯或通过RSS关注。你也可以在Hacker News或Lobsters上分享它。

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

基于 RBF 神经网络与非奇异终端滑模的机械臂强化学习轨迹跟踪控制（Matlab代码实现）

针对双连杆机械臂在模型不确定性、外部扰动以及执行器饱和约束下的高精度轨迹跟踪控制难题，本文提出一种融合径向基函数神经网络、非奇异终端滑模控制与强化学习的自适应鲁棒控制方案。该方法以非奇异终端滑模为核心控制框架，保证系统状态在有限时间内收敛并避免传统滑模的奇异性问题；采用评价 - 动作双网络强化学习结构，利用径向基函数神经网络实现对系统未知动态与不确定项的在线逼近，同时完成控制策略的自适应优化；设计