On Coding with AI

Here are some of the things/practices/principles that I try to follow while writing code, and when with an AI assistant, even more.

So lately I've found myself using Claude Code (app) and Codex (within vs code). Mostly for a small refactors, performance optimizations and to understand large codebases, even services. To see how each component fits with the others on a high-level, architectural point of view.

First things first: the legwork

For example, if I have to add a new feature to an existing service, first I ask what I'm looking at: from the controller, to services and use cases, all the way to repositories, entities, models and value objects. The works.

Another questions that I ask myself (and the assistant):

what db tables are involved here?
which dependencies are injected?
are we touching multiple modules?
what is it affected by my potential modifications?
would i be breaking any existing tests?
should i add any tests? (to test new functionality)
would i be breaking some business rules? some invariants?
would i be following the current architectural and design conventions? (codebase-wise)
would i be introducing more complexity? sometimes code works but it's difficult to read and to reason about
can i explain what the new code does?

With those things in mind, I can begin to write some specs. Markdown format preferred for my assistant. The idea here is to provide a thorough context and good set of instructions.

Monitoring

Ok, with the specs ready and a good context, now I can ask my assistant to actually do something. So I ask, and it begins writing some code. Here I like to follow (read) every reasoning step:

did it grasp what i ask it to do? or did it go off rails?
what is it trying to modify?
is it going straight to the point or is it over-engineering?
is it overthinking? (yes, it can do that too)
is it trying to install some new dependencies?

I do this so I can catch it where it goes wrong and steer the convo to where I want. Also I can save some tokens, and time.

Reviewing

Now let's jump where it has finished implementing what I asked. Time for review:

does it work?
does it run?
does it build?
do tests, typechecks and linter pass?
can i understand what happened? i.e, what was modified? why?
can i understand (and explain) what the new logic does?
can i defend why i chose this approach and solution?
can i test the new functionality?
is it loosely-coupled? can i reuse this code?
would i do anything different? if yes, i go through the loop all over again till i have what i want

If I can answer confidently, and with success, all of these questions, then I go on and build a thorough PR. That's it.

Final Thoughts

Sometimes models can confidently fail and give you false answers, wrong solutions. Things can break:

test the solution of MODEL-1 with MODEL-2
ask if it would do anything different, to stress test it, to improve it, to semantically compress it
could we implement this pattern here?
could we move this repetitive functions into a /helpers folder?
how would this scale with X users?
could a new hire understand this without much context? at least the logic

Here is where I think the technical fundamentals matter a lot, even (more?) with AI. If you know these, you can catch a lot of bad practices and implement well-known working solutions. It can work, but it still can be unreadable. It can work, but still be very bad for performance. It can work, but still don't scale.

A codebase can be syntactically clean but hard to read also. The speed you move at is undeniable with AI, but it also comes with a maintainability cost that can catch you off guard. You could walk into a refactor thinking that may take you 3 or 4 days, and end up doing it in 2 weeks because it was all tighly-coupled, modules consuming directly the functions of other modules, helper logic repeated in multiple services, slightly different implementations of same functions scattered all over, etc, etc. In architectural terms this is called implicit coupling, and it's a consequence of solving local optimization problems without a global perspective of the system. This usually happens when you go "prompt by prompt" without doing any review or actual thinking: what we know as vibe coding.

Another thing that can wreck you it's delegating a huge task or feature that touches a lot of existing, working code. Even a greenfield development can set you up for failure, silently, and I personally think that you should be more careful with these ones since you are laying the bases of your codebase.

On the testing end the problem is that AI coding tools write tests that cover the code that exists rather than tests that cover the behavior that should exist, i.e business and domain rules. So you should be careful with test coverage theater when the only thing that is being tested is the code that the AI wrote before writing the tests!

And on the matter of error handling things can go south very easily. This is simply because AI can, and probably will, treat all the errors in the same way, like, an error is an error. Period. The thing is that errors should be taken into account with the context they are embedded into. They should provide a clean, concrete error with feedback for the end-user and a thorough trace to follow for the engineer seeing the logs for debugging properly.

Finally, if you are running out of tokens or just want to save some, have the stronger model to reason about the task and come up with the actual implementation plan. Then just ask it to provide a markdown file with it and hand it to your lower-end model. You can do this with Ollama locally running on your computer or you can use their cloud models (free ones like Gemma). Also you can have Hermes do the work, even add some dev skills at /.hermes/skills/. You can add these skills to a certain repository too in the /.agents/skills/ folder.

As for agent swarms, I haven't really implemented them yet, the whole orchestrator agent and sub agents thing. Nevertheless I'd be somewhat skeptic to implement something that powerful in a codebase, since it can go bad very fast. I reckon some very good harnesses would be needed, along with strong policies and a strict governance over the workflows. And add some evals to check how everything's going and iterate over that. For this you can check FDEKit.