These are my thoughts on LLMs in software engineering after using them in
professional setting for about an year:
- Chatbots are amazing at codebase exploration.
-
Chatbots are good at regression thought experiments, especially Codex.
- Claude is way better than others in code quality.
-
Possibly using chatbots to scan repositories and services for bugs at scale.
-
Since code generation is cheap now (LLMs), going out of the way for
thoughtful tests, readability, and PR documentation is the least that can be
done.
-
Code cannot be merged at the rate it is produced because you have to own
what was generated. The main gain is elevation from generation to checking,
which is faster but not a substitute for skills.
-
Because you have to own the work, you have to be competent in that area. If
LLMs are relied on too much, they can hinder your ability to develop enough
competence to supervise the work.
-
On the flip side, LLMs allow greater exposure to the problem set much
faster: fail fast → solve → get better (rapid iteration). In other words,
they complement your agency. It remains an open question which of these two
wins out for developing competence.
-
Rapid comprehension appears to be the most standout capability of LLMs over
humans. So the longer and richer the most we can get out of LLMs.
-
Local model aren't much help not even for easier tasks. The models you can
run locally using 16-24 GB of VRAM are underwhelming and slow. The agentic
flows, especially, can build up big KV caches which are too much to handle
locally. Economies of scale win here to bring the best value out of a
certain capex spent on hardware. Models like gemini flash are fast, good and
cheap.
-
The best open-source models can basically match GPTs and Claudes of the
world now and at a fraction of the cost. Since, for most people, they are
too big to run locally the only viable option is various 3rd party hosted
ones but they are often not trusted enough to be used with internal company
codebases. This means we are mostly left with OpenAI, Anthropic or Google’s
models.