Thoughts on LLMs in software development as of end of 2025

Ubaada | 23-12-2025 | 0 views |

These are my thoughts on LLMs in software engineering after using them in professional setting for about an year:

  • Chatbots are amazing at codebase exploration.
  • Chatbots are good at regression thought experiments, especially Codex.
  • Claude is way better than others in code quality.
  • Possibly using chatbots to scan repositories and services for bugs at scale.
  • Since code generation is cheap now (LLMs), going out of the way for thoughtful tests, readability, and PR documentation is the least that can be done.
  • Code cannot be merged at the rate it is produced because you have to own what was generated. The main gain is elevation from generation to checking, which is faster but not a substitute for skills.
  • Because you have to own the work, you have to be competent in that area. If LLMs are relied on too much, they can hinder your ability to develop enough competence to supervise the work.
  • On the flip side, LLMs allow greater exposure to the problem set much faster: fail fast → solve → get better (rapid iteration). In other words, they complement your agency. It remains an open question which of these two wins out for developing competence.
  • Rapid comprehension appears to be the most standout capability of LLMs over humans. So the longer and richer the most we can get out of LLMs. 
  • Local model aren't much help not even for easier tasks. The models you can run locally using 16-24 GB of VRAM are underwhelming and slow. The agentic flows, especially, can build up big KV caches which are too much to handle locally. Economies of scale win here to bring the best value out of a certain capex spent on hardware. Models like gemini flash are fast, good and cheap.
  • The best open-source models can basically match GPTs and Claudes of the world now and at a fraction of the cost. Since, for most people, they are too big to run locally the only viable option is various 3rd party hosted ones but they are often not trusted enough to be used with internal company codebases. This means we are mostly left with OpenAI, Anthropic or Google’s models.