AI is driving a fundamental shift in how software is built and used. Natural-language prompts can now effectively program computers via large language models (LLMs), a paradigm Andrej Karpathy calls “Software 3.0”. This new approach is poised to eat the existing codebase, requiring rewrites of vast amounts of software.
LLMs have become a ubiquitous utility-like resource. Tech firms invest tens of millions of dollars in training frontier models and offer them on cloud platforms for metered use, similar to electricity. With ChatGPT’s viral adoption – reaching 100 million users just two months after launch – AI capabilities have rapidly spread to millions of consumers, far ahead of corporate uptake.
Developers and tech firms are adapting to an AI-centric future. New tools pair human programmers with AI “co-pilots” for partial automation, while companies are redesigning products and documentation to collaborate with AI agents. This careful integration, rather than fully autonomous “AI agents” running wild, is the practical path forward in the coming decade.
Andrej Karpathy: Software Is Changing (Again):
The Evolution of Software: From 1.0 to 3.0
For decades, writing software meant writing explicit code in programming languages – what Karpathy calls Software 1.0. The 2010s saw the rise of Software 2.0, where many tasks were solved by training neural networks (whose learned weights serve as the code). Now, we are entering what Karpathy dubs Software 3.0, a new era in which programs are expressed in natural language and executed by AI models. In practical terms, the “code” is an English prompt that guides an LLM to perform a task.
Three Generations of Software Development:
Software 1.0: Human engineers write traditional code (C++, Python, etc.) with explicit logic and algorithms.
Software 2.0: Developers curate data and train neural networks; the program logic is encoded in model weights learned from examples (a concept Karpathy introduced in 2017).
Software 3.0: Developers (and even end-users) provide instructions in English (or other natural languages) to LLMs. The prompt becomes the program, and the AI’s pretrained knowledge does the rest.
Karpathy argues this is a once-in-a-generation shift. By his account, fundamental software paradigms remained largely unchanged for about 70 years – until the advent of modern AI. “Software 3.0 is eating 1.0/2.0,” he says, meaning the new AI-centric approach will steadily encroach on, and replace, many traditional codebases. A huge amount of existing software may need to be rewritten or refactored to incorporate AI capabilities.
Concrete signs of this transition emerged in fields like self-driving cars. Karpathy recalls that in Tesla’s Autopilot, many hand-coded modules (Software 1.0) were gradually supplanted by neural networks (Software 2.0) as the AI proved more capable. Similarly, the recent LLM boom suggests certain tasks that used to be coded manually might now be handled by prompting a model or fine-tuning one (Software 3.0).
This new paradigm also changes who can create software. Natural language is a higher-level interface than code, enabling a broader population to “program.” Karpathy notes that “the hottest new programming language is English.” Non-programmers can now describe what they want in plain language and get working software. For example, users have built simple apps by chatting with AI assistants – a trend dubbed “vibe coding.” People (even children in educational demos) have created games and apps just by telling the AI their idea, something unimaginable in the 1.0 era. “People can now vibe code apps... at an adequate capability without involving an additional expert,” Karpathy observes. In short, software creation is being democratized.
“AI is the new electricity” - Andrew Ng
LLMs: From Utility to New Operating System
Large language models have rapidly become a new kind of ubiquitous computing utility. AI labs like OpenAI, Google, and Anthropic pour enormous capital (CapEx) into training advanced models – akin to building power plants – and then make their AI “intelligence” available as a service via API, charging per usage (OpEx), much like an electric grid Just as early electric utilities built infrastructure and sold power by the kilowatt-hour, AI providers invest in huge clusters of specialized chips to train models and rent out AI by the token or query. Training state-of-the-art LLMs can cost on the order of tens of millions of dollars each (OpenAI’s GPT-4 is estimated around $40 million), and industry leaders predict the top-end training runs could approach $1 billion within a couple of years. This heavy investment has created an AI race, but once models are trained, they become shared utilities – accessible over the cloud and scalable to millions of users.
Yet LLMs are more than just a faceless commodity service. Karpathy suggests they should also be seen as a new kind of computer or operating system. An LLM can be thought of as a general-purpose CPU that runs “programs” expressed in natural language. It has a working memory (the context window that holds the conversation or prompt) and can use tools or retrieve information as instructed – analogous to how an OS manages memory and peripherals. In this view, the major AI model providers resemble platform owners. The ecosystem is taking shape much like the personal computing market did: a few proprietary platforms (e.g. OpenAI’s GPT-4, Google’s PaLM/Gemini) alongside an open-source alternative (the Llama 2 family from Meta, often compared to Linux in this analogy). Meta’s decision in 2023 to release Llama 2 freely for research and commercial use gave the community a powerful model to build upon, providing a “free-of-charge alternative to pricey proprietary models” from OpenAI and Google. This open model approach, Meta argues, “drives innovation because it enables many more developers to build with new technology”, much as open-source operating systems did.
At the same time, today’s LLM “operating systems” are in a phase reminiscent of the 1960s mainframe era of computing. The models are so large and resource-intensive that most run on cloud servers in centralized data centers, not on local devices. Users essentially operate as thin clients (e.g. using chat interfaces) connected to a powerful brain in the cloud. It’s akin to time-sharing on a mainframe: many users share slices of a far-away supercomputer because owning one personally is impractical. Karpathy quips that we haven’t yet seen the “personal AI” revolution – the equivalent of the 1970s shift to personal computers – because the economics and technology aren’t there yet. However, early signs of personal AI computing are emerging. Researchers have managed to run smaller LLMs on high-end PCs and even smartphones, and Apple’s latest chips have neural engines hinting at future on-device AI. (Karpathy pointed to experiments like running LLMs on clusters of Mac minis as hints of this trend.) In the coming years, as efficiency improves, we may get closer to personal-scale AI that doesn’t require an internet connection – a modern “PC” equivalent to today’s cloud “mainframes.”
One unprecedented aspect of this AI wave is how bottom-up the adoption has been. Historically, transformative tech (from computing to the internet) started with government or corporate use before trickling down to consumers. AI has flipped this script. LLMs gained mainstream popularity first among everyday people. OpenAI’s ChatGPT, released to the public for free in late 2022, became the fastest-growing consumer application in history, hitting an estimated 100 million monthly users in just two months. By 2023-2024, hundreds of millions of individuals were using AI for writing, coding, translating, tutoring and more– often for personal productivity or learning. “This isn’t a minor upgrade... it is a major multiplier to an individual’s power,” Karpathy writes of the phenomenon. In contrast, many large organizations have moved slower. Surveys show that while 97% of software engineers report using AI coding tools in some form, only about 40% of their employers officially encourage AI use in development. Companies face hurdles like data privacy, integration with legacy systems, and cultural resistance that keep them a step behind in adoption. Thus, at least in this initial phase, AI is leveling the playing field for individuals, giving solo developers and small startups capabilities that were once the domain of big tech. As Karpathy puts it, “the future is already here, and it is shockingly distributed” across the public.
What LLMs Can Do – and Their Limitations
LLMs possess impressive capabilities but also distinct weaknesses that set them apart from traditional software. These models have been trained on vast swaths of human knowledge – internet text, books, code – making them extraordinarily well-informed generalists. An LLM like GPT-4 or Claude can recall facts, translate languages, write code, and answer questions on almost any topic. This broad competence has drawn comparisons to savants. Indeed, Karpathy likens an LLM’s memory to “Rain Man”-like recall: they can regurgitate data (even seemingly random details or exact text) with uncanny accuracy. However, they lack the robust understanding and consistency of human reasoning, leading to what Karpathy describes as “jagged intelligence.”
In Karpathy’s words, state-of-the-art LLMs can “perform extremely impressive tasks… while simultaneously [struggling] with some very dumb problems.” For example, a model might solve a complex coding challenge or advanced math problem, yet incorrectly insist that 9.11 is greater than 9.9 – a mistake no ordinary person would make. These gaps in reliability stem from the way LLMs work (predicting likely text sequences rather than deploying a grounded world model) and the data they were trained on. They do not truly “know” when they are wrong, and they can confidently generate incorrect or fictitious information – the well-known phenomenon of AI hallucinations.
LLMs also have effectively no long-term memory or ability to learn incrementally once training is finished. Karpathy analogizes them to a colleague with anterograde amnesia: “they don’t consolidate or build long-running knowledge or expertise once training is over; all they have is short-term memory (the context window).” They start each new session largely from scratch, with no recollection of past interactions unless that context is explicitly provided. This means an AI assistant won’t remember instructions or corrections you gave it yesterday, unless those are included again today – a stark contrast to a human who accumulates experience. Efforts are underway to give models persistent memory (for instance, storing conversation histories or using database tools), but current LLMs still operate under a strict memory limit.
In summary, today’s LLMs come with several key limitations:
Hallucinations and Errors: They may produce plausible-sounding but incorrect information, from small factual errors to entirely made-up answers. Robust verification is required.
Inconsistent “Jagged” Reasoning: They can show superhuman skill on some tasks and fail at basic ones. This unpredictability means they aren’t fully reliable without oversight.
Lack of Ongoing Learning: They can’t update their knowledge base on the fly. An LLM won’t naturally improve or adapt from new information in real time (aside from what fits in its context window). Each session is stateless beyond that context.
Vulnerability to Prompt Attacks: Because they follow instructions blindly, maliciously crafted inputs can trick LLMs into ignoring safety rules or leaking data. This “prompt injection” risk requires careful guardrails on how AI systems are deployed.
Despite these shortcomings, LLMs remain extremely useful in practice when used with care. They excel at rapid first drafts, summarization, boilerplate generation, and answering FAQs – tasks where perfect accuracy is not mission-critical or can be double-checked. The key, experts emphasize, is to leverage their strengths while putting mechanisms in place to catch their mistakes.
AI-Augmented Software
Rather than aiming for fully self-driving “AI agents” in one leap, the industry is gravitating toward partial autonomy applications – traditional software enhanced with AI co-pilots that work under human supervision. In these AI-augmented tools, an LLM performs labor-intensive sub-tasks, but a human oversees and guides the process, intervening as needed. Karpathy suggests thinking of this like an “Iron Man suit”: the user is empowered by the AI’s strengths (augmented with superhuman knowledge and speed), while still in control of critical decisions. The suit (AI) even has some autonomous capabilities – at times acting on its own – but it is ultimately designed to assist a human operator, not replace one (at least not yet).
A hallmark of such systems is an “autonomy slider” that lets users choose how much agency to grant the AI. For example, consider coding with an AI assistant in a software IDE (integrated development environment). At the low-autonomy setting, the AI might simply autosuggest the next line of code or help debug an error – similar to an advanced autocomplete. At a higher setting, the user might select a function or file and ask the AI to refactor it, reviewing the diffs before accepting changes. At the extreme end, the user could let the AI generate an entire module or even run in an agent mode that attempts to build and test a feature with minimal guidance. Tools like Cursor (an AI-augmented IDE) implement exactly this spectrum: tab completion → selection-based edits → whole-file edits → “do everything” agent mode. Another example is the search engine Perplexity.ai, where a user can simply get a quick answer (low autonomy), ask for a multi-source analysis (medium), or request a “deep research” report that the AI compiles over several minutes (higher autonomy). Crucially, in all cases the user can inspect and override the AI’s output.
Designing effective human-AI pairing involves optimizing the feedback loop between generation and verification. Karpathy emphasizes two principles: make verification fast and easy, and keep the AI on a tight leash. The interface should help the human quickly audit the AI’s suggestions. This means presenting information in structured, visual ways rather than a raw text dump. For instance, an AI coding assistant should show code changes in a colored diff view so the developer can see additions and deletions at a glance, accepting or rejecting them with a click. A research assistant should provide source citations (with links) for each factual claim, so the user can verify them. These design choices tap into human strengths (visual pattern recognition, judgment) to counter AI’s flaws. The other side of the coin is constraining the AI’s autonomy appropriately. Free-roaming agents that produce thousands of lines of code or execute long action sequences on their own tend to drift off course or make harmful errors, leaving the user with a massive, inscrutable output to sift through. It’s often more productive to have the AI tackle tasks in small, controlled increments – then pause, let the human verify, and proceed. By iterating in a tight loop, the combined human-AI team moves faster and stays in control of quality.
The cautionary tale here comes from self-driving cars: even a spectacular demo of full autonomy does not mean a system is production-ready for all scenarios. Karpathy recounts riding in a Waymo autonomous car in 2013 that handled a complex 30-minute drive with zero interventions – an experience that made him think true self-driving was imminent. In hindsight, it took another decade of hard problems to get from that demo to commercial robo-taxi services, and even today those operate in limited areas with caveats. “Demo is works.any(), product is works.all(),” he notes, highlighting the vast gap between a controlled demo and a reliable product. By analogy, today’s AI agents can wow in scripted demonstrations, but in the real world of messy, open-ended tasks, keeping a human in the loop is often necessary. Karpathy expects the progression toward greater autonomy to be gradual – “this is the decade of agents,” not the year of agents, he says – requiring careful engineering and safety checks along the way.
Natural Language Coding and “Vibe” Development
One of the most exciting developments of the Software 3.0 era is how it lowers the barrier to entry for programming. If an AI can translate plain English into code, then anyone who can articulate what they want can potentially build software. This has given rise to the phenomenon of “vibe coding” – informally describing a program to an AI assistant and iteratively refining it. Karpathy’s tongue-in-cheek coinage of the term went viral, reflecting a real movement of newcomers tinkering with code through AI.
For example, hobbyists have used OpenAI’s ChatGPT or similar tools to create simple video games without writing code themselves – they just explain the game mechanics and the AI generates the code. Karpathy himself experimented with this: despite not being fluent in Swift (the language for iOS apps), he managed to build a basic iPhone app in a day by continually prompting an AI for code and debugging help. Children have been seen building Minecraft mods or websites via AI guidance, essentially learning programming concepts through natural language interaction. This new mode of software creation is more accessible and creative, though often less precise than hand-coding. It shines for prototyping. As Karpathy notes, he was able to get a toy project working quickly with AI assistance – for instance, he “vibe coded” a web app called MenuGen that generates dish images from a photo of a restaurant menu, something he built to solve a personal annoyance. The core functionality was up and running within hours thanks to AI help.
However, vibe coding can hit a wall when a project needs to be productionized. Karpathy discovered that while an AI can write the application logic, turning it into a real, deployable service involved many tedious steps that AI could not easily handle. Setting up cloud hosting, configuring authentication and databases, dealing with API keys – today these DevOps tasks often require clicking through web consoles, reading documentation, and making decisions that are not spelled out in code. “The reality of building web apps in 2025 is a disjoint mess of services… not accessible to AI,” Karpathy wrote after struggling to deploy his prototype. In one case, integrating a third-party login SDK required following a multi-step web tutorial – something a human can do, but an AI agent would not navigate properly because the instructions were written for a person. This highlights a critical point: much of our software infrastructure is designed for human developers, not AI agents.
To truly unlock end-to-end AI-driven development, the ecosystem will need to adapt tools and services for AI use. Some forward-looking platforms have started to do this. For instance, the web deployment service Vercel was praised by Karpathy for providing documentation in a clean, structured format (including command-line instructions) that an AI could consume, whereas another service’s docs were full of human-oriented prose and screenshots that confused the AI. Similarly, Stripe – known for developer-friendly docs – has experimented with making an LLM-readable reference for its API, so an AI agent could figure out how to perform payments or webhooks without human help. These are early examples of a broader trend: “building for agents.”
"2025-2035 is the decade of agents.” - Andrej Karpathy
Building for AI Agents: A New Digital Consumer
As Karpathy puts it, there is now a “new category of consumer and manipulator of digital information” to consider: not just human users or traditional software, but AI agents. Websites and APIs have long been designed for two audiences – humans (using graphical interfaces) and computers (using machine-readable endpoints). Now we have a hybrid audience: AI programs that operate with human-like reasoning but need structured data like a computer. This calls for a rethinking of interfaces and protocols.
One proposal is for websites to include a simple llm.txt
file (analogous to robots.txt
for web crawlers) that describes the site’s content or provides instructions in plain text. An LLM could read this to understand how to interact with the site’s pages or to retrieve specific information, rather than scraping through HTML designed for visual rendering. Likewise, documentation and help files could be offered in lightweight markdown or JSON formats specifically formatted for LLM consumption. Companies like Vercel and Stripe, as noted, have begun offering AI-oriented docs alongside human docs. Another idea is exposing more direct APIs for actions that currently require complex multi-step GUI interactions – essentially giving the agent a clear mechanism to do what a human user would do by clicking.
There is also a growing ecosystem of tools to bridge the gap from human-oriented content to AI-friendly input. Karpathy highlights context builders such as GitHub’s “gpt-ingest” or Cognition’s DeepWiki. These tools can, for example, take a large code repository or knowledge base and compress it into a summarized form that fits in an LLM’s context window, complete with structured outlines. Instead of expecting an AI to read an entire codebase file by file (which is slow and expensive), the AI can query a pre-digested overview. We can imagine analogous tools for documents, databases, or any large information source – effectively preparing the context so the AI agent can work efficiently.
By meeting AI agents halfway, developers can dramatically improve their effectiveness. If an AI agent doesn’t have to waste effort figuring out interfaces meant for humans, it can focus on the real task. This means a future where, for instance, an AI customer service agent can read a company’s policies from an llm.txt
and use an official API to resolve a customer request, rather than trying to “read” a website like a person. It means an AI DevOps bot can deploy infrastructure by calling documented APIs instead of clicking a web dashboard. In essence, making our digital world AI-accessible is the next step after making it user-friendly and API-friendly.
Conclusion
The emergence of Software 3.0 marks an inflection point in the tech industry. Artificial intelligence is now deeply entwined with software development, from coding assistants and intelligent search to entire applications built around LLM capabilities. Andrej Karpathy and other leaders argue that this shift is both exciting and daunting in its scope. A vast amount of existing software will need to evolve – either by being augmented with AI or completely reimagined under this new paradigm. Developers entering the field today are well-advised to learn not only classical programming, but also how to train models (Software 2.0) and how to prompt or integrate LLMs (Software 3.0), using each approach to its best advantage.
Despite rapid progress, we are in the early stages of this transformation. Today’s LLMs, for all their prowess, work best as copilots rather than fully autonomous agents. The coming years will be about pushing the “autonomy slider” to the right gradually – expanding what AI can do on its own, while maintaining effective human oversight. The industry’s collective challenge is to build the “Iron Man suits” that make humans vastly more productive, rather than chasing fully independent robots that aren’t yet trustworthy. As Karpathy’s recent talk emphasized, success will require focusing less on flashy AGI demos and more on practical partial autonomy, better human-AI interfaces, and tooling built for agents from the ground up.
It’s an extraordinary time in software, with AI catalyzing a paradigm change that usually only comes once in a lifetime. In Karpathy’s words, “software is changing… again.” Those who adapt will help shape the next generation of tech – one where human creativity and machine intelligence work hand-in-hand to build things previously out of reach. The era of Software 3.0 has only just begun, and its story is ours to write.
Disclaimer
The content of Catalaize is provided for informational and educational purposes only and should not be considered investment advice. While we occasionally discuss companies operating in the AI sector, nothing in this newsletter constitutes a recommendation to buy, sell, or hold any security. All investment decisions are your sole responsibility—always carry out your own research or consult a licensed professional.
perfect breakdown
great post!