What is an AI-first software company?
Deterministic-native software organizations have been struggling to generate substantial enterprise value using GenAI. LLMs represent a shift from deterministic to stochastic software. This shift has substantially taxed the capabilities of these deterministic-native organizations to handle the increased burden of nebulosity injected by stochastic software. Unpacking what this nebulosity provides an opportunity to better understand what an AI-first company looks like.
What do we mean by nebulosity?
Every problem has three layers - "why" (purpose), "what" (specification), and "how" (implementation). Most software employees operate at the "how" layer—writing code, creating designs, generating content. As LLMs increasingly automate the "how," companies must differentiate through "why" and "what."
However, the "why" and "what" questions are fundamentally nebulous. We use this term as coined by David Chapman—a concept is nebulous if it resists clear definition. For example, what makes customer service "helpful"? What makes a search result "relevant"? It’s difficult to cleanly pin these concepts down.
Companies navigate nebulosity by creating patterns of concrete implementations that get tested against the market. Google started with the nebulous purpose of "organizing the world's information," refined it into the pattern of a search engine, then originally implemented it through PageRank. Over time their search infrastructure grew to encompass far more than PageRank. Each implementation cycle deepened Google’s understanding of its original nebulous purpose, further opening up opportunities to instantiate more adaptive patterns within their environment.
The default for most companies is to have this relationship between nebulosity and pattern to calcify. A classic example is Kodak: originally focused on the nebulous goal of “capturing memories,” Kodak calcified around film photography as its core pattern and failed to adapt to digital photography, ultimately losing relevance as the market shifted. Successful companies develop processes and rituals to maintain this fluidity. The shift from waterfall to agile is a strong example of this.
Deterministic vs stochastic software
Traditional software engineering university courses have trained engineers to rapidly deliver deterministic software to fulfil specific business logic contracts. Deviances from the specification could be treated as “bugs” and could be reliably and deterministically “fixed”. Rare edge cases consequential to a product’s success were easy to fix via explicit branches for specific situations. Even if a company had a relatively poor handle on the nebulosity of their product’s “why” and “what”, they could rapidly iterate their way to clarity due to strong guarantees of the software’s behavior at the level of “how”.
In contrast, machine learning (i.e. stochastic software) offers no such guarantees. It’s a tool of “last resort” that’s most impactful for problems that defy trivial specification. For example, recognizing cats in images, understanding natural language, generating creative content. ML is useful precisely in settings where one can’t trivially describe the software’s intended behavior via a series of deterministic rules. Moreover, it’s impossible to fix “bugs” in a piece of stochastic software’s behaviour with any guarantees nor is it possible to trivially mitigate edge cases. Instead, one is forced to think statistically in terms of shaping the overall distribution of system behaviors. It’s emotionally very challenging for many engineers to abandon the concrete guarantees offered by deterministic software.
All companies face both technical risk (e.g. “Will this work?”) and market risk (e.g. “Will someone pay for this?”). Deterministic-native companies largely faced market risk and could substantially mitigate technical risk with careful hiring. For example, imagine a company is struggling to scale their databases to meet market demand. Even if they don’t know how to build the right database stack, they’d be able to largely eliminate such technical risk if they found a way to hire Jeff Dean or Sanjay Ghemawat. Such experts would likely be able to leverage their knowledge to rapidly create guarantees on a novel database’s performance. In contrast, stochastic software has substantially higher irreducible technical risk that’s difficult to similarly mitigate via careful hiring or via engineering processes. Even if you hired a core member of Gemini’s post-training code generation team, they wouldn’t be able to confidently tell you whether it’s even possible to improve a given Gemini prompt without substantial empiricism.
Nevertheless, companies will face increasing pressure to organize their teams such that they’re able to simultaneously weave together deterministic and stochastic software. Leads will be increasingly required to embrace the inherent nebulosity of this new world, and therefore grow wiser.
Evals create clarity
Suppose a company is building a customer service AI. To evaluate if the AI is “helpful” (a nebulous goal), the team creates a set of 100 common customer questions and asks the AI to answer them. Each answer is then rated by human reviewers on a scale from 1 to 5 for helpfulness.
This process using the same set of questions and rating system each time the AI is updated is a repeatable evaluation that attempts to answer the question: “How helpful is our AI to customers?”
Evaluations are the joint point between nebulosity and pattern for organizations grappling with stochastic software. Producing a meaningful web of evals will be increasingly important for products seeking to ship stochastic software.
Product Requirements Documents (PRDs) have been the traditional workhorse within deterministic software engineering for specifying a project’s goals, requirements, legal/privacy considerations, etc. These would then get translated into UX mocks, engineering design docs, etc to reify concrete patterns that could subsequently be brought into contact with the market.
In contrast, evals act as the specification for a piece of stochastic software. They are the concrete enactive target of what the engineering team hill-climbs/optimizes the system against. The process of creating evals can be extremely clarifying for the entire company if it’s done with the right set of practices and rituals.
It’s intractable to evaluate the combinatorial explosion of inputs that could be fed into an LLM. In the best case, this forces teams to become far more intentional with what they measure to drive more concrete trade-offs in what the product offers its users. Such clarity in a product’s trade-offs can help the company achieve greater levels of differentiation.
Evals can also help identify incoherence within an organization. For example, it’s easy for a team to claim they’re building a “helpful customer service AI agent”. But this is an extremely nebulous goal. In practice, the engineering, product and UX teams might naturally default to instantiating very different patterns of how such an agent might be instantiated in the world. The process of creating evals and in such a way that solicits feedback from all stakeholders can be an invaluable tool for leadership to force clarity and cohesion.
What is an AI-first software company?
A company is a pattern of interactions between its employees that gives rise to the company's various products, services, internal artifacts (e.g. evaluations) and external feedback. Moreover, it evolves as its products, services, internal artifacts and external feedback evolve. As popularized by Conway’s Law, “companies ship their org charts.”
Going from brick and mortar to the internet substantially shifted organizational interaction patterns. A company was internet-first to the extent that its inner workings were sensitive to the underlying improvements in technology (e.g. bandwidth, SaaS services, etc). A company is AI-first to the extent that these inner workings are sensitive to the underlying improvements of frontier LLMs. A company is adaptively AI-first to the extent that it immediately benefits from model updates. For example, if GPT-7 were to be suddenly released, a “true” AI-first company would experience non-trivial reorganization of its underlying processes.
We also distinguish between AI-first companies from AI-native companies. An AI-native company is one that’s largely a deterministic-native company which has its workflows accelerated with AI-based tools.
What is the process for building an adaptive AI-first startup?
The complete playbook for AI-first companies doesn't exist yet. But clear patterns seem to be emerging.
It’s useful for AI-first employees to have the ability to unambiguously express what they want with a high degree of empathy, so as to effectively communicate with someone that might not have a lot of context. This is extremely useful for prompting models. It’s also useful for surfacing deep-seated differences between individuals, and to therefore gain more clarity on the nebulous purposes animating the team and the pattern in which it ought to be expressed. Cultivating this within an organization requires substantial amounts of psychological safety and trust. However, integrating these differences often leads to substantially improved problem-solving.
In the same vein, increasing an AI-first company’s headcount is a double-edged sword. Each additional head is a tax on the existing pattern of interactions between employees (i.e. its culture). However, if integrated properly, each additional head has the potential to substantially increase the complex problem-solving abilities of the organization. It’s likely easier for a company to remain AI-first with a deep bias towards automation.
The company’s most critical user journey should run through LLM inference. However, the core product need not be a chatbot. The most important touch points between the company and the user must be a strong function of the quality of the company’s underlying models. It’s not necessary for the company’s modeling to be vertically integrated. But the company should have deep expertise in constructing evaluations to steer whichever models it uses. Moreover, every incremental expansion of the core product’s scope should be downstream of some underlying set of models. Companies will increasingly face a shift from placing an emphasis on achieving individual milestones/outcomes to being driven by an underlying process of evolution.
Evals are the joint point between a company’s nebulous purposes and concretely instantiated patterns. As companies complexify, their evals will need to increasingly become communal artifacts with rituals for incremental improvement. For example, a company’s evals could be stored in GitHub with appropriate UIs for even non-technical members to inspect, comment on and suggest improvements. Every employee will need to increasingly internalize that they bear a piece of some collective responsibility in steering the behaviour of the company’s models. That is, to participate in an ongoing dialog with the company’s web of AI systems.
OSS developments like the LAMP stack led to the development of “full-stack engineers”. That is, generalists that could effectively move between different layers of the tech stack. It’s been common career advice to cultivate a T-shaped model of competency. That is, deep in one area and wide along other areas. Rather than “full-stack engineers” LLMs will facilitate the rise of “full-stack employees” which can span job roles like PM/SWE/UX rather than achieve effectiveness within a narrow role. A T-shaped metaphor still applies. But augmented by LLMs, the base of the T will be far wider and the depth of the stem will be far deeper.
Where do we go from here?
It will be extremely challenging for deterministic-native companies to make this transition to becoming AI-first. Transitions in the broader economy will upend some of the world’s largest business models. It would have been extremely ambitious to expect the Yellow Pages, a brick and mortar company, to transform itself into Google, an early exemplar of internet-first thinking. Having said that, not all businesses will need to become AI-first to survive. E-commerce still hasn’t dwarfed brick and mortar sales. IBM still exists as a profitable company. Microsoft made a prodigious transition from the desktop to web services. There are still many businesses in my neighborhood in NYC that are cash-only. But the largest companies in the world are most certainly internet-first with a market capitalization that dwarfs companies from the previous era.
The AI-first economy will be substantially larger. AI systems will eventually be as fundamental to our cognition as the psychological technology of literacy.