mattwood.blog

The Cost of the Answer

2026-07-02T00:00:00Z

Most people keep both a checking account and a savings account, not because one is better but because they do different jobs. A checking account gives back what you put in: deposits go in, withdrawals come out, and the balance is roughly what you deposited. A savings account puts the balance itself to work. Interest earns interest, and given enough time the growth outweighs the original deposit.

Most teams open a checking account when they start with AI. Take a process, hand it to agents, and the work speeds up. The gain is proportional to the deposit: more effort into the design, more throughput out, and to gain again you go back in and deposit more. That is real value. It is also a ceiling. If all your investments are in checking, you grow only as fast as you can deposit.

What separates the two accounts is not the money. It is whether anything in the system can turn this period's balance into next period's growth. AI has the same dividing line: it compounds only when the organization lowers the cost of knowing whether the work got better. The cost that matters is not the cost of the work but the cost of the answer.

The machinery and the economics

The machinery of compounding is the loop: a system that does a piece of work, evaluates what it produced, and feeds the evaluation back into the next round. A first draft gets a critique, the critique produces a second draft, and each round starts where the last one ended rather than where the first one began. Anything that closes that circuit is a loop. Anything that doesn't, no matter how fast or how automated, is a single pass repeated.

Every loop has two halves: the thing producing the work, and the thing judging whether the work improved. Call the second one the evaluator. Agents have made the first half nearly free. The work itself, the drafting, the coding, the generating, now costs so little that it is no longer where the economics of the loop are decided; the evaluator is. A loop compounds only if something can reliably tell, round after round, that this version beats the last one, and feedback without that judgment does not compound. An unvalidated loop can reinforce a mistake as easily as fix one, and it can get wronger every round, which is worse than never looping at all, because it looks like progress the whole way down.

Agents make the work cheap. Evaluators make the answer cheap. Evaluators that persist make judgment compound. Most teams are working hard on the first, and the returns have moved to the other two.

The theory of value

If validated feedback is what compounds, then the speed of any loop is set by how cheaply you can get the validation: how much it costs to know whether the last round was better. When the answer is free, you can check after every single round, and the loop compounds as fast as it can run. When the answer is expensive, you can only afford to check occasionally, and the loop crawls no matter how fast the work itself runs.

Answer cost, not work cost, decides which work compounds and which merely speeds up. Coding compounded first, and the tempting explanation is that coding gets its answer for free: the test suite runs itself, so the loop can iterate as often as it likes. But a test suite only tells you whether the code satisfied the tests, not whether the product judgment behind them was right, and tests do not write themselves. Coding compounded first because software engineering spent decades building cheap evaluators: tests, builds, linters, type systems, benchmarks, continuous integration. Fields compound when they invest in answer infrastructure. Writing has a little of that infrastructure: a read is quick, so a draft can get an answer nearly every round. Strategy and synthesis have almost none, which is why the verdict on a strategy can take months to arrive.

The work leaders are often held accountable for, the strategy, the synthesis, the judgment, is also the work that is difficult to check. Difficult is not the same as fixed, though. Answer cost depends on how much answer infrastructure a field, or a firm, has built, and infrastructure can be built. And the final answer is not the only answer. For a strategy, the verdict may genuinely stay expensive; you may not know for six months whether the call was right. But the verdict decomposes into intermediate answers that are cheap if you ask for them: whether this version is clearer than the last, whether the assumptions are explicit, whether the objections are addressed, whether the plan survives the plausible futures. None of these is the final truth. AI cannot know the final truth cheaply, and it does not need to: it can lower the cost of the intermediate answers that make the final judgment better, and those are the answers a loop runs on. The verdict stays with whoever owns the call; the loop hands them better material to make it with.

Making answers cheap, and keeping them honest

A cheaper answer is not automatically a good one, only a faster one, and a fast answer that is still wrong buys you nothing. An answer has to be cheap enough to run often and trustworthy enough that running often means learning more. Both properties can be engineered.

You lower cost by asking a smaller question. Judging whether a draft is good in any absolute sense is slow and expensive. Judging whether this version beats the last one is cheaper, and it is usually all the loop needs. Comparison can still be hard when dimensions conflict, one version clearer, the other bolder, but the question stays bounded: two candidates, one call, no need to define good in the abstract. Bounded questions can be asked often.

You lower it again by watching the plateau instead of waiting for a finish line. Nobody can tell you in advance what done looks like for a piece of strategy, so do not make the loop wait for that answer. Watch the size of the changes each round: when they shrink toward nothing, or the same fix keeps recurring, you have a signal, not a verdict, and it cost nothing but attention to a trend you were already tracking.

You raise trustworthiness by requiring evidence before you count a win. A loop with no free answer can talk itself into progress that is not there, so make it show its work: the specific change, the specific result. This does not make any single check faster. It makes the check honest, which is what keeps you from compounding on a false gain. And you raise it again by waiting for a pattern to repeat before you trust it. A lesson drawn from one round might be noise wearing a lesson's clothes. Waiting for it to show up twice costs one extra round and buys an answer that is harder to fool.

The durable asset

A loop with a fixed evaluator makes the work better: this draft, this strategy, this website. The artifact improves until it ships, and the next artifact starts from zero. The artifact was never the asset. It ships, it leaves, and it starts depreciating the day it is done. The evaluator can stay.

A company writes a strategy memo. The first pass is an agent's draft. The second pass is an evaluator holding that draft against a rubric: assumptions explicit, options distinct, tradeoffs named. The third pass notices that the same objection keeps recurring across rounds, weak customer evidence, say, and starts demanding it earlier. The fourth pass writes that objection into the standing rubric, so the next memo, on a different question entirely, faces it from the first draft. Run this for a year and the company has not merely produced better memos. It has accumulated better strategic taste, held somewhere that does not walk out the door.

The model may be exactly the same at the end of that year as it was at the start. What sharpened is the apparatus around it: the rubrics, the objection sets, the worked examples, the recorded failure cases, the standing comparisons. A loop that improves its own evaluative apparatus is lowering the cost of the answer on its own, permanently, for every future piece of work that passes through it. It makes more of the organization's work eligible to compound without anyone going back in to fix the economics by hand. AI is well suited to running that apparatus, because it can hold a draft against a rubric, yesterday's synthesis against today's, or two strategies against each other, fast enough to answer every round. The apparatus is what compounds, and it belongs to whoever built it.

Throughput is consumed, artifacts depreciate, and evaluators appreciate. They can also ossify: a rubric can harden into bureaucracy, an objection set can overfit to the last failure, accumulated taste can quietly become accumulated caution. The asset needs maintenance, new evidence, new counterexamples, an occasional reset. Kept honest and held long enough, an evaluator changes what the work is. A strategy stops being a document you finish and file, and starts being something you keep running: more variants worth generating because generating them got cheap, decisions revisited on a cadence instead of once a year, a living draft still answering to reality on the same terms it always did, just able to try more before reality weighs in.

AI does not compound because generation got cheap. It compounds when judgment gets cheap enough to run continuously, and durable enough to improve the next piece of work, not just the current one. That second investment is harder to fund than the first. Automation pays back in throughput, and throughput shows up in the first week; an evaluator looks like overhead right up until it becomes the asset. So the natural question, what work can I automate, keeps winning the budget, and it leads to checking accounts every time: you speed up the work, pocket the gain, and the next gain costs another deposit.

The better question is: where is the cost of the answer too high to loop? Those places are not failures of AI; they are the map of where to invest. Lower the cost of the answer and the work starts compounding. Lower it persistently and pieces of judgment that have sat in checking forever, one deposit at a time, start behaving like savings.

The Field and The Frontier

2026-06-23T00:00:00Z

Two dynamics are running simultaneously in AI, and they are easy to confuse because they are measured on different axes. One tracks what it costs to keep pushing the frontier. The other tracks the price of buying a fixed level of capability. Read together they look contradictory. Read apart they explain much of where enterprise value is going to come from.

At the frontier, budgets are rising. Not because any fixed unit of capability is getting more expensive, but because the definition of frontier work keeps moving. Each generation absorbs more reasoning, more tokens, more tool use, more context, more evaluation. The frontier never holds capability fixed; it spends every efficiency gain on greater ambition. So in practical terms frontier budgets do not shrink, even as the price of any specific capability falls, because what we ask of the frontier expands to fill the new envelope.

Behind the frontier, prices are collapsing. For any fixed level of capability, the price of achieving it is falling at rates that take a moment to absorb: not incrementally, but by orders of magnitude, year over year. Both are true at the same time. The first attracts nearly all the attention. The second is where much of the enterprise value will be created this year.

The numbers are not subtle. Epoch AI and Artificial Analysis track price per million tokens for models that hit specific benchmark thresholds: fixed-capability price curves. Benchmarks are imperfect proxies for work, but they are useful here because they hold a capability threshold constant while price moves. GPT-3.5 Turbo-level performance on general knowledge has fallen at roughly 9x per year, the slowest rate in the data. GPT-4 level performance on PhD-level science questions has fallen at roughly 40x per year: models that cost $30 per million tokens in early 2023 had equivalents below $1 by mid-2024, and below $0.10 by early 2025. Some tiers fall even faster, though on fewer data points. The 9x and 40x rates alone are enough. This is not an efficiency gain at the margin. It is a category change in the price of benchmark-equivalent inference.

The mechanism is straightforward. As the frontier advances, the techniques that produced yesterday's leading models become available for optimization. Smaller models are trained to match larger predecessors; distillation and architectural improvements compress capability into fewer parameters; open-weight models proliferate; competition pushes commodity inference toward utility economics. Any specific capability level becomes radically cheaper over time, even as the frontier itself keeps moving forward.

For most of the history of this curve, the decline was interesting and practically irrelevant. The cheapest models were not good enough to perform useful knowledge work at the standard organizations need. That changed somewhere around GPT-4-class usefulness, first for a widening class of professional knowledge work: summarization, drafting, classification, synthesis, coding assistance, analysis, decision support. Once models cleared that bar on those tasks, falling price stopped being background noise and became deployable surface area.

Below the threshold, cheap AI is a curiosity. Above it, cheap AI is a material. Materials behave differently from tools. A tool gets picked up and put down. A material gets embedded: in infrastructure, in process, in the operating assumptions of a business. Cheap enough and capable enough, intelligence stops being an application you reach for and starts being a substrate you build on, closer to electricity or bandwidth than to software.

This is the field: the broad space behind the frontier where capabilities are no longer novel, but are cheap enough and available enough to be built into ordinary work. In almost every large enterprise the pattern is the same: experimentation is broad, production deployment is narrow, and process-level orchestration is rare. Forrester's 2025 automation predictions estimate that genAI will orchestrate less than 1% of core business processes, even as it affects process design, development, and data integration. The shortfall is not for lack of trying. It is that very few organizations have worked systematically across the whole operating surface of the business, rather than in scattered pilots. The field is wide open not because no one is looking, but because almost no one is covering it with discipline.

The hard problem has changed. Models have been accessible via API for some time; what is new is that capable inference is now cheap enough to be economically viable across workloads many organizations have not attempted yet, and that threshold drops every few months, pulling more organizations and more use cases into range. When capable inference was expensive, the model was the system. Now that it is cheap and getting cheaper, the model is a component, and the system is the work.

You can hear the shift in what customers ask. The question used to be: which model is best? The question now is: how do I select and chain models so the right one meets the right workload? That is the field thinking out loud. Customers have started doing the real work: proving capability against their own data, their own processes, and their own risk.

A workflow that saves an analyst fifteen minutes is not worth $5 in inference cost. At $0.05, the calculation changes. At $0.005, it changes again. Most untouched workflows were passed over not because automation was impossible, but because the whole system was not worth building: model cost was only one line in a budget that also included integration, exception handling, and ownership. Falling inference cost does not erase those other lines. It changes the return enough that many more workflows become worth instrumenting.

Three postures follow. Frontier exploration: learn what just became possible, because capabilities emerge unevenly and the only way to map the edge is to probe it. Frontier-first development: build at frontier prices on the assumption that price reduction follows capability, so that what is frontier when you start is field by the time you reach scale. Field-first deployment: survey the workflows you already run, and where the return is real, instrument them, evaluate them honestly, and deploy across a far wider surface than most organizations have attempted. The discipline is in choosing well, not in automating everything that is technically feasible. Most organizations are over-indexed on the first two. The third is where the untapped value sits.

Which points at who wins. When the model is a component, access alone stops being the differentiator. The advantage moves to two kinds of machinery. The first is technical: routing, evaluation, observability, cost controls, the plumbing that matches workloads to models and keeps them honest in production. The second is organizational, and harder to buy off a shelf: proprietary data, process knowledge, clear ownership, and the authority to change how work actually gets done. The first is necessary. The second is what separates the organizations that pull ahead from the ones that run impressive pilots.

None of these compete. The frontier creates capability; the price curve makes that capability ordinary. Today's frontier is next year's field.

For most enterprises, the most valuable work in AI over the next several years will be the patient, unglamorous business of proving and deploying known capabilities against local data, local workflows, and local risk, at prices that finally make the attempt worthwhile.

The frontier gets explored. The field gets cultivated.

What The Garden Is For

2026-06-10T00:00:00Z

New models that can work on their own for days, not minutes, arrived this week. As a system runs more of itself, the human role does not shrink. It concentrates.

In the first year, a garden is almost entirely labor, and almost all of that labor is knowledge. You are on your knees in it. You are learning how deep a bean wants to sit and how far apart to set tomatoes so that air moves between them and disease cannot settle. You are learning that watering little and often is a beginner's habit, that it keeps roots shallow and clustered near the surface where the first dry spell can reach them, and that a long soak less frequently sends them down to where the water keeps. You stake and you thin. You prune for a structure that will not show itself for two summers. You pull weeds out of soil you disturbed and so invited them into. The first year rewards knowing and punishes guessing, and there is no shortcut through it.

None of that knowing is the point of the garden.

It is the price of entry, the cost of building a system that will, in time, compound largely without you. The point is what stands there in the fifth year: a thing that feeds itself more than you feed it, that throws shade where shade is wanted and scent where you walk in the evening, that produces, season on season, for a fraction of the work the first year demanded. You did not garden in order to plant. Planting was the means. The end was always the living, standing, producing thing, and the quiet exchange of everything inside it keeping everything else alive.

Because a garden is not a collection of plants. It is a collection of systems acting on one another across years. There is the soil, which is itself alive, a dense traffic of fungi and bacteria and roots trading sugar for minerals underground. There is water and how it moves and where it sits. There is light and how it falls in April and in August, which are not the same. There is the community above the soil, the way one plant fixes nitrogen that the hungry one beside it will spend, the way a deep taproot opens the ground for shallower neighbors. There are the pollinators, and the pests, and the things that eat the pests. To see a single plant is to be a beginner. To see the web is to be a gardener, and the seeing only deepens with the years, which is why the knowledge of how it all works does not lose its value as the garden matures. It becomes the most valuable thing you own.

A garden built well begins to defend itself. A bed planted as a single crop has no defenses at all, and you will spend your life propping it up, spraying and feeding and replacing, because nothing in it checks anything else. A bed planted as a community builds its own balances: the ladybug arrives for the aphid, the bird for the caterpillar, the wasp you never see for the things you never see, and healthy soil simply refuses many of the diseases that ruin tired soil. The garden begins to catch its own errors. A plant in the wrong place fails plainly and tells you so, and a garden full of living feedback corrects more of itself each year, which means your hand is needed less for rescue and more for direction.

This does not mean the work ends; vigor has no morals. The fastest, strongest, most determined grower in a neglected bed is the bindweed and the bramble, pouring into the world exactly the energy you wished into your crop. A weed caught in its first week is a flick of the wrist. The same weed left a season puts down root that will regrow from a fragment the width of a thumbnail, and you will be answering for that one careless month for years. So the gardener never stops walking the beds. The attention does not end. It changes character, from doing toward noticing, from the work of the hands toward the work of the eye that knows what it is looking at.

What shifts, across the seasons, is where the hours go. Less of them to the weeds, never none, and more of them to the systems themselves. You stop reacting and start guiding. You plan a succession so that something is always coming on as something else is spent. You set a nitrogen-fixer this year where next year's hungry crop will stand. You move a shrub two feet into the light that actually falls there rather than the light you imagined. The work climbs, from keeping the garden alive toward making every part of it succeed more fully at once, the soil and the water and the light and the living community all pulling in the direction you have chosen. And the garden is never finished, which is not a failure of the garden but the whole of its nature. The ledger of it runs backward from most things you build: the early years are nearly all deposit and little return, and the mature years hand back far more than you ever put in.

The most ambitious gardens of all sit inside the most deliberate boundaries. The walled kitchen garden was never a cage. The south-facing brick hoards the day's heat and gives it back at night, and against it a peach or a fig will ripen that would never set fruit in the open ground a few feet away. The cold frame buys a month at each end of the year. The espalier folds a tree flat against the warmth and doubles what it bears. None of this is a constraint the garden suffers. It is the condition that lets the garden exceed itself. The romance of wild, unbounded growth has it exactly backward: nothing of value flourishes by being left unbounded, and the same vigor that runs to bramble in an open field becomes a wall of fruit when something deliberate is set around it. The wall is not a restriction. It is a climate you chose.

The same systems will serve almost any end you ask of them. The same soil, the same water and light, the same living balance, arranged one way becomes a garden built for scent that thickens as the sun goes down, arranged another way a plot measured honestly in pounds per bed, arranged another way a border that exists for no reason anyone can defend except that it is good to stand in front of. The systems do not prefer one over another. They will pour the same vigor into beauty or into fruit or into bramble, with the same indifference.

A garden will grow with all the vigor it has toward whatever you point it at, and toward nothing in particular, toward bramble and bindweed, if you point it nowhere. The more a garden can run on its own, the less it needs you as a pair of hands and the more it needs you as a source of judgment, and judgment, in the end, narrows to a single question the garden can never answer for itself. The soil cannot tell you what the garden is for, and neither can the seasons, and no depth of craft will answer it for you, because it was never a question of craft. Deciding what the garden is for, and standing behind that decision through the wet summer and the late frost and your own honest mistakes, is the one task that is never handed off to the soil. It belongs to the gardener, first and last, and it is the most human thing in the whole enterprise. Everything else, given time and knowledge and patience, the garden can be made to do for itself.

That one thing it leaves entirely to you.

The Silent Governor

2026-05-05T00:00:00Z

Scarcity has been the silent governor inside most large organizations. Nobody installed it, nobody maintained it, and almost nobody noticed it was there.

For most of management history, what an organization did not do was decided less by deliberate choice than by what it could not get to. The opportunity that could not be staffed did not need to be rejected. The product line that could not be scoped did not need to be debated. The market that could not be analyzed did not need to be prioritized. Bandwidth quietly governed the agenda, and strategy was, in practice, the residue of what couldn't be reached.

This was not always irrational. Scarcity can impose discipline, and constraints can force focus. But it also allowed organizations to avoid naming the tradeoffs they were making. Senior leaders complained about this for decades. They said they wanted to be doing the work of choosing, of setting direction, of making the calls that only they could make. Instead they spent their days pushing work through the organization, reviewing what came back, and explaining the things that never got done. The job they wanted was buried under the job they had, and the burial was so complete that most of them stopped distinguishing between the two.

Agents are starting to take the governor off.

When the cost of generating a credible option falls far enough, the things scarcity used to handle on your behalf come back to your desk. The road map is drafted. The competitive analysis arrives. The pricing exercise is done. And four adjacent opportunities are sitting in front of you, each scoped enough that a reasonable person could argue for any of them. The question stops being \"can we do this?\" and becomes \"should we, and instead of what?\"

That is a different kind of work, and it is the work most leaders said they wanted all along. The complaint was always that operational drag prevented strategy from happening. Agents are testing whether that was true. The easier it becomes to surface plausible options, the harder it becomes to avoid choosing among them. Bandwidth is no longer available as the answer.

This is the part that takes some adjustment. The leaders who already had a queue of \"if only we had time\" ideas are about to discover what leverage feels like. The ones who relied, perhaps without naming it, on scarcity to say no for them are about to find out what they actually believe. There is no operational excuse left to hide inside. The options are scoped. The analysis is done. The decision is yours.

The interesting question is not whether agents will make organizations faster. They will. The interesting question is what your organization will do once the governor is off. The scarcity of first-pass work is going away. What remains is the work leadership always claimed it wanted: choosing what matters, saying no to plausible things, and standing behind the direction it sets.

Beautiful Tension

2026-04-28T00:00:00Z

Organizations are built to seek equilibrium.

That is not a weakness. It is one of the reasons they exist. A young company can survive on force of will and proximity, with everyone close enough to the work to feel the same pressures at the same time. A large organization cannot. At scale, coordination becomes the work. Process accumulates because memory is needed. Governance appears because variance becomes expensive. Standards form because customers, employees, regulators, and partners need to trust that the system will behave in recognizable ways.

For most of the modern management era, this has been the mark of maturity. A successful organization finds a working model and learns to repeat it. It reduces unnecessary variance. It turns judgment into method, method into process, and process into habit. Done well, this is not bureaucracy in the pejorative sense. It is institutional intelligence: the way a company converts hard-won lessons into durable capability.

The difficulty is that equilibrium is not the same as health. The two can feel identical from inside the organization and diverge completely in the market.

Dissipative structures

In 1977, Ilya Prigogine won the Nobel Prize in Chemistry for his work on systems that organize themselves far from equilibrium. The canonical example is a thin layer of fluid heated from below. At low temperatures, the molecules move randomly. Raise the gradient past a certain threshold and something remarkable happens: the fluid spontaneously organizes into hexagonal convection cells. Order emerges, not in spite of the disturbance, but because of it. The structure is sustained by the flow of energy through the system. Turn off the heat and the pattern vanishes.

Prigogine called these dissipative structures. Life is the most elaborate example. A cell maintains its organization by continuously metabolizing the gradient between itself and its environment. A tornado, a flame, a coral reef, a city: each exists in the same narrow regime, far enough from equilibrium to organize, not so far that it tears itself apart. Each is a pattern held in place by motion.

The conceptual shift this offers leadership is small but load-bearing. It separates two ideas we usually conflate: order and equilibrium. Some kinds of order are static, produced by reducing motion. Other kinds of order are dynamic, produced by regulating it. A bicycle is stable because it is moving. A surfer is stable because they are continuously responding to the wave. A living organization is coherent not because it has frozen itself in place, but because it is continually generating new structure from the energy passing through it.

That distinction is becoming central because AI is pushing many institutions away from equilibrium faster than they can comfortably admit, and the instinct to restabilize is no longer reliably correct.

Consider what is actually happening beneath the surface of the current moment. The economics of expertise are being rewritten, which means the economics of organizational structure are being rewritten too. A great deal of the coordination overhead in a large company exists because specialized knowledge is scarce and expensive to apply, so we build layers of review, handoff, and synthesis around each expert. When the marginal cost of applied expertise falls sharply, those layers stop being scaffolding and start being drag. At the same time, the cadence of the underlying technology has compressed to a point where some important architectural assumptions now change faster than the planning cycles most enterprises use to govern them. The model you standardized on last quarter may not be the model you would choose this quarter, and the model you would choose this quarter may not be the one you would choose next. This is not a complaint about the pace of change. It is an observation about the mismatch between two timescales: the one your roadmap operates on and the one the frontier now moves on.

The old equilibrium may still function. That is part of the danger. It may still produce revenue, sustain careers, satisfy committees, and generate plans that look sensible from inside the current model. Decline rarely announces itself by making everything fail at once. More often, the existing system keeps working just well enough to defend itself. It remains coherent, legible, and familiar, even as the environment begins to demand something else.

Large organizations respond to this kind of disturbance through their immune systems. A new behavior appears and the organization asks whether it has been approved. A new team forms and the organization asks where it sits. A new method gains traction and the organization asks how it will scale. These are not foolish questions. In a scaled enterprise, uncontrolled change can confuse customers, duplicate work, fragment architecture, and exhaust people. The antibodies are not evidence that the organization is broken. They are evidence that the organization once learned how to survive.

The harder question is whether the immune system can tell the difference between infection and adaptation. When the environment is stable, suppressing difference protects the organism. When the environment is changing, suppressing difference may prevent the organism from becoming what survival now requires. The future usually enters an organization looking unauthorized. It does not arrive with the right committee structure, the approved taxonomy, the complete business case, and the operating model already settled. It arrives as pressure, anomaly, experiment, frustration, and inconvenient evidence.

Controlled disequilibrium

Controlled disequilibrium is the leadership discipline of holding an organization far enough from equilibrium to adapt, but not so far that it loses coherence. It is not chaos. It is not transformation theater. It is not a preference for disruption as an aesthetic. It is the deliberate regulation of productive tension so that a system can reorganize before the environment forces it to. The organization is not trying to stop the flow. It is trying to shape the structures the flow creates.

The word \"controlled\" matters because trust still matters. Customers still need reliability. Employees still need orientation. Regulators still need evidence. Disequilibrium without control becomes thrash, and thrash is its own form of paralysis: priorities change too quickly, teams lose confidence, and every experiment feels like another tax on attention. People stop learning because they are too busy bracing.

The word \"disequilibrium\" matters because control without movement becomes a different kind of paralysis. The organization learns to describe transformation better than it can perform it. It creates governance around the work rather than changing the work itself. It confuses alignment for progress and progress for the continued production of artifacts that make the current system feel in command. The opposite of controlled disequilibrium is not stability. It is either paralysis or thrash. One side overprotects the past. The other burns through trust.

The discipline is uncomfortable because it asks organizations to treat some of their most reassuring instincts with suspicion. A mature organization wants to know the plan. It wants to define the end state, assign owners, align stakeholders, set milestones, manage dependencies, and reduce uncertainty. Those disciplines remain useful, but they are insufficient when the work itself is discovery. The plan cannot simply be a route to a known destination. It has to become a way of learning fast enough to discover what the destination now requires.

In a stable environment, control can live comfortably in process. The organization knows enough about the work to prescribe the approved way to do it, then manage compliance against that way. In a dynamic environment, control has to move upward into clear principles and downward into fast instrumentation. People need to know the intent, the boundaries, and the tradeoffs when leaders are not in the room. The system needs to make movement visible enough that leaders can see what is being learned, what customers are experiencing, and which patterns deserve to become structure. Control cannot mean slowing everything down until it resembles the past. It has to mean making motion legible.

In practice, this looks less dramatic than the word disequilibrium suggests. It is the team that is allowed to change its technology choice mid-project because a better option emerged, without having to relitigate the original business case. It is the product experience that ships in a deliberately provisional form so that the organization learns what customers actually do with it, rather than what a committee predicted they would. It is the leader who declines to resolve a disagreement between two capable teams prematurely, because the disagreement is generating information the organization needs. It is the willingness to let a small part of the operating model look inconsistent for a while, because that inconsistency may be where the next version is being worked out. None of this is chaos. All of it is deliberate. The common thread is that the organization is choosing, in specific places, to let the system remain unresolved a little longer than comfort would prefer.

This is where many transformations go wrong. They try to produce agility by installing the visible rituals of movement: squads, standups, demos, sprints, incubators, transformation offices, steering groups, innovation portfolios. Some of these mechanisms are useful. None of them are agility. They are choreography. Agility is not the presence of motion. It is the capacity to reconfigure without losing coherence, and that capacity is emergent. It shows up when activity creates contact with reality, contact with reality creates learning, learning forces reconfiguration, and reconfiguration creates a new order that makes the next movement easier. You cannot install it. You can only create the conditions under which it appears.

The path to that new order often looks inefficient while it is happening. Teams may start something and then stop. A promising approach may fail. A project may return to the start line after weeks of effort. In a traditional operating model, these moments are embarrassing. They look like waste, indecision, or weak sponsorship. Once a project has a name, a budget, a steering committee, and a launch date, the institution starts to defend continuity as if continuity itself were the goal.

Going backwards is not the same as going nowhere

Sometimes returning to the start line is the act that preserves speed because it prevents the organization from compounding on a false premise. The point is not to celebrate failure. Wasted motion is still waste. The point is to distinguish a failed attempt from a failed direction. The ambition may remain entirely correct even when the chosen path proves wrong. Without that distinction, people protect the path in order to protect the ambition. They continue funding the wrong thing because stopping would appear to question the seriousness of the goal. Controlled disequilibrium allows a more mature posture: the vision can be durable while the route remains provisional.

None of this means every part of the organization should be equally unsettled. Some systems should remain highly stable. Security, compliance, financial controls, customer commitments, and core operational reliability may need even more discipline as the rest of the organization moves faster. Static stability is appropriate where variance creates harm without creating useful learning. Dynamic stability is required where learning is the work. Confusing the two creates predictable failure. If everything is locked down, the organization cannot adapt. If everything is in flux, the organization cannot be trusted. The mature version of controlled disequilibrium is differentiated movement: knowing where to hold firm, where to loosen, where to accelerate, and where to let new patterns prove themselves before they are scaled.

This also changes what it means to scale. Traditional scaling takes a successful pattern and reproduces it with minimal variation. That works when the pattern is well understood and the context is stable. In a more dynamic environment, scaling too early freezes the wrong answer. The organization takes an emerging practice, wraps it in process, generalizes it before it is ready, and then wonders why the energy disappears. The antibodies call this maturity. The market calls it delay. A more adaptive organization treats scaling as the conversion of learning into reusable capability, not the replication of surface behavior. Scale is not the enemy of agility. Poorly timed standardization is.

The goal is not permanent disequilibrium. No one wants to live forever in a state of unresolved transformation. The point of disequilibrium is to generate a better order, not to romanticize instability. The organization moves away from equilibrium so that it can discover a form of order better suited to the world it now inhabits, and then it holds that new order lightly enough to move again.

The most capable organizations in this era will therefore not be the most stable in the traditional sense, nor will they be the most chaotic. They will be the ones able to create new order while in motion. They will have enough continuity to maintain trust and enough movement to remain alive to reality. They will know that calm can be a symptom of health, and it can also be a symptom of avoidance. Leaders have to create safety without creating comfort. They have to create urgency without creating panic. They have to protect the ambition while allowing the path to change. They have to honor the accumulated wisdom of the institution without allowing that wisdom to become a veto over the future.

Tension is the transformation.

The tension is not a defect in transformation. The tension is the transformation.

AI makes this more urgent because it is not simply another technology wave to be adopted at the edge of the existing operating model. It reaches into the assumptions beneath the model: what expertise costs, how software is created, how customers are served, how decisions are supported, how quickly a capable organization can turn intent into action. When assumptions at that level begin to move, equilibrium becomes provisional. The organization can either defend its current shape or learn how to survive becoming something else.

That is the real discipline. Not disruption for its own sake. Not stability for its own comfort. Not innovation as theater, and not governance as refuge. The discipline is to hold the organization in a state where learning can become structure before the old structure becomes denial. That may be the defining organizational capability of this era: not the ability to preserve equilibrium, or even to predict the next one, but the ability to generate order while in motion.

Occam's Inversion

2026-04-14T00:00:00Z

Last week, the best AI system in the world could autonomously resolve about 81% of real-world software engineering tasks. This week, a new model hit 93.9%. Same evaluation, same problems, no change to the test. If you have been tracking AI progress as a smooth curve, that number should stop you. What is arriving now, across coding, reasoning, cybersecurity, and autonomous problem-solving, is a series of discontinuous jumps, each one resetting assumptions that were reasonable the month before.

Most people carry a mental model of AI progress that goes something like this: more data, bigger model, better results, diminishing returns, plateau. A recent paper from AI researchers Alessandro Achille and Stefano Soatto provides a formal framework for why that model no longer holds. Their argument is that we have crossed from systems that recognize patterns to systems that extract transferable reasoning structure from experience.

Consider a codebreaker. A pattern-matching approach studies thousands of past decryptions and memorizes which substitutions tend to appear. A structural approach is different: you learn that certain letters appear with predictable frequency in any language, that doubled letters constrain possibilities, that word boundaries create exploitable regularities. You extract principles that compress the search space for any cipher, including ones built on methods you have never encountered. The first approach improves linearly with more examples. The second compounds.

The \"jagged\" frontier of AI, where models are breathtakingly good at some tasks and oddly poor at adjacent ones, is the signature of this kind of learning: uneven but deep, closing gaps in bursts as structural insights land in specific domains. You cannot predict which gap closes next. You can only observe that the gaps are closing faster than anyone expected, and that each closure opens capabilities that were not on anyone's roadmap.

The underlying capability curve may be continuous at the level of training compute. But the experience of it is not. When a system extracts a structural insight that transfers across problems, the result is not gradual improvement. It is a phase change. Capabilities that were absent on Tuesday are present on Wednesday. A model that could not reliably plan a multi-step software fix last week can now resolve it autonomously. That is a step function, and the 81-to-94 jump is what one looks like from the outside. The organizational implications follow from the experience, not the math.

The Achille and Soatto paper contains a counterintuitive inversion. Classical machine learning theory, rooted in Occam's Razor, holds that simpler models generalize better. For pattern matching, this is true. For reasoning, the relationship reverses: the more complex the domain, the more structural learning pays off. Simple problems offer little transferable structure to extract. Complex ones are rich with it.

The hardest, messiest, most irreducibly complex problems are precisely where AI reasoning is improving fastest. The ones organizations have labeled \"too hard for AI\" or sheltered behind as a reason for measured caution are the ones most exposed to rapid gains. The complexity is not a shield. It is an accelerant.

And this is where something has shifted that most organizations have not yet absorbed. For years, the binding constraint on AI's impact was technical capability. The models were not good enough, not reliable enough, not versatile enough. That constraint is loosening rapidly. The binding constraint now, for most organizations, is the capacity to adapt. AI progress is increasingly gated not by what the technology can do but by how quickly institutions can absorb what it already does.

Posture

If you are leading an organization through this, the question is not whether the step functions will continue. The question is what posture you hold when they arrive.

The first posture is ahead. These organizations built internal AI capability before the latest jump forced the question. They ran real workloads against real models, learned where the tools failed, and developed institutional judgment about what to trust and what to verify. Each step function extends their lead because they have already paid the cost of learning. They are not predicting the next jump. They are running experiments that will tell them what the next jump means for their business within days of its arrival, not months.

The second is behind. Each step function arrives as a shock, triggering a scramble to understand its implications and mount a response. By the time that response is mobilized, the ground has shifted again. Behind is not a position you choose. It is a position you drift into by treating AI as a project rather than a posture. It produces turbulence, zigging when you should be zagging, made worse by the fact that most organizations are structurally slower than the rate of change requires.

The third is structurally resistant, and it is not a harder version of being behind. It is a different problem entirely. Regulated industries, large enterprises with deep governance, public sector institutions: these organizations are not slow by accident. They are slow by design. Their regulatory frameworks, risk cultures, approval processes, and procurement cycles were built to prevent rapid change, because in the world they were designed for, rapid change was synonymous with risk. The organizations that are behind face a velocity problem. They need to move faster. The structurally resistant face a physics problem. The institutional forces actively oppose the motion they need. You cannot accelerate a system whose design function is to resist acceleration. You have to change what the system is optimizing for. And the longer that takes, the wider the gap grows between what the technology makes possible and what the institution can act on.

Here is where everything in this piece converges. The Occam's inversion tells us that the most complex domains are the ones where AI reasoning will advance most dramatically. The structurally resistant organizations tend to operate in those exact domains: healthcare, financial services, legal, critical infrastructure, government. The institutions least equipped to absorb discontinuous change are the ones most exposed to it. And the very complexity they have used to justify caution is the reason the step functions heading their way will be among the largest.

Were we wrong or just early? Yes

Some will ask whether they got it wrong. Whether the strategies and investments set in the last cycle were mistakes. Yes and no. Some of those bets are already stale, because nobody can predict what a step function looks like before it arrives. But the bets that hold are the ones that were never about a specific capability in the first place: building the muscle to evaluate new tools fast, keeping decision loops short, developing people who can distinguish signal from noise at the frontier. A strategy built on proximity to the frontier holds across discontinuities. A strategy built on a specific snapshot of what AI can do has a shelf life measured in months.

The instinct in large organizations facing this uncertainty is to standardize: pick an approach, codify it, enforce it, call it scale. In a world of continuous change, that works. In a world of step functions, it is scaling the wrong thing. The pilot that worked last quarter may be irrelevant after the next jump. The vendor you standardized on may be leapfrogged by a model that did not exist when you signed the contract. Scaling a solution across discontinuities produces organizations that have AI everywhere and insight nowhere. The alternative is to scale the capacity for judgment instead: the ability to evaluate, adopt, and abandon tools at the speed the technology demands.

The organizations that navigate this well will not be the ones that predicted the right curve. They will be the ones that built for a world where the ground shifts without warning, repeatedly, and each shift carries more consequence than the last. In a step-function world, the advantage belongs to those who can stand on new ground the fastest.

Dead Reckoning

2026-03-17T00:00:00Z

Before GPS, before satellite navigation, before any fixed reference point you could trust, sailors estimated their position the hard way. You logged your starting point, tracked your heading, measured your speed, and noted every change. No single observation told you where you were. The position emerged from the accumulated record: each small entry individually uncertain, collectively reliable enough to cross an ocean. You committed to a best estimate, acted on it, and updated when new information arrived. The method had a name: dead reckoning.

It worked not because it was precise, but because it was disciplined. The navigator who waited for certainty before committing to a position was the navigator who never arrived. The one who tracked the heading, logged the small signals, and revised continuously was the one who arrived.

This is how transformative capabilities arrive: not a single moment you can point to, but hundreds of small updates and quiet capability jumps that most people process as noise. A model that handles a longer context window. A benchmark quietly crossed. A coding assistant that stops feeling like a toy and starts feeling like a colleague. None of these moments feel like history. All of them are.

The debate about what to call where all of this is heading is real and worthwhile, and it won't be settled anytime soon, which is fine, because it never really is. Superintelligence. AGI. Transformative AI. Pick your term: each has advocates, each has detractors, and none has a locked definition. Did we ever decide what \"the cloud\" was? Do we agree on what a photograph is now that every phone applies computational processing before the shutter sound even fires, and AI afterwards? Definitions are hard to pin down at the best of times, and significantly harder when the thing being defined keeps changing. That doesn't make the exercise pointless. Taking a swing at a definition is useful: it forces precision, surfaces assumptions, and gives you something to update as the evidence changes.

But a settled definition was never the prerequisite for navigation. The prerequisite is reading the small signals honestly, logging them, and asking what heading they imply.

Right now, three separate trails of breadcrumbs are converging. Here's what they are, where they've come from, and why their intersection is worth tracking.

The fuel is changing

For most of AI's recent history, the binding constraint on what models could do was data: you needed more of it, better labeled, more carefully curated. Compute mattered, but data was the ceiling. That relationship is inverting.

One concrete signal of this: a research project called NanoGPT Slowrun runs a deliberately inverted benchmark. The data budget is fixed (100 million tokens, no more) and compute is unlimited. Contributors compete to extract the most from that fixed supply, with no time or hardware constraints. Within days of launch, community contributions had pushed effective data efficiency to 5.5 times the baseline. The researchers estimate 10 times is reachable soon, and 100 times within a year. Whether those numbers land precisely is secondary. What matters is the revealed dynamic: flip the constraint, and progress accelerates in a single direction.

This is not an isolated project. Researchers across the field are increasingly designing for a world of practically infinite compute and scarce data, building learning algorithms that extract far more signal from far fewer examples. The results are arriving faster than most forecasts implied, compounding through open community contributions in ways that traditional lab-bound research doesn't. Targets that seemed distant at the start of a week look reachable by its end.

This matters because once compute becomes the primary input rather than data, the ceiling lifts in a qualitatively different way. Data is bounded by what humans have already done. Compute compounds on itself, through hardware improvements, capital investment, and algorithmic gains that stack on each generation of both. One has a far higher ceiling, and only one gets reliably cheaper each year.

Generality is a side effect, not a design goal

The second trail follows directly from the first. When you build models in compute-constrained rather than data-constrained regimes, something unexpected happens: generality increasingly falls out as a consequence of scale, not as something anyone specified.

The clearest recent illustration comes from Standard Intelligence's FDM-1 project. Researchers trained the model on eleven million hours of internet video: screen recordings, coding streams, design sessions, the accumulated digital behavior of people working at computers. The goal was computer use. The result was something harder to categorize. The model learned to navigate complex software, to execute long sequences of actions across multi-hour workflows, to do basic CAD modeling and security testing. So far, that's impressive but bounded. Then the team fine-tuned it on less than one hour of driving data, and, in a constrained demonstration under supervision, it drove a car around a block in San Francisco.

Nobody designed a self-driving feature. Nobody included driving in the training objective. A model trained on screen recordings crossed a domain boundary that wasn't in the specification and transferred. That's not a benchmark result. That's a different kind of thing.

This is the pattern that keeps recurring as models move from data-constrained to compute constrained development. Capability trained in one domain generalizes to another without anyone planning for it. The models aren't being designed to be general. They're becoming general because generality is what falls out when you scale compute against a sufficiently broad input.

General models are proving useful in the real world

The third trail is the one most people are already standing on, even if they don't think of it that way. General models are proving useful not in research environments, not on benchmarks, but in the actual work of people who have real things to get done.

Claude Code is the most visible current example: not a coding assistant that autocompletes lines, but a system that takes a goal and pursues it autonomously across a codebase, making decisions, running tests, correcting itself. General computer use is following the same trajectory, with models that can navigate interfaces, execute workflows, and complete tasks that until very recently required a human to be present at a keyboard. These aren't demos. They're in production, handling real work, and the rate of improvement is not slowing down.

The significance of this third trail is easy to underestimate. Research breadcrumbs are interesting; useful breadcrumbs compound. Once a general capability proves itself in real work, it attracts investment, use, feedback, and iteration. The loop tightens. What's useful today becomes the baseline for what's expected tomorrow, and the gap between expectation and capability closes faster than the previous cycle.

Three lines, one direction

What makes the current moment different from earlier periods of AI progress is not any single one of these trails. It's that they're converging.

The resource constraint shifting to compute means the fuel supply is no longer the binding limit. The emergence of generality from compute-bound training means the models being built are increasingly capable across domains rather than within them. And the proof of usefulness in real work means the feedback loop has closed: these capabilities are now being stress-tested against actual complexity, not synthetic benchmarks, and they're holding up.

When three independent lines of progress reinforce each other, the pace of the combined movement is not additive. Each trail makes the others more productive. More efficient training produces more general models, faster. More general models mean broader usefulness. Broader usefulness means sharper feedback about where the limits actually are, which feeds back into training. The breadcrumbs are coming faster, and they're no longer running in parallel.

None of this guarantees smooth progress. Real bottlenecks remain: in alignment, in energy infrastructure, in hardware supply chains, in the governance frameworks that don't yet exist, and in the organizational absorption rates of companies and institutions that are still figuring out what to do with the capabilities already in their hands. These are not trivial constraints. But none of them currently appear to be tightening faster than the underlying capabilities are improving. The heading holds, even if the speed is uncertain.

Which brings us back to the question of what to call the destination. If you define superintelligence as systems capable of outperforming humans across most cognitively valuable tasks, not a perfect definition but a working one, then the three trails described here are pointing toward it, not arriving yet, but pointing unmistakably. The distance is genuinely unknown. But dead reckoning doesn't require knowing the distance. It requires honest position estimates, regular updates, and the discipline to act on your best current read rather than waiting for the landmark that may not appear until you're already there.

Most people are experiencing this as a series of product updates. An assistant that writes better than it did six months ago. A tool that can hold a research thread across an afternoon. The improvements feel incremental because they arrive incrementally, which is exactly how structural shifts always feel from inside them. The sailors who first mastered dead reckoning didn't experience a revolution in navigation. They just kept logging the same small signals, updating their position, and arriving where they intended to go, until one day the method had quietly crossed an ocean, and the world was a different size than it used to be.

The lines are no longer parallel. They're converging. The discipline is the same as it always was: log the signals, track the heading, update your estimate. Dead reckoning assumes a navigator distinct from the vessel. For now, we are still the ones doing the reckoning. The navigator who waits for certainty doesn't avoid risk. He compounds it.

Seasons of Exceptions

2026-03-10T00:00:00Z

If you were close to cloud technology fifteen years ago, you lived through several seasons of exceptions.

The cloud is less sustainable. The cloud has worse total cost of ownership. The cloud is less secure. The cloud cannot scale. Each of these was presented not as an opinion but as a finding, backed by papers and panels and the kind of certainty that comes from measuring the present and projecting it forward in a straight line.

And each of them had basis. They were rooted in decades of hard-won understanding about how infrastructure worked: how it was priced, how it was deployed, how it was secured, how it scaled. That understanding was not wrong. It was the product of experience, and the people who held it were acting in good faith. But it assumed constraints that belonged to the old model. On-premises infrastructure had physical boundaries, capacity ceilings, procurement cycles, and security perimeters that shaped how every CTO and CIO thought about what was possible. When cloud computing arrived, the evaluation happened through that existing lens. And through that lens, the exceptions looked like facts.

In the best cases, these were blind spots. We did not think about what elastic pricing would do to total cost of ownership once workloads were designed for it rather than migrated to it. We did not anticipate that cloud infrastructure would eventually enable security investments that no single organization could justify alone. In the worst cases, they were something more stubborn: things we knew for certain that turned out not to be true.

The exceptions were seasonal. They arrived with conviction, held the room for a while, and then quietly fell away as the technology matured, as operational models adapted, as the sheer weight of investment and iteration closed the gaps. Efficiency improved. Cost models sharpened. Security postures in the cloud surpassed what most organizations could achieve on their own. Scale became normal.

Today, nobody serious argues that the cloud is fundamentally less secure or less sustainable or worse on total cost of ownership. You can find pockets where it is true. Specific workloads, specific configurations, specific contexts where on-premises might be workable. But those are third and fourth standard deviation cases. The exceptions prove the rule precisely because they exist only at the edges.

Same pattern, different lens

The organizations who succeeded were not the ones who waited for every exception to be fully resolved. They were the ones who recognized the trajectory early enough to build for where it was going. The skeptical had defensible positions. The cynical had valid concerns. And the teams that moved anyway, not recklessly but with conviction, captured value that the wait-and-see crowd never recovered. By the time the exceptions were settled, the advantage had already been allocated.

We are now in that same moment with AI.

The current season of exceptions sounds familiar, but the exceptions come from a different place. With the cloud, objections were about infrastructure: cost, security, scale, sustainability. They were rooted in how we thought about machines and systems and operations. With AI, the objections are about intelligence itself. AI cannot be creative. AI cannot exercise judgment. AI cannot be empathetic. AI cannot be trusted. These are not claims about infrastructure. They are claims about capabilities we consider to be fundamentally, perhaps uniquely, human.

I am not someone who finds it helpful to anthropomorphize technology, particularly in an enterprise context. But I do not think you can understand what is happening with AI adoption unless you recognize that the lens through which we evaluate it is increasingly the same lens through which we evaluate each other. When someone says AI cannot be creative, they are not making a technical claim about token prediction. They are drawing a bright line around what they believe is innately human, and declaring that no machine will cross it.

This is a different kind of exception than \"the cloud is too expensive.\" It runs deeper. It feels more permanent. And that is precisely what makes it more dangerous to build strategy around.

Bright lines are fading

These bright lines follow the same pattern as the infrastructure exceptions, just from a more personal starting point. They are rooted in a lifetime of experience with the only intelligence we have ever known: our own and each other's. That understanding is not wrong. But it assumes constraints that belong to a particular kind of intelligence, and those constraints do not necessarily transfer to a new one.

The creativity objection assumes that recombination at scale cannot produce novelty. The judgment objection assumes that pattern recognition across thousands of cases cannot approximate contextual reasoning. The empathy objection assumes that a system which does not feel cannot meaningfully help someone who does. Each of these is a thing we know for sure. And the cost of being wrong is not that you tried something that failed. It is that you waited while others did not.

We are already watching the exceptions ease. AI systems are producing creative outputs that professional creatives find genuinely surprising, not because the AI understands what it is doing, but because the combinatorial space it explores is larger than what any individual could traverse alone. Businesses are using AI for judgment calls in underwriting, in triage, in strategic scenario planning, not because the AI replaces human judgment but because it introduces a consistency and breadth that human judgment alone cannot sustain at scale. Mental health support is the fastest growing use case for AI assistants in the consumer world, which tells you something about empathy that is worth sitting with: millions of people are choosing to talk to AI about their most personal struggles, and reporting that they feel heard. You can debate whether that is empathy. You cannot ignore that millions of people are finding value in it.

The trust story

The remaining exception is probably trust. But even that is shifting, and I want to tell you a story about how.

A year ago, I attended an industry event. The AI keynote was whimsical. It was a look at the funny robot falling over, a curated reel of AI getting things charmingly wrong. The audience laughed. The implicit message was clear: this is interesting, this is entertaining, this is not yet something you would rely on for anything that matters.

This year, the same organization presented analysis produced by Claude. Not with an apology. With authority. The framing was not \"here is what an AI said, take it with a grain of salt.\" The framing was \"this analysis is worth reading because it came from AI.\" The provenance was the credential, not the caveat.

Now, you can argue that the first presentation was late. That it underplayed value that was already obvious, that it chose comedy when the audience needed clarity. And you can argue that the second presentation was early. That the CEO was too bullish, that AI still makes factual errors, that presenting AI-generated analysis with authority carries risk. Both of those criticisms have merit. But even at the extremes of those two positions, the change in trajectory is unmistakable. In twelve months, the default posture moved from amusement to reliance. From \"isn't that cute\" to \"we should pay attention to this.\"

That shift is not contained to your organization. Your clients' defaults are moving too. So are your competitors'. The organizations that move early on AI are not just improving their own operations. They are becoming the place where their clients' attention lands first. The organizations that wait are watching that attention move elsewhere.

Where trust goes, value follows

And where trust shifts, value follows. It always has. And as it did with cloud, that behavioral trust will formalize — in policy, in governance, in standards — long before every objection is resolved.

Trust in cloud security shifted, and the organizations that had already built cloud-native capabilities captured disproportionate value. Trust in mobile commerce shifted, and the companies that had already invested in mobile-first experiences pulled ahead. In both cases, the winners were not the ones who were right about every exception. They were the ones who were right about the direction, and who moved before the exceptions were fully settled.

The same window is open now. The bright lines that feel most permanent — creativity, judgment, empathy, trust — are easing on the same timeline the infrastructure objections did. The fact that they feel more personal, more fundamental, more tied to what it means to be human, does not make them more durable. If anything, it makes them harder to see clearly, because we are not evaluating a technology. We are evaluating a challenge to our assumptions about ourselves.

The seasons of exceptions are ending. Not all at once, and not uniformly. There will be pockets where the exceptions hold, specific domains, specific tasks, specific contexts where AI falls short in ways that matter. Those pockets will exist for years. They will be cited by people who need them to be true.

But the trajectory is clear. And the question is the same one it was with cloud, with mobile, with every platform shift that came before: are you building for the world where the exceptions still hold, or for the world where they no longer do? The skeptics were right about the details last time too. The shift happened anyway.

The Unbearable Burden of Being Right

2026-03-03T00:00:00Z

I tried to use the most advanced AI capability of 2026 on the most advanced device I own, and got told to go use my laptop.

Perplexity announced \"Computer Use\" this week, joining Anthropic, OpenClaw, Standard Intelligence, and others in a race to build AI agents that can operate your computer on your behalf. Not just answer questions or generate text, but actually click buttons, fill out forms, navigate between applications, move files around. The kind of thing that turns AI from a conversational partner into a coworker who can take things off your plate.

So naturally I pulled it up on my phone. And there it was, floating over a meadow rendered in the style of a screensaver from 2004: \"Coming to mobile soon. Currently only available on desktop.\"

I laughed. Then I thought about it for a while.

Coming Soon

AI already does useful things on your phone. It transcribes your meetings, summarizes your emails, generates images, answers questions. What it cannot do is act autonomously across applications on your behalf. It cannot see your screen, decide what to click, open a different app, pull information from one place and put it in another. That kind of cross-application agency is what computer use is, and it is the capability that only works on the desktop.

The phone is the computer that won. It is the device that most people on earth use for most of their computing, most of the time. It is the thing the entire technology industry spent the last fifteen years optimizing for, building around, designing toward. And it is now, by architecture and by intention, the device least prepared for autonomous AI.

This is not a failure. That is what makes it interesting.

The reason computer use agents work on the desktop and not on your phone is the sandbox. Mobile operating systems were designed from the ground up to keep applications isolated from each other. Each app lives in its own container, unable to see what other apps are doing, unable to reach into the file system or manipulate another application's interface. This is why your phone does not get viruses the way your laptop does, why a rogue app cannot reach into your banking app, and why billions of people trust a pocket-sized device with their entire lives.

These were engineering and product choices that created security and trust at a scale no computing platform had achieved before. And they are the reason that autonomous AI agents cannot operate where most computing actually happens.

On a desktop, an AI agent can see the screen, interpret what is there, decide what to click, and execute that action across whatever application is relevant. This is possible because the operating system was never designed to prevent it. Desktop operating systems grew up in an era where interoperability between applications was a feature, not a threat. The boundaries are porous. The file system is shared. One application can, for better and worse, reach into the world of another.

Mobile took the opposite lesson from the same history. The chaos and insecurity of the desktop era was the problem that mobile set out to solve. And it solved it. Completely.

So here we are. The platform that is least secure, least controlled, and least modern is the one where the future works. The platform that represents a decade of disciplined, correct, compounding decisions is the one showing you a landing page that says \"coming soon.\"

Mechanisms

This is not just a technology story. It is a structural pattern, and once you see it, you find it everywhere.

Here is the mechanism. Every good decision, if it works, gets built upon. The next decision assumes the first one will persist. The decision after that assumes both will. Over time, what began as a series of individual choices becomes architecture. And architecture has a property that individual decisions do not: it becomes load-bearing. The sandbox was a design choice. Then it became an API surface that developers built against. Then a revenue model that depends on the app store as the single point of entry. Each layer was added because the previous layer was sound. Each layer made the previous layer harder to revisit.

This is how local optimization creates global inflexibility. Each decision, viewed on its own, solves the problem in front of it and makes the immediate system better. The accumulation of those decisions, each one correct, produces a global architecture that is optimized for a context that may no longer apply. The system is not broken. It is locally optimal everywhere and globally stuck.

The economics make this worse. The sandbox is not just a security architecture. It is an economic architecture. App stores generates revenue. The permission model creates liability protection. When you ask why a phone maker does not simply open the sandbox for AI agents, part of the answer is security, but part of the answer is that every opened boundary is a revenue surface that becomes contestable and a liability perimeter that becomes ambiguous. The same is true inside any organization. Governance is never just about quality. It is about who approves, who bears risk, and who is exposed when something goes wrong. Incentives and architecture become mutually reinforcing, and the combination is what gives the structure its weight.

Once enough people and systems and economics depend on a structure, it develops its own center of mass. The engineers who built it have careers invested in maintaining it. The developers who designed around its constraints have codebases that assume it. The finance teams who model revenue have projections that depend on it. None of these people are being obstructionist. They are being rational. The structure rewards them for preserving it and penalizes them for questioning it. And by the time the external context shifts enough that the architecture needs to evolve, the internal context has organized itself entirely around the architecture as it stands.

This is the trap. Not incompetence, not resistance to innovation, not a failure of vision. The trap is that good decisions, made well and built upon faithfully, become the geology of an organization. Governance accumulates like sediment, each layer deposited by a rational process, and over time the strata harden into something that no strategy memo can reorganize.

And there is a timing problem that makes this particular moment especially difficult. Institutions compound safety over decades. AI compounds capability in months. The structures that organizations built to manage risk were designed for a world where the external environment moved at roughly the same pace as the internal one. That assumption no longer holds, and the mismatch is what makes the burden feel unbearable. The architecture is not just in the way. It is holding up the roof while the ground underneath it shifts. And that is why people are so reluctant to touch it. Not because they are lazy or unimaginative, but because they rightly fear what happens if the thing they depend on collapses while they are still inside it.

Geology

I have spent the last eighteen months focused on an AI transformation inside a large organization. The kind of place that has built processes and governance structures and risk frameworks over decades, each one in response to a real need, each one making the organization more resilient, more consistent, more trustworthy. Clients trust the organization in part because those systems exist. And every day you feel the weight of systems that are doing exactly what they were designed to do, at a pace that was set long before the world started moving this fast. You cannot fix something that is not broken. You can only recognize that the context around it has changed enough that what it produces is no longer what you need.

The instinct when you recognize this pattern is to argue that the old decisions were wrong. That the sandbox was a mistake, that the governance was too heavy, that the risk framework was too conservative. This is tempting because it gives you something to blame and something to dismantle. It is also wrong, and the people who built those systems know it is wrong, which is why they resist when you try it.

The harder and more honest path is to hold two things at once. The decisions were right. And they are now in the way. The tension between them is not something you can resolve by picking a side. It is something to lead through, together. And leading through it looks less like conviction and more like humility. It means telling the people who built the current system that they were right, and meaning it, while also making clear that being right then does not settle the question of what is right now. It also means accepting that the things you are building today will one day be someone else's geology. The new platforms, the new frameworks, the new ways of working — they will accumulate and harden and eventually resist whatever comes after them. This is not a problem you solve once. It is a condition you learn to operate inside.

So what do you do?

Two Speeds

You do not tear down the architecture. And you do not pretend it is not there. What I have seen work, in the technology version and the organizational version of this problem, is building parallel structures that operate under different rules while the existing architecture continues to do what it does.

The phone makers will not remove the sandbox. What they will likely do is create a controlled layer within it where agents can operate with explicit permissions, a space where cross-application access is possible without dismantling the security model that everything else depends on. An agent zone inside the compliance perimeter. The sandbox stays. The new capability gets a surface to run on. Both coexist, governed differently.

The same principle applies inside organizations. You do not replace the governance structure that took decades to build. You create a parallel track where AI-native work can move at its own speed, with its own risk model, inside boundaries that are defined clearly enough that the existing structure does not feel threatened. Two speeds, running side by side, with clear rules about what crosses between them. Over time, the boundary shifts as confidence builds and evidence accumulates.

This is not easy. Parallel structures create their own tensions. People in the existing structure feel bypassed. People in the new structure feel constrained. The leadership work is not in designing the parallel system. It is in maintaining the legitimacy of both long enough for them to converge, which they eventually do, but only if neither side has been delegitimized along the way.

The screenshot on my phone this week was a small thing. A landing page for a feature that will probably work on mobile within a few months. But I keep coming back to it because it captures something that I think matters about this moment.

The future does not get blocked by your mistakes. It gets blocked by your achievements.

The Inversion

2026-02-24T00:00:00Z

I.

If you walk through a modern office after sunset, the light reflecting off the glass walls is almost always green. Not the green of a terminal or a trading floor, but the muted grid of a spreadsheet. Formulas reference other formulas across dozens of tabs. Cells contain logic that took years to accumulate and minutes to break.

These workbooks are the quiet engines of the modern economy. In regulated industries, they are not documents but systems. They calculate tax positions, model risk exposure, allocate capital, and justify decisions that carry material consequences.

For a long time, artificial intelligence could not touch them. Not because the models lacked capability, but because a spreadsheet is not a document. It is a network of dependencies. Meaning lives in the relationships between cells, not in any sequence of words. Feed it to a model as text and the model sees numbers. The thousands of logical gates that connect input to output remain dark matter: present, consequential, invisible.

At PwC, the shift came when we stopped asking models to read spreadsheets and started teaching them to navigate structure. Decompose the workbook into regions. Map the dependencies between formulas. Let the system traverse the file as a graph rather than a page. It no longer needed to hold everything at once. It needed only to find the logic relevant to the question being asked.

What changed was not the spreadsheet but the representation. What had been a static artifact became something that could be queried. Logic that had been locked inside manual processes could now be inspected, tested, and reasoned about by a machine. The first inversion is the oldest: making hidden logic visible.

The data had always been there. We just couldn't see it until we changed how we looked.

II.

In the early phase of the current transition, the craft was in the instruction. Teams wrote prompts, assembled agent specifications, versioned them, debated their phrasing. The quality of the output depended on the precision of the input, so precision became the work.

This was not misguided. At that stage, small changes in wording produced material differences in outcome. The prompt functioned as code. It was the interface between what a person intended and what a machine produced.

Then the models improved. They became better at inferring context, resolving ambiguity, filling in what was left unsaid. The careful scaffolding of the prompt began to matter less. What once required elaborate instruction increasingly required only a clear statement of purpose. The mechanics of the interface receded from view, the way the details of network protocols disappeared once the web became usable without understanding how packets moved.

The instruction layer is compressing into the model itself. Prompts and code, once treated as durable artifacts, are increasingly the exhaust of the process, generated to fulfill a goal, discarded once it is achieved. The second inversion is quieter: the compression of instruction into capability.

The things we labored over most are becoming the things that matter least. The craft was real. The layer was temporary.

III.

There was a time when the bottleneck was access to code. Then it was access to infrastructure. Both constraints eased. What remains is upstream of either.

If machines can traverse our data and generate the instructions to act upon it, the limiting factor is no longer execution. It is knowing what to execute and why.

We tend to assume that intent is set at the start of a project. Requirements are gathered, specifications are written, and building begins. In practice, intent is not a starting condition. It is an emergent property. It surfaces through friction: through the encounter between what was specified and what was actually needed. Edge cases appear. Assumptions that felt solid dissolve on contact with use. The act of building has always been inseparable from the act of discovering what was really meant.

When building is slow, that discovery hides inside the cost of delivery. When building is fast, the discovery is the work. The third inversion is the most uncomfortable: the exposure of intent.

What we have treated as secondary artifacts, prompt iterations, design rationale, the trail of revised specifications, are records of that discovery. They document the moment an assumption became a question and a question became a constraint. They are evolutionary maps of how thinking actually developed, as opposed to how we later claimed it did. Without them, each project begins from the same starting assumptions. With them, the next project begins where the last one's understanding ended.

For decades, organizations rewarded execution because execution was scarce. As that scarcity ends, what remains scarce is the clarity to describe it.

The spreadsheet era taught us to preserve logic in cells. The software era taught us to preserve logic in code. This era may require us to preserve the logic of our own thinking: how objectives evolved, where assumptions broke, and why decisions were made.

That is not documentation. It is infrastructure for judgment.

It Won't Fail Because of Me

2026-02-17T00:00:00Z

AI success is as much about attitude as algorithms

When leaders inventory what this kind of change demands, they land on reasonable things like infrastructure, governance, skills, workflows, and measurement. All of those are real. But there is one component that almost everyone underestimates, partly because it sounds simple and partly because it feels like something you can just decide to have. That component is attitude.

Attitude is the most counterintuitive bottleneck in organizational change. It feels easy to generate. A good town hall, a compelling vision, a few early wins, and you assume the attitude follows. But attitude at scale is not enthusiasm or energy in a room. It is the durable, shared orientation that determines how thousands of people make small decisions when nobody is watching, including whether to test assumptions, document handoffs, and surface uncertainty early, especially when the work is unglamorous and the easy path is to cut a corner that nobody will notice until much later. With AI, the outputs are fast and plausible, which means the cost of a small corner cut often shows up later and somewhere else. That kind of attitude is one of the hardest things an organization can synthesize, and without it, every other investment in change underperforms.

Why attitude is harder than infrastructure

Organizations are reasonably good at buying things and building things. If the problem is infrastructure, you can fund it. If the problem is skills, you can train for them. If the problem is governance, you can design it. These are hard problems, but they are legible problems. You can put them on a roadmap, assign them to a team, and measure progress.

Attitude does not work that way. You cannot purchase it from a vendor, mandate it in a memo, or install it in a quarter. It is emergent. It arises from the accumulation of small signals: what gets celebrated, what gets ignored, what gets punished, who gets promoted, how leaders respond to bad news, and whether it is safe to say \"I do not know\" in a room full of people who are supposed to know. It compounds in both directions. Good attitude makes every other investment work better. Poor attitude makes every other investment work worse.

And here is the part that makes it genuinely difficult: attitude in an organization starts with leadership behavior, not leadership messaging. Most transformation programs are designed to change everyone else's behavior. But the signals that shape attitude flow from the top, which means the first people who have to change are the ones sponsoring the change. That is an uncomfortable inversion, and it is one reason attitude problems persist even in organizations that are doing many other things right.

Social acceleration is the mechanism, and it is irreducibly collective

There is a way to make this concrete rather than abstract. Think of it as two kinds of acceleration.

Technical acceleration is the pace of change outside your organization. It is the steady improvement of models, the expanding ecosystem, and the falling costs. This acceleration is real and widely accessible. It is also the same acceleration your competitors are seeing, which means it is not a strategy by itself.

Social acceleration is the pace of change inside your organization. It is how quickly your teams can learn new patterns, share them, create standards, build trust, make decisions, and reshape workflows without falling into chaos or paralysis.

Here is the critical point: social acceleration is irreducibly collective. It cannot be driven by a single team, a single leader, or a single initiative. It requires that people across the organization orient toward the same kind of rigor, the same kind of ownership, and the same willingness to surface problems early and treat downstream teams with respect. One team moving fast while three teams wait for permission is not acceleration. It is friction with a good story attached.

When AI initiatives stall, the limiting factor is usually social acceleration, not technical acceleration. The outside world is accelerating models; the inside world has to accelerate coordination. When social acceleration lags, the failure modes are predictable, and they often look technical on the surface while actually being about operating model and incentives.

Innovation by announcement. Leadership declares bold AI goals without changing the operating model to support them. The press release writes itself; the execution never materializes. Teams learn quickly that the safest move is to build impressive demos that satisfy the narrative without bearing the weight of real workflows.

Consensus paralysis. In organizations built on partnership and consensus, the question \"should we do this?\" circulates through so many stakeholders that by the time alignment is reached, the opportunity has moved. AI rewards fast learning cycles; consensus culture rewards slow certainty. The gap between these two clocks is where ambition goes to die.

Governance theater. Risk and compliance functions, under genuine pressure to protect the organization, layer review processes that were designed for a different era of technology. The result is not safety but the appearance of safety, with three-month approval cycles for a system that will behave differently after the next model update anyway. The point is not ceremony, it is feedback. Real governance is continuous, adaptive, and close to the work. Theater is periodic, rigid, and far from it.

Prototype graveyard. Teams can build prototypes faster than the organization can absorb them. Without clear paths from experiment to production, including who owns ongoing reliability, who pays for maintenance, and who is accountable when it breaks, brilliant prototypes accumulate in a graveyard of \"we tried that.\"

Each of these is a symptom of insufficient social acceleration, and none of them yield to technology alone.

The good news is that social acceleration is built through specific mechanisms. In practice, it shows up as a few repeatable moves.

The best organizations build permission structures that make it safe to experiment and safe to raise concerns early, because hidden problems compound faster than visible ones.

They establish shared language around quality, risk, and value, so that the builder, the risk lead, and the business owner are not shipping to three different definitions of \"ready.\"

They shorten decision cycles by clarifying ownership, so that speed comes from accountable decisions rather than committee consensus.

They institutionalize learning loops that include honest accounts of what broke and why, because an organization that only circulates success stories is an organization that repeats its failures.

They instrument trust with evidence through evaluation, monitoring, and auditability, so that confidence comes from data rather than enthusiasm.

This is the kind of work that looks like culture from a distance but behaves like engineering when you do it properly.

The attitude that makes it real

So what does the right attitude actually look like when it is operating at scale?

NASA offers the clearest example. It built trust through disciplined review rituals and explicit go/no-go gates. Risks are logged, reviewed, and assigned an owner before launch, and handoffs are designed so that issues surface early. The posture underneath it was simple:* it won't fail because of me. *

The phrase is not about perfectionism. It says my work is part of a larger system, and I am going to do my part with enough care that the system has a chance to succeed. I will surface risks early, test assumptions, leave clear handoffs, and treat downstream teams and users with respect. If something goes wrong, it will not be because I cut corners, hid uncertainty, or left the next person guessing.

In AI work, this translates to a bias toward evaluation, clarity on human accountability, and production-grade handoffs.

What makes this phrase powerful is not that it is aspirational. It is that it is practical. It translates directly into the behaviors that drive social acceleration. When enough people hold this posture, the organization changes. Not because someone mandated the change, but because the accumulated weight of thousands of small, careful decisions starts to compound.

In a high-trust team, people do not protect themselves by staying vague. They protect the mission by getting specific. They bring uncertainty into the open early, and they treat that as professionalism rather than weakness.

It becomes normal to ask: what could go wrong, and how will we know when it does.

It becomes normal to ask: what is the human role in this workflow, and are we setting them up for success or for clean-up duty.

It becomes normal to ask: what are we measuring, and what evidence would increase our confidence over time.

And it becomes normal to ask: are we building a capability, or are we building a dependency.

Over time, those questions turn into habits, and the habits become an operating system for how the team thinks about drift, ownership, and long-term reliability. The attitude is not heavy. It is liberating, because it allows teams to move quickly without relying on luck. It replaces anxious speed with confident speed.

The advantage is not the model

The world will keep delivering better models, and that is a gift. But it is also a trap if it causes organizations to delay building the internal capabilities that turn models into outcomes. The teams that win will not be the ones who waited for the perfect model. They will be the ones who built the collective attitude that makes imperfect tools useful and improving tools transformative.

Attitude is the thing many leaders assume they can generate on demand and few organizations actually sustain. It is the part of systemic change that has no line item, no vendor, no implementation timeline. And it is, more often than not, the part that determines whether everything else works.

\"It won't fail because of me\" is not only a standard. It is a gift you give to your team, your users, and your future self.

Algorithms move fast, but attitudes decide what sticks.

The Coffee Walk

2026-02-10T00:00:00Z

Last month I walked through three floors of an office I was visiting for the first time. On the first floor, a gleaming espresso machine sat untouched, a laminated sign taped above it: \"OUT OF ORDER – TICKET SUBMITTED.\" The sign had yellowed slightly. On the second floor, someone had wedged a French press between a microwave and a pile of shipping boxes, next to a hand-labeled bag of beans. On the third floor, there was a person, not a machine, making coffee, chatting with people in line, clearly known and clearly appreciated.

Three floors. Three coordination styles. One building.

Coffee is one of the easiest places to see how an organization actually works, not because coffee matters in some grand way, but because it's small, frequent, and shared, and it lives right at the boundary between formal operations and informal norms. It creates a daily moment where people coordinate without a meeting, maintain a commons without a committee, and express care without a slide deck.

That's the through-line: coffee exposes how an organization treats shared infrastructure, including who owns it, what \"good\" looks like, how friction gets resolved, whether defaults are trusted, and whether the commons quietly degrades or gets maintained with pride. The same underlying variable shows up later in every attempt to roll out a shared platform, change a workflow, or introduce a new capability that only works if people adopt it together.

And that's why it's a useful proxy for AI adoption. When I say \"AI success\" in this essay, I don't mean universal usage or a spike in tool installs; I mean the ability to make AI a maintained, routine capability that reliably improves real workflows, creates spillover benefits for teams, and keeps getting better instead of quietly decaying. In practice, the model is the espresso machine: impressive, capable, and ultimately secondary if the ownership, norms, support, escalation paths, and day-two maintenance aren't there.

Coffee is a low-stakes rehearsal for those dynamics.

The coffee walk

If you want to understand how a company works, take a coffee walk. Treat it as anthropology: a random walk through a system that exists in almost every office and is rarely discussed with precision. The goal is not to fix anything. The goal is to notice what is true.

Start with questions that sound trivial until you watch how quickly they turn into a map of the organization. Let the walk guide you in a natural sequence. Notice where coffee lives and how visible it is (one place, many places, hidden places, executive places) and then pay attention to who makes it and who maintains it. “Everyone” and “a named owner” are very different worlds.

Listen for what people mean when they say it's \"working.\" That definition tells you what's valued here: speed, quality, availability, ritual, status. It also tells you what people are willing to tolerate, and what they quietly route around.

Then watch what happens when it breaks, and whether the fix travels through a ticketing system, a Slack thread, a shrug, or a hero who quietly makes it all work. Pay attention to the norms and how they're enforced (signs, vibes, silent expectations), and to the behaviors people don't advertise: hoarding pods, labeling milk, hiding mugs, bringing a personal setup, or keeping a backup stash because they don't trust the commons.

Finally, notice the economics and the entitlement model. Is coffee treated as a shared baseline, a subsidized perk, a BYO situation, or something that varies by floor? In most organizations, that answer isn't really about coffee. It's about how they think access and equity work in practice.

Coffee surfaces the shape of ownership and the texture of coordination. It shows where the organization expects self-service, where it expects service delivery, and how it handles shared infrastructure when nobody is explicitly in charge. If you can learn to see culture in coffee, you can learn to see it in other shared systems too: the internal platforms you want teams to adopt, the workflow changes you want to land, and the operating rhythms you assume will hold.

Four archetypes, and what to do with each

Most coffee cultures cluster around patterns. In practice, most companies are blends: one dominant shape and a few competing subcultures. None are good or bad. Each has strengths and predictable failure modes.

Self-serve standardization. A central station, reliable defaults, optimized for throughput, with someone who clearly owns the experience and the repair loop, and with people who largely trust the default.

This kind of organization tends to do well with AI platforms that have strong guardrails and strong defaults. Adoption becomes routine when the happy path is genuinely happy, but the risk is that edge cases get ignored and innovators route around the platform if it can't flex. The move here is to make the default genuinely lovable and then protect it like infrastructure. Great templates, safe guardrails, clear ownership, fast repairs, and a steady release cadence will beat a big launch followed by drift, and you win by making it boring in the best way.

Artisan autonomy. Many micro-stations, personal preferences, local hacks, and people bringing their own gear, which produces pockets of excellence fast and often creates genuine pride in craft.

The scaling challenge is predictable: duplication everywhere, little shared memory, constant reinvention of the same wheel. AI adoption follows the same shape: impressive local workflows, uneven quality, and a hard time turning \"what worked for me\" into \"what works for us.\" The move here is not to centralize everything first, but to connect it so local excellence can travel. Interoperability, shared components, and lightweight standards will do more than heavy process, as long as you also make sharing easy and rewarding without turning it into a compliance program.

Perk signaling. High-end machines, premium beans, and \"experience\" as message, in a culture that can move fast on procurement and optics and often has leadership attention.

The trap is confusing investment with outcomes, and AI adoption can drift into a portfolio of tools without durable workflows, simply because it's easy to buy capability and surprisingly hard to embed it. The move here is to treat AI like any other shiny thing that becomes shelfware unless it is anchored in real work. Define \"done\" in workflow terms, measure outcomes, and make integration unavoidable so it becomes part of how work happens rather than a collection of tools people demo.

Concierge service. Coffee as a service relationship, with a person or small team delivering it with pride, with people who know them, and with an experience that creates community because the commons is cared for in a visible, human way.

This archetype maps cleanly to what most organizations actually need from AI: enablement wrapped around capability, with ownership and support made explicit rather than assumed. The move here is to lean into that service model through office hours, coaching, champions, playbooks, and a trusted steward who owns the experience end-to-end. The cultural move is not \"make everyone an AI person,\" but \"care for the commons,\" so the organization gets shared capability without requiring everyone to become an expert.

An example: service as a cultural primitive

At PwC, coffee isn't only a machine in a kitchen. In at least one office, it has a person at the center of it: someone who is liked, trusted, and appreciated. People greet them by name. Regulars have their usual. Newcomers get gently pulled into the line and the rhythm of the place. When the office moved, that coffee service moved too, seamlessly. The system recognized what mattered and treated it as real infrastructure. The heart of the experience wasn't a device. It was a human service delivered well.

That's a signal.

It suggests a service ethos that is relational, not transactional. It suggests that invisible work gets noticed. It suggests that \"experience\" is a maintained commitment, not a one-time purchase.

AI succeeds in cultures like this when it ships the same way, wrapped in enablement, owned by someone people trust, and maintained as a commitment rather than a launch. Concierge coffee cultures don't require every individual to become a coffee expert. They require the commons to be cared for. AI works the same way.

The coffee halo

Not everyone drinks coffee. The culture still reaches them.

Coffee creates a halo: a shared ritual that generates benefits beyond the direct consumers. It creates a place where people cross paths. It creates a moment where newcomers learn norms without a formal introduction. It creates small chances for help, context, and belonging. It changes the pace of a day. It creates a shared reference point.

This is a helpful correction for AI strategy, because it reframes what success looks like. The goal is not universal usage. The goal is distributed capability with spillover benefits. A small number of power users can raise the quality of meetings, drafts, onboarding, and handoffs. A well-designed assistant can reduce repeated questions even for people who never open the tool. A team can benefit from one person who can summarize, structure, and synthesize quickly.

The halo is the outcome. Usage is an input.

Coffee as a leading indicator

Coffee also helps you notice change. When norms shift, coffee patterns often shift early: hybrid work changes where people gather, cost pressure changes what gets maintained, office redesign changes how teams collide, and leadership changes alter what is valued and what is tolerated. A small ritual becomes a symbol quickly when expectations are under negotiation, and coffee is often where that negotiation becomes visible.

Coffee is a sensor. It reveals the delta between the culture you describe and the culture you live. It does it without requiring a formal cultural analysis, because it's culture at human scale: ordinary, repeated, and emotionally charged in just the right amount.

The point

Coffee is not important because it is coffee.

It's important because it's a shared system people touch every day. It sits in the commons. It collects norms. It exposes ownership. It generates a halo. And it changes early when the culture changes.

If you can learn to see culture in coffee, you can learn to see it in AI adoption. The same dynamics show up, with higher stakes and less patience. Three floors in one building can hold three different operating systems at the same time, and you'll see the same thing with AI: one team thriving on a default, another building personal stacks, another waiting on a ticket that never quite closes.

A coffee walk is practice. It's also a reminder: durable change rarely starts with policy. It starts with the routines people keep, the commons they maintain, and the service they deliver to each other without being asked.

OK Computer

2026-02-03T00:00:00Z

For sixty years, computing has worked in essentially the same way.

The computer offers capabilities, and you learn what's available. You then figure out how to invoke them in the right sequence to get what you want. Want expense reports from receipt photos? You find OCR software, learn how it works, extract the data, write formulas to structure outputs, and format the results. The computer can do all of this (the capability exists in the system) but you have to know what's possible, which tools to use, and how to orchestrate them into a working solution.

Supply-side computing means the machine says, \"Here are my functions. Compose them to achieve your goal.\"

Entire industries emerged around this model. UI and UX disciplines focused on making functions discoverable and learnable. Documentation teams explained what existed and how to use it. Training programs taught people the tools. Solution architects translated business intent into technical execution. None of this exists because computing is poorly designed. It exists because this is how the relationship between humans and computers has always worked.

The computer offered supply. You learned to use it. The translation burden fell entirely on you.

The Shift to Demand-Side

Several things shipped in the last few months. Together, they point to the same underlying change.

Research showed that when large models are given access to a computational environment, they spontaneously learn how to use it. File systems become memory-management tools. Scripts become execution mechanisms. External resources get pulled in when needed. No retraining is required; capability emerges once the environment is available and the model can explore what's possible.

Claude Code lets developers delegate whole tasks rather than individual keystrokes. Not \"help me think about this refactor,\" but \"refactor this codebase,\" while they go work on something else. The model plans the work, executes the changes, handles errors, and reports back when it's done.

Cowork applies the same pattern to non-technical users. You point it at a folder, describe what you want to accomplish, and watch it organize files, extract data from images, and generate reports. It's the same agent architecture, packaged for people who don't write code.

Other tools are doing similar work across workflow automation in different domains.

These aren't demos or research previews in the traditional sense. They're shipping products, running in production environments, used daily by thousands of people doing real work.

Together, they reflect a fundamental shift toward demand-driven computing. You express what you want to accomplish, and the system figures out how to deliver it. When you say \"organize these files by content,\" the agent determines which tools to use, in what sequence, and how to adapt when something doesn't work the first time. You don't need to know the supply exists. You express demand, and the system sources the appropriate supply to fulfill it.

How Utilities Actually Work

Real utilities follow a specific pattern, and it's worth being precise about it.

When you flip a light switch, you don't provision generation capacity. You don't route power through substations or balance load across the grid. You don't need to understand three-phase distribution, transformer ratios, or transmission-line capacity.

You express demand through the simplest possible interface (I want light) and everything else (generation, transmission, distribution, delivery) is handled by infrastructure you never see and don't need to understand. The interface is trivial. The orchestration is invisible. That's what makes it a utility rather than industrial infrastructure.

Cloud as Latent Supply

Cloud infrastructure gave us something unprecedented: elastic supply at massive scale.

Compute, storage, accelerators, databases, analytics services, AI capabilities (all globally distributed, available on demand, and scalable from zero to nearly infinite within minutes). The supply side became extraordinary, elastic in ways previous generations of infrastructure could never match, powerful enough to handle almost any computational workload, ready and waiting for demand.

But accessing that supply still required significant expertise. You needed to make architecture decisions about which services to use and how they should connect. You had to handle provisioning, manage orchestration across multiple services, and constantly optimize the tradeoff between cost and performance. The supply existed, and it was genuinely impressive in scope. What was missing was a demand-side interface that didn't require specialized knowledge to use.

The Workload Everyone Missed

Much of the infrastructure conversation over the last two years has focused on GPU buildouts.

Training clusters capable of handling trillion-parameter models. Inference capacity to serve millions of concurrent users. Power and cooling infrastructure to support extreme computational density. Data center expansion to house specialized hardware. The debate has centered on whether we're building enough capacity to support the next generation of foundation models and the applications built on top of them.

That demand is real, and the infrastructure requirements are substantial. But it isn't the whole story, and it may not even be the largest one.

Alongside training and inference, a different workload is emerging (one few people were explicitly planning for): general-purpose computing orchestrated by AI agents at scale.

Not training new models. Not serving conversational inference. Instead, running analyses that never ran before because setup costs were too high. Automating workflows that were too complex to justify with traditional approaches. Orchestrating millions of small, heterogeneous tasks across files, services, APIs, and systems (the kind of work that requires judgment about what to do next rather than executing a predefined sequence).

Agents turn out to be uniquely good at this kind of work. It's also exactly what cloud infrastructure was originally built to handle.

Over two decades, cloud providers invested heavily in elastic CPU capacity, distributed storage, networking infrastructure, managed databases, message queues, batch-processing systems, and sophisticated orchestration layers. These systems were designed to handle bursty, parallel, irregular workloads (the kinds of computational patterns humans were too slow and too expensive to fully utilize). The infrastructure was built for elasticity and variety, not just raw throughput.

While attention fixated on GPUs and the specialized infrastructure required for model development, the most immediate explosion in compute demand is landing on the general-purpose fabric that already exists. Agents don't just consume intelligence in the form of model inference. They consume infrastructure across the full range of cloud services, and they turn out to be remarkably good at keeping that infrastructure busy in ways humans never could.

The Interface Layer

The missing piece is now in place.

Local computer-use agents make the pattern clear. You express intent in natural language, the agent translates it into computational actions, and it delivers results. On a local machine, the supply side is necessarily finite (limited to what's installed, what hardware exists, what a single system can do).

Connect that same agent capability to cloud infrastructure, and something fundamental changes. Elastic supply meets trivial demand expression, and the constraints that shaped computing for sixty years begin to disappear.

You can say something like: \"Process these ten million customer records, identify purchasing patterns, generate personalized recommendations for each segment, and send them via email by tomorrow morning. Budget is $500, and accuracy matters more than speed.\"

The agent takes that specification and handles everything that follows. It determines what supply is required (databases, processing services, compute capacity, storage). It provisions the appropriate resources, orchestrates execution, parallelizes the work, scales up and down as needed, delivers the results in the requested format, and tears everything down when it's complete.

You never see the architecture that was built to fulfill your demand. You never provision resources or write orchestration code or choose specific services. You expressed demand with constraints, and the system sourced supply, orchestrated delivery, and optimized for your priorities.

Agent-defined architecture means infrastructure emerges from demand rather than constraining what's possible. Architecture becomes an implementation detail, the same way power generation becomes an implementation detail when you flip a light switch.

Utility Computing, Realized

Cloud providers built the foundation for this moment over many years. The elastic supply was always there, waiting for a way to make it genuinely accessible.

What changed is that the interface finally became trivial in the way utilities require.

Not: learn cloud architecture, understand service tradeoffs, provision resources, wire systems together, write orchestration logic, monitor performance, and continuously optimize.

Just: state what you want to accomplish, specify constraints around time, cost, and quality, and receive results that match your requirements.

The agent handles everything in between. Demand-driven computing at cloud scale means utility computing finally behaves like an actual utility. The interface is trivial. The orchestration is invisible. Computational capability is consumed the same way electricity is consumed.

Revealed Demand

The economic consequences become clear when you consider latent demand.

There is enormous latent demand for computation today (work that should happen but doesn't because translation costs make it uneconomical). Analysis that could provide genuine value but isn't worth the weeks required to learn tools and build systems. Automation that makes sense but would require expertise the organization doesn't have and can't easily acquire. Processing that remains manual because orchestration is too complex relative to the benefit.

All the \"I'd do this if I knew how\" work never gets done, not because it lacks value, but because the interface cost exceeds the expected return.

When the interface becomes trivial, latent demand becomes revealed demand. This isn't new desire appearing from nowhere. It's suppressed execution finally finding expression. The work always made sense. The computational capability always existed. What was missing was an economical way to connect intent to execution.

Agents amplify this effect further because they operate continuously. No context switching. No nights or weekends. No waiting for someone to have time to set things up. Demand can be expressed and executed at machine speed, twenty-four hours a day. Existing infrastructure gets used far more fully than human operation ever allowed.

We're not building dramatically more supply. We're finally accessing what already exists at the rate and intensity it was designed to handle.

What's Changing Now

The shift is already visible in production systems.

Developers are delegating meaningful work to agents they trust with real codebases. Non-technical users are orchestrating complex data transformations. Models are operating reliably inside computational environments, making consequential decisions about tool usage and resource allocation. The transition from assistant to operator is happening at scale.

Cowork was built by Claude Code in ten days (agents building the tools that democratize agents).

The implications cascade across multiple dimensions. For organizations, cycle time compresses dramatically. The path from idea to deployed solution shrinks from months to days or hours. Competitive advantage shifts away from \"who has the best architects\" toward \"who can articulate the clearest demands and constraints.\" The bottleneck moves from implementation to judgment.

For cloud providers, deep and broad service catalogs become true competitive moats. Elastic infrastructure finally sees sustained utilization across the full range of services.

For work, effort shifts from implementation toward direction. Less time figuring out how to make systems do things, more time deciding what's worth doing.

For the economics of computing, interface cost collapses. Organizations pay closer to the actual resources consumed rather than the blended cost of resources plus scarce expertise. Computation becomes unbundled from the cost of knowing how to compute.

Why This Happened Now

Three things converged.

Cloud platforms reached a level of maturity and reliability that made agent-defined architecture viable. Foundation models became capable of translating between human intent and computational supply with sufficient accuracy. Agent architectures crossed the reliability threshold required to delegate real work in production environments.

Each was necessary. None was sufficient alone. Together, they inverted the relationship between humans and computers.

Supply-side computing required humans to learn what machines could do. Demand-side computing requires machines to understand what humans want. That inversion has now occurred.

The Era Shift

For sixty years, you had to learn what your computer could do before you could make it useful. That constraint shaped software, teams, training, and entire industries. It was so fundamental that it faded into the background.

That constraint just ended.

Computers can now understand intent expressed in natural language, reason about available supply across vast service catalogs, orchestrate execution across distributed systems, and deliver results that match demand. The interface became trivial. The orchestration became invisible. Computing became a utility.

This isn't a future possibility. It's already happening, in production, at scale. Cloud providers built the supply over decades. Agent architectures built the interface in a matter of years. Revealed demand is about to meet elastic infrastructure in ways that will surprise anyone still focused exclusively on training and inference.

Everything accelerates from here.

The Michelin Paradox

2026-01-29T00:00:00Z

Davos has a way of compressing the future into a few days. This year, the contrast was unusually sharp.

On one side: the model providers. They are moving at an extraordinary rate. Capability jumps are arriving faster than most enterprises can absorb, and the confidence in more general intelligence has quietly shifted from if to when. You can feel it in the demos, in the roadmaps, and in the commercial momentum.

On the other side: the enterprise leaders I spoke with. Smart, motivated, well-funded. And still stuck. Not stuck on whether the technology works, but on how to use it consistently, safely, and at scale. Many are wedged between what these systems can do and what their organizations can reliably deliver.

That gap is not primarily a technology constraint. It is an operating constraint.

And here’s the uncomfortable part: many AI products assume a level of coordination and readiness that most organizations have never needed before. Those assumptions are often implicit. That’s not an excuse. It’s a diagnosis that demands action. To win, we have to make those assumptions explicit and build the systems to meet them.

A kitchen story about AI

Here’s the metaphor I keep coming back to. In a Michelin-star kitchen, you can buy the best knives and ovens in the world and still cook badly. Because the craft is not the knife.

The craft is the system: prep, timing, standards, coordination, station ownership, quality checks, service cadence, and the discipline of doing the same thing the same way until it becomes muscle memory. Great tools do not create a great kitchen. They amplify a great kitchen.

AI is similar. AI tools are not the work. The work is the operating system around them.

So what do leaders do, practically, to close the gap?

Make speed a strategy, not a slogan

When technology advances faster than your operating model, “we’re moving fast” becomes meaningless. What matters is cycle time from decision to shipped capability, measured in weeks, not quarters.

Moves that actually change speed:

Run fewer bets at a time. If everything is priority-one, nothing is. Pick a small number of outcomes and starve the rest on purpose.

Shorten the distance between decision and build. Tighten the handoff chain. Fewer committees. Fewer alignment loops. Fewer approval layers.

Create a single weekly service cadence. A recurring rhythm where you review what shipped, what broke, what you learned, and what you are changing next week. No theater. Just evidence.

Treat reuse as a first-class deliverable. Don’t fund dozens of near-identical copilots. Fund shared capabilities that can be consumed across teams.

A useful test: if you paused all new planning for two weeks and asked, “What can we ship with what we already have?” would anything meaningful land? If not, you have an operating problem.

Optimize for outcomes, not checklist progress

Many organizations are building AI programs that feel comforting and produce artifacts: frameworks, policies, training decks, steering committees. Some of that is necessary. But it is also an easy way to look busy without changing work.

Moves that keep you honest:

Pick business measures you already care about (cycle time, defect rate, time-to-insight, win rate, customer satisfaction) and force AI to move those measures.

Stand up tight feedback loops where real users evaluate AI in real workflows every week. Not surveys. Observed use.

Define what good looks like per workflow in plain terms. Example: “Draft a first pass in two minutes, cite sources, flag uncertainty, hand off cleanly to review, and cut review time by 30%.”

Instrument the work, not the narrative. Track adoption, completion, latency, error modes, escalation rates, and human time saved in the flow of work.

If AI progress cannot be seen in existing operational metrics, it is not progress yet.

Build the kitchen, not just the menu

AI does not just add a tool. It changes how decisions get made, how work gets decomposed, and what quality means. The hard part is not building one impressive demo. The hard part is making one new way of working repeatable.

Moves that institutionalize change:

Make a small number of behaviors non-negotiable. Document decisions. Use shared components. Log exceptions. Ship weekly. Measure outcomes.

Clarify decision rights. Who approves production use? Who owns risk calls? Who can reallocate engineering capacity? Ambiguity creates delay.

Build line-cook roles, not just head-chef roles. You need operators who can deploy, monitor, improve, and support AI in production, not only visionaries.

Create a single front door for production AI. One path with clear gates: safety, privacy, evaluation, monitoring, and support. Standardize the boring parts (identity, data access, prompt management, eval harnesses, observability, audit logs) so teams can move fast on the interesting parts.

AI success looks boring when it’s real. It looks like consistency.

A note to the model providers

Model providers have earned their momentum. But there is a growing mismatch between the pace of model iteration and the operational scaffolding enterprises need to succeed.

It is not tenable to ship increasingly powerful systems and leave customers to assemble the operating system alone.

Moves where providers can help:

Make operational assumptions explicit. What governance model does this require? What skills? What monitoring? What failure modes should teams expect?

Ship evaluation and observability as defaults. Not optional best practices, but baked-in primitives.

Offer migration-safe interfaces. Enterprises need stability across versions, predictable behavior changes, and clear upgrade paths.

Provide reference workflows, not just reference prompts. Show complete patterns: data access, tool use, escalation, human review, auditability, and measurement.

The winners will not just build better models. They will build better conditions for success. Not for nothing, but the customers I spoke to are desperate for the help.

Bringing it back to the kitchen

My takeaway from Davos is simple: the knives are getting sharper every month. But most kitchens are not set up to serve at that speed.

Leaders who close that gap will treat AI like a craft discipline, not a tool rollout. They will build the prep stations, define the standards, instrument the line, run a weekly service cadence, and make reuse the default.

Great tools do not make a great kitchen. But in a great kitchen, great tools change what’s possible.

And right now, the opportunity is to become the kitchen that can actually use them. Rare opportunity ahead.

AI Gifts for 2026, Part 3: Closing the Series

2025-12-19T00:00:00Z

*If you’re just joining the series, Parts 1 and 2 are linked in the comments. *

This final installment focuses on what separates experiments from operating models. These are the gifts that determine whether AI stays contained or becomes industrialized:

\uD83C\uDF81 Gift #6: Escalation Is a Product Feature

In well-designed AI systems, escalation isn’t a failure mode. It’s a capability.

The most effective workflows don’t treat escalation as a last resort. They design for it explicitly:

agent → augmented human → full human.

Fast. Visible. Reversible.

Most organizations overlook this, but escalation is the skeleton of industrialized AI workflows. Without it, systems either overreach or underdeliver. With it, autonomy can grow safely.

When escalation is intentional, agents know when to ask for help, humans know when to step in, and responsibility is never ambiguous. The system doesn’t stall, it adapts.

*The gift here is recognizing that how work hands off matters just as much as who does it. *

\uD83C\uDF81 Gift #7: The “Two Clocks” Mindset

Technology clocks beat organizational clocks. They always have.

AI capabilities are advancing at a pace no planning cycle can match. Waiting for perfect clarity doesn’t reduce risk, it creates it. The real leadership challenge now is shrinking the adoption gap between what’s possible and what’s operational.

*This is the gift of tempo. *

Organizations that move well don’t try to synchronize the clocks. They accept the mismatch and design around it, building feedback loops, shortening cycles, and creating space to learn while moving forward.

Progress comes from motion, not certainty.

\uD83C\uDF81 Gift #8: Collapse the Last Mile

The model isn’t the hard part anymore.

The hard part is everything around it: operating model, culture, evaluation, risk posture, quality loops. This is the “last mile” where most AI initiatives stall, and where real differentiation now lives.

Collapsing the last mile means designing AI into how work actually gets done, not treating it as something adjacent. It means aligning incentives, clarifying ownership, and embedding quality and risk management into the workflow itself.

This is less about technology and more about intent. Firms that invest here don’t just deploy AI, they operationalize it.

\uD83C\uDF81 Gift #9: Build for Verifiable Value

Clients aren’t asking, “Will it work?”

They’re asking:

How will I know it worked?

How do I measure AI-driven productivity?

How do I avoid AI slop and shadow processes?

*The gift is to instrument value, not imply it. *

That means designing systems that make outcomes visible, where productivity gains can be measured, decisions can be traced, and improvements can be verified. AI that can’t prove its impact eventually loses trust, no matter how impressive the demo.

Value that’s visible scales. Value that’s assumed does not.

\uD83C\uDF81 Gift #10: Let the Frontier Work for You

As capabilities accelerate, many organizations fall into the same trap: scaling too slowly.

They wait for stabilization. They delay for certainty. And in doing so, they miss the compounding effect of building early.

This final gift is a commitment:

build ahead

ship early

improve continuously

and let the underlying models get cheaper and better underneath you.

It’s the antidote to “wait until it stabilizes.” The frontier will keep moving whether you engage with it or not. The advantage comes from letting that motion work in your favor.

Closing the Series

Taken together, these 10 gifts outline a different way of thinking about AI in 2026, not as a collection of tools, but as an environment organizations must learn to operate within.

The firms that succeed won’t be the ones with the flashiest demos. They’ll be the ones that design for learning, build for change, and move with intent as the frontier advances.

Thanks for following along, and Happy Holidays!

AI Gifts for 2026: Part 2

2025-12-16T00:00:00Z

In Part 1, we explored two foundational gifts for an AI-driven 2026 — designing beyond today’s limits and prioritizing depth over breadth. *(If you missed Part 1 of this series, I’ve linked it in the comments below.) *

Both point to the same truth: we are no longer designing for static systems. We are designing for moving frontiers, technologies that continue to evolve while we are actively building on top of them.

In static systems, governance is mostly about control—rules, approvals, and fixed workflows. In moving systems, those tools decay quickly. What matters instead is how the system learns, how judgment is applied, and how quickly yesterday’s insight becomes tomorrow’s default behavior.

This week, we turn to three gifts that shape how organizations build, govern, and learn with AI systems as those frontiers expand:

\uD83C\uDF81Gift #3: Build With Agents, Not Apps

For decades, enterprise software followed a familiar pattern. You built an app, wrapped it in an interface, defined a workflow, and asked humans to operate it. Even today, much of AI is still wedged into that mold: a prompt box here, a chatbot there, each one a destination rather than a participant in the work itself.

But the app metaphor is starting to break.

Apps scale with the number of people using them. Agents scale with compute. That single shift changes the character of the system you are building.

Agents are not a new UI. They are a new actor.

They can take in a goal, decide what to do next, call tools, revise their plan, escalate when needed, and continue forward. They do not wait passively for instructions, and they do not assume a single, linear path through a workflow. They behave more like colleagues, ones who operate at machine tempo and learn with every cycle.

Consider a familiar enterprise task: preparing an audit plan, structuring a complex tax analysis, responding to a regulatory inquiry. In an app-centric world, humans navigate screens, stitch together outputs, and carry context from step to step. In an agent-centric world, the system holds the goal, coordinates the steps, pulls in data, flags uncertainty, and escalates only when judgment is truly required. The work still happens, but the burden shifts.

The design question moves from “How do humans navigate this?” to “How does the system orchestrate this on their behalf?”

It is a quiet but profound reorientation. Apps ask humans to adapt to the system. Agents adapt the system to the human.

*The gift is the shift in imagination: designing for a world where humans still set direction, but agents increasingly handle the path there. *

\uD83C\uDF81Gift #4: Human-in-the-Loop as a System, Not a Vibe

As AI systems grow more capable, the instinct in many organizations is to lean harder on human oversight, as though “someone reviewing it” is enough to guarantee safety, quality, or correctness.

But human-in-the-loop is not a checkpoint. And it is certainly not a vibe.

Left undefined, it becomes the place where ambiguity accumulates. Review queues grow. Senior experts become permanent bottlenecks. Costs flatten instead of falling. And the system never improves, because every correction evaporates once the work is done.

When human oversight is treated as a system, something different happens.

Human judgment becomes a strategic resource: targeted, trackable, and increasingly rare as the system learns. Decisions about when humans intervene, what triggers escalation, and how confidence is measured stop being implicit. They become part of the architecture, observable and improvable.

Each intervention becomes signal. Each correction becomes momentum. Oversight stops being rework and starts becoming propulsion.

The gift here is clarity: human oversight not as a fallback, but as a designed, measurable, evolving part of the system. It is how organizations avoid hard-coding permanent human bottlenecks into workflows that should otherwise scale.

\uD83C\uDF81Gift #5: Every Human Touchpoint Teaches the System

The first two gifts raise a deeper question: what should human effort actually do in an AI system?

Too often, humans are positioned as fixers: stepping in to correct errors, override decisions, or move work along when the system falls short. The problem is not the intervention. It is that the intervention disappears the moment it is done.

If a human intervenes and the system does not learn from it, that was wasted effort. Not because the intervention was not valuable, but because its value stopped at that moment. In many organizations, this is the hidden tax of AI adoption: humans fix, systems forget, and the same problems return at machine speed.

In high-performing AI systems, every human touchpoint is treated as instruction. A correction is not just a fix; it is a lesson. A review is not just approval; it is training signal. Over time, the system requires fewer interventions precisely because it has absorbed the reasoning behind them.

This is where productivity actually compounds. Effort shifts from repeatedly doing the work to permanently improving how the work is done. Without that shift, AI accelerates output, but not progress.

This reframes the human role entirely. People are not there to prop up the system indefinitely. They are there to teach it.

*The gift is the mindset shift: turning humans from perpetual fixers into teachers, and ensuring the system is designed to listen. *

What Comes Next

These three gifts share a common thread: they treat AI not as a tool to be used, but as a system to be taught. Organizations that internalize this will see their effort compound. Those that do not will keep solving the same problems, just faster.

Gifts 6–10 explore how these patterns scale: structuring escalation, managing tempo, and closing the gap between AI capability and real, verifiable value.

AI Gifts for 2026

2025-12-11T00:00:00Z

This holiday season, I’m sharing a different kind of gift guide: AI gifts you can give your team, your firm, and your organization.

Most gift guides focus on what to buy. This one focuses on what to build—the practices, patterns, and architectural choices that make AI easier to use and scale. As we move into an AI-driven 2026, these internal upgrades matter far more than any individual model announcement.

Over the next few weeks, I’ll share gifts that strengthen your organization’s ability to operate, learn, and deliver at the tempo AI now enables. Think of them as compounding capability upgrades.

Let’s start with the first two:

\uD83C\uDF81Gift #1: Design Beyond Today’s Limits

In the early computing era, chip performance was improving so quickly that the best software teams didn't design to the hardware they had—they designed to the hardware they knew was coming.

Moore's Law created a simple rule of thumb: by the time your product shipped, the capability curve would catch up. AI is moving much faster. If you design to today's constraints (latency, cost, context windows, model accuracy) your solution will feel outdated the moment it ships. The frontier is advancing too fast.

*The gift is this: design for the experience you want to deliver, not the bottlenecks that exist today. The capabilities will catch up. *

The opportunity cost of under-building is now far higher than the cost of building ahead. Give your teams permission to design for the curve, not the moment.

\uD83C\uDF81Gift #2: Depth Over Breadth

Many organizations celebrate AI adoption by counting how many people have access. It feels like progress because it’s easy to measure. But wide access without deep engagement doesn’t materially shift how work gets done.

Shallow usage creates the illusion of adoption without the outcomes.

Here’s the real pattern emerging across enterprises: the organizations seeing meaningful impact aren’t the ones with the most seats—they’re the ones where people use AI across multiple workflows, day after day, with increasing sophistication.

Why? Because depth changes the economics of work.

When someone uses AI only occasionally, they get incremental convenience. When they use it across the full arc of their work—analysis, writing, coding, planning, reviewing—they get multiplicative gains. Work compresses. Turnaround cycles shorten. Teams move from “occasionally sped up” to “consistently accelerated.”

The gap between these two worlds is widening. In almost every enterprise dataset, the people who lean in deeply—across more tasks, with more advanced capabilities—pull away from the median user. Their workflows become both faster and better. They also unlock categories of work they previously couldn’t do, which alters role boundaries and team dynamics.

That’s the so what: Depth doesn’t just make individuals more productive—it reshapes how the organization performs. It changes what a team is capable of. It influences who does which work. It affects delivery speed, client experience, and ultimately competitive position.

The gift, then, is a commitment to intensity over distribution.

Start by choosing the workflows where depth matters most—where repetition, judgment, and variability intersect—and build the conditions for mastery: shared patterns, reusable workflows, expert examples, consistent reinforcement.

Breadth expands access. Depth expands capability. And capability, not coverage, is what compounds.

More Gifts to Come

In the next installments, I’ll share additional gifts—each one focused on a structural pattern that helps organizations scale AI with clarity, speed, and confidence. They build on one another. Together, they form an operating mindset for 2026: how to design, deploy, and improve AI systems in ways that compound value over time.

More next week.

AI The Hard Way

2025-10-21T00:00:00Z

When a new technology arrives, most people look for the smoothest way through it. They plan for fewer obstacles, clearer rules, and better timing. It feels sensible. But in moments of real change, that instinct points the wrong way.

With AI, there is no frictionless path. Every shortcut hides the real work: learning to move while the ground is still shifting. Waiting for it to get easier is another way of standing still.

Transformation is not about comfort or certainty. It's about building motion inside uncertainty. The teams that learn fastest are the ones that start moving before it feels ready, using what they learn to steady themselves as they go.

The goal is to keep learning, to stay in motion, and to use what's hard as fuel. What looks like friction is often energy waiting to be redirected. The organizations that move through the hard parts, not around them, are the ones that reach the other side first.

Ease vs. Energy

Most change efforts start with the search for better process. We want to optimize, simplify, and align. But AI transformation is not a process problem. It's an energy problem.

Every new system creates turbulence. Data is messy, roles are unclear, and the tools keep changing. The instinct is to slow down until the path smooths out. But that pause drains energy from the system. The job of leadership is to keep energy moving even when stability isn't possible.

Momentum built in instability is what turns uncertainty into progress. Energy that moves becomes learning; energy that waits becomes decay. The organizations that master that conversion—turning turbulence into motion—are the ones that actually change.

The Trap of Waiting for Easy

When people say \"it'll be easier when…,\" they are usually protecting the old system. “When we have better data,” “when leadership aligns,” “when we finish the pilot.” Each statement sounds reasonable, but all share a hidden belief: change should feel stable first.

Real transformation doesn’t wait for permission or perfect conditions. It starts as a small rebellion—a refusal to stay still while waiting for comfort. The organizations that grow are the ones that keep moving before alignment is guaranteed. They act their way into clarity instead of planning their way toward it.

Every major shift begins with that dissonance: the uncomfortable sense of doing something before the playbook exists. The hard part is not starting—it’s staying in motion when certainty never arrives.

Friction vs. Drag

Not all difficulty is created equal. Organizational drag—the politics, the broken approval chains, the data locked in legacy systems—that’s real, and it should be eliminated ruthlessly. Drag slows everything down without teaching anything.

But here’s the trap: many organizations exhaust themselves removing drag and call it transformation. They optimize the wrong things. They mistake getting better at the old game for learning the new one.

Friction is different. It’s the discomfort of learning new tools with old instincts. The tension of making decisions without full information. The awkwardness of changing how you work before you know if it will work. That friction is the transformation. Remove it and you remove the learning.

Drag should be killed on sight. Friction should be studied. Most organizations confuse the two and end up optimizing for the wrong kind of smooth.

Friction as Advantage

Ease shows up after the learning has been absorbed. It's the result of friction already processed. Early on, every step feels awkward because people are using new tools with old reflexes. But the path through awkwardness is the only way to mastery.

Smoothness gained too soon is often a sign that nothing new is happening. The goal isn't to make adoption simple. It's to make experimentation safe and repeatable, so the organization can learn faster than the pace of change.

But friction doesn’t just build capability. It also builds distance.

The harder something is to integrate—culturally, operationally, technically—the fewer competitors will persist long enough to do it. The friction becomes the barrier. The ones who cross it build credibility, capability, and insight that can’t be copied quickly.

Ease levels the field. Friction creates distance. The organizations that can keep moving through rough terrain build resilience that compounds. What starts as struggle becomes advantage, because others give up when the same resistance appears.

In AI transformation, resilience replaces scale as the differentiator. Scale used to win by doing more, faster, cheaper. Now, the edge belongs to those who can stay in motion through ambiguity, re-architecture, and discomfort. Relentlessness compounds; comfort decays.

Listen to the Friction

When things feel hard, that’s information. The friction is feedback. It signals where an organizational muscle is weak—decision speed, trust, data flow, or courage.

The task is to stay with it long enough to learn what it’s saying. Avoid the reflex to smooth it over or declare victory too soon. Every patch of difficulty is a live diagnostic on how your system actually works when tested.

The pain is the data. Ease is silence.

If it’s hard, listen to it. If it’s easy, question it. The job of leadership is not to remove friction, but to ensure it teaches something worth knowing.

AI will keep raising the difficulty level. The organizations that thrive won’t be those that make it easy, but those that make it through.

Two Clocks

2025-10-14T00:00:00Z

Two clocks shape how organizations adapt to AI.

Clock 1 measures how fast capabilities improve. New features and model releases arrive every few months. They write better, plan better, and handle fuzzier instructions than the last version. This pace is set by labs, vendors, and the wider research community. It keeps moving whether you're ready or not.

Clock 2 measures how fast your organization changes how it works because of those capabilities. It includes policy, process, training, confidence, and the work of turning experiments into everyday practice. This clock moves much more slowly in most places.

When Clock 1 runs much faster than Clock 2, a gap opens. That gap is where your organization's fate lies. The smaller the gap, the better.

The catch is that Clock 1 isn't under your control. It's almost entirely external. Conversely, Clock 2 is almost entirely internal. You can't slow the first one down. But you can set the speed of the second.

Clock 1: AI Capability

What Speeds It Up

Most technologies follow an S-curve: a slow start, a period of rapid progress as forces compound (techniques combine, infrastructure matures, costs drop, talent deepens), then a plateau as returns diminish.

You might look at AI today, with new models, techniques, and applications arriving almost every week, and think we’re already in the steep part of the curve. In my view we’re still early, but three forces are pushing us toward that inflection point:

Reasoning and planning Models can now break problems into steps, test alternatives, and adjust when they fail. That makes them useful on genuinely novel tasks, not just variations on familiar ones.

Orchestration across systems Software can route parts of a problem to different specialized models or tools and then combine their outputs. One system may handle legal text, another financial calculations, another plain-language summaries. An orchestration layer breaks down a task, sends each piece to the right system, and integrates the results. A single improvement in one area can lift hundreds of workflows that use it.

AI used to build better AI Teams use AI to write parts of training code, generate synthetic data, test outputs, and speed up experiments. Each contribution is narrow and supervised, but the cumulative effect is faster development. When the tools that build the next generation improve each cycle, the next generation arrives sooner. It is not AI autonomously improving itself; it is engineers using AI as a powerful tool, and even small productivity gains shorten improvement loops.

These forces reinforce one another. Better reasoning makes AI more useful for complex development tasks. Coordination across systems tackles bigger parts of the process. As AI contributes more to its own development, cycle times shrink. Each step makes the next faster. That is what makes a J-curve possible instead of the traditional S-curve plateau.

What Slows It Down

There are brakes, but they mostly delay rather than stop the curve.

Compute and energy constraints limit how large and fast models can grow. Training the most advanced models requires enormous amounts of specialized hardware and electricity.

Regulation may slow releases or constrain what data can be used. Different jurisdictions are setting different rules, adding complexity and compliance costs.

Economics, especially hardware and power costs, can cap how much companies invest. If training runs get expensive enough, fewer organizations can afford the frontier.

Even with these brakes, Clock 1 is unlikely to slow to the pace at which most organizations adapt. The gap between what is possible and what you are doing will keep growing unless you actively work to close it.

Clock 1 Is Not Your Constraint

Clock 1 is fun to watch. The frontier keeps advancing, new capabilities land weekly, and it is easy to imagine what might come next. There is a natural pull to track model releases, debate timelines, and speculate.

Clock 1 is not under your control, and it is not your constraint.

Consider a thought experiment: even if we had AI that was indistinguishable from perfect (infinite capability, zero cost, completely reliable), most organizations would still struggle. Not because the technology failed them, but because they never built the capacity to absorb it. The constraint is not what AI can do. The constraint is what your organization can do with AI.

Treat Clock 1 as the environment, not a project. You cannot manage it. You can only operate within it. Which means Clock 2 is where all your leverage lives.

Clock 2: Organizational Absorption

Clock 2 measures how quickly you turn potential into practice, the time between “this is possible” and “this is how we work.” You can speed it up by understanding what accelerates it and what holds it back.

What Speeds It Up

Speeding up Clock 2 starts with recognizing that different groups move at different speeds and need different things from you.

A small number adopt immediately, maybe two to five percent. These are your innovators: curious, comfortable with ambiguity, and willing to handle rough edges. They need permission and visibility, not much else.

A larger group, maybe fifteen to twenty percent, adopts once they see the innovators succeeding. These early adopters are pragmatic. They want examples, guidance, and reasonably polished tools.

Then comes the majority, maybe sixty to seventy percent. They adopt when it becomes standard practice, when not adopting feels like the exception. They need it to be easy, reliable, and expected.

A small group resists until they have no choice. They need clear expectations and consequences. The hard jump is moving from early adopters to the majority, and most efforts fail because those groups need different things.

Your absorption strategy should match this reality: one approach for innovators and early adopters, and a different one for the majority.

For the innovators and early adopters These are the people who start the campfires. Your job is to make those fires visible and help them spread.

Give them permission to explore. Celebrate what they are learning, not just what they have proven. Make experiments visible through short demos, quick posts, and casual show-and-tell. The goal is not perfect documentation. The goal is judgment—what fits, what does not, how to verify outputs, how to weave AI into real workflows. People copy what they can see working near them; visibility beats memos.

Remove friction. Give access quickly. Let them try without approvals for every attempt. Shield them from bureaucracy. Their value is speed of learning, and friction erodes it.

But do not stop at exploration. Move them from experiments to expeditions. Experiments are about learning; expeditions are about reaching a destination. Hold them accountable for adoption, not just discovery. Success is measured by whether others follow, not just by finding something interesting.

For the majority The majority will not adopt on curiosity or excitement. They adopt when it is the normal way to work, the tools are reliable, and expectations are clear.

For this group, adoption is an expectation. New hires learn it in onboarding. Managers are accountable for how their teams use it. Performance conversations include whether the work is being done with the best available tools.

What Slows It Down

Several predictable habits keep Clock 2 slow, and none of them are structural. They are all choices.

Endless pilots mean learning forever, changing nothing. Perfectionism means waiting for certainty, which guarantees you stay behind. Certainty only comes from production use at scale. Tool fixation means measuring access instead of changed work. Optional culture means making adoption voluntary, which ensures uneven pockets. Each feels safe in the moment. Over time they make slowness permanent unless you actively change them.

Clock 1 explains the pressure; Clock 2 explains the response. When they finally align, the pace of change becomes manageable, and meaningful progress starts to feel normal.

When the Clocks Sync

When your absorption speed roughly matches capability speed, everything gets easier and the work itself changes.

Anxiety drops because you are not always behind. New releases become routine rather than overwhelming. You can evaluate each development on its merits instead of treating everything as urgent.

Judgment improves because you have used the tools enough to know what matters. You are deciding based on what you have actually learned, not guessing from demos.

Real growth appears, not just efficiency gains. Most conversation focuses on efficiency: automation, speed, headcount. Those are real, but they miss the bigger opportunity.

The bigger opportunity is work that was not possible before. Spreadsheets did not just speed up math; they made financial modeling and scenario planning routine. AI offers a similar shift, only larger in scope. When drafting, analysis, or planning take seconds, you iterate more, personalize more, and document more. You ask questions you never asked and explore ideas you would have dismissed.

That expansion of what is possible is where growth comes from. Not replacing people, but enabling higher-value work. Not doing the same with fewer resources, but doing more with the same. Not just cutting costs, but increasing revenue and impact.

Absorption compounds. Each workflow you modernize builds skills, templates, and trust that make the next one easier. Processes become reusable. A culture of continuous adaptation sustains itself.

The gap between fast and slow \"absorbers\" does not stay constant; it widens. An organization six months behind today may be a year behind next year and two the year after if rates stay the same. The fast absorber keeps the edge and extends it.

The inverse is also true. If you are behind and raise your absorption rate, you can close the gap quickly. You can skip to current capabilities and build around those. Being behind only becomes permanent if you stay slow.

Wrapping up

You cannot set the pace of capability improvement, but you can set the pace of absorption.

Clock 1 will keep ticking. The capabilities will keep improving. The pace might slow a little because of external constraints, but it will not slow to match most organizations’ current absorption rate. If you want these clocks to sync, you have to speed up Clock 2.

Start by acknowledging that different groups in your organization move at different speeds. Support the innovators differently than the majority. Make the campfires visible. Build reliable tools for people who are not enthusiasts. Make adoption an expectation, not an option. Think in tasks, not tools. Make AI a reflex, not a special consideration.

Over the next few years, absorption speed will shape your trajectory more than any single model release. The organizations that thrive will not be the ones with perfect information or zero risk, but the ones that decide quickly, change how work gets done, and learn in motion.

The gap between these two clocks will define who grows and who stagnates, not because of any specific AI application, but because of compounding advantages.

Clock 1 will keep ticking whether you act or not. Clock 2 is all on you. Sync the clocks while the gap is still small enough to close.

The Telescope Problem

2025-07-07T00:00:00Z

Why comparing AI to humans can only take us so far.

Seeing Further, Not Just More Clearly

When early telescopes were developed, they were judged by how well they extended ordinary sight. The clearer and sharper the image, the better the tool. At first, the goal was to see farther using the same frame of reference — just more of it. But over time, telescopes changed. They began to detect forms of light and energy the eye couldn’t perceive. Radio waves, ultraviolet bands, gravitational signatures. These instruments were no longer just improving vision. They were shifting how we observed and what we could know.

Artificial intelligence is moving through a similar transition. In the beginning, we evaluated systems by comparing them to familiar human abilities. Could a model solve math problems like a student? Could it write like a journalist or reason like a doctor? These were understandable questions. They helped define scope and give a sense of progress. But like early judgments of telescopes, they reflect a limited frame.

The Human Benchmark

Comparison to human performance has been a useful starting point. It provides orientation when technology is new and evolving quickly. In many cases, it is still the best available proxy for usefulness or safety. But if it becomes the dominant lens, it can mislead us. Systems that appear “better than human” at a specific task may still be hard to trust, difficult to use, or poorly integrated into existing workflows.

Framing AI systems as human equivalents also affects how people respond to them. Telling someone a model is more accurate or more efficient may signal that their role is at risk. When systems are introduced this way — even implicitly — they can generate friction or disengagement. This is especially true in professional settings where identity, experience, and responsibility are closely held.

Terms like “digital worker” or “agentic workforce” often reinforce this problem. They imply symmetry where there is none. They suggest that the goal is to replace rather than support. The result is that the people who are essential to the success of the system may become the ones most reluctant to use it.

Adoption Follows Usefulness

The most widely adopted AI systems today do not succeed because they outperform people on standardized tasks. They succeed because they are adaptable, accessible, and easy to apply to a wide range of real-world situations. ChatGPT is one example. It offers general-purpose assistance without asserting authority. It allows for experimentation without requiring buy-in. GitHub Copilot is another. It integrates into a developer’s existing workflow, offering suggestions without interrupting pace or control.

These systems do not compete with human expertise. They offer support in areas where speed, scale, or flexibility matters. Their success is less about capability in isolation and more about context — how they fit, how they feel, and how quickly they become part of a rhythm.

The telescope analogy applies here, too. The early instruments that tried to sharpen what we already saw were eventually surpassed by those that helped us observe things we could not otherwise detect. But those tools were only valuable if people were willing to use them — and if the insights they generated could be acted on.

The Shrinking Mirror

There is also a limit approaching in how far human comparison can take us. Many benchmark tasks — language exams, coding problems, visual classification challenges — have already been met or surpassed. In more complex domains, there may be no single human baseline to compare to. And in many frontier areas — such as modeling proteins or genomes — systems now operate in ways that have no direct human analogue at all.

As the models become more capable, the frame of human equivalence becomes less meaningful. Not just because it underestimates the systems, but because it under-describes what they are doing. We do not evaluate a radio telescope by how closely it replicates sight. We evaluate it by whether it reveals something we would otherwise miss.

New Instruments

If AI is to reach its full potential, we will need new ways to assess what matters. Instead of asking whether a model is better than a person, we might ask: Does it improve the quality of decisions? Does it reduce the time to insight? Does it surface options that were not previously considered? Does it make people more confident in uncertain conditions?

These are quieter questions. They don’t lend themselves to headlines or leaderboards. But they are the kinds of questions that emerge when AI becomes embedded in actual work, not just in abstract contests.

This shift won’t happen all at once. Comparison to human ability may remain useful for certain regulatory, safety, or onboarding decisions. But over time, our understanding of what makes a system valuable will evolve — just as our understanding of what makes a telescope powerful evolved when we stopped asking it to behave like a better eye.

A Shared Direction

The most powerful tools do not just reflect our capabilities back to us. They help us work in new ways, see new patterns, and act with new perspective. To do that, they need to be introduced with care. Not as replacements for human roles, but as extensions of human systems.

We cannot build communities around systems that feel like threats. But we can build shared momentum around tools that help people feel more capable, more informed, and more involved in the work ahead.

The telescope didn’t make the eye obsolete. It changed what the eye could reach.

AI might do the same — if we stop asking it to mirror us, and start asking what it might help us see next.

Prompt Reengineering

2025-06-16T00:00:00Z

Why the next era of AI is built on context, not clever inputs

“The medium is the message.” — Marshall McLuhan

As McLuhan understood, technologies reshape not just what we can do, but how we think about doing it. When a new technology enters the mainstream, it brings with it not just new capabilities, but new patterns of interaction, ways of thinking, working, and making sense of what’s possible. Some of those patterns endure. Others are transitional; scaffolding that helps us get from one stage to the next.

Prompting has played that role in the early stages of language model adoption. It emerged as both an interface and a skillset, offering a way to bridge the gap between open-ended models and human intent. In the absence of structure, we learned to create our own: writing prompts that resembled mini-programs, layering clarity, tone, and goals into a few lines of carefully constructed text. It worked, but only for those who knew how to do it.

But prompting, as a primary interaction model, reflects a particular moment in the evolution of these systems. It belongs to an era when the model knew very little beyond what was placed directly in front of it, when context was thin, and memory (if it existed at all) was short-lived. In that environment, the burden was placed almost entirely on the user to be precise, complete, and imaginative.

That’s beginning to change. Not because prompting no longer works, but because the system is starting to carry more of the weight. Context now stretches across sessions. Models are increasingly grounded in external sources of truth. Tools and APIs provide structured access to knowledge and actions. And intent detection, once brittle, is growing more robust. All of this shifts the balance between what the user must supply and what the system can infer.

The result isn’t the disappearance of prompting, but its gradual reframing. And that transition, subtle as it may seem, opens the door to a much broader and more inclusive range of uses.

This progression is easiest to see in everyday use. Take a simple task: analyzing last quarter's sales data. Even a few months ago, this required a paragraph of context: “You are a business analyst. Here is our sales data in CSV format. Please calculate year-over-year growth...” Today, you might simply ask: “How did we do last quarter?” The system already knows who you are, can access your data, and remembers what metrics you care about from previous conversations.

This isn’t a rejection of what came before. It’s a continuation, another point on a trajectory that has always moved toward interfaces that ask less, understand more, and quietly adapt to the people using them.

What Changed (and Why Now)?

The move away from explicit prompting isn’t the result of a single breakthrough. It’s the outcome of several quiet but compounding changes in how these systems are architected and deployed. Together, they reduce the amount of work required from the user, not by asking less of the system, but by changing how the system understands what it’s being asked to do.

Perhaps the most significant change is the expansion of context. In the earliest language models, every interaction began from a blank slate. Whatever understanding the model had was confined to a single prompt and the static parameters of its training. Today, context stretches further. Models retain memory across sessions. They reference prior exchanges, retrieve relevant data from external sources, and incorporate tool outputs into their reasoning. This broader context window means that the user no longer needs to re-establish the conversation every time. You can pick up where you left off, just as you would with a colleague.

This development is also deeply tied to the emergence of structured grounding. Rather than relying purely on general training data, models now frequently draw from curated sources: search results, documents, APIs, and internal tools. These connections allow the model to respond not just based on patterns, but based on facts, anchoring responses in specific, verifiable inputs. That means the user’s prompt doesn’t need to carry all the necessary information. It can simply point the model in the right direction.

The third transformation is in intent recognition. Early models could be powerful but brittle, hyper-literal in some cases, distractible in others. Today’s models are better at inferring what you’re likely trying to do and filling in the blanks with appropriate defaults. It’s not perfect, but it’s improving rapidly. And as that capacity strengthens, the need for the user to spell everything out declines.

What we’re seeing isn’t just smarter models, it’s a quiet reallocation of effort. The system is being asked to do more. The user, by design, is asked to do less. The prompt still exists, but it no longer carries the full weight of the interaction. It’s more like a pointer than a payload, enough to set the direction, but not to define the path.

Of course, this transition isn't uniform. Complex creative tasks, specialized domains, and novel requests still benefit from careful prompting. But for everyday interactions (the queries and tasks that make up most usage), the burden is increasingly moving from user to system.

That realignment isn’t always visible, but it’s foundational. It changes what the system needs to be good at. And it changes what the user needs to do to be successful.

The Path Ahead

On the surface, these changes can seem subtle. Less prompting. More memory. A bit more helpfulness in how the model interprets intent. But taken together, they mark a deeper evolution, one with wide-ranging implications for accessibility, adoption, and system design.

First, there’s the matter of access. When success with AI depends on knowing how to phrase things “just right,” the circle of effective users stays narrow. As prompting recedes, the interface becomes more forgiving. You don’t have to be clever. You don’t have to be precise. You just have to show up and ask. That lowers the barrier to entry, not just technically, but psychologically.

It also opens up new patterns of use. Interactions that once felt like single transactions start to behave more like ongoing relationships. You don’t need to reestablish context each time. You can build on what’s already there. That continuity allows for more compound, cumulative work and makes it easier to embed AI into the rhythm of existing workflows.

For organizations, this prompts a rethinking of how systems are evaluated and built. If prompting is no longer the bottleneck, then the emphasis moves to what surrounds the model: memory, orchestration, tool routing, knowledge integration. These aren’t peripheral concerns, they’re the new core. Designing the environment around the model becomes just as important as tuning the model itself.

This transition also matters because it makes room for a different kind of scale. Systems that rely on carefully crafted input don’t scale well across a diverse user base. In fact, strong prompting requirements often work against broad adoption. Systems that learn from usage, that hold state, that retrieve and route intelligently, those can be used broadly without specialized instruction. They’re more robust to variation. More resilient to ambiguity. More aligned with how people actually work.

This movement points toward a future where AI isn't just more capable, but more equitable. When grandparents can get the same results as prompt engineers. When non-native speakers aren't penalized for phrasing. When expertise means knowing your domain, not how to talk to the model, that's when these tools achieve their real promise.

This is how AI becomes something more than a tool to be mastered. It becomes an environment to step into: one that adapts to you, not the other way around.

Think Different

2025-05-28T00:00:00Z

Agents Don't Work Because They’re Like Us. They Work Because They Aren’t.

The path to AI agents in the enterprise isn't about making them more like us—it's about understanding why their fundamental differences create value.

Writing Amplified, AI Transforms

When writing was invented, Socrates feared it would destroy memory. And in a sense, he was right—we stopped memorizing epic poems and genealogies. But writing didn't diminish us; it freed our minds to build complex arguments, track ideas across centuries, and, surprisingly, write more than ever.

Today, the average person writes more in a week than their grandparents did in a year. The tool we built to record thought transformed us into people who think through writing. We didn't just adapt to the technology; we co-evolved with it.

AI agents are poised to drive a similar transformation—not by mimicking us, but by amplifying us precisely because they're fundamentally different.

Amplification Becomes Transformation

Throughout history, tools initially amplify human capabilities and then profoundly transform them. Clocks, built for simple timekeeping, enabled synchronized human behavior, industrial civilization, and the very concept of \"being late.\" The printing press, intended to replicate manuscripts faster, democratized knowledge, creating newspapers, scientific journals, and mass literacy. Microscopes, designed for enhanced sight, revealed invisible worlds, reshaping our view of humanity as ecosystems rather than isolated individuals.

Each innovation followed the same pattern: amplification led to transformation, and each tool's uniquely nonhuman qualities became central to new human capabilities. This same pattern is unfolding now with AI agents.

Three Tiers of Intelligence

Early enterprise AI efforts tried making agents think like humans. But successful adopters have found greater value by embracing AI's fundamentally different intelligence, settling into three distinct operational tiers:

Tier 1: Operational Liberation (Agent-to-Agent) AI agents handle routine tasks with infinite patience and precision. No ego, no fatigue, no boredom—perfectly coordinating supply chains and financial systems. This doesn't replicate human thinking; it liberates human attention from tasks that never benefited from human judgment.

Tier 2: Consequence-Free Exploration (Human-to-Agent) Collaboration with AI agents creates an intellectual safe space. Agents feel no shame, fear no embarrassment, and protect no reputation. Humans can freely explore risky ideas—testing radical strategies, reorganizations, or business models—without social consequence.

Tier 3: Enhanced Human Judgment (Human-to-Human) Human interactions become richer because people arrive better prepared. After extensive agent collaboration, humans engage more deeply with the complex, nuanced decisions that genuinely move markets.

These tiers aren’t hierarchical but complementary, leveraging the nonhuman strengths of AI to amplify human cognitive capacity.

Compound Cognitive Capacity

Each tier compounds human capacity uniquely:

Operational clarity from agent-to-agent interactions frees cognitive bandwidth.

Human-agent collaboration expands cognitive exploration, enabling rapid iteration without social costs.

Human-to-human interactions become more sophisticated, focusing on complex judgments and decisions rather than basic information transfer.

Successful enterprises recognize that AI's lack of ego, comfort with repetition, and immunity to social pressure are strengths, not limitations. They're building systems around these differences as cognitive multipliers.

The Necessary Cultural Shift

Technology alone won't drive transformation. Organizations struggling with AI adoption often try integrating AI into existing human workflows. Successful adopters redesign workflows around complementary intelligences.

This cultural shift requires:

Clearly distinguishing \"laboratory\" work (human-agent) from \"public\" work (human-human), reframing failure with agents as iteration rather than risk.

Adjusting performance metrics, rewarding quality of problems explored and contributions to collective judgment, not just individual output.

Cultivating dual trust: trust in AI agents to explore freely and trust in colleagues to value and respect refined thinking.

Whither Humanity?

AI won't simply push humans \"up\" to creativity and empathy. Instead, humans will increasingly engage with irreducibly human dilemmas—ethical choices, market-entry risks, stakeholder negotiations—where multiple valid solutions exist. These messy, complex interactions become central to value creation, precisely because they're human.

The frictionless clarity provided by AI makes these genuinely difficult challenges more visible, frequent, and essential. Advanced adopters already find that work feels harder because what's left—the complex human dilemmas—is inherently difficult and valuable.

Preparing for Ambiguity

Leading organizations are shifting talent strategies, valuing people comfortable with ambiguity, conflicting ideas, and incomplete data. Expertise becomes cheaper; judgment becomes premium.

Teams move away from pure functional expertise toward \"productive tension\"—diverse perspectives generating insight through friction. Humans excel in divergent thinking (finding the right questions) while AI handles convergent thinking (finding correct answers).

Amplifying the Irreducibly Human

AI doesn't work because it thinks like us—it works precisely because it doesn’t. The enterprises succeeding with AI understand this deeply. They're not waiting for better AI; they're creating cultures and structures that amplify human capabilities through AI’s unique strengths.

Just as writing transformed humanity by amplifying memory, AI agents will transform us by amplifying our most profoundly human capabilities. The question isn't whether AI will transform work. It's whether we'll transform ourselves to harness what AI uniquely makes possible. And that transformation—like every one before it—isn't about technology. It's about embracing tools whose differences reveal human strengths we never knew existed.

The New Builder Revolution (Won't Be Televised)

2025-04-29T00:00:00Z

Software creation is changing. Counterintuitively, the next wave of innovation probably won’t come from where you expect.

A Quiet Start

Not every major shift announces itself. Sometimes change builds slowly, in the background, before most people realize what has happened. Today, AI is starting to reshape how software is built. The tools are spreading beyond traditional developers, lowering barriers and making it possible for more people to create working digital products.

At first glance, this shift is easy to dismiss. The workflows feel unfamiliar. The early outputs seem rough. But like past shifts in technology, the real change happens well before it looks polished.

A New Class of Builders Is Emerging

AI is expanding the circle of who can build functional software. New builders are emerging from adjacent fields like design, product management, and operations. Some are coming from entirely different backgrounds. They’re using AI tools to generate code, refine it, and ship real applications. They may not follow traditional development workflows, but they are solving real problems and delivering real products.

In many ways, they behave more like digital creators than engineers. They move quickly, iterate openly, and focus less on writing perfect code and more on achieving real outcomes.

There are two early signs that a real shift is underway: first, the reaction of the old guard; and second, the speed at which tool innovation is outpacing institutional adoption.

1. The Old Guard Is Reacting Predictably

Every time technology shifts, those most invested in the old ways react with skepticism. When cloud storage first appeared, storage engineers dismissed S3 as unreliable. When EC2 launched, many IT leaders said it might be useful for development work but would never be trusted for production. Today, as AI-assisted development grows, traditional engineers often respond in much the same way. They label it “vibe coding” and question its seriousness.

These concerns shouldn’t be dismissed outright — they often reflect deep knowledge about stability, scalability, and security. But similar resistance has accompanied every major shift in computing. Historically, the louder the skepticism, the more profound the change tends to be.

2. Tool Velocity Is Outrunning Institutions

The second indicator is the pace of tool innovation. AI tools are improving faster than institutions can evaluate, integrate, or absorb them. Every few months, new models, frameworks, and assistants appear — each expanding what individuals can accomplish. Organizations, built around slower cycles of evaluation and adoption, are falling behind.

When technology moves faster than institutions can adapt, advantage shifts. The newcomers — unburdened by legacy systems and process — are often better positioned to act.

Innovation Will Come from Unexpected Places

The most successful organizations in this new era won’t necessarily be the largest or the best known. They’ll be the ones that recognize change early and move without friction. They’ll have enough experience to understand where older methods still matter — but enough flexibility to abandon them when better options appear. They’ll avoid being trapped by process for its own sake. They’ll focus on outcomes over orthodoxy.

This is not a new pattern. Amazon built AWS before “cloud computing” was even a noun — simply because it identified an internal need and moved without waiting for external validation. The teams that succeed now will share that same DNA: pragmatic, unencumbered, and willing to rethink what building looks like.

Skills and Mindsets Are Evolving

The skill set required to succeed is changing. Deep technical knowledge remains important, but it is no longer the only path to meaningful contribution. Skills like problem framing, judgment, iteration speed, and the ability to steer AI tools are becoming just as essential.

Building effective software now looks less like crafting every detail by hand and more like shaping and refining through fast feedback loops. The comparison to early YouTube is instructive: the most successful creators weren’t the ones with the best cameras or editing tools, but those who understood how to work with the medium as it evolved.

The same is happening with software. Teams that can experiment rapidly, respond to feedback, and discard failing ideas without friction will move faster than those tied to traditional rhythms. Playfulness — in the sense of low-cost experimentation and resilience to failure — is becoming a competitive advantage.

This Isn’t a Zero-Sum Shift

This isn’t a binary handoff where traditional developers are replaced by AI-assisted newcomers. Instead, a spectrum is emerging, with innovation happening across the full range of contributors.

Traditional software development is experiencing its own renaissance — powered by tools like Warp, Claude Code, Q Chat, and GitHub Copilot X. These tools aren’t replacing expertise — they’re amplifying it. Debugging is faster. Documentation is easier. Complex implementations are more efficient. Developers are integrating AI into their workflows, becoming dramatically more productive while maintaining professional standards.

At the same time, product managers, designers, and domain experts are using AI to build solutions that once required dedicated engineering teams. The result isn’t substitution — it’s expansion. More people building. More ideas shipped. More creative and technical leverage across the board.

While some predict AI will reduce the need for developers, history suggests otherwise. When creation becomes more accessible, we don’t see fewer creators — we see more creation, more specialization, and more innovation.

Software Is Becoming a Creative Medium

Software development is no longer just a technical exercise. It’s becoming more fluid, expressive, and collaborative. Lower barriers to entry are pulling more people into the act of building. The cost of trying new ideas is dropping. Entirely new categories of digital products and workflows are starting to emerge.

And the shift is happening not just at the tool level, but at the platform level too. Canva Code, launched recently, is a striking example — a visual, drag-and-drop environment where anyone can assemble real software experiences without writing traditional code. It points to a future where digital creation is as accessible — and as creatively open-ended — as building a presentation or editing a video.

The change mirrors what happened with photography when smartphones put cameras in everyone’s hands: not just better pictures, but new genres, new habits, and new forms of expression. Software is following the same path.

Alli McKee launching Canva Code at Canva Create 2025

The Future Is Already Being Built

The new builders are not waiting for approval. They’re using the tools available now, moving quickly, and learning in public. Their work doesn’t always look familiar or polished — but history suggests that by the time these changes are widely recognized, the landscape will already have shifted.

For organizations navigating this moment, the way forward isn’t wholesale replacement — it’s thoughtful integration. The most successful will:

Identify where AI amplification delivers the most immediate value

Create space for experimentation outside traditional workflows

Build bridges between old and new approaches

Focus relentlessly on outcomes, not methods

The key insight isn’t that traditional development is going away — it’s that the surface area of who can build, and what they can build, is expanding dramatically. The organizations that thrive will embrace both the rigor of established practices and the possibility of new approaches.

The revolution won’t happen overnight, and it won’t erase what came before. But those who recognize the shift — and lean into it with clarity and intent — will help define what comes next.

The revolution won’t be televised. But it’s already underway.

Revert to Type: The Unexpected Rise of Text-First Interfaces

2025-03-10T00:00:00Z

For decades, the trajectory of technology has been to make interactions more visual, more graphical, and seemingly more intuitive. From GUIs to touchscreens to voice interfaces, we’ve optimized for ease of use. And yet, there’s an unexpected countertrend emerging: text is becoming the interface of choice for some of the most advanced workflows.

At the same time, the cost of working with text—generating it, analyzing it, translating it, summarizing it—is collapsing toward zero. Large language models can now write functional code, generate detailed reports, extract insights, and automate complex tasks in seconds.

Counterintuitively, even as text-based work is increasingly automated, the value of text as an interface is on the rise.

The iPhone and the Cost of High-Fidelity Interfaces

The rise of the iPhone was largely due to a revolutionary user interface—rich, fluid, and intuitive. Early on, it was even skeuomorphic: ornamental design elements derived from real-world counterparts (the Notes app resembling a yellow notepad, or the Calendar app looking like a leather-bound desk calendar). This approach helps build a 'bridge' to new interfaces, making it easy for users to explore, understand, and interact intuitively. It also looks great.

This level of fidelity, however, comes at a high cost. Rich interfaces require significant design effort, engineering resources, and ongoing maintenance. But because the value is also high, great apps with polished, visually rich interfaces have thrived on the iPhone as a platform.

Contrast that with today’s new platforms—LLMs and AI-powered systems. For these, the need for high-fidelity graphical interfaces is dramatically lower, sometimes even superfluous. Their power comes not from glossy visuals but from raw functionality, flexibility, and intelligence. A well-crafted prompt can accomplish more than an elaborate UI, and systems like Claude Code and Amazon Q demonstrate that plain-text interactions can be remarkably effective.

Why Text as an Interface Works So Well

There’s a reason tools like Claude Code, Amazon Q, Warp, and terminal-based development environments are gaining traction: text is a powerful, flexible, and efficient medium for interaction. Consider the following benefits:

Transparency & Inspectability – Text naturally preserves history. You can scroll back, see every step of reasoning, every command executed, and every output. No need for special debugging tools; the record is just there.

Composability & Chaining – Unix popularized the idea that small, simple text-based commands can be strung together to perform sophisticated operations. Text-based AI tools are reviving that philosophy, making it easy to pipe the output of one process into another. Indeed, this is how most agents systems work.

Ease of Iteration – Graphical interfaces often require clicking through layers of menus to make small changes. With text, making modifications is as simple as editing a line and rerunning a command.

Rapid Development & Deployment – Building features in text interfaces is often faster and less resource-intensive than designing full graphical experiences. This agility is crucial for AI-assisted workflows, where iteration speed is key.

Universal Compatibility – APIs, scripts, and automation tools all natively understand text. A text-based interface is inherently easier to integrate with other systems than a proprietary graphical one.

Personalization & Adaptability – Rigid graphical interfaces tend to be polarizing—you either love them or hate them. Text-based interfaces, on the other hand, can dynamically adapt to user preferences. Whether you prefer more or less information, a terse or verbose style, or a structured or freeform layout, text can be customized in ways that fixed UI elements cannot.

Text-First Interfaces: A Time-Tested Approach

While this trend might seem novel, text-based expert systems have long been dominant in domains where speed, accuracy, and flexibility matter most. Consider:

Vim & Command Line Tools – Developers have long preferred text-based environments for coding because they are fast, scriptable, and infinitely extensible.

Airport Consoles – Airline employees rely on cryptic but powerful text-based reservation and scheduling systems to manage millions of passengers efficiently.

Bloomberg Terminals – Financial professionals depend on dense, text-heavy interfaces to access real-time market data, execute trades, and analyze trends.

Hotel & Banking Systems – Many backend systems in the hospitality and financial industries still prioritize text interfaces due to their speed, reliability, and structured workflows.

All of these examples demonstrate that when performance, transparency, and speed matter, text can provide a superior interface. Now, with AI making text manipulation nearly effortless, we are seeing a renaissance of these principles applied to a broader range of tools and use cases.

Chatting with Amazon Q on the command line (retaining anthropomorphic, first-person personality, and an interesting \"acceptall\" mode)

A New Design Language is Emerging

We are witnessing the rapid development of a new design paradigm—one that is text-heavy and function-first. Prompts are just the beginning. Expert AI systems like Claude and Q are moving beyond the simple chatbot model into minimalistic, structured text-based interfaces that prioritize clarity and efficiency over personality or aesthetics. Unlike the anthropomorphic approach of early AI, these tools are designed for precision, offering concise responses, deep insights, and structured outputs with minimal fluff.

This shift reflects a move away from designing interfaces for casual consumers toward interfaces built for power users who demand efficiency, transparency, and control. Instead of asking AI to mimic human conversation, these systems are being honed to function more like powerful Unix-style utilities—modular, predictable, and capable of being combined in creative ways.

The Discoverability Tradeoff: Why It Matters Less in Expert Systems

One of the biggest arguments in favor of graphical interfaces is discoverability—icons, menus, and tooltips guide users through what’s possible. With text-based systems, the concern is that users need to know what to type or ask for in order to unlock the system’s potential.

But in enterprise and expert contexts, this matters far less:

Expectation of Training – Unlike consumer apps, where zero onboarding is the goal, expert systems assume a level of proficiency. Bloomberg terminals, hotel and airline reservation systems work because professionals are trained on them.

Embedded Knowledge & AI Assistance – AI-infused text interfaces can provide inline suggestions, autocomplete, and adaptive prompts to help users discover functionality naturally over time.

Efficiency Over Exploration – In expert settings, users often prioritize speed and precision over discovery. A financial analyst doesn’t want to browse; they want a fast path to the exact data they need.

This suggests that while text interfaces may feel \"hidden\" at first, they actually align well with environments where mastery, not casual exploration, is the priority.

Claude Code, complete with interesting text-first design elements (note lack of anthropomorphic features or first-person commentary)

The Economics of Text-First Interfaces: Faster, Cheaper, More Capabilities

One of the most overlooked advantages of text-based interfaces is their economic impact. Lowering the cost of delivering new capabilities doesn’t just make existing workflows more efficient—it fundamentally changes the rate at which innovation happens.

Lowering the Cost of Feature Delivery

Traditional software development requires designing, testing, and maintaining complex graphical interfaces, which can be a bottleneck for shipping new features.

Text-based systems, especially AI-driven ones, allow for rapid deployment of new commands, integrations, and workflows with minimal UI overhead.

This means developers and enterprises can push new capabilities faster and at lower cost, without worrying about UI constraints or redesigns.

The Flywheel Effect of Faster Integration

Because adding new features is cheaper, more features get built.

Because more features get built, users demand (and expect) more integrations.

Because integration is easier in a text-based paradigm, systems become more interconnected and composable.

The result? An acceleration of innovation cycles, where new functionality is not just possible, but inevitable.

A Shift in Software Economics

In the GUI era, software differentiation often came from how features were presented and designed.

In a text-first paradigm, differentiation comes from what features exist and how seamlessly they integrate with other tools.

The winners will be the platforms that enable rapid, frictionless innovation—where capabilities, not just UI, define competitive advantage.

This economic shift suggests that as text-based expert systems gain traction, we won’t just see more efficient workflows—we’ll see an explosion of new capabilities emerging faster than ever before.

Warp's new 'Dispatch' mode, includes help with installation, coding, and explanations.

The Future of Text-Based Interaction

We’re at an inflection point where interacting with computers via text isn’t just about writing prompts; it’s about treating text as the interface itself. The traditional terminal is being reimagined with AI-assisted workflows, and we’re seeing the resurgence of command-line-like efficiency in new contexts.

Rather than designing ever-more-complex visual experiences, we might find that the most powerful interfaces are the simplest ones—just words on a screen. And as AI continues to drive the cost of text manipulation toward zero, the strategic advantage will shift toward those who know how to harness text’s unique strengths.

The question isn’t whether text will replace graphical interfaces entirely—it won’t. But in an age where AI can work seamlessly with natural language, we may find that returning to text-based interactions gives us more control, clarity, and speed than we ever expected.

Epilogue & Notes

A few further thoughts.

1. The Future of Text-Based Interfaces: Dynamic UX on Demand

Right now, the shift toward text-based expert systems is driven by efficiency, transparency, and adaptability. But as AI advances, we may see an evolution where traditional UX isn't eliminated, it just becomes dynamically generated when needed. Instead of static apps with predefined user experiences, AI could generate tailored interfaces on demand, optimized for each task, user, or context. AI systems could infer your preferred level of detail, structure, and interactivity, delivering a UI that adapts in real time (I'm pretty certain this is the path Apple and others will take, building on SwiftUI and UIKit).

The paradox here is that while text-based interfaces may replace some traditional UI, AI may make UI more fluid and context-aware than ever before—rendering it precisely when and where it’s actually useful.

2. The Limits of Text-First UX: Where Visual Richness Still Wins

There are, of course, entire fields where text-first interaction simply isn’t enough, or where visual interfaces are not just useful, but necessary. It’s hard to imagine a text-first Photoshop where you type “increase contrast” instead of using sliders and tools. While AI is advancing in creative domains, visual interfaces remain essential for tasks like design, video editing, and animation.

Architecture, engineering, and game development rely on manipulating complex visual objects. AI might assist, but text alone won’t replace these workflows. Still other software isn’t about efficiency but discovery. Think of music production, video editing, or scientific visualization—fields where interaction and iteration are inherently non-linear.

I could also imagine a hybrid world: could we see a text-first Photoshop for expert users, where someone types “remove background, enhance shadows, sharpen edges” instead of manually adjusting layers? Maybe. I'm sure someone will try (and I'll be first in line).

:wq

Reclaim The Silo

2025-03-04T00:00:00Z

Why specialized teams—not isolated kingdoms—accelerate enterprise AI

For years, \"silos\" have been a dirty word in enterprise strategy. The idea that different teams, departments, or systems operate in isolation—building their own solutions, using separate tools, and failing to communicate—has been framed as a fundamental flaw.

But when it comes to AI impact, we need to rethink this. Let's be clear: I'm not advocating for communication breakdowns or duplicate work. I'm championing focused teams with specialized expertise that can move fast and build AI that actually works.

These aren't your grandfather's silos. These are centers of excellence with the freedom to innovate.

Why Silos Work for AI

The right kind of silos—specialized teams with deep domain knowledge and the autonomy to act—accelerate AI impact in ways centralized approaches can't match.

The rapid evolution of AI isn't happening in a centralized, top-down manner. Instead, it's emerging from specialized applications where expert knowledge meets specific business challenges. AI adoption isn't just about rolling out a general-purpose chatbot across the enterprise; it's about building tailored, high-value expert systems that deliver meaningful impact.

And for that, silos are not a bug—they’re a feature. Here’s why.

Hyper-Specialization Drives Better AI Models AI thrives on domain expertise. A finance department using AI to detect fraud has vastly different needs from a customer support team deploying AI-driven ticket resolution. When teams work independently, they can fine-tune models to their exact requirements rather than being constrained by a one-size-fits-all solution.

Faster Experimentation, Less Bureaucracy Centralized AI initiatives often get bogged down in committees, policy reviews, and cross-functional debates. But when teams have autonomy, they can move quickly—experimenting with different models, integrating them with their workflows, and refining them based on real-world use cases. This speeds up adoption where it matters most.

Localized AI Wins Create Enterprise-Wide Momentum Success in AI is often incremental. A high-performing AI model in one department can serve as a proof of concept for others, leading to organic adoption rather than forced mandates. When one silo successfully deploys AI, others take notice and follow suit—adapting the lessons learned to their specific needs.

Resilience Through Redundancy A single, monolithic AI system is a single point of failure. If it underperforms, the entire organization suffers. But when multiple teams develop their own AI solutions, an underwhelming chatbot in one division doesn’t derail an effective supply chain optimization model in another. AI silos create resilience by ensuring that innovation is diversified across the organization.

These advantages of specialized teams—better models, faster experimentation, organic momentum, and resilience—create a compelling case for domain-focused AI. But to truly maximize impact, organizations need a framework that balances autonomy with strategic coordination.

Rethinking AI Strategy: From Adoption to Impact

The real advantage of specialized teams isn't just faster AI adoption—it's delivering transformative business outcomes that centralized approaches struggle to achieve. Here's how domain-focused teams drive superior AI impact:

Domain-specific innovation outperforms generalized solutions. When teams deeply embedded in specific business functions build AI, they discover breakthrough applications that central AI teams would never identify.

Accountability drives real-world performance. Specialized teams feel the direct consequences of their AI's performance, creating a powerful incentive for continuous improvement. When the same team that builds the AI must live with its results, models evolve faster and perform better.

Business metrics replace technical benchmarks. Domain-embedded teams naturally optimize for business outcomes rather than technical metrics. They're less likely to celebrate model accuracy improvements that don't translate to business performance.

Specialized teams break through the last-mile problem. The biggest challenge in AI isn't building models—it's integrating them into workflows where they create value. Teams with deep domain expertise overcome this by designing solutions that fit seamlessly into existing processes.

One potential risk of embracing specialized teams is the unintended creation of fragmented organizational structures, where duplication of effort and knowledge silos lead to inefficiencies. But specialized doesn’t have to mean disconnected. By explicitly designing clear channels for regular cross-team communication—such as quarterly cross-functional AI showcases or knowledge-sharing hubs—organizations can avoid reinventing the wheel.

The Data Paradox: Boundaries Create Flow

The conventional wisdom suggests that breaking down all data boundaries leads to better AI. The reality is far more nuanced and sometimes contradictory. Unlike traditional IT projects, AI development benefits from data friction—when applied strategically. Consider these counterintuitive principles:

Strong governance accelerates innovation. The most innovative AI teams thrive not with unlimited data access, but with crystal-clear boundaries. When governance clearly defines what's permissible, teams spend less time in legal consultations and more time building.

Data curation trumps data pooling. Specialized teams excel at creating high-quality, contextually relevant datasets that outperform massive but unfocused data lakes. Finance teams don't need access to all customer data; they need the right customer data. Domain experts applying rigorous standards to smaller datasets frequently produce more powerful models than centralized teams working with everything.

Controlled experimentation requires boundaries. When every team can access all data, governance becomes so complex that innovation slows to a crawl. Specialized teams with clearly defined data domains can move faster because their scope is manageable.

Domain understanding creates better data products. When teams own both their AI solutions and the data that powers them, they create more consumable data products for others. They understand how data will actually be used, not just how it's structured. Rather than raw data dumps, they produce contextually enriched information that adds value across the organization.

This doesn't mean all centralized data lakes should be abandoned—only that specialized teams should prioritize curated data quality and relevancy over sheer volume.

The Bottom Line

Let's reclaim the word \"silo.\" In AI, we need specialized teams with the freedom to innovate within their domains—not isolated islands, but focused powerhouses that share knowledge while maintaining their specialized edge.

The instinct to break down all organizational boundaries is understandable, but when it comes to AI impact, a more nuanced approach is needed. AI works best when it is specific, contextual, and tightly integrated with the expertise of the teams using it. Strategic specialization makes that possible.

Instead of flattening everything into a single, slow-moving AI initiative, let each specialized team become a launchpad for innovation. The organizations that recognize this will implement AI faster, more effectively, and in ways that actually matter.

The Trust Deadlock

2025-02-24T00:00:00Z

In the early days of any groundbreaking technology, there’s a familiar stumbling block: the “trust deadlock.” We want to take advantage of the new capabilities, but we’re not entirely sure how or whether they’ll work reliably. Without trust, people are hesitant to try—or they try it only once and abandon it at the first hiccup. Yet trust can’t be built without real-world usage. It’s a classic chicken-and-egg problem: How do you get adoption when trust is low, and how do you build trust without adoption?

To see how we might solve this problem in the context of AI, it’s useful to look at another technology that once felt equally experimental—ride-sharing and delivery apps. Let’s see how ride-sharing overcame this ‘trust deadlock,’ and what that means for AI today.

A Personal Anecdote: The Early Days of Uber and DoorDash

I still remember when I first started using Uber. It was a revelation: I could press a button, and in moments, a driver on the other side of town would begin heading my way. In theory, this saved me a lot of effort. No more standing in the rain trying to hail a cab. But in practice, I spent just as much time glued to my phone, watching the tiny car icon move across the map.

Why? Because my trust was low and my inexperience was high:

Unfamiliar process. This was completely new territory: I had no idea how accurate the estimates were or whether the driver would take the right route.

Lack of confidence in outcomes. Would they arrive at the right entrance? Was my delivery going to get lost?

In those first dozen rides, I monitored everything. And while it cost me the same amount of time as sitting in a taxi might have, I gained something more valuable: intuition. I saw when delays happened (traffic, missed turns, other drop-offs), and I learned how the system behaved. Little by little, that knowledge boosted my confidence. I trusted the platform more, so I used it more—and over time, the process became second nature. Eventually, I only checked the app if something felt unusual. My trust had caught up with my usage, and that’s when the efficiency really kicked in.

AI in the Same Boat

Today, we see a parallel journey unfolding with AI. From large language models to agents, people are intrigued but not always sure how—or if—they should rely on these systems.

Unfamiliar Process: Many of us don’t fully understand how AI reaches its conclusions, much like not knowing what goes on under the hood of a ride-share app. It can feel unsettling to trust something so opaque.

Uncertain Outcomes: We worry about AI “hallucinations,” bias, or errors. Until we see enough successful outcomes for ourselves, we remain cautious.

Building Intuition: Just as we once hovered over the map to confirm our driver was on track or critiqued the route used to deliver our ramen, AI adopters rightly scrutinize every output to see if it “makes sense.” This vigilance, while time-consuming at first, is how we build intuition—and, eventually, trust.

Why Transparency Matters

A big reason we grew comfortable with ride-sharing is transparency. We see who’s picking us up, watch their route in real time, and get notifications if there’s a delay. With AI, a similar kind of visibility can help break the trust deadlock:

Some AI tools and research prototypes allow us to see a “chain of thought” or the step-by-step reasoning process the AI uses to arrive at an answer. It’s the equivalent of watching the driver navigate on the map. If you can see how the system is reasoning, you gain a deeper understanding of potential bottlenecks or errors, which builds trust.

While models themselves may be hard to inspect, we can build intuition and trust into our systems by exposing 'the route' and progress as the models make progress. By making AI systems less of a black box and more of a transparent system, developers help users build that crucial intuition. Just as a driver stuck at a pickup location might be a sign something’s amiss, a model’s sudden spike in perplexity—or contradictory chain-of-thought steps—can be an early indicator that you need to step in.

Practical Steps to Break the AI Trust Deadlock

Offer Low-Stakes Environments: Create sandboxes or pilot programs where people can test AI solutions without risking major consequences. In Uber, the app gives you an overview of drivers near by as an example. It's not 100% accurate, but it gives me a very quick sense of what to expect without asking anything of me.

Real-Time Feedback and Notifications: It's hidden by default, but watching the chain of thought unfold in real time is a big win for transparency and trust. Same for Perplexity showing the steps it will take to answer a question and the sources inspected. It's not visual chrome, it's very deliberate transparency designed to engender trust.

Focus on Solving Real Problems: People flocked to Uber and DoorDash because they solved genuine, everyday hassles in a compelling way. The technology (GPS, mobile apps, and so on), was an implementation detail.

Highlight Early Successes: A quick “aha” moment can turn skeptics into believers. Surface examples of where AI performs exceptionally well: accurate translations, spot-on recommendations, or time-saving automations. These small wins pave the way for deeper trust. Deliver early. Deliver often.

Encourage Incremental Adoption: Just as I eventually stopped watching the map after I learned how rides usually go, AI adoption can progress from meticulous supervision to a more laid-back approach. At first, users might check every output for accuracy. Over time, they’ll gain the confidence to let the AI work more autonomously (while still checking the output).

Err on the side of transparency: Some worry transparency could create information overload. However, like ride-sharing apps that thoughtfully surfaced only key information (driver location, ETA) while hiding complexity, AI interfaces can be designed to show meaningful insights without overwhelming users. The greater risk lies in insufficient transparency rather than too much.

While thoughtful transparency provides the foundation for trust, equally important is helping users develop the knowledge and skills to interpret what they're seeing.

User education plays a crucial role in establishing trust. Much like how ride-sharing companies created simple tutorials and clear expectations, AI adoption requires calibrated expectations, guided first experiences, progressive disclosure, and creating spaces where users can share tips and best practices.

A model wrapped in an agent wrapped in a web app probably won't do the job. Studies show that users with even basic AI literacy are more likely to form appropriate trust levels—neither over-relying on AI nor dismissing its capabilities entirely.

We’re Still in the Early Stages

It’s important to remember that today’s AI technology, despite its astonishing capabilities, is still in its infancy. We’re figuring out the best ways to integrate models like ChatGPT into our daily workflows and how to measure and present metrics like perplexity so that they’re meaningful to end users. This journey will likely involve iterative improvements in transparency, user experience, and reliability.

At the heart of this evolution is the realization that trust begets usage, and usage begets trust. We saw it happen with ride-sharing and delivery services: once enough people experienced a smooth ride—or a perfectly delivered meal—they came back for more, enabling the platform to grow and refine its offerings.

To track progress in breaking the trust deadlock, consider metrics that capture both behavioral and attitudinal aspects: frequency of use, depth of engagement, and willingness to try new AI feature, how often users override or modify AI suggestions (decreasing over time indicates growing trust), explicit user feedback on their confidence in AI outputs, and so on.

Trust Varies by Context and Stakes

While the ride-sharing analogy helps us understand basic trust-building mechanisms, AI applications span a much wider spectrum of risk and impact. Different contexts demand different levels of trust.

In medical diagnosis, financial decisions, or safety-critical systems, users rightfully demand near-perfect reliability and extensive transparency. The trust bar is may require formal verification, regulatory approval, and extensive real-world testing.

While for business intelligence, customer service, casual assistance, or productivity tools, users typically need confidence in overall reliability while accepting occasional minor errors.

Each context requires tailored trust-building approaches. For high-stakes applications, extensive pre-deployment testing, third-party validation, and robust monitoring may be necessary. For lower-stakes tools, highlighting the system's limitations while emphasizing its benefits can set appropriate expectations.

We haven't fully figured out these context-specific trust mechanisms yet, but as AI matures and becomes more integrated into critical systems, we'll develop more nuanced frameworks for building and recovering trust across different domains.

The Road Ahead

For AI, the same dynamic is playing out. We need to break the trust deadlock through a combination of compelling use cases, transparent interfaces, and reliable performance. The technology under the hood can be cutting-edge and revolutionary, but if end users don’t trust it—or don’t know how to use it effectively—it won’t gain widespread adoption.

As developers, innovators, or simply curious users of AI, our mission is to provide enough openness and reassurance so that everyone can comfortably take that first “ride,” watch for potential hiccups, and eventually incorporate AI into their daily routines without a second thought. With each successful ride—or accurate AI output—trust grows, and the technology becomes another seamless part of life’s journey.

The ROI Trap

2025-02-17T00:00:00Z

Are we measuring the wrong things?

\"What's the ROI on electricity?\" Factory owners asked in 1910s America. As historian Paul David noted in his seminal work on technology adoption, it took decades for American factories to realize electricity's true value—not because the technology wasn't transformative, but because they were measuring the wrong things.

Samuel Insull, Thomas Edison's former secretary who built Chicago's electrical system, recalled endless meetings with factory owners fixated on comparing the direct cost per horsepower of electric motors versus steam engines. \"They couldn't see that the real advantage had nothing to do with the cost of power,\" he wrote. \"It was about restructuring the entire way we thought about manufacturing.\"

This question, absurd as it sounds today, was commonly asked by factory owners in the early 1900s. Many initially saw electricity as merely a replacement for steam power, calculating its value through the narrow lens of energy costs. They missed the revolutionary impact it would have on manufacturing, the workplace, and productivity.

Today, we may be making a similar mistake with AI.

Historical Parallels

Consider how we think about internet connectivity in modern organizations. When was the last time you saw a CFO ask for an ROI calculation for providing high-speed internet to employees? We understand intuitively that network connectivity is a fundamental enabler of productivity, innovation, and employee satisfaction. Today, if the internet isn't available, millions of people literally can't get their jobs done. This isn't just a utility—it's a necessity. Companies that tried to economize on internet connectivity in the early 2000s quickly found themselves at a disadvantage. Slow internet became more than an inconvenience; it became a reason why talented employees left for competitors. The same pattern is likely to emerge with AI.

And again with the Cloud...

The evolution of cloud computing offers another instructive parallel. In the early 2010s, many organizations approached cloud adoption with the same ROI-focused mindset we see with AI today. They created detailed cost comparisons between on-premises servers and cloud installations, before concluding that cloud was \"like renting\" and \"too expensive.\"

What these calculations missed was transformational impact. The real value wasn't in cost savings—it was in the ability to experiment rapidly, scale instantly, and innovate continuously. Organizations that invested in the cloud and made the capabilities ubiquitous to its staff found velocity grow and invention flourish.

Valid Cost Concerns

While this post argues against over-focusing on ROI, the financial reality of AI adoption—especially for smaller organizations—cannot be ignored. Small and medium-sized businesses face legitimate constraints, like annual AI licensing costs, training and integration require significant upfront investment, and ecurity and compliance measures, which all add additional overhead.

However, organizations can address these challenges through thoughtful adoption strategies. Consider starting with specific high-impact departments or use cases, then expanding based on demonstrated value. Cloud-based AI services with consumption-based pricing offer more flexible entry points than traditional enterprise software. The key is viewing AI investment not as an all-or-nothing proposition, but as a scalable journey that can begin modestly and grow with your organization's needs and capabilities.

Beyond Traditional Metrics

Traditional ROI calculations usually fail to capture the transformative nature of technologies which are different at a foundational level. They focus on more easily measurable direct impacts while missing the broader organizational effects (you may know better ones):

Productivity Amplification: Just as high-speed internet enables workflows that weren't possible with dial-up, AI amplifies cognitive tasks in ways that transform how work gets done. The value isn't just in time saved—it's in the new possibilities opened up.

Cultural Impact: Organizations that provide broad access to powerful AI tools send a clear message about valuing their employees' time and capabilities. Those that don't risk being seen as technologically regressive, much like companies that still restrict internet access.

Innovation Enablement: When AI capabilities are universally available, employees find novel applications that weren't part of the original business case. This organic innovation is hard to capture in traditional ROI calculations.

Not everything that counts can be counted, and not everything that can be counted counts

Creating artificial scarcity around AI access—through restrictive licensing or tiered access models—introduces hidden costs that can outweigh any apparent savings:

Shadow AI: Employees denied access to corporate AI tools will find alternatives, creating security risks and fragmented workflows.

Productivity Drag: When workers toggle between AI-enabled and AI-restricted tasks, they experience the same cognitive friction as switching between high-speed and dial-up internet.

Innovation Barriers: Limited AI access creates two classes of workers: those who can leverage AI for innovation and those who cannot.

This isn't an argument against measurement—quite the opposite. As management guru Peter Drucker famously noted, \"What gets measured, gets managed.\" The challenge lies not in whether to measure, but in what and how we measure. Traditional ROI calculations excel at capturing incremental improvements in existing processes. We can and should measure direct cost savings, time saved on existing tasks, reduction in errors, and so on. Transformational metrics are harder to measure (but critical to understand), include things like new capabilities enabled, innovation velocity, employee satisfaction and retention, organizational adaptability, etc.

The key is maintaining a balanced perspective. While we should absolutely measure AI's impact, we must avoid the trap of optimizing solely for what's easily measurable.

The Path Forward

The companies that will thrive in the AI era won't be those that found the perfect ROI calculation. They'll be the ones that recognized AI as a fundamental business utility—like electricity, internet connectivity, or mobile devices. They'll make it universally available, focusing not on controlling access but on enabling responsible use. This doesn't mean abandoning financial prudence. But it does mean shifting from a scarcity mindset to an enablement mindset. The question isn't whether to provide AI access, but how to do it effectively and responsibly. The real risk isn't overspending on AI—it's underinvesting and falling behind. Just as no modern company tries to compete with restricted internet access, future organizations won't be able to compete with restricted AI capabilities. The time to build this foundation isn't when you're already behind—it's now.

Agent Provocateur

2025-02-11T00:00:00Z

“Gradually, then suddenly.” That’s how Hemingway described going bankrupt, and it’s how exponential technologies tend to transform industries. At first, change feels incremental—until we find ourselves in the steepest part of the curve, where everything shifts seemingly overnight.

Most technologies follow an S-curve in their development: slow initial progress, followed by rapid acceleration, and finally a plateau as the technology matures. AI’s rapid leaps in capability, efficiency, and cost-effectiveness suggest we may be hitting that inflection point right now. But how can we recognize it with confidence?

Signals of Exponential Growth

Let's break down how we can recognize these moments of acceleration, and why AI agents are positioned to drive the next phase of adoption.

Rapid Improvements in Performance: A telltale sign of exponential growth is when fundamental performance metrics—like processing power, model accuracy, or efficiency—start improving at a pace that outstrips predictions. When breakthroughs start arriving faster than anticipated year over year, it signals a shift into high-gear.

Sharp Declines in Cost per Capability: Major cost reductions often coincide with technical breakthroughs. In AI, the combination of optimized compute architectures (GPUs, TPUs, and specialized AI chips) and more efficient model training has slashed the cost per computation and inference. When both cost and performance improve simultaneously, it’s a strong indication that adoption will surge.

Expanding Ecosystem & Network Effects: A technology becomes truly exponential when an ecosystem builds around it—think open-source collaborations, third-party integrations, and complementary hardware/software. These network effects fuel adoption, making the technology more valuable as more participants join.

Industry “Pull” Instead of “Push”: When entire industries start demanding a technology—rather than its creators having to push it into the market—it’s often a harbinger of mass adoption. Competitive pressures, fear of missing out, and proven ROI drive this shift from early adoption to necessity.

These signals have historically preceded major technological transformations. But how do they apply to AI's current trajectory? Let's examine the evidence.

AI: Into The Exponential?

The AI industry isn't just displaying the four signals of exponential growth—it's showing them at an unprecedented pace and scale:

Performance: The acceleration in model capabilities has become stunning. While updates to OpenAI o1 and Claude 3.5 Sonnet showed remarkable progress, recent developments suggest we're entering an even steeper part of the curve. Consider DeepSeek's R1, which achieved state-of-the-art results at a fraction of the traditional cost, or S1's breakthrough in achieving competitive performance with just 1,000 training samples versus the typical hundreds of thousands. These aren't just improvements—they're fundamental shifts in how we achieve AI capability.

Cost: The economics of AI are being fundamentally rewritten at both ends of the spectrum. At the frontier, DeepSeek achieved state-of-the-art performance for $5.6 million using lower-tier hardware - a fraction of traditional costs. Meanwhile, S1 demonstrated that even a few dollars of compute can now yield impressive results through efficient fine-tuning. These cost breakthroughs are particularly significant because they suggest we're approaching a critical threshold where AI deployment becomes economically viable across a much broader range of applications and industries.

Ecosystem: The rapid proliferation of open-weight models, efficient training techniques, and novel architectures is creating a virtuous cycle of innovation. When DeepSeek can achieve competitive results without premium hardware, and research labs can fine-tune powerful models in minutes, it suggests we're entering a phase where innovation will compound dramatically.

Pull: Industries aren't just adopting AI—they're restructuring around it. The demand isn't just for better models, but for transformative capabilities that can be deployed at scale.

It may feel like we're seeing exponential improvements, but current developments suggest we're only at the beginning of the curve. Consider that today's breakthroughs—training high-performance models for millions instead of billions, or achieving state-of-the-art results with minimal fine-tuning—are still primarily about making existing approaches more efficient. The real exponential growth will likely come from fundamental innovations in architecture and training methods that we're just beginning to glimpse, combined with the emergence of new deployment mechanisms like agents that can multiply the impact of these improvements.

License To Skill

Foundational AI models provide raw intelligence, but AI agents represent the most promising mechanism for translating that intelligence into practical value at scale. Whether automating workflows, making decisions, or interacting with tools, agents will be the primary vector through which businesses deploy AI capabilities.

AI agents represent an even more promising vector for exponential growth. Here's why agents may actually deliver greater scale economies and impact than models alone:

Performance: Agents compound improvements by combining model capabilities with real-world tools, data, and APIs, creating multiplicative gains in practical capability. Unlike traditional software that requires manual updates to incorporate new capabilities, well-designed agents can automatically leverage improvements in their underlying models. + For example, when a new language model becomes available, every agent using it immediately gains enhanced capabilities across all their tasks - from better reasoning to improved tool use. This creates a powerful multiplier effect: the more agents you deploy, the greater the aggregate impact of each model improvement.

Cost: Agents currently face two major cost barriers: high development costs due to the complexity of ensuring reliable performance, and significant operational overhead from running sophisticated models. However, several trends point to rapidly declining costs: emerging standardized frameworks are simplifying development, while specialized models are reducing runtime costs for common agent tasks. As these mature and democratization accelerates, we expect to see agent deployment costs plummet - similar to how containerization and microservices transformed software economics by making deployment both cheaper and more scalable.

Ecosystem: Agents benefit from exponentially growing tool integrations, APIs, and frameworks, creating network effects that models alone cannot achieve. But the real power lies in emerging multi-agent systems, since they allow agents to work with other agents, autonomously. For instance, Microsoft's AutoGen framework enables multiple specialized agents to collaborate on complex tasks—one agent might handle user interaction, while others specialize in code generation, debugging, and documentation. These agent networks can dynamically reconfigure based on the task at hand, creating a level of adaptability and scalability impossible with traditional software.

Pull: Agents have a much more direct alignment with business processes and workflows, which will drive broader adoption. Unlike raw models that require significant integration work, agents can be designed to plug directly into existing business processes and tools.

The key insight is that while models represent exponential improvement in raw capability, agents represent exponential improvement in practical value delivery. This suggests that while model improvements are a crucial enabler, agents are likely to be the primary mechanism through which AI's exponential growth manifests in practical terms. They represent not just a \"next thing\" but a fundamental multiplier on the value of underlying model improvements.

The evidence for this is already emerging: while models like DeepSeek's R1 and S1 demonstrate remarkable technical achievements, it's the packaging of these capabilities into autonomous agents that's likely to drive the next wave of adoption and value creation. The real exponential curve may not be in model parameters or training efficiency, but in the compound effects of agents combining improving models with expanding tool sets and API ecosystems.

The Path Forward

History teaches us that waiting for perfect clarity around technological transitions is a recipe for falling behind. The signs of AI's exponential acceleration are starting to emerge, but the exact timing and shape of the transformation will only be obvious in hindsight. Rather than trying to perfectly time the market, the more practical approach is to start small but start now—building experience with AI agents through focused experiments and real-world applications.

The goal isn't to transform everything overnight, but to develop the muscle memory and organizational learning that will be critical when adoption accelerates. Pick specific, bounded problems where AI agents could add value. Test, learn, and iterate. Build familiarity with the technology's real capabilities and limitations. This hands-on experience will prove far more valuable than any abstract strategy when the exponential curve hits its steepest point.

After all, successful adaptation to exponential change rarely comes from perfectly timing a single big bet. It comes from accumulated experience and the ability to recognize opportunities as they emerge. The time to begin that learning process isn't when AI agents are already transforming industries—it's now, while we still have the luxury of learning through deliberate experimentation rather than desperate reaction.

Abundance Is the Catalyst

2025-01-29T00:00:00Z

When Resources Become Abundant, Innovation Becomes Inevitable

Initially, the promise of faster internet seemed straightforward: files would download more quickly, webpages would load instantly, and tasks that once took minutes could be completed in seconds. Logically, you might imagine that our expectation was to spend less time online. That's not quite how it played out.

As bandwidth grew more abundant (faster, cheaper, wider), so did the scope of what was possible. Streaming high-definition video became normal. Remote collaboration tools moved from niche to essential. Entire industries, from cloud computing to social media, took shape. Bandwidth abundance didn’t just improve what we were already doing—it transformed how we used the internet and what we expected from it.

This is a pattern we see time and again. When a resource that was once scarce becomes abundant, it doesn’t lead to less use or stagnation. It leads to entirely new ways of thinking and working.

The Jevons Paradox: Why Efficiency Drives More Use

The idea that abundance drives greater usage isn’t just an observation—it’s a phenomenon economists have studied for over a century. In 1865, the British economist William Stanley Jevons identified what is now called the Jevons Paradox. He noticed that as coal-powered steam engines became more efficient, coal consumption didn’t decrease. In fact, it increased dramatically.

The logic seems counterintuitive: wouldn’t making a resource more efficient reduce how much we use it? But Jevons realized that efficiency lowers costs and removes barriers, creating new demand and applications. As steam engines became more efficient, they became more affordable, versatile, and widespread, fueling industrial growth and ultimately increasing coal consumption.

This same principle applies to bandwidth, computing, and now artificial intelligence. When a resource becomes more accessible, people find new ways to use it. Abundance leads to expansion, not contraction.

Jevons in the Modern Era

In the 1990s, as bandwidth expanded, the cost of transferring data dropped. This didn’t result in people using less internet—it fueled exponential growth. Streaming, video conferencing, and cloud computing all became possible because lower costs and greater availability removed barriers to entry.

The same is true for computing power. Early computers were scarce and expensive, so they were only used for mission-critical tasks. As costs dropped and processing power grew, computing became central to everything from mobile apps to advanced research. The abundance of computing didn’t reduce our reliance on it—it made it indispensable.

AI and the Next Phase of Abundance

Today, we’re on the cusp of another shift in abundance—this time with artificial intelligence. Per Jevons: far from capping our reliance on AI, greater efficiency and abundance will integrate it further into our lives.

At the moment, AI is often framed as a tool for efficiency. It helps automate tasks, analyze data, and generate content. These are valuable applications, but they are just the beginning. As AI becomes more accessible and widely deployed, its role will expand, just as bandwidth, electricity, and computing did before it.

Imagine an education system where every student has access to a personalized tutor, one that adapts to their learning style and pace. Picture a healthcare system where diagnostics are not only faster but also more accurate, guided by AI models trained on global datasets. Consider industries that don’t yet exist—applications of AI that we can’t fully anticipate today but will feel essential in the decades to come.

This is the potential of AI abundance. Its value won’t just be in making existing processes more efficient. It will be in unlocking new possibilities and reshaping industries in ways we don’t yet understand.

Abundance Changes the Questions We Ask

When a resource is scarce, the focus is on efficiency. How can we make the most of what we have? How can we optimize within the limits of what’s available? But when a resource becomes abundant, the questions change. We start to ask: What else can we do? What problems can we solve that were previously out of reach?

With bandwidth, these questions led to streaming services, video calls, and real-time collaboration. With computing, they led to artificial intelligence, mobile devices, and the digital economy. With AI, the answers are still unfolding, but the pattern is clear.

The abundance of AI won’t mean less reliance on it. It will mean greater integration into every aspect of life. Over time, it will likely move from being a tool we use to a foundational infrastructure we depend on, much like electricity or the internet.

A Measured Perspective

This isn’t to say that abundance is without challenges. With every technological shift, there are questions of equity, access, and unintended consequences. But history shows us that abundance tends to expand opportunity, not limit it. The key is to ensure that the benefits are widely distributed and that we think carefully about how to navigate the risks.

The Takeaway

Abundance doesn’t reduce demand—it creates new opportunities. Whether it’s bandwidth, electricity, or compute power, history shows that when scarcity gives way to abundance, the result is not just greater efficiency but greater innovation.

AI is poised to follow this same trajectory. As it becomes more abundant, it won’t just accelerate what we’re already doing. It will transform how we work, learn, and create, opening the door to possibilities we can’t yet fully imagine.

The shift from scarcity to abundance isn’t the end of the story. It’s where the story begins.

The Aperture of Utility

2025-01-23T00:00:00Z

Why Some Technologies Expand While Others Narrow

I've been attending the World Economic Forum in Davos this week - an event with an eclectic agenda, an energetic community, and a unique opportunity to learn. Last year, enthusiasm around AI was palpable, and I expected it might naturally wane over time.

**I could not have been more wrong. **

This year, AI dominated every conversation, with even more energy and depth than before.

One of the most intriguing aspects of emerging technologies is how they evolve as we learn more about them. Some technologies, like AI, seem to reveal an ever-expanding array of use cases the deeper we explore. Others, like blockchain or 3D printing, generate early excitement only to see their scope of practical application narrow over time.

This dynamic—the way a technology’s “aperture of utility” widens or narrows—shapes the trajectory of innovation, adoption, and long-term impact. In this post, I’ll explore how this concept applies to AI and other technologies, highlighting why some continue to surprise us with their versatility while others settle into a narrower range of applications.

Defining the Aperture of Utility

The “aperture of utility” describes the breadth of credible use cases for a given technology. In its early stages, a technology may have a narrow aperture, either because it’s immature or poorly understood. Over time, one of two things happens:

The aperture expands as new advancements or insights enable a broader range of applications.
The aperture contracts as technical, economic, or practical limitations reveal themselves.

A widening aperture reinforces excitement, investment, and experimentation, creating a self-reinforcing cycle of innovation. Conversely, a contracting aperture tends to temper expectations, often leading to slower adoption or disillusionment.

Expanding Apertures: AI, Cloud Computing, and the Internet

Generative AI: Generative AI has evolved from a content creation tool to a reasoning partner, autonomous agent, and decision-support system. Advances in multistep reasoning, agents, fine-tuning, and collaboration are driving this transformation.

Generative AI is moving beyond static outputs, evolving into a collaborator capable of reasoning through complex problems. It employs techniques like chain-of-thought prompting to break down logical steps, integrates external tools like APIs for richer analysis, and enables agents that can plan, adapt, and execute tasks independently. These advancements allow generative AI to amplify human decision-making and reshape entire workflows.

What sets generative AI apart is its ability to evolve from a tool into a cognitive partner. Advances in reasoning, autonomous problem-solving, and fine-tuning are transforming industries, unlocking value across domains, and redefining the boundaries of what AI can achieve.

What’s most remarkable is that each breakthrough compounds upon the last, unlocking new applications in ways that feel both inevitable and surprising. As these models grow in reasoning power, their aperture of utility will continue to expand—reshaping industries and redefining what we consider possible in AI.

Cloud Computing: Cloud computing’s aperture also expanded rapidly. Initially seen as a way to rent remote servers, cloud services now underpin innovations in serverless computing, machine learning, IoT, and edge computing. By reducing the costs and complexity of infrastructure, cloud platforms unlocked use cases no one anticipated at their inception.

The Internet: The Internet is perhaps the quintessential example of expanding utility. Starting as a tool for research and communication, it evolved into the backbone of e-commerce, entertainment, remote work, and more. Its success lies in its open, adaptable infrastructure, which encourages continuous reinvention.

Lessons from Contracting Apertures

Initial enthusiasm for new technologies often gives way to more focused applications as limitations become apparent. Blockchain, once seen as a universal disruptor, has found meaningful yet constrained applications in cryptocurrencies and DeFi. Similarly, 3D printing excels in prototyping and custom manufacturing but has not achieved the mass-production revolution many envisioned.

Time Dimension of Aperture Changes

The process of aperture expansion or contraction follows distinct temporal patterns. The telegraph's aperture expanded rapidly in its first decade but plateaued within 30 years as its limitations became clear. Similarly, Segway's initial promise of revolutionizing personal transportation contracted sharply within just a few years of launch, ultimately finding only niche applications in tourism and security patrols. Blockchain ledgers in supply chain are promising. 3d printing continues to find new applications in aerospace and medical fields.

Some technologies undergo multiple cycles of expansion and contraction. Virtual worlds surged with Second Life in the early 2000s, contracted sharply, and then resurged nearly two decades later with the metaverse. This pattern highlights how timing and technological readiness often determine whether a technology can sustain its utility.

The timeline of aperture changes often follows a revealing pattern: technologies that maintain expanding apertures typically show practical utility within their first 2-3 years. Those that fail to demonstrate concrete value in this window frequently experience rapid contraction.

This \"utility window\" appears particularly critical in enterprise technology—if businesses can't find practical applications within initial pilots, broader adoption rarely follows. Understanding this utility window is crucial, but timing alone doesn't tell the whole story. What underlying factors determine whether a technology can demonstrate value quickly and sustain its expansion? Or conversely, what causes some initially promising technologies to contract despite early enthusiasm?

Why Do Apertures Expand or Contract?

Several key factors shape a technology's trajectory during and after this critical utility window, determining whether it can unlock new opportunities or hits fundamental limits.

Technological Maturity: Expanding technologies often gain from breakthroughs that lower costs, enhance usability, or unlock new possibilities. For instance, AI’s expansion is rooted in innovations like transformer architectures (e.g., GPT), which enable scaling to unprecedented capabilities. These technical leaps broaden not just performance but the scope of potential applications.

In contrast, technologies with slower progress may stagnate. Blockchain, for instance, faces persistent scalability challenges that limit its adoption for high-throughput applications like payments or supply chain tracking. Without breakthroughs in performance or cost-efficiency, its aperture remains constrained.

Ecosystem Support: The success of a technology is rarely determined in isolation—it depends on the ecosystem of tools, standards, and partnerships surrounding it. AI, for instance, thrives because of open-source frameworks (e.g., PyTorch, TensorFlow) and accessible platforms (e.g., AWS, Hugging Face, and Google Cloud). These ecosystems reduce barriers to entry and empower a broad range of developers to experiment and build, accelerating the discovery of new use cases.

Conversely, technologies with fragmented or immature ecosystems often struggle to scale. In 3D printing, for example, inconsistent standards for materials and software interoperability have slowed broader adoption, keeping the aperture focused on niche applications rather than mass production.

Economic Fit: A technology’s ability to align with real-world economic needs often determines the breadth of its utility. Expanding technologies tend to have scalable, cost-efficient deployment models that allow adoption across industries. For example, cloud computing thrives because it eliminates the need for upfront infrastructure investments, making it universally appealing to businesses of all sizes.

On the other hand, technologies that fail to offer a compelling value proposition see their apertures contract. Blockchain voting systems, for instance, face significant barriers: they’re often costlier, slower, and more complex than existing centralized alternatives, undermining their practical appeal.

Public and Market Perception: Expanding technologies tend to align with clear, practical needs, while those with misaligned or overhyped promises risk disillusionment. AI has maintained its momentum because it consistently delivers tangible value—whether in automating mundane tasks, improving customer experiences, or accelerating scientific research.

In contrast, technologies like VR/AR have faced a perception gap. While their promise of immersive experiences is compelling, the practical reality—bulky hardware, high costs, and limited killer apps—has tempered widespread adoption. The aperture narrows when hype outpaces practical utility.

Implications of the Aperture of Utility

Understanding whether a technology’s aperture is expanding or contracting has practical implications for businesses, investors, and researchers:

Adoption Strategies: For expanding technologies like generative AI, success depends on experimentation and exploring diverse use cases—whether in healthcare, finance, or education. By contrast, contracting technologies like 3D printing thrive when focused on niches like prototyping or custom medical devices.

R&D Investment: For expanding technologies like AI, broad, long-term research efforts can uncover new applications and extend their momentum. Contracting technologies, like 3D printing, benefit from focused investments in proven applications that deepen their value in specific niches.

Hype Management: Understanding the aperture’s trajectory helps leaders set realistic expectations—embracing AI’s expansive potential while focusing blockchain efforts on its most viable applications.

Closing thoughts

The aperture of utility offers a lens for understanding how technologies evolve and why some thrive while others falter. AI’s aperture is still opening, with every breakthrough revealing new possibilities across industries. But sustaining this momentum will require not only innovation but responsibility—ensuring these tools solve real-world problems while remaining aligned with human needs.

The technologies that succeed are not just the most powerful but the most adaptable, accessible, and useful. For AI, the aperture is still expanding—and the opportunities it unlocks may only grow from here.

Thought Leadership

2025-01-03T00:00:00Z

Is there truly no moat in AI, or does post-training behavior create one after all?

Trust is a key factor in successful AI adoption, often intertwined with more commonly discussed measures of quality. While quality can be evaluated using metrics such as accuracy and latency, trust is often influenced by subtler elements. Among these elements, behavior—the manner in which a model responds, reasons, and interacts—may be one of the most decisive in determining whether users embrace or reject a system.

First, we discuss the notion of a post-training moat. Then we examine why behavior shapes trust, followed by an example of my own and we'll close out on where this may take us this year.

Actually, maybe there is a “moat” after all...

There is a widespread notion that “there is no moat” in AI because large-scale models often use the same public data for pre-training. While it is true that pre-training on openly available data creates fewer barriers to entry, focusing solely on pre-training overlooks the significance of post-training. During the alignment or fine-tuning stage, models develop distinctive behaviors that can be difficult to replicate. These behaviors often draw on privately held or proprietary data, shaping how the model “speaks,” how it balances opposing views, and how it adapts to user needs. This post-training process can create a practical moat, anchored not in the publicly available data but in the unique ways a model has been refined.

Why Behavior Shapes Trust

Behavior is about more than correctness; it reflects how models present information, handle ambiguity, and address diverse viewpoints. For example, consider two AI systems evaluating an analogy or concept you have created. One system takes a “glass half full” approach, emphasizing clever aspects and a creative spin. Another offers a “glass half empty” view, pointing out flaws and potential oversights. Neither system is inherently wrong, but each elicits a distinct emotional or cognitive response from the user. Overly affirmative behavior can appear insincere or sycophantic, eroding trust if users suspect the AI is merely echoing their own ideas. Conversely, overly critical behavior can feel abrasive, particularly if users seek constructive support rather than blunt critique.

Striking a balance that invites engagement without alienating the user is an important part of building trust. This balancing act mirrors human interactions: we often look to trusted advisors for both encouragement and respectful criticism.

Post-Training and the “Trust Moat”

The same public dataset that informs broad linguistic competence cannot guarantee trust-building behavior. Instead, post-training efforts—such as reinforcement learning from human feedback (RLHF) or domain-specific fine-tuning—incorporate user preferences and contextual constraints. These processes develop the “trust moat,” in which the refined behavior of the AI system sets it apart from more generic models. Because the data and feedback loops that inform these refinements are often proprietary or at least highly specialized, replicating them is not straightforward.

Model providers develop proprietary datasets based on human feedback that capture nuanced preferences about how their AI should respond. These datasets aren't just about right/wrong answers, but about style, tone, and judgment calls in complex situations. The iterative process of refining behavior (through techniques like RLHF) involves countless micro-adjustments based on user interactions and expert feedback. It's similar to how a person's communication style is shaped by years of social interactions - you can't simply copy the end result without going through a similar learning process.

The result is a set of behaviors which are different from model to model, which are likely to get more different over time (of course, whether these differences amount to a competitive advantage remains to be seen.)

Measuring Behavior in Passing: The Role of “Eval”

Though traditional metrics still matter, specialized evaluation strategies enable us to track not just what a model says, but how it says it—and that ‘how’ is the foundation of trust. As I wrote in a previous post on “eval,” selecting the right benchmarks for both quality and trust is key. Standard tests quantify performance, whereas human-focused evaluations are usually necessary to reveal how users perceive a model’s helpfulness, neutrality, or tone. Although these processes can be more qualitative, they offer vital insights into how behavior shapes acceptance and trust.

Example in Action

My recent experience with two models illustrated the importance of behavioral “fit.” The “glass half full” model praised my analogy for its ingenuity, while the “glass half empty” model highlighted flaws and clumsiness. Neither response was universally better; each spurred a different reaction in me, the author. I appreciated the flattery yet also valued the candor.; the differences writ large were striking and meaningful.

On one hand, affirmation can breed skepticism if it feels disingenuous. On the other hand, excessive negativity can discourage continued exploration of an idea. Which should I choose? The answer, predictably, is 'both'. This interaction reinforced what humans know to be true, but we are still learning in AI: diverse opinions lead to stronger outcomes. As the 'human in the loop' in this case, I had a chance to weigh both perspectives, pick a path, and then stand by it.

For the record: in this instance, I selected the more optimistic interpretation, largely because it was the holiday season and I was inclined to maintain a festive mood—though I still recognized the critical feedback as valid.

A Path Ahead

As AI systems become more integrated into everyday workflows, trust will hinge on the balance between factual accuracy and perceived alignment with user needs. Behavior, shaped by post-training data and processes, can serve as a durable moat for AI developers, distinguishing one model from another even if both are trained on the same public corpus. By embracing rigorous yet nuanced evaluation methods (“eval”) and refining how models behave in varied situations, developers can ensure that their AI systems foster trust—ultimately guiding users toward meaningful, sustained adoption.

The balance of quality, trust, and nuanced behavior is a path forward for AI adoption—giving users not just results, but results they can rely on.

The Thinking Press

2024-09-26T00:00:00Z

Note: this is an abridged version of a talk I gave to Slim Foundation students and scholarship winners in Mexico City last month (along side Bill Clinton, Bradley Cooper, James Clear, and others.)

What Does It Take To Change The World?

Some might say that changing the world starts with a sudden flash of inspiration, sent from the divine and received by a genius who always knows exactly what to do with it. There are stories like this throughout history: Newton, a falling apple, and gravity; Archimedes, a bath tub, and eureka; James Watt, a kettle, and the steam engine; Emmet \"Doc\" Brown, a slip in the bathroom, and time travel. Is this the way the world changes?

If only it were that simple. As great as these stories sound, they are just stories.

So let's try again…

What Does It Take To Change The World? (redux)

Change is a cycle, not an event. It typically requires hundreds (if not thousands) of threads from hundreds of people all coming together at the right time and place. Change exists on a continuum, but we can probably pick out some of the key characteristics and paint an arc through them all.

1. Insight. Rather than a 'eureka' moment, change usually starts with something much more subtle. Not divine inspiration, but an inkling. An insight. An observation. The identification of a problem which is worth solving.

2. Context. Every insight or observation happens in the context of dozens (nay, hundreds) of other insights from other people, or groups of people. In most cases, every new insights adds to the context of all, but once in a while, all these threads start to come together in unusual or unexpected ways.

3. Breakthrough. Resulting in a new product, service, idea, or tool which represents a major step forward for the world, and everyone in it. A step forward which heralds a new dawn, opening up new possibilities which just weren't possible before. Possibilities which aren't just a little bit better - faster, cheaper, more productive - but are significantly better than what came before. A step function change which amplifies human activity, creativity, and productivity in significant ways.

4. Accessibility. To truly change the world, it is insufficient for these improvements to be limited to a small group - they need to be available to increasingly large groups and populations. To everyone, over time.

5. Iteration. The very best new capabilities also get better through this increased accessibility, via positive feedback loops.

6. Foundation. Providing building blocks for an ecosystem of new inklings, ideas, observations, and insights, so that others can build on top of them in the future.

Leading to new ideas and a broader context, where threads come together in ways which open up major new opportunities, for all, which get better through use, and provide building blocks for the next generation.

With so many concurrent requirements, it's no wonder that these scenarios happen rarely. One example I'm sure you are familiar with is the printing press; surely an idea which most would agree has gone on to change the world.

The Printing Press

The printing press was born of a realization that hand written manuscripts were just not going to cut it as a way to distribute information: too slow, too expensive, and required significant skills which were rare at the time, which put the acquisition and distribution of knowledge firmly within reach of only the wealthiest and most influential organizations (most prominently, religious institutions).

Let's see how it stacks up against the elements in the arc above.

1. Insight. Hand-copied manuscripts weren't going to cut it.

2. Context. Many people and groups were working on parts of a solution: printing blocks, moveable type, various innovations in inks, and so on.

3. Breakthrough. It was the Gutenberg Press which brought all of these threads together, and successfully mechanized the use of type on vertical sheets of paper in a reproducible and cost effective way.

4. Accessibility. The press dramatically reduced the marginal cost of publishing, ushering in (amongst many), the scientific revolution.

5. Iteration. Presses were not perfect out of the gate - and even with a huge opportunity ahead, there were lots of barriers to overcome (not least, that revenues from printing were low because literacy levels were also low, reducing the size of the addressable market and limiting a valuable source of investment in future improvements)

6. Foundation. The broad distribution of information in this way created a remarkable foundation for almost all improvements thereafter. Indeed, you can draw a pretty straight line from the printing press to scientific, industrial, and digital revolutions, and potentially, to whatever comes next.

From Ink to Intelligence

Just as the printing press revolutionized information spread in the 15th century, we now face another potential world-changing innovation. In our digital age, the next frontier isn't just about sharing knowledge—it's about processing, understanding, and creating it at new scales.

Enter Artificial Intelligence: a technology that could be as transformative now as the printing press was then. Like Gutenberg's invention, AI may democratize access to capabilities once limited to a few, enhancing human intellect and creativity in unexpected ways.

Let's apply our framework for world-changing innovations to AI and compare it to its historical predecessor.

The Thinking Press

It feels like we are on the precipice of that next cycle of world-changing revolution, with the maturation of artificial intelligence.

AI is probably the single biggest technical shift in how we are going to interact with data, information, and each other, since the advent of the earliest internet. Organizations that invested in the early internet went through a period of remarkable growth in the past 30 years, which begs the question: is there a chance that AI will drive similar levels of growth (and transformation), in the next 30 years?

Let's see how it stacks up against our change criteria.

1. Insight. Systems that learn by example (instead of being explicit programmed with rules), can exhibit behavior which approximates human intelligence.

2. Context. Many people and groups were working on parts of a solution: machine learning research and development, faster processing from accelerated computing, and access to swathes of computing resources as a utility in the cloud.

3. Breakthrough. AI brings these threads together into a series of new mathematical models which have remarkable emergent capabilities for reasoning, recall and creativity.

4. Accessibility. These models - made available as chat assistants and agents - are extremely easy to use and employ, being never harder to use than explaining your intent.

5. Iteration. Feedback loops abound in AI. AI improves through its usage and can be combined with human feedback to build faster, smarter, and more efficient models.

6. Foundation. AI provides completely new primitive, low-level software components which can be used to build entirely new categories of products, improve existing apps, and accelerate processes through automation. Models can also be combined in new and interesting ways, making existing systems better as new models are built.

On the surface, AI checks a lot of boxes, which begs a broader question.

Will Artificial Intelligence Change The World?

For the intellectually honest, today's answer is: maybe.

Technology capability tends to follow an 'S-curve' over time: starting with a period of incremental improvements which compound into a exponential increase in capability, followed by a period of diminishing returns as the natural limits of the technology are realized (and the next-generation starts to mature).

Unfortunately, you can only see the curve behind you, so it's hard to accurately position where we are. Given the speed at which artificial intelligence is progressing, it is tempting to pitch us right in the middle of the curve, in the high-gradient zone.

But it is more likely that we are in the bottom right hand corner - since it is still very, very early in the journey.

Artificial Intelligence may yet change the world.

The journey of world-changing innovations, from the printing press to AI, shows us that transformative change is rarely instant. It's a process of insights building on each other, breakthroughs becoming accessible, and iterations creating new foundations. As we stand at the frontier of AI's potential, we must remember that its impact will depend not just on the technology itself, but on how we choose to develop and apply it.

For students and newcomers to AI, this is an exciting time of opportunity. Your fresh perspectives and ideas could be the catalyst for the next breakthrough. Engage with AI, experiment with it, and most importantly, think critically about its implications and possibilities. The future of AI isn't predetermined – it's waiting to be shaped by curious and innovative minds like yours.

Whether AI will truly change the world remains to be seen, but one thing is certain: your engagement with it today could help steer its course tomorrow. So dive in, explore, and be part of the next chapter in the ongoing story of human innovation.

Note: huge thanks to the Slim Foundation for the invitation to a unique event and experience!

Curvonomics (and the rise of AGI)

2024-07-30T00:00:00Z

\"World-changing ideas generally evolve over time as slow hunches rather than sudden breakthroughs\"

In Where Good Ideas Come From, Steven Johnson tried to answer questions like: What sparks the flash of brilliance? How does groundbreaking innovation happen?

His counterintuitive conclusion? That world-changing ideas generally evolve over time as 'slow hunches' rather than sudden breakthroughs. Breakthrough ideas don't just appear fully formed like in the movies (arc reactors, flux capacitors, double-decker couches, shrink rays); in reality they evolve slowly, sometimes over years or even decades. They start as vague notions or partial insights that linger in the back of one's mind. Over time, these hunches connect with other ideas, experiences, and information, gradually taking shape and maturing.

Slow hunches take patience (good ideas often need time to develop fully), open mindedness (new contributions to the state of the art usually come from multiple sources in multiple disciplines), connectivity (letting new ideas interconnect, overlap, and dissect one another), and persistence.

There are lots of examples of this: Johnson cites the theory of evolution, the invention of the telephone and the web, discovery of penicillin, and more. On my recent appearance on the Big Technology podcast, the host Alex and I were joshing that the search for extraterrestrial life is probably another one.

I suspect the path to artificial general intelligence - AGI - will play out this way, too.

The Path to AGI is S-Shaped

It takes time for all the pieces of any invention to fully form: the language by which they are described, the layers of abstraction, the core capabilities and supporting structures required for success, the compounding factors which contribute more than the sum of their parts to its broad applicability.

I've talked before about how the arc of technology capability tends to follow an S-curve over time.

In the early stages of a technology's development, progress is often slow and incremental. This phase is marked by experimentation, prototype development, and the establishment of foundational knowledge.

As the technology matures, key capabilities compound on one another, resulting in a period of rapid improvement (and adoption). This phase is characterized by exponential growth - a \"hockey stick\" on the path toward a steep upward trajectory.

Eventually, the rate of improvement slows as the technology approaches its theoretical limits or market saturation. This results in a flattening of the curve at the top of the \"S\".

Retrospective Inevitability

AGI is not a singular, monolithic endeavor, but rather a complex interplay of advancements across multiple technological and scientific domains. And so, while AGI is likely to follow its own combined S-curve over time - the multi-dimensional nature of AGI progress suggests that its emergence will likely be the result of numerous S-curve advancements in various fields, each progressing at slightly different rates and phases.

Various components contributing to AGI development— natural language processing, machine learning algorithms, hardware capabilities, and knowledge representation—will each follow their own S-curve trajectories. These curves are likely to be slightly out of sync with one another, creating a tapestry of progress rather than a single, uniform advancement.

As these technological components advance along their respective S-curves, the markers of progress towards AGI are likely to become increasingly apparent, but separate: a new demo one week; a press release another; a new pre-print article a month later. Over time, we'll see sufficient advancement in enough places, that AGI - instead of arriving in a thunderclap instant - appears more and more inevitable. The aggregate process required to realize the technology will lessen. The gaps will narrow.

By the time there is an achievement of AGI, it may appear rather obvious and unsurprising, as the groundwork will have been laid visibly over an extended period. It may even be perceived as boring. A retrospective inevitability. This is a healthy and positive thing overall (except maybe for the breathless commentators who wonder aloud, 'what did Illya see?').

Whither, AGI?

Environments that foster slow hunches - by providing time, resources, and diverse inputs - are likely to be more conducive to innovation than those focused on any single immediate breakthrough.

If you agree, this perspective has implications for research strategies and resource allocation. Rather than focusing solely on breakthrough moments, it is more likely that the field would benefit from sustained, parallel efforts across multiple domains, with an emphasis on integration and synergy between advancements.

This lens of 'curvonomics' - multi-dimensional, asynchronous progress - suggests that while the journey to AGI may be long and complex, it is likely to be marked by visible, incremental advancements that collectively pave the way for this transformative technology. Counterintuitively, a single-minded focus on AGI may actually turn out to be a disadvantage rather than an advantage, to finding the steady path which delivers it, whatever 'it' turns out to be. I am deliberately leaving 'it' undefined here as today, your definition is at least as good as mine (if not better), and while for any given definition the specifics may be different, the general S-shaped curve to our path remains.

Carpe Exponentia

2024-06-26T00:00:00Z

Seize the Snowball

The capability of any technology tends to follow an s-curve over time, starting out with the basics, but growing extremely quickly in a short amount of time after an inflection point which starts with an exponential \"hockey stick\" of fast-paced improvements. Over time, the pace of change slows down and incremental improvements continue… until the next hockey stick inflection point.

You never really know where you are on that curve, and the only view is backwards. Some might claim that the improvements in AI are slowing down, but far more would guess we are in the high-gradient part of the curve, with new techniques, models, and approaches appearing almost every day. Indeed, by the time you finish reading this (and definitely by the time I finish writing it), there will be new capabilities worthy of our attention.

As counterintuitive as it may seem, for my money it's more likely that we are in the early phase of the s-curve and that in spite of an extraordinary rate of invention and improvements - we haven't hit that hockey stick inflection point yet.

An Intimidating Prospect

How do you spot the inflection point if you can only look backwards? With limited resources, how do you time it just right? Too early, and there is an opportunity cost as you could have spent your resources on something else. Too late, and you miss out on growth and the opportunity to shape the curve to your advantage.

Organizations that place their bets well will be able to ride the capability curve all the way up. But place them incorrectly, and there is a chance you'll be playing catch up when the tide eventually turns. Deciding where to invest poses challenges and opportunities.

The bad news: these events are rare, unique, and hard to spot. Almost by definition, the techniques that have driven improvements so far are unlikely to deliver exponential future growth on their own.

The good news: there are some tell-tale signs. Let's take a look.

Snowball Capabilities

The most common indicators to look for are those which indicate a technique has the potential to start a snowball rolling - a set of improvements which compound on one another to grow the absolute capability in aggregate. This cascade changes the trajectory of capability more than any single improvement. There are usually three elements.

1 - the pre-requisite: Discovery of new technology with improved utility & performance. Simple tools to plant seeds in soil. Browsers. Smart phones. Very few trajectory-defining moments don't start with a meaningful new invention. These are often very general technologies because the opportunity to specialize doesn't exist without the invention existing in the first place. These inventions are a pre-requisite for growth; without it, there is no opportunity at all. They are the cold weather that precedes the snowfall.

2 - the x-factor: Deeper specialization for individual tasks. Building and using the initial invention to provide deep specialization: different tools for planting different seeds contributes to much-improved crop yield. Web-based SaaS applications specialized in oh-so-many different directions covering niche and broad areas. Smart phone apps gave rise to entirely new categories of increasingly specialized products. Useful, fun, remarkable even, although in isolation, they are beautiful and unique snowflakes.

3 - the combinator: Increasingly collaborative systems of work & play. Through efficiencies, farms could combine the use of specialized tools to grow more diverse crops, and in turn create much larger farms. The growth of one crop compounded the other. SaaS applications grew in popularity and integrated through APIs, growing the web's usefulness beyond the impact of any single web app. And for mobile devices, apps specialized to the point of becoming entirely personal (through photos, calendars, social media, etc.), and interoperate through content, sharing, memes, in ways which make every app (and the device they run on) far more useful than the sum of its parts. This is the snowball and the hill and the push-over-the-edge.

In turn - these collaborative, specialized capabilities combine together to reveal a new universal truth, insight or invention and the cycle continues. They build on each other to something significant and new.

The Autonomy Snowball

Let's play out our model with respect to generative AI.

Pre-requisite: Foundation models fit the bill perfectly for a new technology with improved utility and performance. It is the frontier models which have proved the most remarkable in debuting a software component which can perform reasoning and integration in new (albeit, limited) ways. Models themselves are seductive (and useful and fascinating!), but they are likely the cold front before the snowfall, not the snowball itself.

X-factor: Agents - AI systems which use generative AI models to work toward an objective, automatically - offer deep specialization for individual tasks. Agents can develop specialized strategies and plans to move toward a stated goal and then execute that plan, automatically.

Want to add a new feature to your specific app? There's an agent for that. Need to migrate an codebase from one version of Java to another? There's an agent for that. Want to optimize your supply chain? Progress your research? Modify your biomolecule to reduce toxicity? There's an agent for that (or there soon will be).

Combinator: Multi-agent systems - systems which can integrate, orchestrate, and collaborate with other agents - are maturing very quickly. Most are not ready for the limelight just yet, but it's only a matter of time. At the point at which they are capable of combining agents into more complex systems, is the point at which the capability derived from any individual agent rises - and with it, the utility and value of all agents, and the models they run on.

With these three elements in place, agents (a.k.a. \"agentic systems\"), seem well poised to drive similar compounding capability growth as SaaS apps did for the web, or apps did for mobile. I think this makes a pretty compelling case for experimentation, prototyping and investment today (Amazon Bedrock is a great place to do this, naturally). The agents you create will be useful immediately, and as multi-agent systems mature, you'll be ready to compound their value - exponentially.

Seize the snowball.

\uD83C\uDDE8\uD83C\uDDE6 This post is based on a short talk I gave at the Collision conference in Toronto this morning. Thanks to Nick Walsh and the Amazon Bedrock team for their help.

Models, Rock, Paper, Scissors

2024-06-12T00:00:00Z

Rock. Paper. Scissors. Shoot!

A game as old as decision making itself - Rock Paper Scissors is a bastion of school yards, road trips, and board rooms the world over. Paper beats rock. Rock beats scissors. Scissors beat paper. Elegant. Simple. And an example of intransitive superiority.

Intransitive superiority is a fascinating concept that changes the way we think about something as being the \"best\" in absolute terms - paper beats rock. Rock beats scissors. Scissors beat paper - and the limitations of side-by-side comparisons of more complex systems or choices.

These counterintuitive systems are not uncommon outside of the playground: you can find the same system in evolution (species A outcompetes species B for resources, species B < species C, species C < species A), sports leagues (team A beats team B, team B beats team C, team C beats team A), video games (earth magic < water, water magic < fire, fire magic < earth), and - and you knew I was getting there - generative AI foundation models (model A outperforms model B, model B outperforms C, model C outperforms A, depending on the task at hand).

These cycles arise when the relationship between elements is non-transitive, leading to cyclical patterns of superiority, where each element is superior to one other element while being inferior to another.

It becomes pretty important to examine the specific pairwise interactions between elements rather than relying solely on overall rankings or presuming transitivity. Failure to account for intransitive superiority can lead to suboptimal decisions and strategies, as the superiority of an element may vary depending on the specific context and the other elements involved.

The same is true for generative AI foundation models. Traditional transitive superiority would have us believe that a single absolute \"best\" model exists, and that there is a hierarchy of silver and bronze winners after that. Indeed, we kind of fool ourselves into believing this by (over?) indexing on benchmarking scores. Traditional transitive superiority assumptions make for great headlines (and a win is a win in any category!), but it's only part of the story - and it misses the broader context which reveals a counterintuitive truth.

There is no one model to rule them all.

General \"world\" models like GPT4o and Claude 3 Opus are great! They have a ton of utility in reasoning across general information and knowledge, analyzing data, interpreting code, and so on. These broad, general language tasks are common - and having access to awesome models like these is a bit part of why customers are excited about generative AI in the first place. They even \"win\" in many benchmarks, and for general tasks will outperform specialized models. But!

A specialized model will beat a general model at a specific task - with better answers, less risk of hallucination, and often at lower cost. But!

An ensemble of specialized models working in concert will out perform a specialized model at a specialized task (since they usually have more context to pull form multiple forms of specialization). This is - in part - why mixture of experts models work so well (used - famously - by Mistral AI in Mixtral). But!

General models will beat ensemble models in general tasks. And so the cycle continues.

Assuming model transitivity - that an absolute \"best\" model exists irrespective of context - often leads to suboptimal performance of an AI system, and higher costs in aggregate. But! If we assume model intransivity - the value shifts from finding the \"best\" to using the \"many\" for the right task at the right time.

Intransitive models in the real world

It is in this insight which instructed our design of Amazon Bedrock, where different models from different families are available behind a consistent API with evaluation tools to help pick the right one at the right time. This collection of models creates a wide circle of transivity from which you can pick the \"best\" model for increasingly specialized tasks.

This includes world models like Claude 3 Opus for general tasks, ensemble models like Mixtral, or specialized natural language models such as Titan's text models. Also, specialization capabilities let you fine-tune existing models to hone and focus on specialized use cases, further increasing the circumference of the cycle of intransivity. For each use case or task, you can create or find the best model, and vary that choice as your needs inevitably change.

It's also what informed the architecture of Apple Intelligence, where specialized on-device models are used when possible, tasks are routed to specialized ensemble models are combined in the cloud when appropriate (which have been fine-tuned on personal data types like messages, calendars, emails, and so on), and general tasks are routed to generalized world systems like ChatGPT (and others like it in the future).

Rock. Paper. Scissors. Specialized. General. Ensemble.

I think this intransitive characteristic is likely to remain an immutable, stable feature of the menagerie of generative AI models for the foreseeable future.

I would bet we will have many more model families (and that the models in each category will continue to diversify) over time, making mastery of this intransitive choice and selection of the right model for the right use case to be one of the biggest levers most of us can pull to drive successful AI workloads. Organizations that build the muscles to make the right choice based on the right evaluation criteria, are going to be well poised to move quickly as the models themselves also continue to improve.

Exciting times.

Further reading

You can read about model choice in Amazon Bedrock and Apple's model adaption approach, here:

https://aws.amazon.com/bedrock/developer-experience/

https://machinelearning.apple.com/research/introducing-apple-foundation-models

This Newsletter is Counterintuitive

2024-05-21T00:00:00Z

Counterintuitive (adjective): contrary to what one would intuitively expect.

The arrival of Generative AI has ushered in a period of extraordinary discontinuous change. While not unheard of, events like this are rare and have the remarkable ability to upend conventional wisdom and drive counterintuitive results.

During times of discontinuous change, the rules that once governed a system or industry may no longer apply. If you're not careful, the sudden and drastic nature of these shifts can catch you off guard, leading to unexpected outcomes that defy traditional logic.

How will AI challenge long-held assumptions? Or create entirely new paradigms?

What lessons can we learn from past periods of discontinuous change?

What new opportunities or markets could emerge? How can we position ourselves to capitalize on them?

How might the needs, preferences, and behaviors of our customers or stakeholders shift in response to this change?

It’s these sorts of questions that I’m keen to try and answer. There are lots of ways to stay up to date on the day-to-day (sometimes hour-by-hour!) updates on AI, but here, I want to try and put the changing landscape into context, discuss the strategic implications, and the longer term first- and second-order effects of this remarkable new technology. Be they intuitive or not.

I hope you'll join me for the ride.

I'll plan to send an update around once or twice a week (starting this week with some observations about how larger organizations are adopting AI successfully).

PS: I'll also continue to post updates, news, and 'field notes' on more tactical learnings on LinkedIn. Taken together, I expect to create a library of opinions, observations, and outlooks - both short and long term - which will be helpful to anyone working with AI (although you may know better ones).