OpenAI's platform engineering head reveals how AI is reshaping software engineering, agent management, and the future of work. Discover key insights on AI de...
AI Engineering Leaders: The Next 12-24 Months of AI Transformation
Executive Summary
The landscape of software engineering is undergoing a seismic shift. Sherwin Wu, Head of Platform Engineering at OpenAI, shares an insider's perspective on how artificial intelligence is fundamentally changing what it means to be an engineer—and what lies ahead for the next 12-24 months. This comprehensive guide explores the transition from traditional coding to AI-assisted development, the challenges teams face when adopting AI agents, and why many organizations are still missing critical opportunities in their AI deployments.
Key Insights
- Engineers as wizards: Software engineers are evolving from code writers to AI orchestrators, managing multiple AI agents simultaneously while maintaining strategic control
- 100% AI-authored code: At OpenAI, nearly all new code is generated by Codex, with AI handling both code generation and comprehensive code reviews across the organization
- The one-person billion-dollar startup is real: AI's efficiency gains are enabling individuals to build massively valuable companies, with second and third-order effects creating unprecedented opportunities for specialized B2B SaaS
- Customer feedback can mislead you: In rapidly evolving AI fields, blindly following customer requests to improve specific scaffolding (like vector stores) can lead you away from where models are actually heading
- Negative ROI is common: Many non-tech companies attempting AI deployment experience negative returns because they lack bottoms-up adoption and proper internal AI expertise teams
- Models will eat your scaffolding: As AI models improve, previous workarounds and frameworks become obsolete, requiring constant reinvention of development approaches
The Evolution of Software Engineering: From Code Writers to AI Orchestrators
The fundamental nature of engineering work has shifted dramatically in just the past couple of years. What was once a job focused on writing every line of code has transformed into something that feels almost magical—orchestrating AI agents to perform complex tasks with minimal direct intervention.
When engineers adopted tools like Codex, the impact was immediate and measurable. Engineers using AI-assisted development open significantly more pull requests—approximately 70% more—compared to those working traditionally. This isn't just a productivity bump; it represents a fundamental change in how work gets accomplished. As these tools continue to improve and engineers gain comfort with them, this productivity gap continues to widen.
The experience at OpenAI is particularly instructive. Nearly every engineer on the platform engineering team uses Codex for virtually all their tasks. While it's difficult to assign precise percentages due to attribution challenges, the organization has essentially moved to a model where AI generates the vast majority of production code. What's particularly striking is the scope of AI involvement: 100% of pull requests undergo review by Codex, meaning every line of code that makes it to production has been vetted by an AI system trained to identify bugs, suggest improvements, and recommend best practices.
This shift has fundamentally changed the engineer's day-to-day experience. Instead of sitting down to write a feature from scratch, engineers now spend their time providing context, steering AI agents toward solutions, and validating the work. For many, this feels like a return to something almost mystical—hence the "wizard casting spells" metaphor that resonates throughout the AI community. Engineers describe checking in on their AI agents like a sorcerer might check on multiple magical processes happening simultaneously, each one working independently but under loose supervision.
Managing Fleets of AI Agents: The New Engineering Reality
Today's engineering managers and senior individual contributors find themselves managing what they describe as "fleets" of AI agents. This isn't a future scenario—it's happening right now within organizations that have embraced AI-first development practices. Engineers on the platform engineering team at OpenAI routinely manage 10-20 parallel coding threads simultaneously, with each thread representing an independent AI agent working on a specific task or feature.
This creates an entirely new skillset requirement for engineers. The challenge isn't writing code anymore; it's knowing how to give the right instructions, maintain sufficient context for the AI to operate effectively, and recognize when an agent has gone off track. This requires both deep technical knowledge and a kind of management sensibility that hasn't traditionally been part of engineering curricula.
One particularly revealing experiment at OpenAI involves a team maintaining a 100% Codex-written codebase—they've removed the traditional "escape hatch" of manual code writing. When their AI agents encounter problems, the team can't simply roll up their sleeves and fix things manually. Instead, they must solve issues by improving how they communicate with the AI, providing better context, and refining their instructions. This constraint has become a powerful learning tool.
What they've discovered is that most AI agent failures don't stem from the model being incapable; rather, they result from insufficient context or poorly specified requirements. When engineers face a stuck agent, the solution often involves adding more documentation, improving code comments, or encoding tribal knowledge that previously lived only in their heads. This has led to fascinating new practices: comprehensive README files, detailed .md documentation files, well-structured code comments, and even specialized context management strategies that help AI agents understand not just what to do, but the broader business and technical context they're operating within.
The stress engineers feel when their agents aren't working is real and significant. It's a different kind of pressure than traditional debugging—when you can't simply take over and fix the problem manually, you're forced to become better at communication and context provision. Over time, this constraint becomes a strength, as teams develop increasingly sophisticated approaches to working with AI agents.
The Art and Science of Code Review in the AI Era
Code review, historically one of the most tedious and time-consuming aspects of software engineering, has undergone a radical transformation. At OpenAI, this change is comprehensive: Codex reviews 100% of all pull requests before they're merged to production. This seemingly simple change has profound implications for how development teams operate.
The traditional code review process is notoriously slow and often frustrating. Senior engineers managing central systems frequently found themselves inundated with 20-30 code reviews each morning, which could balloon to 50 or more if they fell behind. Each review requires context-switching, understanding someone else's code, considering edge cases, and providing constructive feedback. It's valuable work, but it's also grinding and repetitive.
Codex excels at this exact task. The AI can review code rapidly, identifying style issues, potential bugs, optimization opportunities, and architectural concerns. Crucially, it can do this consistently and without fatigue. Where human review might take 10-15 minutes per pull request, AI review often takes 2-3 minutes, with comprehensive suggestions already baked in.
The interesting discovery is how this has changed the role of human reviewers. For many smaller pull requests, teams have found that human review becomes optional—the AI suggestions are comprehensive enough that the original author reviewing Codex's feedback provides sufficient quality assurance. For larger or more complex changes, human review still plays a valuable role, but it's now augmented by AI insights rather than being the sole source of verification.
This same AI-driven approach extends beyond code review into the entire CI/CD pipeline. Lint errors, testing failures, and deployment issues that traditionally required manual intervention can now be handled by Codex. When a lint error appears, Codex can identify it, suggest fixes, and patch the code automatically, then restart the CI process. The result is engineers spending less time on the mechanical aspects of getting code to production and more time on the creative work of problem-solving and architecture.
The concern about this creating a circular problem—Codex writing code and then Codex reviewing that code—is worth addressing. OpenAI has thoughtfully managed this by reducing human review from 100% to approximately 30%, ensuring that human engineers still provide meaningful oversight without becoming a bottleneck. Additionally, the organization uses internal variants of models to offer different perspectives on code quality, creating a system of checks and balances that prevent any single AI system from having unchecked authority.
Why Most AI Deployments Fail: The ROI Problem
Despite the enthusiasm around AI in tech circles, a sobering reality exists when you look beyond Silicon Valley: many organizations attempting to implement AI are experiencing negative ROI. This isn't because AI technology isn't capable; rather, it's because the implementation approaches these organizations are taking are fundamentally flawed.
The root cause often comes down to a simple distinction: top-down mandates versus bottoms-up adoption. When executives decree that their organization will become "AI-first," purchase expensive AI tools, and perhaps even tie performance reviews to AI tool usage, they create a situation where employees use these tools because they're required to, not because they understand how or why. The result is cargo cult adoption—the motions are performed, but the value never materializes.
Compare this to the organizations where AI implementation has genuinely worked. These typically combine executive support with genuine enthusiasm from the people doing the actual work. At OpenAI, the shift accelerated not just because leadership wanted it, but because Codex gave individual engineers the ability to apply AI directly to their work in tangible, valuable ways. Engineers could immediately see the benefits—they could build more, faster, with fewer errors. This created natural enthusiasm and knowledge-sharing.
The organizations that struggle are often those where the AI adoption is completely divorced from the actual work processes. A typical scenario: leadership mandates AI adoption, employees receive minimal training, and they're left to figure out how to use these tools for work that's fundamentally different from what they do in tech companies. When a support team operations lead is told to use AI but given no guidance on how to apply it to their specific processes, of course adoption fails.
The solution, according to insights from OpenAI's experience, is to create what's called a "tiger team"—a dedicated group of talented, enthusiastic people who explore AI's capabilities deeply, discover practical applications for specific workflows, and then evangelize these approaches to the broader organization. Interestingly, these tiger team members aren't typically software engineers. While engineers understand AI conceptually, they're expensive, difficult to find, and often not as naturally excited about the technology. The people who tend to get most enthusiastic are those already technically inclined but outside traditional engineering: operations leads who use Excel extensively, support team members who understand process automation, finance professionals comfortable with data. These individuals light up around AI precisely because they see how it can transform their specific domain.
The key is then to empower these people, rather than spreading them thin across the entire organization. Give them time to explore, run hackathons, create knowledge-sharing sessions, and build internal excitement. The enthusiasm spreads naturally when people see their peers achieving real productivity gains.
Building for Where Models Are Going, Not Where They Are
One of the most counterintuitive pieces of advice from leaders at the forefront of AI comes from this principle: don't always listen to your customers about what to build next. This seems to contradict standard product development wisdom, but in the AI field, it's often correct—and here's why.
The AI field is changing at a pace that makes following customer feedback shortsighted. In 2022, when models were still relatively limited, the industry built extensive scaffolding around them: agent frameworks, vector stores, retrieval systems, and numerous specialized tools designed to help compensate for model limitations. Developers had to do enormous amounts of work to make models useful for specific tasks.
As models have improved, something remarkable has happened: they've literally consumed this scaffolding. The specialized tools and frameworks that seemed essential have become unnecessary as the models themselves became capable enough to handle more of the work directly. This dynamic is ongoing. Current fashionable approaches like elaborate skills management and file-based context systems might become obsolete as models improve further.
Consider the trajectory of vector stores. In 2023, the entire AI developer ecosystem was obsessed with vector databases and RAG (Retrieval-Augmented Generation) systems. The assumption was that you needed specialized vector stores to bring organizational context into models. Teams invested heavily in vectorizing their document corpora, optimizing search algorithms, and building complex retrieval systems. But as models improved, simpler approaches often worked just as well. A generic search tool, even basic file system search, often proved sufficient. The specialized scaffolding turned out to be unnecessary—it was crutches for models that couldn't perform well enough without them.
This creates a dilemma for product builders: customers will naturally ask for improvements to what exists. They'll request better vector stores, more sophisticated agent frameworks, improved retrieval mechanisms. If you listen to these requests exclusively, you build better solutions within an outdated paradigm. You're optimizing for a local maximum.
The alternative is building for where the models are going. This requires staying closely attuned to model capabilities, understanding the trajectory of improvement, and being willing to simplify or abandon tools as they become unnecessary. It requires looking past what customers are asking for and thinking about what will actually matter in 12-24 months as the models continue their rapid evolution.
This isn't a blanket rejection of customer feedback. Rather, it's a call for balance. Listen to customers to understand their problems, but then think independently about whether their proposed solutions align with where the technology is actually heading. Sometimes the answer will be to build what they ask for. Often, it will be to build something simpler that will serve them better as models improve.
The Rise of One-Person Billion-Dollar Companies and the B2B SaaS Gold Rush
One of the most underappreciated second-order effects of AI advancement is its implication for company structure and the startup ecosystem. The phrase "one-person billion-dollar startup" has emerged to describe what becomes possible when individual leverage increases to extreme levels. One person, armed with AI tools, can accomplish what previously required a large team.
But the question that follows isn't immediately obvious: if one person can achieve billion-dollar results, what does that mean for everything else? The answer reveals an enormous opportunity that Silicon Valley often overlooks.
Consider a one-person startup that builds an AI-powered customer support solution. This founder uses AI tools to build the product, deploy it, and scale it to millions of dollars in revenue. They accomplish with AI tools what would have previously required a 50-person engineering team. But now, what happens when this one-person founder needs something specialized?
They might need custom integrations, bespoke features for their specific market, or tailored solutions that don't exist in the general market. This creates an opportunity for another startup—perhaps a one or two-person team—to build exactly that specialized software. Because AI makes building custom software so much cheaper, a tiny team can profitably serve a niche market that was previously too small to be economically viable.
Extend this logic, and you can imagine a future where dozens or hundreds of small, specialized B2B SaaS companies serve each one-person billion-dollar startup. Each of these supporting startups might have just a few people, might serve a narrow niche—say, payment processing integrations for podcast hosting platforms, or custom analytics dashboards for newsletter creators. Individually, these are small markets. Collectively, they represent an enormous opportunity that's currently invisible to most investors and entrepreneurs.
This is what creates a potential "golden age of B2B SaaS." Not because B2B SaaS is new, but because the unit economics have shifted radically. A market that previously required 10 people to serve can now be served profitably by one person with AI tools. Niches that were too small to be worth building for now have viable economics.
The implication for the startup ecosystem is profound: the era of forced consolidation and mega-companies might give way to an era of thousands of small, specialized, incredibly profitable companies. There will be far more opportunities, but they'll be smaller, more specialized, and easier to build.
Distribution and Network Effects: The Next Competitive Advantage
An interesting third-order effect emerges from the combination of easy product building and massive competition for attention. As the cost of building software products approaches zero, competition intensifies. If anyone can build anything, how do you win?
The answer increasingly points to distribution and audience. The entrepreneurs and creators who already have established audiences—people who can reach thousands or millions directly—gain an enormous advantage. A creator with a large email list, a popular podcast, or a substantial social media following can launch AI-powered products into a ready-made audience. Someone without distribution faces the traditional startup challenge of building an audience for their product.
This creates an interesting dynamic where traditional creator platforms and audience-building become more valuable, not less. A podcaster with 100,000 listeners has more startup potential than a brilliant engineer without an audience. Someone with a popular newsletter can launch AI-powered tools to their subscribers. The game isn't just "can you build something people want?" anymore; it's "can you build something people want and get it in front of the right audience?"
This might seem to advantage established creators over new entrants, but it also points to the importance of community, network effects, and audience building as core startup skills. The most successful AI-era startups won't just be those with the best technology; they'll be those that combine good technology with effective distribution.
The Future of Engineering Management
While individual contributors have undergone dramatic transformation, engineering managers face a different challenge. There's no "Codex for managers" yet, though managers increasingly use AI tools for certain tasks. The changes happening in management are more subtle than for engineers, but potentially more significant long-term.
The most clear trend is that AI amplifies the impact of top performers. Someone who truly understands how to work with AI tools, who leans in to the technology, and who has high agency becomes exponentially more productive. This creates an increasing performance gap between those who master AI tools and those who don't.
For managers, the implication is clear: allocating more than 50% of management time to top performers and genuinely empowering them becomes even more critical. This aligns with principles from "The Mythical Man-Month," which likens the best engineers to surgeons—the center around which everything else organizes. In an AI-enhanced world, this principle becomes even more pronounced.
The management philosophy that works best is proactive enablement. Rather than waiting for engineers to identify blockers, effective managers in the AI era look around corners to identify and remove obstacles before they become problems. This means actively monitoring Slack messages, Notion documents, and team dynamics to catch friction points early. It means ensuring top performers have every tool, decision, and resource they need before they need to ask.
There's an intriguing possibility emerging here: what if you could use AI to help identify these blockers? Imagine hooking a ChatGPT instance up to your company knowledge base, asking it to identify active blockers for your team or predict what might become a blocker in the coming months. It could scan Notion documents and Slack messages to surface issues before they become critical. This represents another layer of AI augmentation—using AI not just to build products, but to manage organizations more effectively.
Real Work vs. Open-Ended Work: Where AI Creates the Most Value
A distinction worth understanding is between open-ended knowledge work and repeatable business processes. Most software engineering falls into the open-ended category: when you build a feature, you're solving a novel problem, not executing a predetermined process. This is exactly where tools like Codex excel—they support exploration, generate novel solutions, and help engineers think through complex problems.
But as you move beyond software engineering, vast portions of the economy operate differently. Consider customer support, accounting, operations, or logistics. Much of this work involves following established procedures, applying known rules, and executing well-documented processes. Deviation from the procedure is generally not desired; adherence to it is what creates reliable, predictable outcomes.
AI has barely scratched the surface of this category of work. The focus in Silicon Valley has been on open-ended knowledge work—software engineering, data science, strategic finance—because these are the jobs of tech workers. But the global economy is dominated by repeatable processes. These represent an enormous, largely untapped opportunity for AI to create value.
The challenge with applying AI to these processes is different than with open-ended work. You need high determinism: the AI should reliably execute the process the same way every time. You need integration with business data and enterprise systems. You need the ability to handle exceptions and edge cases while maintaining the core process. This is less about "intelligent exploration" and more about "reliable, rule-based execution at scale."
There's genuine opportunity in this space, and it's likely to drive significant value creation over the next few years. While sexy AI applications in creative fields get attention, the real ROI may come from mundane process automation in enterprises. Automating a process that 10,000 people currently perform, even if it's not creative or exciting, creates massive value.
Competing with OpenAI: The Ecosystem Perspective
A natural question that founders ask is: how can I build a startup without being crushed by OpenAI, Google, or other large AI labs? The answer, from someone inside OpenAI, is surprisingly reassuring.
First, the market is simply enormous. The opportunity available through AI is so vast that it has fundamentally shifted what investors are willing to fund. VCs are now investing in companies that directly compete with each other because the total market opportunity is so large that competition doesn't eliminate the chance for success. Multiple companies can thrive in the same space.
Second, success for startups depends far more on execution and market fit than on raw model capability. Consider Cursor, now a massive success in the AI coding space, competing directly in a space where OpenAI operates. Cursor succeeded because it built something developers genuinely loved—the product experience was superior, the integration with workflows was seamless, and the team maintained focus on what users actually needed.
The lesson: don't obsess over where OpenAI is headed or what Google might build. Build something people genuinely want. If you achieve product-market fit, you'll find your niche.
Finally, OpenAI fundamentally operates as an ecosystem platform company. The API is the core product, and the strategic interest lies in enabling thousands of applications to be built on top, not in building competing applications. Every model released gets made available through the API, eventually. Access to models is granted freely to ecosystem participants. Tools like "Sign In with ChatGPT" are designed to support the ecosystem, not lock people in.
This reflects OpenAI's founding mission: to ensure AI benefits all of humanity. As a single company, they can't build solutions for every niche, use case, and geographic market. Instead, by making powerful tools available to developers globally, they amplify their impact. This benefits OpenAI through a growing API business and benefits everyone else through unprecedented AI capabilities.
Practical Advice for Building with AI Tools Today
For developers and teams looking to work with OpenAI's capabilities, there's a stack of tools available at increasing levels of abstraction.
At the lowest level is the Completions API—the most basic primitive. You give it text, it processes for a while, and you get a response back. It's unopinionated and maximally flexible. This is what most developers use, and for good reason: it works for nearly any application.
One level up is the Agents SDK, designed specifically for building AI agents that operate in loops, delegate to sub-agents, and orchestrate complex workflows. This adds useful abstractions while still maintaining significant flexibility.
Above that are UI components and widgets—pre-built user interface elements that leverage either the API or Agents SDK, allowing developers to quickly build beautiful interfaces for their AI applications.
There are also Evals products for testing and validating that your agents and workflows perform as intended, offering quantitative ways to measure AI system performance.
The architectural philosophy is revealing: offer maximum flexibility at the low level, and increasingly opinionated abstractions as you move up the stack. Teams can choose whether to work at the lowest level of abstraction (maximum control, more work) or use higher-level abstractions (faster development, more opinions).
The Next 12-24 Months: An Era of Rapid Change and Opportunity
Looking ahead, the consensus from those building at the forefront of AI is clear: the next 12-24 months are going to be some of the most exciting and transformative in the history of technology and startups. This isn't hyperbole; it's a statement grounded in the pace of capability improvements and the expanding possibilities.
For comparison, consider the previous cycle. The decade from around 2014 to 2020 included the rise of cloud computing, mobile apps, and early AI, but it didn't feel like transformation at the fundamental level. Then came the last three years—the introduction of ChatGPT, the explosion of AI-native applications, the shift in how engineering works, the emergence of new business models. It's been relentless and exhilarating.
The challenge is that many people are approaching this moment with either skepticism or anxiety. Some dismiss AI hype as overblown. Others feel overwhelmed by the pace of change and worry about falling behind. Both responses risk causing people to disengage exactly when engagement matters most.
The practical advice is to engage actively without becoming obsessed. You don't need to know every new tool, follow every announcement, or master the latest framework. You need to understand the capabilities and limitations of current tools, experiment with a couple of them, and stay oriented to how they're improving. Installing ChatGPT and connecting it to a few internal data sources like Notion or Slack is more than sufficient. Using a Codex client and exploring what it can do is a good starting point.
The key is not taking this moment for granted. In a few years, when AI capability is even more advanced and the transformation is complete, looking back on this period will feel like looking back on the early days of the internet or mobile computing. You'll wish you'd engaged more deeply, experimented more boldly, built more fearlessly.
Conclusion
The transformation of software engineering and work more broadly is already underway, and the next 12-24 months will likely see acceleration. Engineers are becoming AI orchestrators, managing fleets of agents rather than writing code directly. Companies that deploy AI thoughtfully, with both executive support and bottoms-up enthusiasm, are unlocking tremendous value. The market opportunity is expanding in unexpected directions, from one-person billion-dollar startups to golden ages of specialized B2B SaaS.
For those working in tech, the advice is clear: lean in to this transformation. Experiment with the tools. Understand their capabilities and limitations. Build on top of them. Don't get overwhelmed by the pace of change; focus on understanding the fundamentals. And don't take this moment for granted—the next few years will define the trajectory of technology for the next decade. The time to engage is now.
Original source: OpenAI’s head of platform engineering on the next 12-24 months of AI | Sherwin Wu
powered by osmu.app