Discover how frontier AI models like Codex are transforming product development, team structures, and the future of work. Insights from OpenAI's product leader.
How AI is Reshaping Product Management: The Future of Work in 2026
Key Insights
- 90% of OpenAI employees use Codex — not just engineers, but the entire company across all departments
- Product implementation is no longer the expensive part — taste, curation, and decision-making are now the critical bottlenecks
- The design process hasn't died; it's evolved — prototypes and documents both have value, but their purpose has fundamentally changed
- AI struggles with design because taste is hard to quantify — human preference and cultural context remain irreplaceable
- Team structures are collapsing into fluid roles — individuals now function as PMs, engineers, and designers simultaneously based on what needs doing
The New Shape of Product Work
The landscape of product development has undergone a seismic shift with the rise of frontier AI models. At OpenAI, the adoption of Codex has transcended traditional engineering boundaries. With 90% of the entire company using the tool—from marketing and finance to legal and communications—it's clear that AI-powered development tools are becoming as essential as email in the modern workplace.
The fundamental assumption that drove product work for decades has completely inverted. Previously, teams invested heavily in research, ideation, and prototyping to derisk expensive implementation phases. Documentation, wireframes, and prototypes served as cheap substitutes for actual development. Today, that equation has flipped entirely. Implementation is no longer the bottleneck; curation and taste are.
Anyone with access to modern AI models can now build almost anything from scratch. This democratization of capability has created an environment where "everybody is building everything." The expensive part isn't writing the code—it's deciding what's good, what to fold into other features, how to present information, and how many iterations to evaluate before choosing the right direction. This is the new frontier of product work: not execution, but judgment.
Why AI Struggles With Design (And What That Means for Your Team)
The question of why frontier models aren't yet excellent at design reveals something fundamental about human work that won't be automated away anytime soon. Design is fundamentally harder to grade than software because the human element of taste is integral to the feedback loop.
For code, the feedback is binary: Does it compile? Does it work? These are objective, verifiable criteria that AI models can optimize against. Design lacks this clarity. What makes something "good design" involves cultural context, novelty, systems thinking, and an understanding of where a product is headed. When a model generates design output, it often replicates existing patterns—think of how many websites looked like Linear's after its popular redesign. For code, over-indexing on known patterns is desirable; for design, it's the opposite. Novelty and differentiation matter.
There's also an abstraction layer that current AI struggles with: the relationship between visual design and the codebase architecture. A rebrand might seem like a simple task of updating 263 components, but at a deeper level, it's about the semantic relationships between design elements and code structures. If two UI elements that look visually different actually represent the same interaction pattern in the code, that abstraction layer is something AI still finds challenging. This is the frontier of AI-assisted design—not surface-level aesthetics, but structural understanding.
Research investment patterns also explain why some fields advance faster than others. Models get trained intensively on tasks that accelerate AI research itself. Early coding models were obvious investments because better code generation directly advanced research capabilities. The same case isn't as clear for design, even though it's important. These are practical limitations that will likely erode over time, but the murkier problems—novelty, cultural context, abstraction layers—will persist longer.
The Evolution of Product Processes: PRDs, Prototypes, and Medium Selection
The debate about whether PRDs (Product Requirement Documents) are dead misses a crucial nuance. Both PRDs and prototypes remain valuable; what's changed is when and why you use each medium.
Previously, the medium itself conveyed information about the stage of development. A polished prototype signaled that assumptions had been derisked and decisions were final. A detailed PRD meant the feature was well-defined and ready for implementation. This signaling function helped teams stay aligned. Today, building something is so inexpensive that visually polished prototypes can emerge from early-stage exploration. This creates confusion—something that looks production-ready might actually be a rough concept test.
The solution isn't abandoning one format for another; it's becoming intentional about medium selection. If you're exploring a vague concept and need clarity around the problem space, a document might be the best tool. The act of writing forces precision. If you want to test interaction patterns and get tactile feedback, a prototype makes sense. The critical skill now is choosing the right medium for your specific communication goal and being explicit about what stage of development you're in.
At OpenAI, many teams have adopted a "prototype-first" approach because implementation costs have plummeted. Instead of writing lengthy PRDs, teams create 5-10 different prototypes from similar initial ideas, compare them visually, and pick a direction. This is efficient for visual exploration but can lead to premature decisions if teams don't recognize they're still in the exploratory phase. The primal mark principle—the first sketch that influences all subsequent work—means jumping into prototyping too early can anchor thinking in one direction.
The future involves fluidity: knowing when documentation adds clarity, when prototypes enable testing, and when to abandon either approach in favor of direct conversation. This is taste in action.
Understanding Taste: The Most Valuable Skill in 2026
Taste has become the most sought-after and misunderstood skill in product work. It's not about aesthetics or personal preference. A person with great taste—even if they wear cargo shorts—understands systems thinking, foresight, effective communication, and context.
Taste involves knowing what to work on in a sea of possibilities. With AI capable of generating hundreds of feature variations, the bottleneck shifts to human judgment about direction. It's about understanding how something fits into a larger system, where a product is headed, and what themes matter. It's about choosing the right medium to convey information. It's about recognizing that a fully functional prototype might be an early exploration rather than a launch-ready product.
In practice, good taste means asking the right questions: What is the goal? How do we get there? What should we build when we can build almost anything? These aren't technical questions—they're strategic and creative ones. As AI handles more execution, the people who thrive will be those with strong conviction about direction combined with intellectual humility to change course when evidence suggests they should.
This is why teams are gravitating toward hiring "high-agency, high-taste" individuals who can shepherd ideas from conception to completion. These people understand their discipline deeply but can cross functional boundaries when needed. They have opinions, but they're informed by research, user feedback, and business strategy rather than ego.
Team Structure in the Age of AI: Roles Are Collapsing, But Disciplines Aren't Disappearing
The "role collapse" happening at OpenAI—where individuals function simultaneously as engineers, designers, and product managers—is worth examining closely. It's real, but it doesn't mean traditional disciplines are becoming obsolete.
The Codex team at OpenAI operates with roughly double-digit engineers, roughly half that on design, and a handful of product people. But "product" functions more like zone defense in basketball than traditional project management. Instead of two product managers closely aligned on a single project, they're spread out to maximize coverage and identify gaps. This requires product-minded engineers who can think beyond code and designers who understand technical constraints.
The term "member of technical staff" has emerged as a description for individuals who don't fit into traditional org boxes. Their role is defined by what they do, not their title. If you spend 70% of your time doing product work, you're a PM. If you spend 60% coding and 40% on product, you're both. This is powerful for high-agency individuals who can move fluidly between responsibilities, but it creates risk if it's used as an excuse to eliminate specialized knowledge.
The danger of the "everyone is a builder" mentality is that it can obliterate specialized disciplines. When companies say "we're eliminating the product role," they risk losing accumulated best practices, frameworks, and institutional knowledge about how to structure product work. Similarly, treating design as something any engineer can do underestimates the depth of design thinking. Yes, you can use Figma if you can use software, but that doesn't make you a designer.
The productive path forward balances fluidity with specialization. Enable talented people to move across boundaries and bring their full skillset to problems. Hire product-minded engineers and engineer-literate designers. But don't mistake accessibility (design tools are more accessible) with expertise (understanding why certain designs work). The best teams will have people who have gone deep in one discipline and can now fluently apply those insights across others.
Building Features That Don't Work Yet: The Long Game Strategy
One of the most counterintuitive lessons from Codex's evolution is the value of building features that aren't ready yet. The original Codex interface asked users to delegate tasks to the model and wait for completion. The model would go off and complete the task, then return. This was too "AGI-pilled"—too ambitious for the models' actual capabilities at that moment.
When Copilot launched with a different interaction model—asking questions, providing suggestions, letting the user stay in control—it performed dramatically better. Same underlying model capability; different product form factor matched to actual AI abilities. This taught a crucial lesson: features don't need to work perfectly today if you have conviction they'll work when models improve.
At OpenAI, teams intentionally prototype features that will only become viable when models reach a certain capability level. Instead of asking "Is this useful right now?" they ask "Will this be useful when models are 2x better?" This requires documenting why a feature didn't work, what capability gap prevented success, and clear metrics for when to revisit it.
This approach conflicts with traditional product thinking, which emphasizes shipping only what works today. But in an environment where model capabilities double every few months, the calculus changes. Building a feature six months early and waiting for models to catch up might be faster than shipping something suboptimal today and iterating. It's not a carte blanche to build anything; it requires discipline about which bets matter and why.
The implication is profound: shipping isn't the final form anymore. Shipping is creating an artifact that you can test against future models. It's building infrastructure for where you're headed, not optimizing for where you are.
How AI Is Changing Leadership and Planning
Traditional product roadmaps have become nearly obsolete in environments where model capabilities shift every few months. Planning nine months out with precision is false precision; everything you planned last November might be invalidated by model improvements in January.
At OpenAI, planning works differently. Short-term plans (1-2 months) have high detail because the variables are more stable. Medium-term plans (3-6 months) remain deliberately hazy. Long-term plans (9+ months) are strategic directions, not feature commitments. The team tracks what capabilities models will likely have and on what timeline, then maps features to those capability windows.
This requires different leadership skills. Instead of top-down planning, effective leaders create space for bottoms-up exploration. They cultivate culture where people identify opportunities and experiment quickly. They recognize that today's blocker might be solved by tomorrow's model improvement, so they focus on building optionality rather than betting everything on current capabilities.
For leaders personally, AI is becoming a tool for pattern recognition and coordination. Andrew describes waking up to a daily brief automatically generated from 3,000 Slack channels, identifying which issues need attention. He can then coach the system ("Deemphasize this, worry about that instead") to refine what gets surfaced. This isn't about letting AI make decisions; it's about using AI to help humans prioritize better.
The challenge is scalability. Right now, this kind of automation requires someone deeply familiar with the domain to set up and refine. The future state would be a system that any knowledge worker could configure through conversation, without needing to understand APIs or automation platforms. This remains an open problem.
The Latest Frontier: Autonomy and Loops
The question of fully autonomous development—"set it up, come back when it's done"—remains unsolved. Current models excel at individual tasks but struggle with two critical challenges: knowing which features to build and how to maintain codebase quality.
On the feature selection front, models can't yet understand which requested features matter, which should be grouped together, or how different requests fit into product strategy. A model running unsupervised could generate 100 features, each technically correct, but the product would be incoherent.
On the codebase front, AI models tend to increase complexity. They'll solve individual problems elegantly but miss opportunities to delete unnecessary code or refactor for clarity. They don't have the systems perspective that comes from maintaining a codebase over years. A truly autonomous system would need to understand not just "implement this feature" but "implement this feature while maintaining these abstractions and simplifying this other part of the code."
This is still an active area of exploration. Harness engineering—building systems that guide AI toward better decisions—is showing promise. But we're not yet at a place where you can tell a system "grow" or "win" and have it autonomously improve your product. That doesn't mean it's impossible; it means we're still learning what the right interfaces and feedback mechanisms look like.
Why Codex Succeeded Where Other Interfaces Failed
The desktop application form factor mattered more than people expected. When OpenAI launched browser-based agent tools (Operator, agent mode in ChatGPT), they worked reasonably well but couldn't achieve the same adoption or satisfaction as the desktop Codex app. Why?
Part of it was latency and control. A local app running in Electron provided snappier interaction than browser-based approaches. Part of it was intentional scope—the desktop app was clearly a development tool, not trying to be a general-purpose assistant. Users knew what to expect.
The company tried to generalize the interface for non-technical users, building alternative versions for marketing, finance, and legal teams. Nobody used them. Instead, people kept coming back to Codex because it was the right shape for the work: clear, purpose-built, and powerful. The lesson was that trying to be everything to everyone (the "super app" trajectory) was less effective than being excellent at one thing and then thoughtfully expanding.
The current thinking combines these insights: Codex functions as a home base where you start work, coordinate across tools, and hand off to specialized applications when needed. It can open Excel for financial modeling, connect to external APIs, or launch specialized apps. It's not trying to replace Premiere Pro; instead, it can understand Premiere Pro and help you use it better. It's not trying to replace Linear or Notion; it can interact with them intelligently.
This architecture respects user preferences while expanding capability. It's maximally flexible for how people actually work rather than how we imagine they should work.
Conclusion
The transformation of product work in 2026 centers on a simple inversion: we've moved from a world where implementation was expensive and thinking was cheap, to one where implementation is cheap and taste is expensive. This shifts the entire value proposition of product teams, design disciplines, and leadership itself.
The professionals who thrive will be those who develop strong taste—not just aesthetic taste, but strategic judgment, systems thinking, and the wisdom to know what to build when you can build almost anything. They'll be comfortable with fluidity across disciplines while respecting the depth within each one. They'll build features that don't work yet but will work soon. They'll plan in ways that accommodate uncertainty while maintaining direction.
The good news is that this shift makes work more interesting. Instead of endless meetings about requirements before implementation, you're prototyping, testing, learning, and refining. Instead of rigid org structures, you have permission to work across domains. Instead of false precision in planning, you're building optionality and staying responsive.
If you're building products, the immediate actionable shift is to think about what medium (document, prototype, conversation) is best for what you're trying to accomplish, rather than defaulting to one approach. If you're leading teams, focus on hiring taste and agency more than specific skill sets—the skills can adapt faster than the judgment can develop. If you're using AI in your work, start by identifying the repetitive, low-judgment parts of your job that AI can handle better than you, freeing you to spend time on the parts that require taste.
The future of work isn't "everyone does everything"—it's everyone doing more of what they're genuinely good at, supported by AI handling the rest. That's worth building toward.
Original source: OpenAI Codex lead on the new shape of product work | Andrew Ambrosino
powered by osmu.app