Discover G Stack, the AI engineering framework transforming Claude Code into a development powerhouse. Learn Garry Tan's agent-based workflow that codes fast...
Claude Code AI: How Garry Tan Built the Ultimate Agent Software Stack
Key Takeaways
- G Stack revolutionizes AI-assisted coding by implementing a team-based approach with specialized skills and roles, mirroring real development teams
- The agent era demands proper scaffolding, not raw model intelligence—thin harness, fat skills is the winning strategy
- Office Hours skill replicates YC's proven methodology, using forcing questions to validate ideas before building
- Browser automation and adversarial review catch critical issues early, improving quality from 6/10 to 8/10 automatically
- Parallel processing enables 10-15 concurrent projects, with 80-90% time spent on planning and review rather than raw coding
- The barrier to building software has collapsed—the only question remaining is what you'll create
The Agent Era: Why Traditional Coding is Obsolete
We've entered a completely new era of software development. Garry Tan, President and CEO of Y Combinator, spent a decade as a full-time engineer at Palantir, co-founded Posterous (acquired by Twitter), and built YC's internal knowledge platform Bookface. His credentials are unquestionable. And his assertion is radical: the way to build with AI agents mirrors how humans always built—as teams, with roles, with process, with review.
The catalyst came when Tan heard legends like Andrej Karpathy and Boris Cherny admit they'd stopped writing code manually. He started experimenting with Claude Code in January and became "completely hooked." In just two months, he coded more than he did in all of 2013, the last year he worked intensively as an engineer. The result? He essentially rebuilt Posterous—a platform that originally took two years, a co-founder, and ten engineers—largely through Claude Code sessions.
But here's the critical insight: raw model intelligence isn't the bottleneck. Out of the box, Claude produces plausible-looking code that silently breaks because the model "wanders" without understanding your specific data, architecture, and context. It guesses at scale. This is where most AI coding efforts fail. The solution isn't a smarter model; it's better scaffolding.
G Stack: Thin Harness, Fat Skills Framework
Three weeks after identifying the problem, Tan built G Stack—an open-source repository that transforms Claude Code into a legitimate AI engineering team. The result shocked everyone: it gained more GitHub stars than Ruby on Rails in weeks. This isn't because G Stack is flashy. It's because it works.
The philosophical approach is elegant: thin harness, fat skills. Instead of asking one model to do everything poorly, G Stack creates specialized skills that act like team members with specific expertise. Each skill handles one job exceptionally well. The framework manages orchestration, context passing, and quality gates. The model focuses on its lane.
The signature skill is "Office Hours"—directly modeled after how YC partners work with founders. It's the distilled version of thousands of hours YC's 16 partners have spent perfecting their methodology. But here's what makes it powerful: it's not a generic feature brainstorm. It's a forcing function that reframes the problem before you write a single line of code.
Office Hours: The Forcing Function That Validates Ideas First
Walk through a real example with Tan. He opens Conductor (the G Stack interface) and decides to build a startup idea: a tool that extracts 1099-INT tax documents from Gmail and financial institutions. Simple concept. Most developers would jump straight to building. Office Hours doesn't allow that.
The skill asks six forcing questions. The first one cuts to the heart of everything: "What's the strongest evidence you have that someone actually wants this—not 'is interested,' not 'signed up for a waitlist,' but would be genuinely upset if it disappeared tomorrow?"
The follow-up demands specificity: "Have you personally lost track of a 1099-INT? Has someone you know gotten dinged by the IRS for missing one? Is there a specific person or moment that made you think 'this needs to exist?' RECOMMENDATION: Be as specific as possible. Names, dollar amounts, consequences. 'Tax season is stressful' is not demand. 'I owed the IRS $847 in penalties because Ally Bank sent a 1099-INT I never saw'—that's demand."
Tan confirms he has personal experience. The model probes deeper: How many accounts? Which banks? What was the consequence? When the answer comes back—multiple accounts, manual hunting every tax season, accountants sending annoyed emails—the model identifies something crucial: the pain is real but it's annual friction, not a crisis. That changes the product calculus.
This is where Office Hours separates wheat from chaff. It doesn't accept surface-level answers. It asks why existing solutions don't work. TurboTax has 1099 import. H&R Block does too. Plaid connects to banks. Why do none of these solve the problem? This questioning forces the founder to think bigger. The model recognizes the real opportunity: 1099 aggregation isn't the product; it's the wedge. The actual business might be a CPA marketplace where you earn 10-15% of tax preparation fees instead of charging $2-5 annually for document aggregation. The revenue multiple shifts from linear to exponential.
This conversation—this thinking—is what you get from YC office hours. It's the part of startup building that separates ideas that eventually fail from ideas that work. And G Stack delivers it before you've written any code. Most developers would have already built a document scraper and wondered why nobody paid for it. Instead, the model helped reframe the entire business model.
Three Approaches: How AI Evaluates Solutions
After Office Hours reshapes the idea, the model proposes three completely different approaches:
Approach A: Gmail OAuth + Search is the smallest scope. Search Gmail for tax document notifications, output a checklist of banks. No browser automation. Small effort, small risk. As Tan notes, this sounds "interesting, but doesn't sound big enough for me to actually work on." He could do this himself in a weekend.
Approach B: Full Stack with Browser Automation connects the wedge to the marketplace. Use Gmail OAuth to find 1099s, then employ browser automation to log into banks, download PDFs, and route documents to CPAs. The revenue potential scales because you're enabling transactions. This is the version that excites Tan.
Approach C: CPA-First (Flip the Go-To-Market) proposes unconventional thinking. Instead of OAuth complexity, what if you simply prompt the user to open Gmail? A specialized browser agent searches their inbox locally, identifies banks from emails (which often request specific account information), and handles the automation without storing credentials. It's elegant, unconventional, and sidesteps major infrastructure headaches.
What's remarkable is the model doesn't force one answer. It presents options with honest assessments of effort, risk, and potential. It flags that "Approach B sounds appealing, but the hybrid approach could be more effective." The thinking mirrors what an experienced engineering lead does: evaluate trade-offs, not prescribe solutions.
Browser Automation: The Unexpected Power Move
The conversation gravitates toward browser automation, an unconventional solution that becomes the star of the planning session. Here's why: you can't reliably parse tax documents from arbitrary banks. But you can teach an AI agent to navigate a browser, search Gmail, identify document notifications, prompt the user about other accounts they maintain, handle multi-factor authentication, download PDFs, and send summaries to their CPA. All while the user watches on their actual machine—not in the cloud.
This is the second critical insight: trust. Users want to see what the AI is doing. Putting automation in the cloud is someone else's computer and feels risky. Running it locally, visible, auditable—that's a different experience entirely. G Stack's browser capability removes major trust friction from AI-assisted workflows.
Adversarial Review: Automated Quality Gates
After the planning phase solidifies the approach, the next skill runs "adversarial review"—a multi-step quality gate that automatically hunts for problems. It doesn't ask the model to build perfect code. It asks the model to build code, then attacks it.
The review identified 16 issues on the first pass: missing error handling for network failures, incomplete privacy disclosures, no two-factor authentication handoff plan, race conditions in parallel bank login attempts, and more. Instead of flagging and stopping, the system attempted auto-fixes. The quality score improved from 6/10 to 8/10. Only minor issues remained.
This is a game-changer for teams that historically spent weeks in QA. The adversarial review skill acts like a senior engineer who has seen every way a feature breaks and knows to test for it. It runs before a human ever touches the code.
Design Shotgun: Multiple Variants in Minutes
Instead of a traditional CEO review (which in standard Scrum is usually one person's opinion), G Stack offers "design shotgun"—a tool that generates multiple AI design variants simultaneously. For the tax app's main dashboard, three distinct options emerged:
Option A: Command Center presented detailed bank and document status in a technical layout—powerful for engineers, potentially overwhelming for average users managing taxes.
Option B: Card-Based Interface showed progress and missing documents in a user-friendly format. This resonated immediately. Tan rated it highly.
Option C: Complex Multi-panel tried to show too much. Less intuitive.
The team selected Option B as the design direction. This happened in minutes, not weeks of design cycles. The parallel agent approach generates candidates; human judgment selects the winner.
The Sprint Process: Planning, Building, Reviewing, Shipping
G Stack codifies the entire development lifecycle into a sprint process: Think → Plan → Design → Build → Review → QA → Ship. Each phase has a dedicated skill that feeds into the next.
For those who want less hands-on involvement, "autoplan" automatically runs CEO, engineering, design, and developer reviews based on default recommendations. After code is built, the "review" skill performs a staff-level bug check. Then "qa" runs browser-based testing—taking screenshots, performing complex interactions, filling forms, downloading media, running full regression tests, checking JavaScript and CSS issues. Finally, "ship" prepares the pull request for integration.
The entire workflow is available through 28 different commands within G Stack. It's been adopted broadly, with users reporting they spend 80-90% of their time in planning and review stages, not raw coding. This represents a fundamental shift: quality is built upstream, in planning and review, not downstream in debugging production failures.
Parallel Processing: Managing 10-15 Concurrent Projects
Here's where G Stack transforms from clever tooling to multiplier technology: Tan runs 10-15 parallel Claude Code sessions simultaneously across different projects. Sometimes he runs multiple sessions on the same project. Each worktree represents a new work item. When an idea strikes, a bug report arrives, or he sees frustration about G Stack on X (formerly Twitter), he creates a new worktree and runs office-hours to flesh it out.
The bottleneck he initially hit was QA. Planning and design accelerated massively, but he found himself manually performing QA—the least enjoyable part of development. By wrapping Playwright at the CLI level, G Stack can now use a full browser for comprehensive testing. This transformed G Stack into what Tan calls "Level 7 efficiency"—multiple concurrent efforts streamlined through the entire development workflow.
He no longer uses a traditional to-do list. Instead, each idea or bug becomes a worktree. He can process 10, 15, 20, or even 50 pull requests daily, depending on his meeting schedule. This isn't because he's superhuman. It's because the bottleneck shifted from development capacity to review and decision-making capacity. The agents handle coding; Tan handles strategy and approval.
Supply Chain Security: A Growing Concern
One concern shadows this optimistic vision: supply chain attacks. When code is generated at scale, how do you verify it's secure? Tan admits being "quite paranoid" about this. The good news is G Stack provides security tooling built into the framework. Still, this remains an area requiring vigilance as AI-generated code scales.
The Collapse of the Barrier to Building
Tan's broader thesis is deceptively simple: the barrier to building software has collapsed. A year ago, building Posterous required two years, a co-founder, and ten engineers. Today, one person with G Stack and Claude Code can achieve similar scope in months. The leverage is staggering.
This isn't because the technology is "just a few percentage points better." It's because scaffolding, workflows, and team dynamics matter more than raw intelligence. When you remove the friction of context switching, force ideas through validation gates, run automated quality checks, and enable parallel work, you multiply human effectiveness exponentially.
The question Tan poses is therefore existential: "The only question left is what are you going to build?" The technical barrier is gone. The opportunity is unlimited. The time to act is now.
Accessing G Stack: Your Path Forward
G Stack is available as an open-source repository at github.com/garrytan/gstack. The first command to run is '/office-hours'—you're literally getting a version of the product thinking YC does with founders, including similar pushback and reframing, all before you ever apply to YC or pitch to investors.
The thinking behind G Stack reflects a profound truth: great teams don't work because one person is brilliant. They work because process, roles, and review force better thinking. AI-assisted development scales this principle. The model becomes the junior developer. The skills become the senior engineers. The human becomes the decision-maker and strategist.
Conclusion
We are witnessing a fundamental shift in how software gets built. Garry Tan's G Stack demonstrates that the limiting factor in AI-assisted development isn't model intelligence—it's scaffolding, workflow, and process. By implementing team dynamics (Office Hours, adversarial review, design shotgun, QA automation) through modular skills, G Stack transforms Claude Code from a code-completion tool into a legitimate AI engineering team capable of handling complex, production-grade projects.
The era of manual coding isn't over—it's transformed. The developers who thrive will be those who architect workflows, validate ideas rigorously, and make strategic decisions while agents handle execution. For anyone building software today, exploring G Stack isn't optional. It's how the next generation of products will be built. Visit github.com/garrytan/gstack and start your first office-hours session. The barrier to building has collapsed. The question now is: what will you create?
Original source: Inside Garry Tan's Claude Code Setup
powered by osmu.app