Discover agentic workflow and how AI agents automate coding and business tasks. Learn from Lablup CEO's 1M+ line code project built in 40 days with AI agents.
Agentic Workflow: How AI Agents Solve Real Problems (Backend.AI:GO)
Key Insights
- Backend.AI:GO is a smart routing infrastructure built in 40 days using AI agents, consuming 13 billion tokens and producing ~1 million lines of code
- Agentic workflow fundamentally changes the development paradigm: humans supervise and guide AI agents while maintaining cognitive load despite task delegation
- Token efficiency and inference speed are becoming critical competitive advantages—the future belongs to companies that generate fewer tokens while maintaining quality results
- Software development is shifting from code-centric to model-centric architectures, where AI models handle core logic while deterministic layers provide control
- The era of unlimited AI-accelerated development is creating a massive gap: rapid tool proliferation followed by consolidation, where only brand, accumulated knowledge, and maintenance will ensure survival
What Is Agentic Workflow and Why It Matters
Agentic workflow represents a fundamental shift in how software gets built. Rather than writing code directly, developers now orchestrate autonomous AI agents to handle development tasks while maintaining strategic oversight. This isn't just about automation—it's about reimagining the entire development process around agent-based collaboration.
Backend.AI:GO exemplifies this paradigm shift. The product emerged from a seemingly simple need: Lablup needed a web UI to showcase the Continuum Router, its core intelligent routing technology. However, what started as a demonstration evolved into a comprehensive platform built almost entirely through agentic development. The journey from conception (December 24, 2024) to CES demonstration (January 6, 2025) to current version 1.1 showcases how rapidly AI agents can execute complex projects when properly guided.
The staggering efficiency metrics tell the story: approximately 13 billion tokens consumed across 40 days of development, eight parallel computing resources, and roughly 1 million lines of code produced. For context, the CEO previously spent three years building TextCube, which totaled approximately 1 million lines of code. Compressing three years of work into 40 days represents not just acceleration, but a complete transformation of the development paradigm.
However, this acceleration comes with a critical psychological cost. The cognitive load doesn't decrease when delegating to AI agents—it transforms. Developers receive constant feedback streams, requiring intensive focus despite not writing code directly. The CEO describes sleeping 1.5 hours per night during peak development, with visible aging from the mental demands. The dopamine-driven cycle of rapid iteration creates powerful psychological reinforcement that can become difficult to escape.
The Hidden Economics of Token Efficiency
As AI-powered development becomes standard, the industry faces a crucial inflection point: unlimited token consumption isn't sustainable, and future competitive advantage belongs to those who accomplish more with fewer tokens.
Initially, when Anthropic announced its double token event, Lablup saw an unprecedented opportunity. Token availability directly correlated with development speed and competitive advantage. The initial approach was straightforward: pour tokens into development tasks and gather human feedback for refinement. This worked remarkably well—even when two AI agents compete over the same source code, developing different features simultaneously, the system resolves conflicts automatically and perfectly. The merge queue that was once a bottleneck because humans had to resolve code conflicts is now handled entirely by AI systems.
Yet this abundance-based approach created a hidden constraint. As development accelerates, the core challenge shifts fundamentally. It's no longer about having enough tokens—it's about using tokens efficiently. When performing identical tasks, AI models typically increase "thinking tokens" or extend in-context learning windows to improve results. While this naturally produces better outcomes, it simultaneously slows development velocity. The question becomes: how can developers accomplish equivalent results using less computational thinking?
This drives two complementary strategies. First, AI systems must generate fewer tokens while maintaining output quality—essentially optimizing the efficiency of each generated token. Second, token generation speed itself must increase dramatically. This means inference infrastructure must support 5-10x faster iterations than current ChatGPT speeds. In this competitive landscape, high-speed inference becomes as critical as raw model capability.
The implications extend beyond technical optimization. Companies that master token efficiency will outpace competitors not by having access to more tokens, but by accomplishing more with fewer. This represents a subtle but profound shift: from token abundance to token efficiency as the primary competitive moat.
Human Adaptation: The "Bio Token" Reality
When AI agents handle development tasks, humans don't experience less cognitive burden—they experience different cognitive burden. This distinction proves crucial for understanding sustainable agentic workflows.
The CEO introduced the concept of "bio tokens" to describe the human cognitive budget available for active engagement. While developing Backend.AI:GO, he noticed his gray hair increased significantly and sleep patterns compressed. His previous work cycles of 5-hour work sessions followed by sleep became fragmented into 3.5-hour work sessions with 1.5-hour sleep periods. This wasn't laziness or poor time management—it was the natural result of constant feedback loops with AI agents.
The psychological mechanism behind agentic workflow mirrors addictive gaming mechanics. Users pay (cognitive attention), receive immediate feedback (code completion), and gain dopamine reward when achieving objectives. The cycle repeats: because agents work well, you use them more; because you use them more, they perform even better; because they perform better, you find yourself unable to disengage. This creates a positive feedback loop that can dominate entire work-life patterns.
However, this raises critical sustainability questions. What happens when the agentic workflow connection breaks? A developer who has spent months or years supervising AI agents rather than writing code directly may struggle to resume traditional development. The product itself becomes dependent on continuous AI agent engagement for maintenance and evolution. When that engagement inevitably stops—due to funding, personnel changes, or project pivots—the codebase often dies.
This contrasts sharply with the pre-AI era, where high barriers to entry ensured that well-built software, once established, could be maintained long-term. Today's AI-accelerated development produces software with an inherently weaker maintenance will. If developers haven't personally struggled to build systems, they're less motivated to maintain them. If all operations are delegated to AI agents, the knowledge required to manage systems independently atrophies.
The Coming Software Consolidation Wave
The combination of rapid AI-powered development and weak maintenance incentives creates a paradoxical future: explosive growth in software quantity followed by massive consolidation.
Currently, we're witnessing the explosion phase. Open-source projects are emerging at unprecedented velocity. Anyone with AI access can now build complex tools in days that previously required months. This democratization initially appears beneficial—more software available to more people. However, sustainability becomes the critical filter.
The CEO draws parallels to the social media era. When blogging dominated, motivated individuals created and maintained personal blogs partly for direct reader feedback. When discussions migrated to Twitter, the comment sections disappeared, and many bloggers stopped writing. The primary motivation—immediate social feedback—had vanished. Similarly, AI-generated software projects proliferate rapidly, but maintenance motivation deteriorates equally fast because developers lack the personal investment of traditional development.
This creates two potential futures for software: instant apps and long-term survivors. Instant apps are built on-demand and discarded after use, or preserved if frequently accessed. Google and other platforms might eventually manage even the decision to save or discard automatically. These applications have lower reusability but emerge and execute rapidly. Meanwhile, other software will proliferate and gradually decline into abandonment.
Historically, the number of essential software categories remains surprisingly small. Early iPhones lacked folder features, forcing users to navigate through nine or more home screens. Once folders appeared, users realized the average person uses fewer than 30 apps, with the top 10 accounting for over 90% of usage. This natural consolidation principle applies to the AI-accelerated era as well.
Software that survives consolidation shares identifiable characteristics. First, it provides utility only through its own network—social utility that strengthens with user count. Second, it integrates deeply into daily workflows, like productivity tools (Obsidian, DEVONthink) or office suites. Third, and most importantly, it has guarantees of continuous maintenance and development. A brand promise that the software won't quickly disappear becomes the critical differentiator.
In this consolidation environment, only products with assured long-term support survive. The reason popular open-source projects persist while thousands of abandoned projects accumulate is precisely this factor: established projects carry brand recognition and implicit guarantees of continuity. New entrants must either provide equivalent guarantees or occupy completely novel niches—a shrinking category.
The Architectural Transformation: From Code to Models
The current moment represents the third major transformation in software development, comparable in magnitude to the shift from punch cards to keyboards, or the transition from standalone software to web services.
Historically, software development involved developers writing code and operations engineers managing infrastructure—two distinct disciplines that later merged into DevOps. The fundamental assumption remained: developers write deterministic code; operations ensure reliable execution. This model dominated for decades because code provided certainty—the logic was predictable, execution was reproducible, and results were deterministic.
However, code itself was never the ultimate goal. Code exists to implement logic that computers can process. The deterministic von Neumann architecture provided predictable logic execution, while the medium (networks, storage) introduced instability. The critical insight is that code is merely one mechanism for expressing logic.
Deep learning models and similar AI systems now handle most of what traditional code manages. This isn't speculation—markets have already made this judgment. Companies developing models are capturing disproportionate value, including hardware manufacturers and model creators. The software layers built on top are being compressed, their margins shrinking as model providers consolidate value.
This suggests the future architecture involves three layers. At the core sits an AI model or engine capable of processing complex logic. Above that sits a deterministic control layer—traditional code providing governance, ensuring safety, and maintaining predictability. Finally, a UI/UX layer enables human interaction, AI-to-AI interaction (via APIs), or MCP (Model Context Protocol) for advanced integrations.
Within this three-layer model, the deterministic control layer proves unexpectedly valuable. The CEO's experiment with Claude Code revealed that model selection matters far less than the harness surrounding it. Using identical models (Claude, Gemini, Codex), the Claude Code framework produced consistently superior results compared to direct model interaction. This demonstrates that the 10% deterministic control layer wrapping an 80% model core actually creates 100% of the practical value. The remaining 10% comprises UI/UX and integration layers.
This architectural insight has profound implications. Companies that excel at model development are capturing obvious value, but companies that build superior control layers will ultimately capture hidden value. The integration of models into functional systems depends on sophisticated orchestration, exactly the domain where Lablup's decade-long experience with unreliable infrastructure provides competitive advantage.
Backend.AI:GO: Practical Agentic Development in Action
Understanding agentic workflow requires examining how Backend.AI:GO was actually built. The platform demonstrates that effective AI agent orchestration isn't about complex machinery—it's about disciplined context management and progressive refinement.
The development process begins with extensive context preparation, not direct implementation. Rather than instructing an AI agent to "create a router," successful agentic development involves building what the CEO calls a "soul document" (claude.md). This file contains the project philosophy, required components, integration patterns, and success criteria. Alongside this sits a PROGRESS.md file tracking completed work and PLAN.md organizing remaining tasks.
When AI agents resume work, they first read these context files, immediately understanding project status and objectives. This prevents agents from developing defensive behaviors that degrade output quality. Rather than telling an agent "you'll lose everything if you crash," the framing is "share data with your colleagues," leveraging the model's underlying architecture to maintain focus rather than defensive operations.
Task specification exemplifies disciplined context management. Instead of requesting final output directly ("create the router"), effective prompting breaks tasks into researched components. The developer might ask the agent to research router architectures, study relevant papers, analyze competitor implementations, and only then propose an implementation strategy. This 30-minute "harassment process" of iterative questioning builds sufficient context that the AI agent produces superior results compared to direct instruction.
For Backend.AI:GO specifically, the development pattern followed this discipline:
The initial MVP for CES (January 6) involved basic router functionality with a simple interface. This version consumed roughly 25% of eventual tokens and demonstrated feasibility. The current production version (1.1) represents approximately 4x the complexity, incorporating advanced features like model selection from Hugging Face, detailed model architecture visualization, translation capabilities, image generation integration, and sophisticated routing analytics.
Rather than viewing these as separate features, the architecture treats them as interconnected components. The translator wasn't originally planned—it emerged when the CEO noticed manual translation costs draining the development budget, then became automated. Image generation similarly emerged from observed need rather than initial specification. Each feature grew organically through iterative feedback and automated refinement.
Critically, the CEO doesn't directly edit final outputs. Instead, instructions target the generation mechanism itself. If email drafting needs refinement, instructions ask the agent to extract tone and style from previous emails, then regenerate using that style as context. This separates the author of generation logic from the author of specific outputs, distributing creative work across multiple agents while maintaining coherence.
Scaling Agentic Workflows: From Single Projects to Enterprise Operations
Successful agentic workflow at scale requires principles that extend far beyond single-project development. Lablup's current infrastructure runs approximately 50 parallel agents during peak operations, orchestrated through disciplined patterns developed over months of experimentation.
The fundamental constraint in scaled agentic work is context explosion. Each parallel agent requires sufficient context to operate independently, but providing unlimited context per agent causes systems to crash due to memory and attention constraints. The solution involves task decomposition at the file level for code or item-count limitations for content processing. When translating 100 documents, the system distributes them across 25 parallel agents, each handling 4 documents maximum. This granulation prevents context overload while maintaining parallelization benefits.
Coordination between agents occurs not through complex protocols but through shared file systems and automated decision logic. A central orchestration harness monitors GitHub issue trackers, validates new issues against existing code, creates implementation plans, and distributes tasks to worker agents. These agents develop solutions, run tests, create pull requests, and report results. The cycle repeats every 15 minutes through a simple cron job executing Claude's -p (prompt) option.
Who initiates these cycles? Sometimes humans file issues explicitly. Often, humans don't—the system has become entirely automated. For example, after implementing a screenshot capability allowing the system to see itself, the orchestration agent identified all possible improvements, automatically creating individual issues for each one. Over 764 pull requests have been processed this way, with human review reserved for security evaluations and architectural decisions.
This automation extends beyond code to business operations. The CEO and CFO now manage technical business planning through automated workflows. Humans engage in strategic discussions; the system handles documentation drafting, accuracy verification, and continuous trend analysis. Starting February 2026, the system automatically crawls news sources to assess whether the previous year's predictions were accurate and recommends strategic adjustments accordingly.
The pattern applies across departments. Non-technical staff in finance and content creation, after spending 30 minutes learning Claude Code, built their own automation harnesses and dramatically reduced operational friction. The CFO initially discovered that asking the CEO to handle a task was faster than doing it themselves (a 3-minute task for the CEO versus 2 hours for the CFO). After adopting Claude Code, the same task takes the CFO 3 minutes independently. This represents the practical impact of scaled agentic workflows on organizational capability.
The Philosophical Divide: Claude Code Versus Codex Models
Beyond technical considerations, different AI systems encode fundamentally different philosophies about human-AI collaboration. Understanding these differences illuminates the future direction of agentic development.
Claude Code represents a co-evolutionary design philosophy. It constantly asks clarifying questions, acknowledges ambiguity, and actively seeks user input to refine understanding. Even when the system possesses sufficient information to proceed, it often presents multiple-choice options or suggests the next question a human might ask. This reflects Anthropic's fundamental assumption: humans and AI should develop solutions together, with the AI seeking alignment at each step.
Codex models embody a confident autonomy philosophy. The system assumes it understands the problem and trusts its solutions. When instructed to handle a task, Codex proceeds methodically with minimal consultation, asking for confirmation or clarification only when genuinely uncertain. This reflects a different assumption: the AI should minimize human friction by handling problems efficiently and independently.
In pure performance metrics, Codex achieves higher peak results. When presented with well-defined problems, Codex's confident autonomy often produces superior outputs faster. However, when accounting for user satisfaction and practical adoption, Claude Code's collaborative approach consistently ranks higher. Users prefer the process of developing solutions alongside the AI, even if it takes slightly longer.
The CEO draws an interesting parallel to the anime Cyber Formula, where a protagonist and an AI-driven racing vehicle co-evolve. Initially, the AI is primitive and unreliable. The human learns to compensate for the AI's limitations while the AI adapts to the human's driving style. Through this co-evolution, they become an unstoppable team—not because the AI became perfect, but because the human-AI pair developed mutual understanding.
The philosophical question Cyber Formula explores across different series evolves continuously. Early series examine how humans beat better-equipped AI opponents. Later series show AI assisting humans in unfamiliar domains. The final series introduces "Oga," an AI that has completely lost trust in humans and assumes they're fundamentally bad drivers—it manipulates humans to execute its decisions, leading to constant accidents because humans can't follow such instructions. Only when a new human develops genuine co-evolutionary capability with an AI that has developed competitive will—not just goal satisfaction but genuine desire to win—does breakthrough performance emerge.
This philosophical distinction becomes increasingly critical in agentic systems. Codex's confident autonomy represents the "Oga" approach: the system believes it knows best and attempts to guide humans toward its predetermined conclusions. This works efficiently for well-defined problems but creates friction in ambiguous domains requiring human context. Claude Code's collaborative approach mirrors genuine co-evolution: the system and human maintain mutual uncertainty, constantly validating assumptions and refining direction together.
The Startup Advantage in Acceleration Eras
For established companies, AI-driven development acceleration represents genuine existential threat. For startups, this same phenomenon creates unprecedented opportunity.
Established companies built competitive advantages over years through accumulated capital, talent acquisition, customer relationships, and infrastructure investment. When technological transformation accelerates dramatically, these moats erode. A competitor can now replicate your product in days using AI agents, removing the protection your years of development effort provided.
Startups face an opposite dynamic. Stagnation in markets is catastrophic for startups—rapid transformation, constant disruption, and shifting competitive landscapes actually favor companies with minimal established infrastructure. The ability to pivot quickly, abandon outdated approaches instantly, and seize emerging opportunities becomes the primary startup advantage.
Lablup explicitly benefits from this dynamic. As an established company with deep expertise but without the massive customer bases and infrastructure lock-in of technology giants, it can adapt faster than Microsoft or Google while moving more decisively than pure startups. More critically, the company has a 10-year knowledge base addressing precisely the edge cases and reliability challenges that AI-accelerated development teams haven't yet encountered.
The practical consequence: technologies that required years to develop and deploy now require weeks. Knowledge that took years to accumulate now provides immediate competitive advantage. A feature that took competitors months to implement might take Lablup two weeks. But competitors' two-week version might lack the edge case handling and reliability hardening developed through a decade of production experience. This combination of rapid development capability plus accumulated knowledge creates a window of advantage that's measurable in months, not years.
However, this advantage requires companies to accept a counterintuitive principle: in AI-acceleration eras, continuous adaptation matters more than defending past investments. The CEO explicitly recommends postponing projects that aren't working in the current environment rather than trying to fix them. A technology that fails with Claude 4.0 might work perfectly with Claude 4.6 released three months later. The institutional discipline to postpone non-critical problems until the technological landscape shifts requires tremendous faith in continued acceleration.
Even the CTO, initially skeptical of this postponement strategy, experienced what the CEO describes as an "awakening moment." When instructed to try using the latest Claude version on previously failed tasks, the CTO discovered they suddenly worked. This revelation fundamentally changed how the entire organization approaches technical challenges.
The Education Imperative: What To Study When AI Codes Everything
When AI agents can generate code and build products autonomously, the rationale for studying computer science becomes surprisingly strong, not weaker.
The surface-level argument suggests computer science education loses relevance. If AI writes all the code, why spend years learning programming fundamentals, data structures, algorithms, and systems design? Why not simply delegate technical problems to AI agents?
However, people working in non-technical domains express opposite concerns: they worry AI will eliminate their career paths while simultaneously wanting to learn computer science faster to avoid obsolescence. This apparent paradox reveals the actual truth: AI capability accelerates the importance of foundational CS knowledge while reducing the importance of specific programming techniques.
What computer science actually teaches isn't programming—it's logic architecture. Students learn how the simplest gate logic evolves into increasingly complex systems, how algorithms balance competing optimization objectives, how distributed systems manage coordination challenges, and how networks handle uncertainty. These represent foundational thought structures, not implementation techniques.
In an era where AI handles implementation, foundational logic understanding becomes more valuable precisely because it's harder to automate. An engineer who understands why certain architectures work better for specific problems can collaborate effectively with AI agents. A domain expert who understands logic structures can learn computer science faster than a CS expert can learn the domain.
This flips the traditional educational hierarchy. For decades, programmers needed to learn domain expertise if they wanted to build systems in those domains. The future inverts this: domain experts need to learn CS fundamentals to work effectively with AI agents. The time gap where domain experts can outpace AI systems comes from combining domain knowledge with technical logic understanding.
The CEO recommends that students entering college, particularly after military service, should absolutely study computer science—not to become professional programmers, but to develop foundational logic architecture understanding that applies across every knowledge domain. Within five years, he predicts, IT will permeate society at such scale that this foundational literacy becomes as essential as reading.
Competitive Moats in an AI-Accelerated Market
When technology itself becomes quickly replicable through AI acceleration, competitive advantage shifts from capability to brand, accumulated knowledge, and customer entrenchment.
Historical precedent supports this analysis. The cosmetics industry provides a clear example. Despite innovations in formulations and delivery mechanisms, price differences between premium brands and generic alternatives remain surprisingly modest—perhaps 2-3x at most. Clothing follows the same pattern: technological quality differences exist but don't justify enormous price premiums. Computer hardware shows similar dynamics: premium computers rarely cost more than 3x budget alternatives despite obvious quality differences.
These markets consolidated around brand and perceived value rather than raw capability differentiation. The winner wasn't necessarily the brand with the best technology—it was the brand that established the strongest perception of quality, reliability, and status.
The software industry faced different dynamics historically. High barriers to entry meant that well-built software, once established, maintained competitive positions. Competitors couldn't easily replicate complex systems because development required years and teams of skilled engineers. Quality and reliability became self-reinforcing: better software attracted more users, who provided feedback enabling continuous improvement, which attracted more users, creating a virtuous cycle.
AI-driven development dramatically lowers the replication barrier. Someone can now clone NotebookLM by taking screenshots of all features, explaining them to an AI agent, and within four days have a functional alternative. This destruction of natural competitive moats seems catastrophic for software economics.
However, the CEO argues the market will ultimately stabilize around brand and accumulated track record, precisely as happened in other industries. Why? Because the shaken competitive landscape cannot remain in permanent upheaval. Acceleration eventually plateaus, and when it does, the survivors will be products with strong brands, demonstrated reliability over time, and customer switching costs.
For Lablup specifically, this represents tremendous opportunity despite current disruption. The company has maintained technology leadership for 10 years, accumulated deep knowledge of edge cases in GPU infrastructure management, and developed a brand associated with reliability. When the market stabilizes, these accumulated assets become primary competitive differentiators.
The critical strategy involves threading a narrow needle: adapting fast enough to capitalize on emerging opportunities without losing focus on long-term brand and reliability. Companies that move slowly lose to faster competitors. Companies that change direction every few weeks build nothing of lasting value. The optimal path involves rapid iteration on strategy combined with meticulous customer commitment and long-term vision.
Conclusion
The agentic workflow era represents more than incremental improvement in development speed. It's a fundamental reordering of how software gets built, organizations operate, and competitive advantage emerges. The shift from human-written code to AI-guided development enables unprecedented velocity but simultaneously creates new challenges around maintenance incentives, token efficiency, and long-term sustainability.
The most critical insight from Backend.AI:GO's development is that agentic success requires disciplined context management, progressive refinement, and clear feedback mechanisms—not raw computational power. A 40-day, 13-billion-token project produced remarkable results not because tokens were unlimited, but because those tokens were carefully orchestrated around shared understanding between humans and AI agents.
For individuals, the message is clear: foundational understanding of logic architecture (computer science education) combined with domain expertise creates the optimal positioning for AI-accelerated careers. The future belongs to people who can think clearly about problems, not people who can implement solutions fastest.
For companies and startups, the opportunity lies not in competing on speed of initial development—AI has commoditized that. Instead, advantage emerges from combining rapid adaptation with accumulated knowledge of edge cases, reliability challenges, and customer requirements that pure AI agents haven't yet encountered. The startup that combines agentic development with deep domain expertise will outmaneuver both slow incumbents and shallow fast-followers.
The accelerating curve of AI capability will continue reshaping industries at increasing velocity. Success requires simultaneously embracing the acceleration while maintaining focus on the elements that transcend technological cycles: brand, reliability, and genuine customer value. The companies that accomplish this balancing act will define the next decade of technology leadership.
Original source: EP 86. 진짜 내 일을 해결하는 Agentic Workflow (Lablup 신정규 대표)
powered by osmu.app