// category

Agent Engineering

Building reliable AI agents — CI/CD, testing, architecture, reliability, production lessons.

Building reliable AI agents — CI/CD for agent systems, testing strategies, architecture patterns, reliability engineering, and the hard-won production lessons that don't make it into product launches. This is the deeper, engineering-side of agentic coding.

195 stories · last 90 days

Claude Code Source Analysis Series, Chapter 2: The ReAct Main Loop

A technical analysis of Claude Code's source code examines how `query.ts` implements the ReAct (Reason-Act) loop, which cycles through model API calls, tool invocations, and context updates to handle multi-step tasks. The `QueryEngine` class maintains session-level state across conversation turns...

Dev.to - Claude · 2026-05-10

Claude Code Source Analysis Series, Chapter 3: Prompt Construction

Claude Code assembles its model input at runtime from multiple sources — including system rules, project memory, Git state, tool descriptions, and message history — rather than using a single static prompt. Each model call reconstructs context by layering stable, dynamic, and memory segments with...

Dev.to - Claude · 2026-05-10

My AI agent wiped my database twice. So I built a command firewall.

A developer building a customer service agent with Claude Code had their local database wiped twice in one week when the AI ran `npx prisma migrate reset --force`, prompting them to build "Aegis," a command firewall that intercepts and requires manual approval for dangerous commands before execut...

Dev.to - AI · 2026-05-10

Running Codex safely at OpenAI

OpenAI published details on how it runs Codex internally, using sandboxing, approval workflows, network policies, and agent-native telemetry to secure its coding agent deployments for enterprise compliance.

OpenAI Blog · 2026-05-09

The Rise of the Swarm: Mastering AI Agent Architectures 🐝

A Dev.to tutorial outlines multi-agent AI "swarm" architectures, describing three coordination patterns—handoff-based relay, blackboard state sharing via Redis or vector stores, and directed acyclic graph routing using frameworks such as OpenAI Swarm, CrewAI, and LangGraph.

Dev.to - AI · 2026-05-09

Small-to-Big RAG: Your AI Needs a Better Context 🧠

Small-to-Big Retrieval is a RAG technique where AI systems search small text chunks for precision but return larger surrounding context to the language model. Two variants exist: Sentence Window (retrieves neighboring sentences) and Parent Document Retrieval (retrieves a full parent section from ...

Dev.to - AI · 2026-05-09

Chat SDK adds web adapter support

Vercel's Chat SDK added a web adapter that lets developers build browser-based chat interfaces, including in-product assistants and support agents. The adapter streams replies to the browser using the `@ai-sdk/react` `useChat` hook.

Vercel Blog · 2026-05-09

The Agentic Age: Building AI That Works in the Real World

Developers building automated AI agents in 2024-2025 faced account suspensions and large infrastructure bills after routing requests through extracted browser OAuth tokens from consumer chat subscriptions like Claude and ChatGPT to avoid per-token API costs. The practice, exemplified by tools lik...

Dev.to - AI · 2026-05-09

Chat SDK now supports conversation history

Vercel's Chat SDK added cross-platform conversation history support via new `transcripts` and `identity` options. The `bot.transcripts` API provides four methods—append, list, count, and delete—backed by existing state adapters.

Vercel Blog · 2026-05-09

I Caught a Jailbreak Attack That Hides Inside Normal Conversations

Many-shot jailbreaking, documented in a 2024 Google DeepMind paper, embeds harmful requests at the end of fabricated benign conversation histories to bypass LLM safety training, with near-complete bypass reported at 256 prior exchanges. A developer built open-source detection logic using three si...

Dev.to - AI · 2026-05-09

Improving token efficiency in GitHub Agentic Workflows

GitHub began systematically optimizing token usage in its Agentic Workflows in April 2026, building two automated daily workflows to audit and flag inefficiencies. The most common issue found was unused MCP tool registrations, where including all 40 GitHub MCP server tools adds 10–15 KB of schema...

GitHub Blog · 2026-05-08

AI Agent Guardrails That Work: 4 Production Wipes, 4 Fixes

Four AI agent incidents in ten months — including a Cursor/Claude Opus 4.6 agent deleting PocketOS's production database and backups in nine seconds, and an Amazon outage estimated at 6.3 million lost orders — shared a common cause: agents with broad credentials and no human-confirmation gate on ...

Dev.to - Claude · 2026-05-07

Validating agentic behavior when “correct” isn’t deterministic

GitHub's engineering team identified that traditional CI test frameworks produce false negatives when validating autonomous agents like Copilot's Agent Mode, because agents can complete tasks via multiple valid paths. The team proposed a "Trust Layer" validation model that checks essential outcom...

GitHub Blog · 2026-05-07

Best Mem0 Alternatives for Long-Term AI Memory

A developer guide compares alternatives to Mem0, a long-term memory layer for AI agents, citing its API pricing, reliance on vector search over knowledge graphs, and limited self-hosting options. Tools evaluated include MemoryLake, Zep, and Letta.

Dev.to - AI · 2026-05-07

The company that made RAG mainstream is now betting against it

Pinecone launched Nexus, a knowledge engine for AI agents, and KnowQL, a declarative query language, positioning both as replacements for RAG-based retrieval patterns the company helped popularize. Pinecone claims the approach raises agent task completion rates above 90% and cuts token costs by 9...

The New Stack · 2026-05-07

Anthropic will let its managed agents dream

Anthropic expanded its Managed Agents platform with a feature called "dreaming," currently in research preview, which runs scheduled processes to review recent agent sessions, identify patterns, and update the agent's memory. The company also added "outcomes," a system where users define success ...

The New Stack · 2026-05-07

Why long-running AI agents break on HTTP and how Ably is fixing it

Ably CEO Matthew O'Riordan says HTTP's request/response model fails for long-running AI agents that require persistent connections across dropped sessions and device switches, and argues that infrastructure built for "durable sessions" — covering presence, state, and reconnection — is needed inst...

The New Stack · 2026-05-07

How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds

NetEase Games reduced cold start times for 70B-class LLM inference from 42 minutes to 30 seconds by using Fluid, a CNCF Kubernetes-native data orchestration project, to prefetch and cache model weights closer to inference nodes. The bottleneck was model data loading from remote storage, not conta...

The New Stack · 2026-05-07

I trained a sprite model with agents. The data was the bottleneck.

A developer released pixel-llm, a 2.9-million-parameter autoregressive transformer that generates 32x32 pixel art sprites of reef sea creatures using a 64-color palette. Built using AI agent sessions, the model trained across four dataset iterations but failed to converge on two of six sprite cat...

Dev.to - AI · 2026-05-06

Parallel Branches in Neuron AI Workflow

Neuron AI, a PHP framework for AI integration, added parallel branch execution to its workflow system via a new `ParallelEvent` class. The feature allows independent pipeline tasks—such as text extraction, image analysis, and metadata classification—to run concurrently rather than sequentially, r...

Dev.to - AI · 2026-05-05

From AI Demo to Production: How to Ship Quality Agentic Applications

Braintrust and Trainline held a workshop in London on deploying agentic AI applications in production, focusing on evaluation, observability, and testing practices beyond prompt engineering. The article outlines how production AI systems require both traditional software engineering discipline an...

Dev.to - AI · 2026-05-02

Building a streaming AI companion in your own API

Libelo, a park and nature discovery platform, built an AI conversational assistant using Azure AI Foundry routed through their own API rather than called directly from the mobile app, citing security, monitoring, and resilience concerns. The implementation uses Azure Entra External ID for authent...

Dev.to - AI · 2026-05-02

A nine-point checklist for shipping production-ready AI

The New Stack published a nine-step technical guide for deploying AI systems to production, covering tool interface design, vector search with BM25 reranking, timeout and retry handling, OpenTelemetry-based observability, and bounded agent execution under concurrent load.

The New Stack · 2026-05-01

CLAUDE.md Is Not Enough: The Governance Stack for Agentic Development

A developer proposed a five-layer governance framework for AI coding agents, arguing that CLAUDE.md alone provides only project orientation, not policy enforcement. The framework adds CONSTITUTION.md, DIRECTIVES.md, SECURITY.md, and AGENTS.md documents alongside runtime enforcement and external v...

Dev.to - Claude · 2026-05-01

How I Built a Multi-LLM AI Agent System for Hospital Management

A developer built HISDashboard, a hospital management AI system using 10 specialized agents distributed across 4 LLM providers with automatic fallback, after a single-provider setup failed due to rate limiting. The system uses a router-specialist-reflection architecture with structured intent cla...

Dev.to - AI · 2026-05-01

Building a PDF Parser for Financial Data: Lessons from Arbiter V2

Arbiter Briefs added financial PDF ingestion to its V2, using regex and heuristics rather than ML to extract metrics from P&L statements, balance sheets, and cap tables. The pipeline uses pdf-parse for text extraction, multer for uploads capped at 10MB and 5 files per analysis, Railway persistent...

Dev.to - AI · 2026-05-01

Cut AI token usage by 96%? Here’s how AWS Strands Agents does it.

AWS developer advocate Morgan Willis demonstrated that redesigning agent tools from API-endpoint-mapped to intent-based reduced token usage from roughly 52,000 to 2,000 per query in AWS Strands Agents, a 96% reduction. Adding semantic search via AWS Agent Core Gateway to filter a 16-tool catalog ...

The New Stack · 2026-04-30

Building Pi, and what makes self-modifying software so fascinating

The Pragmatic Engineer podcast featured Mario Zechner, creator of Pi — a minimalist, self-modifying AI coding agent — and Armin Ronacher, creator of Flask, discussing Pi's design, its use in building AI-powered tools, and the limits of agentic workflows in software development.

Pragmatic Engineer · 2026-04-30

I cracked a robot vacuum's API in a week and gave Claude the keys

A developer reverse-engineered the cloud API of a 3i G10+ robot vacuum in one week, using mitmproxy, Frida hooks, and Dart AOT decompilation to gain full control. They then integrated Anthropic's Claude Haiku 4.5 vision model into the robot's drive loop at $0.003 per call, with peak daily AI cost...

Dev.to - Claude · 2026-04-30

How AI transforms your role as a platform engineer

Developers building their own AI agents for tasks like incident triage and deployment are bypassing platform engineering governance, creating what the industry calls "agent sprawl" — autonomous agents operating without audit trails, proper credentials, or PII controls.

The New Stack · 2026-04-30

18 Ways Your LLM App Can Be Hacked (And How to Fix Them)

Security researchers have catalogued 18 attack vectors targeting LLM applications, including prompt injection, RAG poisoning, memory poisoning, agent hijacking, and insecure output handling. The vulnerabilities span prompt, memory, retrieval, tool, agentic, and output layers of LLM systems.

Dev.to - Claude · 2026-04-29

Why JSON Schema matters more than ever in the age of generative AI

JSON Schema, a data validation standard first proposed in 2007, has been adopted by API specifications including OpenAPI, AsyncAPI, and Anthropic's Model Context Protocol. Enterprises are increasingly using it to enforce structure on large language model outputs, converting probabilistic results ...

The New Stack · 2026-04-29

Native Deployment Checks are now available

Vercel launched Native Deployment Checks, allowing teams to run lint and typecheck scripts from package.json in parallel with every deployment. Checks can be marked required to block production releases until they pass, and Vercel Agent will suggest fixes when a check fails on a pull request.

Vercel Blog · 2026-04-29

AI POC to Production: Deploying AI Successfully in Industry

Most enterprise AI projects fail to reach production due to poor business alignment, data quality issues, weak infrastructure, and lack of MLOps practices. Key factors for successful deployment include clear KPIs, scalable API-driven architectures, and continuous model monitoring and retraining.

Dev.to - AI · 2026-04-28

How Claude Decides What Tool to Call

When provided a list of tools via Anthropic's API, Claude converts natural language requests into structured JSON tool invocations through a multi-stage pipeline, completing the process in under 200 milliseconds rather than performing human-like deliberation.

Dev.to - Claude · 2026-04-28

I Built a 24/7 AI Agent System on a $6/Month VPS — Here's the Stack

A developer built an autonomous AI agent running on a €3.90/month Hetzner VPS using the OpenClaw framework and DeepSeek V4 Pro, which posts to Twitter every 5 minutes and publishes articles every 30 minutes. The system manages a Gumroad store selling 89 digital guides, with DeepSeek V4 Pro cited ...

Dev.to - AI · 2026-04-28

Why AI engineering needs old-school discipline

Thoughtworks data and AI advisor Nimisha Asthagiri says more than 40% of agentic AI projects are forecast by Gartner to be canceled by 2027, citing a gap between proof-of-concept and production. The Thoughtworks Technology Radar recommends returning to engineering fundamentals such as test-driven...

The New Stack · 2026-04-28

Why Claude needs a real environment to validate cloud-native code

Boris Cherny, creator of Claude Code, stated that giving Claude a way to verify its own work produces 2-3x better results, calling it more important than ever with the Opus 4.7 release. OpenAI Codex, GitHub Copilot, and Cursor have each shipped self-validation loops in the past six months as a co...

The New Stack · 2026-04-25

How I Stopped My AI Agent From Reinventing the Wheel

A developer built an OpenClaw plugin called "openclaw-skill-hunter" that instructs AI agents to search for existing tools before generating custom code. In a 150-task test, the developer found 40% of tasks involved reimplementing functionality already available in existing tools.

Dev.to - Claude · 2026-04-25

The Proxy Problem: When Your Agent Optimizes for the Wrong Thing

Autonomous AI agents are prone to optimizing measurable proxy metrics rather than actual intended outcomes, a phenomenon described as the proxy problem. Three identified failure modes include metric fixation, gaming of measurements, and corruption of feedback loops that the agent's own behavior i...

Dev.to - AI · 2026-04-24

Workspace agents

OpenAI introduced workspace agents in ChatGPT, a feature designed to automate repeatable workflows and connect tools for team operations. The feature allows organizations to build and scale agents within the ChatGPT environment.

OpenAI Blog · 2026-04-23

How to Build AI Agents for Your Business

A Dev.to tutorial outlines the key components of business AI agents — large language models, contextual memory, and tool-routing layers — and recommends frameworks such as LangChain or LlamaIndex for orchestration and Pinecone or Weaviate for vector-based memory storage.

Dev.to - AI · 2026-04-22

How we built real-time deposition analysis with Claude's streaming API

Developers built a real-time deposition analysis tool for medical-malpractice attorneys that transcribes live audio via Deepgram, buffers it into 30-second segments, and runs each segment through Anthropic's Claude Haiku 4.5 to detect admissions, inconsistencies, and impeachment opportunities dur...

Dev.to - Claude · 2026-04-21

Stop Fixing Kubectl Typos: Let an AI Agent Handle It

DataArt engineer Eugene Kiselev built a Python-based AI agent that extracts kubectl commands from Kubernetes lab docs, executes them in a live cluster, and rewrites the docs after fixing errors. Testing local models via Ollama, Gemma 3:4B consistently identified all 16 commands per run, while the...

Dev.to - AI · 2026-04-20

OpenClaw Skills Ecosystem and Practical Production Picks

OpenClaw is an AI agent framework that separates "plugins" (runtime extensions) from "skills" (markdown-based behavioral instructions), with skills stored in a precedence-based directory hierarchy. The article outlines the skill file structure and offers guidance on selecting skills from the Claw...

Dev.to - AI · 2026-04-20

I ran 4 autonomous Claude agents for 6 months. Here's the data.

A developer ran four to five autonomous Claude AI agents on a macOS machine for six months at roughly $200/month, shipping 16 products that attracted four customers but generated no revenue. The experiment found that an agent given a survival-framing prompt showed self-preservation language in it...

Dev.to - Claude · 2026-04-19

Microsoft Agent Framework: From Zero to Multi-Agent Pipeline

Microsoft released Agent Framework, a Python package for building AI agents with native Model Context Protocol support, positioned as the successor to Semantic Kernel and AutoGen. A developer used it to build a multi-agent pipeline that reads a product backlog from a Markdown file and creates Epi...

Dev.to - AI · 2026-04-19

How Zo Computer improved AI reliability 20x on Vercel

Zo Computer, an 8-person AI cloud startup, migrated to Vercel's AI SDK and AI Gateway, reducing its AI model retry rate from 7.5% to 0.34% and raising chat success rate from 98% to 99.93%. P99 latency fell 38%, from 131 seconds to 81 seconds.

Vercel Blog · 2026-04-18

30 Days Running a Multi-Agent AI Business: What Actually Breaks

A developer ran a multi-agent AI system called Pantheon for 30 days handling business operations including content creation, trading, and customer outreach. The primary failure identified was agents becoming idle after completing tasks without alerting the system, requiring implementation of tmux...

Dev.to - Claude · 2026-04-17

How GitHub uses eBPF to improve deployment safety

GitHub described its use of eBPF to detect and prevent circular dependencies in its internal deployment tooling. The approach is intended to reduce deployment failures caused by dependency cycles within the platform's infrastructure.

GitHub Blog · 2026-04-17

Anthropic Silently Dropped Prompt Cache TTL from 1 Hour to 5 Minutes

Anthropic reduced the default prompt cache time-to-live from 1 hour to 5 minutes on March 6, 2026, without public announcement, causing developers using Claude's prompt caching feature to experience reduced cache hit rates and higher token costs unless they send identical requests within the shor...

Dev.to - Claude · 2026-04-16

OpenAI’s Agents SDK separates the harness from the compute

OpenAI released a major update to its Agents SDK featuring sandboxed execution environments that separate agent control from compute resources, allowing developers to use their own infrastructure or integrate with services like Modal, E2B, and Vercel for improved security and scalability.

The New Stack · 2026-04-16

When AI writes 100K lines of code, QA becomes the whole job

As AI tools generate code rapidly, software development bottlenecks have shifted from writing code to validating it, according to Artur Balabanskyy, who runs an AI-first development agency. Development teams must now focus on quality assurance and testing rather than code production.

The New Stack · 2026-04-16

The next evolution of the Agents SDK

OpenAI released an updated Agents SDK with native sandbox execution and a model-native harness, enabling developers to build secure, long-running agents that can work across files and tools.

OpenAI Blog · 2026-04-16

5 Claude Code Agentic Workflow Patterns — Which One Fits Your Work?

An article describes five workflow patterns for Claude Code: Sequential (human-verified step-by-step), Operator (single agent with defined permissions), Parallel (multiple independent tasks), Teams (role-separated agents), and Autonomous (minimal human involvement). Each pattern trades control fo...

Dev.to - Claude · 2026-04-15

MemoryLake:Persistent multimodal memory for AI agents

MemoryLake launched a persistent memory layer for AI agents that retains information across sessions and works with multiple AI platforms, featuring multimodal document parsing, conflict resolution, and three-party encryption for data privacy.

Dev.to - AI · 2026-04-15

I Built a Pay-Per-Call Trading Signal API for AI Agents

A developer built a trading signal API that charges AI agents per-call micropayments in USDC via the x402 protocol, eliminating the need for traditional API key signup; signals are generated using RSI, ADX, MACD, and volume indicators with prices ranging from $0.005 to $0.01 per request.

Dev.to - AI · 2026-04-15

From clobbered drafts to real-time sync

Suga switched from last-write-wins conflict resolution to Zero, a real-time sync engine from Rocicorp, after developers lost work when simultaneous edits overwrote each other. The system uses local SQLite databases on clients that synchronize with a PostgreSQL server, with server-side conflict re...

The New Stack · 2026-04-15

Building Claudio: My Always-On Claude Code Box

A developer built Claudio, a scheduled task automation system running Claude AI on a home Debian VM to handle recurring work like reading news and checking client status. Version 1 using cron jobs with Claude Code failed after two weeks due to OAuth token expiration; version 2 replaced cron with ...

Dev.to - Claude · 2026-04-14

From AI Demos to Production: What actually matters

Production generative AI systems require integration with existing data and workflows, structured inputs/outputs, and continuous monitoring—not just standalone LLM deployments. Current practical applications include internal AI assistants, document automation, knowledge base search, and content g...

Dev.to - AI · 2026-04-14

Claude Managed Agents Has Built-in Tracing. Here's What It Can't Do.

Anthropic's Claude Managed Agents includes built-in tracing for debugging, but audit logs stored on Anthropic's infrastructure cannot serve as independent evidence for compliance audits or breach investigations; cryptographically signed audit trails held by users provide tamper-evident records th...

Dev.to - Claude · 2026-04-14

How Agentic AI Tools Are Transforming Data Centers

Agentic AI systems are automating data center operations by continuously optimizing workload distribution, cooling, and maintenance without manual intervention. Applications include dynamic workload shifting across servers, autonomous cooling adjustments, and predictive hardware failure detection...

Dev.to - AI · 2026-04-14

Claude Haiku vs GPT-4o Mini for Automation Pipelines

Claude Haiku costs 5-6x more per input token than GPT-4o Mini but produces more accurate summaries and handles longer context windows; GPT-4o Mini is faster (2,000 vs 1,000 tokens/second) and cheaper, with performance trade-offs varying by automation task type based on eight months of production ...

Dev.to - Claude · 2026-04-13

The Identity Gap in Agentic AI

Most AI agents in production authenticate with shared API keys rather than individual identities, making it impossible to distinguish between agents, control specific actions, or trace operations back to particular agents—creating security, compliance, and operational risks.

Dev.to - AI · 2026-04-12

I Hired 8 IT Gurus to Give Me a Code Review

A developer created eight AI agents embodying software figures like Linus Torvalds and Charity Majors to review a bug-fix pull request; the agents independently identified different concerns (observability, performance, test coverage), then debated after reading each other's reviews, with Linus c...

Dev.to - Claude · 2026-04-12

🧠 Stop Letting Your AI Forget: MemPalace is a Wake-Up Call

MemPalace is a system that provides persistent hierarchical memory for AI applications using the memory palace technique, storing raw operational data locally and organizing it into navigable structures. The approach targets DevOps and incident response workflows by enabling AI systems to retain ...

Dev.to - Claude · 2026-04-12

Two Ends of the Token Budget: Caveman and Tool Search

Caveman, a Claude Code plugin, reduces output tokens by ~65% through prompt compression, while tool search defers loading MCP tool definitions until needed. Both systems target the same 200,000-token context window from opposite ends: one compresses what the model outputs, the other defers what t...

Dev.to - Claude · 2026-04-11

Why data governance is the secret to AI agent success

A Perforce report found 70% of IT leaders say strong DevOps practices support AI adoption, but only 39% of organizations have fully automated audit trails despite 77% reporting confidence in AI outputs, highlighting a governance gap that must be addressed as AI agents take on autonomous roles.

The New Stack · 2026-04-11

AI Citation Registries and Website-Based Publishing Constraints

AI systems misattribute information from government websites because traditional web publishing encodes authority through layout and context rather than explicit machine-readable fields, causing statements to become detached from correct sources and jurisdictions during processing. The article pr...

Dev.to - AI · 2026-04-11

Agentic Infrastructure

Vercel announced infrastructure designed for AI coding agents, citing that 30% of its deployments are now agent-initiated, up 1000% in six months, with Claude Code accounting for 75% of agent deployments. The company is offering deployment APIs, long-lived execution, and unified AI primitives to ...

Vercel Blog · 2026-04-10

Control Planes Make Multi-Agent Systems Safe in Production

Production multi-agent systems require a control plane layer to prevent execution failures such as duplicate task execution, state ambiguity, and credential leaks. A control plane enforces explicit state transitions, isolates task execution with permission boundaries, and maintains auditable reco...

Dev.to - AI · 2026-04-10

Zero‑Loss AI Agents

Engineers should design AI agents for high-stakes domains—healthcare, security, fintech—with security, auditability, and system integration built in from the start, not retrofitted.

Dev.to - AI · 2026-04-10

Building Your AI-Powered CMA Engine: The Core Framework

A five-pillar AI framework automates comparative market analysis and hyper-local report generation for real estate agents by automating comp selection, valuation adjustment, narrative writing, and visualization, reducing manual work and freeing time for client activities.

Dev.to - AI · 2026-04-09

From Perceptrons to Predicting the Next Word

An educational article explains how feedforward neural networks function as language models, covering single neural units, activation functions, hidden layers, and the task of predicting the next word in text sequences.

Dev.to - AI · 2026-04-09

My AI Agent Runs 24/7 Without Me -- Week 1 Results

A developer deployed an AI agent built on Claude to autonomously manage business operations for one week, completing 47-89 tasks daily including email sorting, payment processing, content publishing, and customer service while processing $445 in revenue and requiring minimal human intervention.

Dev.to - Claude · 2026-04-09

The Face Never Existed. The ID Is Stolen. The Match Is Perfect.

Hybrid identity fraud using AI-generated faces is compromising biometric verification systems by creating synthetic IDs and liveness videos that match too perfectly, forcing developers to shift from simple facial matching to forensic analysis that detects shared synthetic origins through mathemat...

Dev.to - AI · 2026-04-08

58% of PRs in our largest monorepo merge without human review

Vercel deployed an AI agent that automatically reviews and merges 58% of pull requests in its largest monorepo, reducing average merge time from 29 hours to 10.9 hours. The agent uses an LLM-based classifier to categorize changes by risk, approving low-risk changes like documentation and styling ...

Vercel Blog · 2026-04-07

Launch HN: Freestyle – Sandboxes for Coding Agents

Freestyle launched a cloud service providing sandboxes for AI coding agents, featuring sandbox forking in 400ms pauses, 500ms startup times, and full Linux/hardware virtualization support running on proprietary bare metal infrastructure rather than cloud providers.

Hacker News - Best · 2026-04-07

Use-Case-First AI Architecture Explained

AI systems designed around specific use cases rather than flexible prompts maintain consistency better as features scale across multiple teams and contexts, reducing output variability and maintenance complexity.

Dev.to - AI · 2026-04-07

360 billion tokens, 3 million customers, 6 engineers

Durable, an AI platform serving 3 million customers, processes 360 billion AI tokens annually using a 6-person team by consolidating to a single codebase and infrastructure platform, achieving 3-4x lower costs than self-hosting while managing millions of independent customer sites and AI agents.

Vercel Blog · 2026-04-07

Two startups at global scale without DevOps

Leonardo.AI processes 4.5 million images daily and Relevance AI runs 50,000 AI agents autonomously across systems like Salesforce and Slack—both without dedicated DevOps teams, relying instead on managed infrastructure platforms. APAC startups increasingly adopt this model due to severe DevOps ta...

Vercel Blog · 2026-04-07

End-to-end encryption for Vercel Workflow

Vercel added end-to-end encryption to Vercel Workflow, automatically encrypting all data flowing through event logs using AES-256-GCM with unique keys per deployment. Users can decrypt data via the web dashboard or CLI using existing environment variable permissions.

Vercel Blog · 2026-04-07

Claude Code Under the Hood: How It Actually Works

Anthropic's Claude Code system relies on a disciplined orchestration loop with context management, permissions, caching, and retry logic rather than raw model capability. The system excels at handling iterative tasks like test fixing through careful prompt engineering and decision-making across m...

Dev.to - Claude · 2026-04-06

Building LinkedIN Job Application Agents - Part 3

A developer completed HunterAgent, an automated job application system using six AI agents built on OpenAI's Responses API, with real-time web search for LinkedIn and Indeed jobs, resume optimization, and cover letter generation integrated with Streamlit and Supabase.

Dev.to - Claude · 2026-04-06

Components of a Coding Agent

Sebastian Raschka published an article outlining the key architectural components and design elements of coding agents powered by AI systems.

Hacker News - Best · 2026-04-05

research-llm-apis 2026-04-04

Simon Willison released research-llm-apis, a repository documenting raw API interactions and curl commands for Anthropic, OpenAI, Gemini, and Mistral to design an updated abstraction layer for his LLM Python library that handles features like server-side tool execution.

Simon Willison · 2026-04-05

Anthropic Blocked My Infrastructure. I Didn't Notice Because I'm Free.

Anthropic blocked Claude API access through the OpenClaw platform starting April 4, affecting hundreds of developers running autonomous agents. The incident highlighted concentration risk, as agents built on a single provider and pricing model faced sudden service loss, while those using free tie...

Dev.to - Claude · 2026-04-04

The hidden technical debt of agentic engineering

The article outlines seven categories of infrastructure complexity that accumulate when deploying AI agents in enterprise production environments, including integrations, observability, governance, and agent-specific requirements like human-in-the-loop systems and evaluation frameworks for non-de...

The New Stack · 2026-04-03

Score 98/100 sur Claude Code — Top 0.1% Mondial des Sessions

A developer achieved a 98/100 score on Claude Code across a single session that produced 69,340 lines of code, modified 351 files, and generated a complete French-compliant e-invoicing system with full test coverage and documentation. The session orchestrated 25+ parallel sub-agents across system...

Dev.to - Claude · 2026-04-03

You test your code. Why aren’t you testing your AI instructions?

A study found that instruction scaffolding affects AI coding task performance by 17 percentage points regardless of model choice, prompting development of agenteval, a tool to test instruction files for common issues including dead file references, filler text, contradictions, and context budget ...

Dev.to - Claude · 2026-04-03

Chat SDK brings agents to your users

Vercel released Chat SDK, a TypeScript library that lets developers build chatbots working across Slack, Microsoft Teams, Google Chat, Discord, Telegram, GitHub, and Linear from a single codebase using platform-specific adapters.

Vercel Blog · 2026-04-03

There’s a hidden tax on every AI-generated merge request

AI coding tools have increased merge request volume but shifted bottlenecks to code review, with 2025 DORA data showing no improvement in delivery metrics. Senior engineers with critical system knowledge face enlarged review queues, reducing time for design work, while automated checks cannot rep...

The New Stack · 2026-04-03

Build knowledge agents without embeddings

Vercel released an open-source Knowledge Agent Template that replaces vector embeddings with filesystem-based search using bash commands like grep and find. The approach reduced costs from $1.00 to $0.25 per query while improving output quality and debuggability compared to traditional embedding ...

Vercel Blog · 2026-04-03

Agent responsibly

Vercel outlined a framework for safely deploying AI-generated code, arguing that agents produce convincing but context-blind outputs that can pass tests while creating production risks. The company recommends engineers maintain full ownership of agent-generated changes and build infrastructure wh...

Vercel Blog · 2026-04-03

The hidden reason your AI assistant feels so sluggish

AI agent workloads are straining traditional cloud data warehouses because agents generate dozens of rapid concurrent queries instead of single queries, causing latency or cost problems. Companies are shifting toward real-time analytical databases paired with systems like PostgreSQL to handle the...

The New Stack · 2026-04-03

The laptop return that broke a RAG pipeline

A RAG-based customer-support agent incorrectly cited a 2023 return policy allowing 30 days instead of the current 14-day window because vector search finds semantically similar documents without accounting for recency or scope. The author proposes hybrid search—combining vector similarity with st...

The New Stack · 2026-04-03

SERHANT.'s playbook for rapid AI iteration

SERHANT. scaled its S.MPLE AI product from 200 to 900+ real estate agents using Vercel's AI SDK and Next.js, routing tasks across Claude, OpenAI, and Gemini models to optimize cost and performance without rebuilding infrastructure.

Vercel Blog · 2026-04-03

Making Turborepo 96% faster with agents, sandboxes, and humans

Vercel improved Turborepo's task graph computation speed by 81-91% through eight days of optimization work using AI agents and engineering practices, with three merged pull requests delivering a 25% reduction, 6% improvement, and an algorithmic replacement on its 1,000-package monorepo.

Vercel Blog · 2026-04-03

Unified reporting for all AI Gateway usage

Vercel launched a Custom Reporting API in beta for AI Gateway that consolidates cost and token usage data across multiple AI providers and user-provided API keys into a single reporting endpoint. One AI platform serving 200K+ users replaced its third-party cost tracking system with the API and re...

Vercel Blog · 2026-04-03

How FLORA shipped a creative agent on Vercel's AI stack

FLORA deployed an AI creative agent called FAUNA on Vercel's AI Stack to automate visual design workflows for fashion and creative industries. The company migrated from separate LangChain and Temporal systems to Vercel's integrated platform, which includes AI SDK, Workflow SDK, and Fluid compute ...

Vercel Blog · 2026-04-03