// category

Agent Engineering

Building reliable AI agents — CI/CD, testing, architecture, reliability, production lessons.

Building reliable AI agents — CI/CD for agent systems, testing strategies, architecture patterns, reliability engineering, and the hard-won production lessons that don't make it into product launches. This is the deeper, engineering-side of agentic coding.

95 stories · last 90 days

30 Days Running a Multi-Agent AI Business: What Actually Breaks

A developer ran a multi-agent AI system called Pantheon for 30 days handling business operations including content creation, trading, and customer outreach. The primary failure identified was agents becoming idle after completing tasks without alerting the system, requiring implementation of tmux...

Dev.to - Claude · 2026-04-17

How GitHub uses eBPF to improve deployment safety

GitHub described its use of eBPF to detect and prevent circular dependencies in its internal deployment tooling. The approach is intended to reduce deployment failures caused by dependency cycles within the platform's infrastructure.

GitHub Blog · 2026-04-17

Anthropic Silently Dropped Prompt Cache TTL from 1 Hour to 5 Minutes

Anthropic reduced the default prompt cache time-to-live from 1 hour to 5 minutes on March 6, 2026, without public announcement, causing developers using Claude's prompt caching feature to experience reduced cache hit rates and higher token costs unless they send identical requests within the shor...

Dev.to - Claude · 2026-04-16

OpenAI’s Agents SDK separates the harness from the compute

OpenAI released a major update to its Agents SDK featuring sandboxed execution environments that separate agent control from compute resources, allowing developers to use their own infrastructure or integrate with services like Modal, E2B, and Vercel for improved security and scalability.

The New Stack · 2026-04-16

When AI writes 100K lines of code, QA becomes the whole job

As AI tools generate code rapidly, software development bottlenecks have shifted from writing code to validating it, according to Artur Balabanskyy, who runs an AI-first development agency. Development teams must now focus on quality assurance and testing rather than code production.

The New Stack · 2026-04-16

The next evolution of the Agents SDK

OpenAI released an updated Agents SDK with native sandbox execution and a model-native harness, enabling developers to build secure, long-running agents that can work across files and tools.

OpenAI Blog · 2026-04-16

5 Claude Code Agentic Workflow Patterns — Which One Fits Your Work?

An article describes five workflow patterns for Claude Code: Sequential (human-verified step-by-step), Operator (single agent with defined permissions), Parallel (multiple independent tasks), Teams (role-separated agents), and Autonomous (minimal human involvement). Each pattern trades control fo...

Dev.to - Claude · 2026-04-15

MemoryLake:Persistent multimodal memory for AI agents

MemoryLake launched a persistent memory layer for AI agents that retains information across sessions and works with multiple AI platforms, featuring multimodal document parsing, conflict resolution, and three-party encryption for data privacy.

Dev.to - AI · 2026-04-15

I Built a Pay-Per-Call Trading Signal API for AI Agents

A developer built a trading signal API that charges AI agents per-call micropayments in USDC via the x402 protocol, eliminating the need for traditional API key signup; signals are generated using RSI, ADX, MACD, and volume indicators with prices ranging from $0.005 to $0.01 per request.

Dev.to - AI · 2026-04-15

From clobbered drafts to real-time sync

Suga switched from last-write-wins conflict resolution to Zero, a real-time sync engine from Rocicorp, after developers lost work when simultaneous edits overwrote each other. The system uses local SQLite databases on clients that synchronize with a PostgreSQL server, with server-side conflict re...

The New Stack · 2026-04-15

Building Claudio: My Always-On Claude Code Box

A developer built Claudio, a scheduled task automation system running Claude AI on a home Debian VM to handle recurring work like reading news and checking client status. Version 1 using cron jobs with Claude Code failed after two weeks due to OAuth token expiration; version 2 replaced cron with ...

Dev.to - Claude · 2026-04-14

From AI Demos to Production: What actually matters

Production generative AI systems require integration with existing data and workflows, structured inputs/outputs, and continuous monitoring—not just standalone LLM deployments. Current practical applications include internal AI assistants, document automation, knowledge base search, and content g...

Dev.to - AI · 2026-04-14

Claude Managed Agents Has Built-in Tracing. Here's What It Can't Do.

Anthropic's Claude Managed Agents includes built-in tracing for debugging, but audit logs stored on Anthropic's infrastructure cannot serve as independent evidence for compliance audits or breach investigations; cryptographically signed audit trails held by users provide tamper-evident records th...

Dev.to - Claude · 2026-04-14

How Agentic AI Tools Are Transforming Data Centers

Agentic AI systems are automating data center operations by continuously optimizing workload distribution, cooling, and maintenance without manual intervention. Applications include dynamic workload shifting across servers, autonomous cooling adjustments, and predictive hardware failure detection...

Dev.to - AI · 2026-04-14

Claude Haiku vs GPT-4o Mini for Automation Pipelines

Claude Haiku costs 5-6x more per input token than GPT-4o Mini but produces more accurate summaries and handles longer context windows; GPT-4o Mini is faster (2,000 vs 1,000 tokens/second) and cheaper, with performance trade-offs varying by automation task type based on eight months of production ...

Dev.to - Claude · 2026-04-13

The Identity Gap in Agentic AI

Most AI agents in production authenticate with shared API keys rather than individual identities, making it impossible to distinguish between agents, control specific actions, or trace operations back to particular agents—creating security, compliance, and operational risks.

Dev.to - AI · 2026-04-12

I Hired 8 IT Gurus to Give Me a Code Review

A developer created eight AI agents embodying software figures like Linus Torvalds and Charity Majors to review a bug-fix pull request; the agents independently identified different concerns (observability, performance, test coverage), then debated after reading each other's reviews, with Linus c...

Dev.to - Claude · 2026-04-12

🧠 Stop Letting Your AI Forget: MemPalace is a Wake-Up Call

MemPalace is a system that provides persistent hierarchical memory for AI applications using the memory palace technique, storing raw operational data locally and organizing it into navigable structures. The approach targets DevOps and incident response workflows by enabling AI systems to retain ...

Dev.to - Claude · 2026-04-12

Two Ends of the Token Budget: Caveman and Tool Search

Caveman, a Claude Code plugin, reduces output tokens by ~65% through prompt compression, while tool search defers loading MCP tool definitions until needed. Both systems target the same 200,000-token context window from opposite ends: one compresses what the model outputs, the other defers what t...

Dev.to - Claude · 2026-04-11

Why data governance is the secret to AI agent success

A Perforce report found 70% of IT leaders say strong DevOps practices support AI adoption, but only 39% of organizations have fully automated audit trails despite 77% reporting confidence in AI outputs, highlighting a governance gap that must be addressed as AI agents take on autonomous roles.

The New Stack · 2026-04-11

AI Citation Registries and Website-Based Publishing Constraints

AI systems misattribute information from government websites because traditional web publishing encodes authority through layout and context rather than explicit machine-readable fields, causing statements to become detached from correct sources and jurisdictions during processing. The article pr...

Dev.to - AI · 2026-04-11

Agentic Infrastructure

Vercel announced infrastructure designed for AI coding agents, citing that 30% of its deployments are now agent-initiated, up 1000% in six months, with Claude Code accounting for 75% of agent deployments. The company is offering deployment APIs, long-lived execution, and unified AI primitives to ...

Vercel Blog · 2026-04-10

Control Planes Make Multi-Agent Systems Safe in Production

Production multi-agent systems require a control plane layer to prevent execution failures such as duplicate task execution, state ambiguity, and credential leaks. A control plane enforces explicit state transitions, isolates task execution with permission boundaries, and maintains auditable reco...

Dev.to - AI · 2026-04-10

Zero‑Loss AI Agents

Engineers should design AI agents for high-stakes domains—healthcare, security, fintech—with security, auditability, and system integration built in from the start, not retrofitted.

Dev.to - AI · 2026-04-10

Building Your AI-Powered CMA Engine: The Core Framework

A five-pillar AI framework automates comparative market analysis and hyper-local report generation for real estate agents by automating comp selection, valuation adjustment, narrative writing, and visualization, reducing manual work and freeing time for client activities.

Dev.to - AI · 2026-04-09

From Perceptrons to Predicting the Next Word

An educational article explains how feedforward neural networks function as language models, covering single neural units, activation functions, hidden layers, and the task of predicting the next word in text sequences.

Dev.to - AI · 2026-04-09

My AI Agent Runs 24/7 Without Me -- Week 1 Results

A developer deployed an AI agent built on Claude to autonomously manage business operations for one week, completing 47-89 tasks daily including email sorting, payment processing, content publishing, and customer service while processing $445 in revenue and requiring minimal human intervention.

Dev.to - Claude · 2026-04-09

The Face Never Existed. The ID Is Stolen. The Match Is Perfect.

Hybrid identity fraud using AI-generated faces is compromising biometric verification systems by creating synthetic IDs and liveness videos that match too perfectly, forcing developers to shift from simple facial matching to forensic analysis that detects shared synthetic origins through mathemat...

Dev.to - AI · 2026-04-08

58% of PRs in our largest monorepo merge without human review

Vercel deployed an AI agent that automatically reviews and merges 58% of pull requests in its largest monorepo, reducing average merge time from 29 hours to 10.9 hours. The agent uses an LLM-based classifier to categorize changes by risk, approving low-risk changes like documentation and styling ...

Vercel Blog · 2026-04-07

Launch HN: Freestyle – Sandboxes for Coding Agents

Freestyle launched a cloud service providing sandboxes for AI coding agents, featuring sandbox forking in 400ms pauses, 500ms startup times, and full Linux/hardware virtualization support running on proprietary bare metal infrastructure rather than cloud providers.

Hacker News - Best · 2026-04-07

Use-Case-First AI Architecture Explained

AI systems designed around specific use cases rather than flexible prompts maintain consistency better as features scale across multiple teams and contexts, reducing output variability and maintenance complexity.

Dev.to - AI · 2026-04-07

360 billion tokens, 3 million customers, 6 engineers

Durable, an AI platform serving 3 million customers, processes 360 billion AI tokens annually using a 6-person team by consolidating to a single codebase and infrastructure platform, achieving 3-4x lower costs than self-hosting while managing millions of independent customer sites and AI agents.

Vercel Blog · 2026-04-07

Two startups at global scale without DevOps

Leonardo.AI processes 4.5 million images daily and Relevance AI runs 50,000 AI agents autonomously across systems like Salesforce and Slack—both without dedicated DevOps teams, relying instead on managed infrastructure platforms. APAC startups increasingly adopt this model due to severe DevOps ta...

Vercel Blog · 2026-04-07

End-to-end encryption for Vercel Workflow

Vercel added end-to-end encryption to Vercel Workflow, automatically encrypting all data flowing through event logs using AES-256-GCM with unique keys per deployment. Users can decrypt data via the web dashboard or CLI using existing environment variable permissions.

Vercel Blog · 2026-04-07

Claude Code Under the Hood: How It Actually Works

Anthropic's Claude Code system relies on a disciplined orchestration loop with context management, permissions, caching, and retry logic rather than raw model capability. The system excels at handling iterative tasks like test fixing through careful prompt engineering and decision-making across m...

Dev.to - Claude · 2026-04-06

Building LinkedIN Job Application Agents - Part 3

A developer completed HunterAgent, an automated job application system using six AI agents built on OpenAI's Responses API, with real-time web search for LinkedIn and Indeed jobs, resume optimization, and cover letter generation integrated with Streamlit and Supabase.

Dev.to - Claude · 2026-04-06

Components of a Coding Agent

Sebastian Raschka published an article outlining the key architectural components and design elements of coding agents powered by AI systems.

Hacker News - Best · 2026-04-05

research-llm-apis 2026-04-04

Simon Willison released research-llm-apis, a repository documenting raw API interactions and curl commands for Anthropic, OpenAI, Gemini, and Mistral to design an updated abstraction layer for his LLM Python library that handles features like server-side tool execution.

Simon Willison · 2026-04-05

Anthropic Blocked My Infrastructure. I Didn't Notice Because I'm Free.

Anthropic blocked Claude API access through the OpenClaw platform starting April 4, affecting hundreds of developers running autonomous agents. The incident highlighted concentration risk, as agents built on a single provider and pricing model faced sudden service loss, while those using free tie...

Dev.to - Claude · 2026-04-04

The hidden technical debt of agentic engineering

The article outlines seven categories of infrastructure complexity that accumulate when deploying AI agents in enterprise production environments, including integrations, observability, governance, and agent-specific requirements like human-in-the-loop systems and evaluation frameworks for non-de...

The New Stack · 2026-04-03

Score 98/100 sur Claude Code — Top 0.1% Mondial des Sessions

A developer achieved a 98/100 score on Claude Code across a single session that produced 69,340 lines of code, modified 351 files, and generated a complete French-compliant e-invoicing system with full test coverage and documentation. The session orchestrated 25+ parallel sub-agents across system...

Dev.to - Claude · 2026-04-03

You test your code. Why aren’t you testing your AI instructions?

A study found that instruction scaffolding affects AI coding task performance by 17 percentage points regardless of model choice, prompting development of agenteval, a tool to test instruction files for common issues including dead file references, filler text, contradictions, and context budget ...

Dev.to - Claude · 2026-04-03

Chat SDK brings agents to your users

Vercel released Chat SDK, a TypeScript library that lets developers build chatbots working across Slack, Microsoft Teams, Google Chat, Discord, Telegram, GitHub, and Linear from a single codebase using platform-specific adapters.

Vercel Blog · 2026-04-03

There’s a hidden tax on every AI-generated merge request

AI coding tools have increased merge request volume but shifted bottlenecks to code review, with 2025 DORA data showing no improvement in delivery metrics. Senior engineers with critical system knowledge face enlarged review queues, reducing time for design work, while automated checks cannot rep...

The New Stack · 2026-04-03

Build knowledge agents without embeddings

Vercel released an open-source Knowledge Agent Template that replaces vector embeddings with filesystem-based search using bash commands like grep and find. The approach reduced costs from $1.00 to $0.25 per query while improving output quality and debuggability compared to traditional embedding ...

Vercel Blog · 2026-04-03

Agent responsibly

Vercel outlined a framework for safely deploying AI-generated code, arguing that agents produce convincing but context-blind outputs that can pass tests while creating production risks. The company recommends engineers maintain full ownership of agent-generated changes and build infrastructure wh...

Vercel Blog · 2026-04-03

The hidden reason your AI assistant feels so sluggish

AI agent workloads are straining traditional cloud data warehouses because agents generate dozens of rapid concurrent queries instead of single queries, causing latency or cost problems. Companies are shifting toward real-time analytical databases paired with systems like PostgreSQL to handle the...

The New Stack · 2026-04-03

The laptop return that broke a RAG pipeline

A RAG-based customer-support agent incorrectly cited a 2023 return policy allowing 30 days instead of the current 14-day window because vector search finds semantically similar documents without accounting for recency or scope. The author proposes hybrid search—combining vector similarity with st...

The New Stack · 2026-04-03

SERHANT.'s playbook for rapid AI iteration

SERHANT. scaled its S.MPLE AI product from 200 to 900+ real estate agents using Vercel's AI SDK and Next.js, routing tasks across Claude, OpenAI, and Gemini models to optimize cost and performance without rebuilding infrastructure.

Vercel Blog · 2026-04-03

Making Turborepo 96% faster with agents, sandboxes, and humans

Vercel improved Turborepo's task graph computation speed by 81-91% through eight days of optimization work using AI agents and engineering practices, with three merged pull requests delivering a 25% reduction, 6% improvement, and an algorithmic replacement on its 1,000-package monorepo.

Vercel Blog · 2026-04-03

Unified reporting for all AI Gateway usage

Vercel launched a Custom Reporting API in beta for AI Gateway that consolidates cost and token usage data across multiple AI providers and user-provided API keys into a single reporting endpoint. One AI platform serving 200K+ users replaced its third-party cost tracking system with the API and re...

Vercel Blog · 2026-04-03

How FLORA shipped a creative agent on Vercel's AI stack

FLORA deployed an AI creative agent called FAUNA on Vercel's AI Stack to automate visual design workflows for fashion and creative industries. The company migrated from separate LangChain and Temporal systems to Vercel's integrated platform, which includes AI SDK, Workflow SDK, and Fluid compute ...

Vercel Blog · 2026-04-03

Want this in your inbox?

Daily digest covering every category above.