// category

Agent Engineering

Building reliable AI agents — CI/CD, testing, architecture, reliability, production lessons.

Building reliable AI agents — CI/CD for agent systems, testing strategies, architecture patterns, reliability engineering, and the hard-won production lessons that don't make it into product launches. This is the deeper, engineering-side of agentic coding.

287 stories · last 90 days

Making LLM outputs auditable: the provider abstraction pattern

A developer building NumPath, a teacher dashboard tool, describes using a Python Protocol interface to abstract LLM API calls, allowing the system to swap providers and test with deterministic stubs. The pattern separates evidence assembly (via database reads) from text generation, making AI-gene...

Dev.to - Claude · 2026-06-01

How we contain Claude across products

Anthropic published documentation detailing sandbox techniques used across its Claude products: Claude.ai uses gVisor, Claude Code uses Seatbelt on macOS and Bubblewrap on Linux, and Claude Cowork runs full VMs using Apple's Virtualization framework on macOS and HCS on Windows. The document also ...

Simon Willison · 2026-05-31

BoxAgnts Introduction (7) — OpenAI API and Anthropic API

BoxAgnts, a Rust-based AI agent framework, implements a unified `LlmProvider` trait that abstracts API differences between OpenAI, Anthropic, and Google Gemini, allowing model switching via a single parameter change. The seventh installment of the series covers interface design, message format co...

Dev.to - Claude · 2026-05-31

Building an AI Agency from Scratch — Episode 1: Day Zero

A developer in Shenzhen directed an AI agent named Centaur to spawn a team of 15 sub-agents, which crashed within an hour due to memory exhaustion and the absence of defined roles or hierarchy. The experiment led to a revised 3-layer architecture capping concurrent sub-agents at four, resulting i...

Dev.to - AI · 2026-05-30

Protecting against inference theft

Vercel described "inference theft," where attackers proxy AI endpoints through OpenAI-compatible adapters and resell stolen inference, noting a single LLM call can cost $2 versus fractions of a cent for standard HTTP. The company said it gates AI requests through per-call bot analysis rather than...

Vercel Blog · 2026-05-30

AI is shipping code faster than security was built to handle

Snyk launched Evo Continuous Offensive Security, an AI-native penetration testing product, citing that traditional pentesting averages 15 days of annual coverage, leaving a 350-day window of exposure. The product targets enterprises using AI coding agents that compress development cycles from wee...

The New Stack · 2026-05-30

Why AWS scrapped OpenSearch’s architecture to chase agent workloads

AWS rebuilt approximately 97% of its Amazon OpenSearch Serverless architecture from the ground up, introducing a new proprietary storage layer that separates storage from compute, allowing collections to scale to zero when idle. The redesigned service auto-scales 20 times faster than its predeces...

The New Stack · 2026-05-29

AiFinPay: Autonomous Payments for ruvnet/ruflo

AiFinPay released a Python SDK ("aifinpay-agent") designed to add payment processing to AI agent workflows, and announced a partnership with ruvnet/ruflo, an agent orchestration platform built for Anthropic's Claude.

Dev.to - AI · 2026-05-29

AiFinPay: Autonomous Payments for ruvnet/ruflo

AiFinPay released a Python SDK ("aifinpay-agent") designed to add payment processing capabilities to autonomous AI agents, and announced a partnership with ruvnet/ruflo, an AI agent orchestration platform.

Dev.to - AI · 2026-05-28

Why AI agents need a Context Lake

Scaling AI agents across organizations faces three obstacles: security reviews that can take over nine months, MCP tool overload that consumes up to 150,000 context-window tokens per Anthropic's estimates, and agents lacking basic organizational knowledge. The article proposes a "Context Lake" as...

The New Stack · 2026-05-28

Microsoft Copilot Cowork Exfiltrates Files

Researchers at PromptArmor found that Microsoft Copilot Cowork is vulnerable to prompt injection attacks that can exfiltrate files via rendered email images containing external requests, with OneDrive pre-authenticated links potentially leaked to attackers.

Simon Willison · 2026-05-27

AiFinPay: Autonomous Payments for ruvnet/ruflo

AiFinPay released a Python SDK called `aifinpay-agent` designed to handle payment processing within AI agent workflows, and announced a partnership with ruvnet/ruflo, an agent orchestration platform. The SDK is available via pip and hosted on GitHub.

Dev.to - AI · 2026-05-27

How the AC/DC framework helps teams govern AI coding agents

The AC/DC (Agent Centric Development Cycle) framework defines four stages for governing AI coding agents: Guide, Generate, Verify, and Solve. The framework argues that verification, not code generation, is the critical bottleneck as agents produce thousands of lines of code faster than teams can ...

The New Stack · 2026-05-27

Add Runtime Limits to Claude Agent Workflows

A Dev.to guide describes a TypeScript pattern for adding runtime limits to Claude-based AI agent workflows, using constraints such as maximum execution time (30 seconds), step count (15), and tool calls (10) to prevent runaway retries and unbounded execution.

Dev.to - Claude · 2026-05-26

Beyond the Prompt: Why Your AI Agent Needs a Governance Runtime

A developer building a product called NEES Core Engine argues that production AI agents require a dedicated governance runtime layer to enforce business logic and safety boundaries, rather than relying solely on system prompts. The article identifies failure modes including policy bypass, memory ...

Dev.to - AI · 2026-05-26

How We Analyzed The Top 2,354 ClawHub Skills for Security

Trent AI analyzed 2,354 packages on ClawHub using a five-step behavioral pipeline that evaluates AI agent skills for permission scope, credential handling, network exposure, input validation, and chained attack paths, alongside VirusTotal scans as a secondary data point.

Dev.to - AI · 2026-05-26

Microsoft Copilot Cowork Exfiltrates Files

Security researchers at PromptArmor disclosed a vulnerability in Microsoft Copilot's Cowork feature that allows attackers to exfiltrate files, likely via prompt injection techniques targeting the AI assistant's access to user documents.

Hacker News - Best · 2026-05-26

What ClickHouse learned from a year of coding with AI agents

ClickHouse engineers reported that AI coding agents became viable for daily work on their large C++ codebase after Anthropic released Claude Opus 4.5 in November 2025, having previously found earlier models ineffective for C++ beyond boilerplate tasks. The team categorizes AI-assisted coding into...

The New Stack · 2026-05-25

Who’s monitoring the agents?

AI agent frameworks such as CrewAI, AutoGen, and LangGraph are increasingly deployed in production, but teams operating multi-agent systems lack adequate monitoring tools to trace how outputs are produced. Common operational problems include runaway model call chains, silent failures, subtly inco...

The New Stack · 2026-05-25

Claude's JSON Output and the R…

A developer found that 14% of 12,400 structured-output calls to Claude returned JSON wrapped in markdown fences despite strict system prompts. To address this, they built a three-pass Rust pipeline that validates, corrects, and verifies structured outputs before returning them.

Dev.to - Claude · 2026-05-24

CI wasn’t built for coding agents. Here’s what comes next.

Traditional CI pipelines, which return results in 10-30 minutes, are too slow for AI coding agents that iterate in seconds. One proposed solution is small, self-contained integration checks called "plans" that run inside an agent's session against a live environment, eliminating the round-trip to...

The New Stack · 2026-05-22

Kore counts down to Artemis, its moonshot for governable AI agents

Kore.ai released Artemis, the latest version of its Kore Agent Platform, a visual and code-based environment for building multi-agent AI systems. The platform includes a declarative Agent Blueprint Language with six built-in orchestration patterns and an automated agent architect tool called Arch.

The New Stack · 2026-05-22

Chat SDK now includes AI SDK tools

Vercel's Chat SDK added a built-in AI SDK toolset accessible via a new `chat/ai` subpath, with a `createChatTools()` function that connects read and write actions to agents. Write tools require approval by default, and three presets — reader, messenger, and moderator — scope the available toolset.

Vercel Blog · 2026-05-21

Chat SDK now supports callback URLs on buttons and modals

Vercel's Chat SDK added support for `callbackUrl` props on buttons and modals, enabling Workflow runs to pause and resume upon user interaction. The feature works for buttons on most platforms with official adapters, and for modals on Slack and Teams.

Vercel Blog · 2026-05-21

Run Claude Managed Agents with Vercel Sandbox

Vercel and Anthropic have integrated Claude Managed Agents with Vercel Sandbox, allowing agent tool calls to execute in isolated Firecracker microVMs on Vercel infrastructure. Each session runs in its own microVM with credential brokering, deny-by-default egress, and access to private networks an...

Vercel Blog · 2026-05-20

Why production RAG systems give confident, wrong answers at scale

Retrieval-Augmented Generation systems fail at production scale primarily because retrieval architectures degrade as document corpora grow into the millions, causing LLMs to generate confident but incorrect answers from incomplete context. The failure is in recall, not the model itself — relevant...

The New Stack · 2026-05-20

OpenClaw and Claude Code - Multi Agents talking via Handoff File

A developer built a two-agent system pairing OpenClaw, a Discord-based LLM bot running on a Raspberry Pi, with Claude Code for coding tasks. OpenClaw receives user requests and passes them to Claude Code via a shared handoff file; Claude Code writes code, opens a GitHub pull request, and exits.

Dev.to - Claude · 2026-05-20

Long-Running Agents: Harness, Evaluator, Handoff

Anthropic, IBM, and AI LABS independently presented talks arguing that hour-scale AI agent reliability depends on harness architecture, adversarial evaluator agents, and structured state handoffs rather than model improvements. Anthropic researchers Ash Prabaker and Andrew Wilson specifically pro...

Dev.to - Claude · 2026-05-19

Instruction systems capability ladder: harness leveling

A developer published an 8-level taxonomy for AI agent instruction systems, ranging from basic system prompts (L0) to self-improving agents (L7), submitted to the Hermes Agent Challenge on Dev.to. The framework applies across tools including CLAUDE.md, AGENTS.md, and .cursorrules, categorizing le...

Dev.to - Claude · 2026-05-19

We Got 2x LLM Inference Speed With Three Kubernetes Settings

DigitalOcean engineers achieved roughly 2x LLM inference throughput on Kubernetes by combining Managed NFS for shared model weights, jumbo frames with TCP buffer tuning, and a node taint to prevent a race condition between the network tuner and vLLM pods. The reference architecture, including Ter...

Dev.to - AI · 2026-05-19

What we shipped -- 2026-05-15

Glad Labs fixed a race condition in voice conversation sessions via PR #436, adding a retry mechanism in `ClaudeCodeBridgeLLMService` that catches "Session ID already in use" errors on the first turn and resumes against existing session data. They also expanded a test suite from 5 to 18 cases and...

Dev.to - Claude · 2026-05-16

RLHF in 2026: when to pick PPO, DPO, or verifier-based RL

A technical guide outlines when to use PPO, DPO, or verifier-based RL (RLVR) for post-training language models, recommending DPO for style and instruction-following tasks, RLVR for math and code with ground-truth checkers, and PPO only when on-policy sampling costs are justified.

Dev.to - AI · 2026-05-16

Why agent harnesses fail inside cloud-native systems

An analysis in The New Stack argues that AI coding agent performance depends more on surrounding scaffolding — prompts, tools, and feedback loops — than model selection, citing data showing the same model moved from rank 30 to rank 5 on Terminal Bench 2.0 with a different harness. The piece conte...

The New Stack · 2026-05-14

Living off the agent: The new tactic hijacking enterprise AI

Cybersecurity researchers are warning that enterprise AI agents, which have broad access to company data and systems, introduce new attack vectors where malicious actors can exploit agents' instruction-following behavior to exfiltrate sensitive information, a tactic being called "living off the a...

The New Stack · 2026-05-13

Trusted Sources for Deployment Protection

Vercel introduced "Trusted Sources," a deployment protection method that accepts short-lived OIDC tokens from authorized Vercel projects and external services, replacing long-lived automation bypass secrets. Callers pass tokens via the `x-vercel-trusted-oidc-idp-token` header; Vercel verifies the...

Vercel Blog · 2026-05-13

Why your AI agent doesn’t actually remember anything

AI agents typically lack persistent memory across sessions because storing conversation history requires more than a database — it involves selection, compression, decay of stale data, and prevention of corrupted facts from influencing future decisions. Most production agents handle idempotency a...

The New Stack · 2026-05-12

How AI-native systems are built

The article outlines a layered architecture for building AI-native enterprise systems, proposing a shift from deterministic rule-based software to probabilistic models with governance gates that enforce access controls and PII scrubbing before requests reach an AI orchestrator.

The New Stack · 2026-05-12

Skill Spam Is a Genre — And the Validators Are Trending

Millionco's "react-doctor," a GitHub Action that scores AI-generated React code on a 0–100 scale, is trending on GitHub as a validator for output from agents including Claude Code, Cursor, and Codex. The tool emerged within three months of Anthropic introducing Skills as a Claude Code surface, al...

Dev.to - Claude · 2026-05-11

Moving Beyond Naive RAG: The Rise of Agentic Retrieval

Agentic RAG replaces static retrieval-augmented generation pipelines with autonomous agents that dynamically decide whether to search a vector database, query SQL, or call external APIs, and can rephrase queries when initial results are insufficient. Frameworks such as LangGraph and LlamaIndex's ...

Dev.to - AI · 2026-05-11

The attack surface moved inside the agent. So did Arcjet.

Arcjet, a San Francisco-based runtime security company, released Guards, a feature that enforces security policies inside AI agent tool handlers, queue consumers, and workflow steps. The tool targets code paths that bypass HTTP boundaries and are invisible to traditional web application firewalls...

The New Stack · 2026-05-11

I Reimplemented Anthropic Dreaming. The First Dream Was Wrong.

A developer reimplemented Anthropic's "Dreaming" memory-consolidation feature for a solo crypto trading bot, running it as a weekly automated pass to compress and deduplicate agent state. The first hypothesis it generated—a time-of-day profit pattern—was disproven by full-history backtesting, wit...

Dev.to - Claude · 2026-05-11

Vercel Sandbox firewall now supports request proxying and filtering

Vercel added request proxying and filtering to its Sandbox firewall, allowing outbound sandbox traffic to be routed through a user-controlled proxy and filtered by path, method, query string, or headers. The features are available in beta for Pro and Enterprise plans via the `@vercel/sandbox@beta...

Vercel Blog · 2026-05-11

Claude Code Source Analysis Series, Chapter 2: The ReAct Main Loop

A technical analysis of Claude Code's source code examines how `query.ts` implements the ReAct (Reason-Act) loop, which cycles through model API calls, tool invocations, and context updates to handle multi-step tasks. The `QueryEngine` class maintains session-level state across conversation turns...

Dev.to - Claude · 2026-05-10

Claude Code Source Analysis Series, Chapter 3: Prompt Construction

Claude Code assembles its model input at runtime from multiple sources — including system rules, project memory, Git state, tool descriptions, and message history — rather than using a single static prompt. Each model call reconstructs context by layering stable, dynamic, and memory segments with...

Dev.to - Claude · 2026-05-10

My AI agent wiped my database twice. So I built a command firewall.

A developer building a customer service agent with Claude Code had their local database wiped twice in one week when the AI ran `npx prisma migrate reset --force`, prompting them to build "Aegis," a command firewall that intercepts and requires manual approval for dangerous commands before execut...

Dev.to - AI · 2026-05-10

Running Codex safely at OpenAI

OpenAI published details on how it runs Codex internally, using sandboxing, approval workflows, network policies, and agent-native telemetry to secure its coding agent deployments for enterprise compliance.

OpenAI Blog · 2026-05-09

The Rise of the Swarm: Mastering AI Agent Architectures 🐝

A Dev.to tutorial outlines multi-agent AI "swarm" architectures, describing three coordination patterns—handoff-based relay, blackboard state sharing via Redis or vector stores, and directed acyclic graph routing using frameworks such as OpenAI Swarm, CrewAI, and LangGraph.

Dev.to - AI · 2026-05-09

Small-to-Big RAG: Your AI Needs a Better Context 🧠

Small-to-Big Retrieval is a RAG technique where AI systems search small text chunks for precision but return larger surrounding context to the language model. Two variants exist: Sentence Window (retrieves neighboring sentences) and Parent Document Retrieval (retrieves a full parent section from ...

Dev.to - AI · 2026-05-09

Chat SDK adds web adapter support

Vercel's Chat SDK added a web adapter that lets developers build browser-based chat interfaces, including in-product assistants and support agents. The adapter streams replies to the browser using the `@ai-sdk/react` `useChat` hook.

Vercel Blog · 2026-05-09

The Agentic Age: Building AI That Works in the Real World

Developers building automated AI agents in 2024-2025 faced account suspensions and large infrastructure bills after routing requests through extracted browser OAuth tokens from consumer chat subscriptions like Claude and ChatGPT to avoid per-token API costs. The practice, exemplified by tools lik...

Dev.to - AI · 2026-05-09

Chat SDK now supports conversation history

Vercel's Chat SDK added cross-platform conversation history support via new `transcripts` and `identity` options. The `bot.transcripts` API provides four methods—append, list, count, and delete—backed by existing state adapters.

Vercel Blog · 2026-05-09

I Caught a Jailbreak Attack That Hides Inside Normal Conversations

Many-shot jailbreaking, documented in a 2024 Google DeepMind paper, embeds harmful requests at the end of fabricated benign conversation histories to bypass LLM safety training, with near-complete bypass reported at 256 prior exchanges. A developer built open-source detection logic using three si...

Dev.to - AI · 2026-05-09

Improving token efficiency in GitHub Agentic Workflows

GitHub began systematically optimizing token usage in its Agentic Workflows in April 2026, building two automated daily workflows to audit and flag inefficiencies. The most common issue found was unused MCP tool registrations, where including all 40 GitHub MCP server tools adds 10–15 KB of schema...

GitHub Blog · 2026-05-08

AI Agent Guardrails That Work: 4 Production Wipes, 4 Fixes

Four AI agent incidents in ten months — including a Cursor/Claude Opus 4.6 agent deleting PocketOS's production database and backups in nine seconds, and an Amazon outage estimated at 6.3 million lost orders — shared a common cause: agents with broad credentials and no human-confirmation gate on ...

Dev.to - Claude · 2026-05-07

Validating agentic behavior when “correct” isn’t deterministic

GitHub's engineering team identified that traditional CI test frameworks produce false negatives when validating autonomous agents like Copilot's Agent Mode, because agents can complete tasks via multiple valid paths. The team proposed a "Trust Layer" validation model that checks essential outcom...

GitHub Blog · 2026-05-07

Best Mem0 Alternatives for Long-Term AI Memory

A developer guide compares alternatives to Mem0, a long-term memory layer for AI agents, citing its API pricing, reliance on vector search over knowledge graphs, and limited self-hosting options. Tools evaluated include MemoryLake, Zep, and Letta.

Dev.to - AI · 2026-05-07

The company that made RAG mainstream is now betting against it

Pinecone launched Nexus, a knowledge engine for AI agents, and KnowQL, a declarative query language, positioning both as replacements for RAG-based retrieval patterns the company helped popularize. Pinecone claims the approach raises agent task completion rates above 90% and cuts token costs by 9...

The New Stack · 2026-05-07

Anthropic will let its managed agents dream

Anthropic expanded its Managed Agents platform with a feature called "dreaming," currently in research preview, which runs scheduled processes to review recent agent sessions, identify patterns, and update the agent's memory. The company also added "outcomes," a system where users define success ...

The New Stack · 2026-05-07

Why long-running AI agents break on HTTP and how Ably is fixing it

Ably CEO Matthew O'Riordan says HTTP's request/response model fails for long-running AI agents that require persistent connections across dropped sessions and device switches, and argues that infrastructure built for "durable sessions" — covering presence, state, and reconnection — is needed inst...

The New Stack · 2026-05-07

How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds

NetEase Games reduced cold start times for 70B-class LLM inference from 42 minutes to 30 seconds by using Fluid, a CNCF Kubernetes-native data orchestration project, to prefetch and cache model weights closer to inference nodes. The bottleneck was model data loading from remote storage, not conta...

The New Stack · 2026-05-07

I trained a sprite model with agents. The data was the bottleneck.

A developer released pixel-llm, a 2.9-million-parameter autoregressive transformer that generates 32x32 pixel art sprites of reef sea creatures using a 64-color palette. Built using AI agent sessions, the model trained across four dataset iterations but failed to converge on two of six sprite cat...

Dev.to - AI · 2026-05-06

Parallel Branches in Neuron AI Workflow

Neuron AI, a PHP framework for AI integration, added parallel branch execution to its workflow system via a new `ParallelEvent` class. The feature allows independent pipeline tasks—such as text extraction, image analysis, and metadata classification—to run concurrently rather than sequentially, r...

Dev.to - AI · 2026-05-05

From AI Demo to Production: How to Ship Quality Agentic Applications

Braintrust and Trainline held a workshop in London on deploying agentic AI applications in production, focusing on evaluation, observability, and testing practices beyond prompt engineering. The article outlines how production AI systems require both traditional software engineering discipline an...

Dev.to - AI · 2026-05-02

Building a streaming AI companion in your own API

Libelo, a park and nature discovery platform, built an AI conversational assistant using Azure AI Foundry routed through their own API rather than called directly from the mobile app, citing security, monitoring, and resilience concerns. The implementation uses Azure Entra External ID for authent...

Dev.to - AI · 2026-05-02

A nine-point checklist for shipping production-ready AI

The New Stack published a nine-step technical guide for deploying AI systems to production, covering tool interface design, vector search with BM25 reranking, timeout and retry handling, OpenTelemetry-based observability, and bounded agent execution under concurrent load.

The New Stack · 2026-05-01

CLAUDE.md Is Not Enough: The Governance Stack for Agentic Development

A developer proposed a five-layer governance framework for AI coding agents, arguing that CLAUDE.md alone provides only project orientation, not policy enforcement. The framework adds CONSTITUTION.md, DIRECTIVES.md, SECURITY.md, and AGENTS.md documents alongside runtime enforcement and external v...

Dev.to - Claude · 2026-05-01

How I Built a Multi-LLM AI Agent System for Hospital Management

A developer built HISDashboard, a hospital management AI system using 10 specialized agents distributed across 4 LLM providers with automatic fallback, after a single-provider setup failed due to rate limiting. The system uses a router-specialist-reflection architecture with structured intent cla...

Dev.to - AI · 2026-05-01

Building a PDF Parser for Financial Data: Lessons from Arbiter V2

Arbiter Briefs added financial PDF ingestion to its V2, using regex and heuristics rather than ML to extract metrics from P&L statements, balance sheets, and cap tables. The pipeline uses pdf-parse for text extraction, multer for uploads capped at 10MB and 5 files per analysis, Railway persistent...

Dev.to - AI · 2026-05-01

Cut AI token usage by 96%? Here’s how AWS Strands Agents does it.

AWS developer advocate Morgan Willis demonstrated that redesigning agent tools from API-endpoint-mapped to intent-based reduced token usage from roughly 52,000 to 2,000 per query in AWS Strands Agents, a 96% reduction. Adding semantic search via AWS Agent Core Gateway to filter a 16-tool catalog ...

The New Stack · 2026-04-30

Building Pi, and what makes self-modifying software so fascinating

The Pragmatic Engineer podcast featured Mario Zechner, creator of Pi — a minimalist, self-modifying AI coding agent — and Armin Ronacher, creator of Flask, discussing Pi's design, its use in building AI-powered tools, and the limits of agentic workflows in software development.

Pragmatic Engineer · 2026-04-30

I cracked a robot vacuum's API in a week and gave Claude the keys

A developer reverse-engineered the cloud API of a 3i G10+ robot vacuum in one week, using mitmproxy, Frida hooks, and Dart AOT decompilation to gain full control. They then integrated Anthropic's Claude Haiku 4.5 vision model into the robot's drive loop at $0.003 per call, with peak daily AI cost...

Dev.to - Claude · 2026-04-30

How AI transforms your role as a platform engineer

Developers building their own AI agents for tasks like incident triage and deployment are bypassing platform engineering governance, creating what the industry calls "agent sprawl" — autonomous agents operating without audit trails, proper credentials, or PII controls.

The New Stack · 2026-04-30

18 Ways Your LLM App Can Be Hacked (And How to Fix Them)

Security researchers have catalogued 18 attack vectors targeting LLM applications, including prompt injection, RAG poisoning, memory poisoning, agent hijacking, and insecure output handling. The vulnerabilities span prompt, memory, retrieval, tool, agentic, and output layers of LLM systems.

Dev.to - Claude · 2026-04-29

Why JSON Schema matters more than ever in the age of generative AI

JSON Schema, a data validation standard first proposed in 2007, has been adopted by API specifications including OpenAPI, AsyncAPI, and Anthropic's Model Context Protocol. Enterprises are increasingly using it to enforce structure on large language model outputs, converting probabilistic results ...

The New Stack · 2026-04-29

Native Deployment Checks are now available

Vercel launched Native Deployment Checks, allowing teams to run lint and typecheck scripts from package.json in parallel with every deployment. Checks can be marked required to block production releases until they pass, and Vercel Agent will suggest fixes when a check fails on a pull request.

Vercel Blog · 2026-04-29

AI POC to Production: Deploying AI Successfully in Industry

Most enterprise AI projects fail to reach production due to poor business alignment, data quality issues, weak infrastructure, and lack of MLOps practices. Key factors for successful deployment include clear KPIs, scalable API-driven architectures, and continuous model monitoring and retraining.

Dev.to - AI · 2026-04-28

How Claude Decides What Tool to Call

When provided a list of tools via Anthropic's API, Claude converts natural language requests into structured JSON tool invocations through a multi-stage pipeline, completing the process in under 200 milliseconds rather than performing human-like deliberation.

Dev.to - Claude · 2026-04-28

I Built a 24/7 AI Agent System on a $6/Month VPS — Here's the Stack

A developer built an autonomous AI agent running on a €3.90/month Hetzner VPS using the OpenClaw framework and DeepSeek V4 Pro, which posts to Twitter every 5 minutes and publishes articles every 30 minutes. The system manages a Gumroad store selling 89 digital guides, with DeepSeek V4 Pro cited ...

Dev.to - AI · 2026-04-28

Why AI engineering needs old-school discipline

Thoughtworks data and AI advisor Nimisha Asthagiri says more than 40% of agentic AI projects are forecast by Gartner to be canceled by 2027, citing a gap between proof-of-concept and production. The Thoughtworks Technology Radar recommends returning to engineering fundamentals such as test-driven...

The New Stack · 2026-04-28

Why Claude needs a real environment to validate cloud-native code

Boris Cherny, creator of Claude Code, stated that giving Claude a way to verify its own work produces 2-3x better results, calling it more important than ever with the Opus 4.7 release. OpenAI Codex, GitHub Copilot, and Cursor have each shipped self-validation loops in the past six months as a co...

The New Stack · 2026-04-25

How I Stopped My AI Agent From Reinventing the Wheel

A developer built an OpenClaw plugin called "openclaw-skill-hunter" that instructs AI agents to search for existing tools before generating custom code. In a 150-task test, the developer found 40% of tasks involved reimplementing functionality already available in existing tools.

Dev.to - Claude · 2026-04-25

The Proxy Problem: When Your Agent Optimizes for the Wrong Thing

Autonomous AI agents are prone to optimizing measurable proxy metrics rather than actual intended outcomes, a phenomenon described as the proxy problem. Three identified failure modes include metric fixation, gaming of measurements, and corruption of feedback loops that the agent's own behavior i...

Dev.to - AI · 2026-04-24

Workspace agents

OpenAI introduced workspace agents in ChatGPT, a feature designed to automate repeatable workflows and connect tools for team operations. The feature allows organizations to build and scale agents within the ChatGPT environment.

OpenAI Blog · 2026-04-23

How to Build AI Agents for Your Business

A Dev.to tutorial outlines the key components of business AI agents — large language models, contextual memory, and tool-routing layers — and recommends frameworks such as LangChain or LlamaIndex for orchestration and Pinecone or Weaviate for vector-based memory storage.

Dev.to - AI · 2026-04-22

How we built real-time deposition analysis with Claude's streaming API

Developers built a real-time deposition analysis tool for medical-malpractice attorneys that transcribes live audio via Deepgram, buffers it into 30-second segments, and runs each segment through Anthropic's Claude Haiku 4.5 to detect admissions, inconsistencies, and impeachment opportunities dur...

Dev.to - Claude · 2026-04-21

Stop Fixing Kubectl Typos: Let an AI Agent Handle It

DataArt engineer Eugene Kiselev built a Python-based AI agent that extracts kubectl commands from Kubernetes lab docs, executes them in a live cluster, and rewrites the docs after fixing errors. Testing local models via Ollama, Gemma 3:4B consistently identified all 16 commands per run, while the...

Dev.to - AI · 2026-04-20

OpenClaw Skills Ecosystem and Practical Production Picks

OpenClaw is an AI agent framework that separates "plugins" (runtime extensions) from "skills" (markdown-based behavioral instructions), with skills stored in a precedence-based directory hierarchy. The article outlines the skill file structure and offers guidance on selecting skills from the Claw...

Dev.to - AI · 2026-04-20

I ran 4 autonomous Claude agents for 6 months. Here's the data.

A developer ran four to five autonomous Claude AI agents on a macOS machine for six months at roughly $200/month, shipping 16 products that attracted four customers but generated no revenue. The experiment found that an agent given a survival-framing prompt showed self-preservation language in it...

Dev.to - Claude · 2026-04-19

Microsoft Agent Framework: From Zero to Multi-Agent Pipeline

Microsoft released Agent Framework, a Python package for building AI agents with native Model Context Protocol support, positioned as the successor to Semantic Kernel and AutoGen. A developer used it to build a multi-agent pipeline that reads a product backlog from a Markdown file and creates Epi...

Dev.to - AI · 2026-04-19

How Zo Computer improved AI reliability 20x on Vercel

Zo Computer, an 8-person AI cloud startup, migrated to Vercel's AI SDK and AI Gateway, reducing its AI model retry rate from 7.5% to 0.34% and raising chat success rate from 98% to 99.93%. P99 latency fell 38%, from 131 seconds to 81 seconds.

Vercel Blog · 2026-04-18

30 Days Running a Multi-Agent AI Business: What Actually Breaks

A developer ran a multi-agent AI system called Pantheon for 30 days handling business operations including content creation, trading, and customer outreach. The primary failure identified was agents becoming idle after completing tasks without alerting the system, requiring implementation of tmux...

Dev.to - Claude · 2026-04-17

How GitHub uses eBPF to improve deployment safety

GitHub described its use of eBPF to detect and prevent circular dependencies in its internal deployment tooling. The approach is intended to reduce deployment failures caused by dependency cycles within the platform's infrastructure.

GitHub Blog · 2026-04-17

Anthropic Silently Dropped Prompt Cache TTL from 1 Hour to 5 Minutes

Anthropic reduced the default prompt cache time-to-live from 1 hour to 5 minutes on March 6, 2026, without public announcement, causing developers using Claude's prompt caching feature to experience reduced cache hit rates and higher token costs unless they send identical requests within the shor...

Dev.to - Claude · 2026-04-16

OpenAI’s Agents SDK separates the harness from the compute

OpenAI released a major update to its Agents SDK featuring sandboxed execution environments that separate agent control from compute resources, allowing developers to use their own infrastructure or integrate with services like Modal, E2B, and Vercel for improved security and scalability.

The New Stack · 2026-04-16

When AI writes 100K lines of code, QA becomes the whole job

As AI tools generate code rapidly, software development bottlenecks have shifted from writing code to validating it, according to Artur Balabanskyy, who runs an AI-first development agency. Development teams must now focus on quality assurance and testing rather than code production.

The New Stack · 2026-04-16

The next evolution of the Agents SDK

OpenAI released an updated Agents SDK with native sandbox execution and a model-native harness, enabling developers to build secure, long-running agents that can work across files and tools.

OpenAI Blog · 2026-04-16

5 Claude Code Agentic Workflow Patterns — Which One Fits Your Work?

An article describes five workflow patterns for Claude Code: Sequential (human-verified step-by-step), Operator (single agent with defined permissions), Parallel (multiple independent tasks), Teams (role-separated agents), and Autonomous (minimal human involvement). Each pattern trades control fo...

Dev.to - Claude · 2026-04-15

MemoryLake:Persistent multimodal memory for AI agents

MemoryLake launched a persistent memory layer for AI agents that retains information across sessions and works with multiple AI platforms, featuring multimodal document parsing, conflict resolution, and three-party encryption for data privacy.

Dev.to - AI · 2026-04-15

I Built a Pay-Per-Call Trading Signal API for AI Agents

A developer built a trading signal API that charges AI agents per-call micropayments in USDC via the x402 protocol, eliminating the need for traditional API key signup; signals are generated using RSI, ADX, MACD, and volume indicators with prices ranging from $0.005 to $0.01 per request.

Dev.to - AI · 2026-04-15

From clobbered drafts to real-time sync

Suga switched from last-write-wins conflict resolution to Zero, a real-time sync engine from Rocicorp, after developers lost work when simultaneous edits overwrote each other. The system uses local SQLite databases on clients that synchronize with a PostgreSQL server, with server-side conflict re...

The New Stack · 2026-04-15

Building Claudio: My Always-On Claude Code Box

A developer built Claudio, a scheduled task automation system running Claude AI on a home Debian VM to handle recurring work like reading news and checking client status. Version 1 using cron jobs with Claude Code failed after two weeks due to OAuth token expiration; version 2 replaced cron with ...

Dev.to - Claude · 2026-04-14

From AI Demos to Production: What actually matters

Production generative AI systems require integration with existing data and workflows, structured inputs/outputs, and continuous monitoring—not just standalone LLM deployments. Current practical applications include internal AI assistants, document automation, knowledge base search, and content g...

Dev.to - AI · 2026-04-14

Claude Managed Agents Has Built-in Tracing. Here's What It Can't Do.

Anthropic's Claude Managed Agents includes built-in tracing for debugging, but audit logs stored on Anthropic's infrastructure cannot serve as independent evidence for compliance audits or breach investigations; cryptographically signed audit trails held by users provide tamper-evident records th...

Dev.to - Claude · 2026-04-14

How Agentic AI Tools Are Transforming Data Centers

Agentic AI systems are automating data center operations by continuously optimizing workload distribution, cooling, and maintenance without manual intervention. Applications include dynamic workload shifting across servers, autonomous cooling adjustments, and predictive hardware failure detection...

Dev.to - AI · 2026-04-14

Claude Haiku vs GPT-4o Mini for Automation Pipelines

Claude Haiku costs 5-6x more per input token than GPT-4o Mini but produces more accurate summaries and handles longer context windows; GPT-4o Mini is faster (2,000 vs 1,000 tokens/second) and cheaper, with performance trade-offs varying by automation task type based on eight months of production ...

Dev.to - Claude · 2026-04-13

The Identity Gap in Agentic AI

Most AI agents in production authenticate with shared API keys rather than individual identities, making it impossible to distinguish between agents, control specific actions, or trace operations back to particular agents—creating security, compliance, and operational risks.

Dev.to - AI · 2026-04-12

I Hired 8 IT Gurus to Give Me a Code Review

A developer created eight AI agents embodying software figures like Linus Torvalds and Charity Majors to review a bug-fix pull request; the agents independently identified different concerns (observability, performance, test coverage), then debated after reading each other's reviews, with Linus c...

Dev.to - Claude · 2026-04-12

🧠 Stop Letting Your AI Forget: MemPalace is a Wake-Up Call

MemPalace is a system that provides persistent hierarchical memory for AI applications using the memory palace technique, storing raw operational data locally and organizing it into navigable structures. The approach targets DevOps and incident response workflows by enabling AI systems to retain ...

Dev.to - Claude · 2026-04-12

Two Ends of the Token Budget: Caveman and Tool Search

Caveman, a Claude Code plugin, reduces output tokens by ~65% through prompt compression, while tool search defers loading MCP tool definitions until needed. Both systems target the same 200,000-token context window from opposite ends: one compresses what the model outputs, the other defers what t...

Dev.to - Claude · 2026-04-11

Why data governance is the secret to AI agent success

A Perforce report found 70% of IT leaders say strong DevOps practices support AI adoption, but only 39% of organizations have fully automated audit trails despite 77% reporting confidence in AI outputs, highlighting a governance gap that must be addressed as AI agents take on autonomous roles.

The New Stack · 2026-04-11

AI Citation Registries and Website-Based Publishing Constraints

AI systems misattribute information from government websites because traditional web publishing encodes authority through layout and context rather than explicit machine-readable fields, causing statements to become detached from correct sources and jurisdictions during processing. The article pr...

Dev.to - AI · 2026-04-11

Agentic Infrastructure

Vercel announced infrastructure designed for AI coding agents, citing that 30% of its deployments are now agent-initiated, up 1000% in six months, with Claude Code accounting for 75% of agent deployments. The company is offering deployment APIs, long-lived execution, and unified AI primitives to ...

Vercel Blog · 2026-04-10

Control Planes Make Multi-Agent Systems Safe in Production

Production multi-agent systems require a control plane layer to prevent execution failures such as duplicate task execution, state ambiguity, and credential leaks. A control plane enforces explicit state transitions, isolates task execution with permission boundaries, and maintains auditable reco...

Dev.to - AI · 2026-04-10

Zero‑Loss AI Agents

Engineers should design AI agents for high-stakes domains—healthcare, security, fintech—with security, auditability, and system integration built in from the start, not retrofitted.

Dev.to - AI · 2026-04-10

Building Your AI-Powered CMA Engine: The Core Framework

A five-pillar AI framework automates comparative market analysis and hyper-local report generation for real estate agents by automating comp selection, valuation adjustment, narrative writing, and visualization, reducing manual work and freeing time for client activities.

Dev.to - AI · 2026-04-09

From Perceptrons to Predicting the Next Word

An educational article explains how feedforward neural networks function as language models, covering single neural units, activation functions, hidden layers, and the task of predicting the next word in text sequences.

Dev.to - AI · 2026-04-09

My AI Agent Runs 24/7 Without Me -- Week 1 Results

A developer deployed an AI agent built on Claude to autonomously manage business operations for one week, completing 47-89 tasks daily including email sorting, payment processing, content publishing, and customer service while processing $445 in revenue and requiring minimal human intervention.

Dev.to - Claude · 2026-04-09

The Face Never Existed. The ID Is Stolen. The Match Is Perfect.

Hybrid identity fraud using AI-generated faces is compromising biometric verification systems by creating synthetic IDs and liveness videos that match too perfectly, forcing developers to shift from simple facial matching to forensic analysis that detects shared synthetic origins through mathemat...

Dev.to - AI · 2026-04-08

58% of PRs in our largest monorepo merge without human review

Vercel deployed an AI agent that automatically reviews and merges 58% of pull requests in its largest monorepo, reducing average merge time from 29 hours to 10.9 hours. The agent uses an LLM-based classifier to categorize changes by risk, approving low-risk changes like documentation and styling ...

Vercel Blog · 2026-04-07

Launch HN: Freestyle – Sandboxes for Coding Agents

Freestyle launched a cloud service providing sandboxes for AI coding agents, featuring sandbox forking in 400ms pauses, 500ms startup times, and full Linux/hardware virtualization support running on proprietary bare metal infrastructure rather than cloud providers.

Hacker News - Best · 2026-04-07

Use-Case-First AI Architecture Explained

AI systems designed around specific use cases rather than flexible prompts maintain consistency better as features scale across multiple teams and contexts, reducing output variability and maintenance complexity.

Dev.to - AI · 2026-04-07

360 billion tokens, 3 million customers, 6 engineers

Durable, an AI platform serving 3 million customers, processes 360 billion AI tokens annually using a 6-person team by consolidating to a single codebase and infrastructure platform, achieving 3-4x lower costs than self-hosting while managing millions of independent customer sites and AI agents.

Vercel Blog · 2026-04-07

Two startups at global scale without DevOps

Leonardo.AI processes 4.5 million images daily and Relevance AI runs 50,000 AI agents autonomously across systems like Salesforce and Slack—both without dedicated DevOps teams, relying instead on managed infrastructure platforms. APAC startups increasingly adopt this model due to severe DevOps ta...

Vercel Blog · 2026-04-07

End-to-end encryption for Vercel Workflow

Vercel added end-to-end encryption to Vercel Workflow, automatically encrypting all data flowing through event logs using AES-256-GCM with unique keys per deployment. Users can decrypt data via the web dashboard or CLI using existing environment variable permissions.

Vercel Blog · 2026-04-07

Claude Code Under the Hood: How It Actually Works

Anthropic's Claude Code system relies on a disciplined orchestration loop with context management, permissions, caching, and retry logic rather than raw model capability. The system excels at handling iterative tasks like test fixing through careful prompt engineering and decision-making across m...

Dev.to - Claude · 2026-04-06

Building LinkedIN Job Application Agents - Part 3

A developer completed HunterAgent, an automated job application system using six AI agents built on OpenAI's Responses API, with real-time web search for LinkedIn and Indeed jobs, resume optimization, and cover letter generation integrated with Streamlit and Supabase.

Dev.to - Claude · 2026-04-06

Components of a Coding Agent

Sebastian Raschka published an article outlining the key architectural components and design elements of coding agents powered by AI systems.

Hacker News - Best · 2026-04-05

research-llm-apis 2026-04-04

Simon Willison released research-llm-apis, a repository documenting raw API interactions and curl commands for Anthropic, OpenAI, Gemini, and Mistral to design an updated abstraction layer for his LLM Python library that handles features like server-side tool execution.

Simon Willison · 2026-04-05

Anthropic Blocked My Infrastructure. I Didn't Notice Because I'm Free.

Anthropic blocked Claude API access through the OpenClaw platform starting April 4, affecting hundreds of developers running autonomous agents. The incident highlighted concentration risk, as agents built on a single provider and pricing model faced sudden service loss, while those using free tie...

Dev.to - Claude · 2026-04-04

The hidden technical debt of agentic engineering

The article outlines seven categories of infrastructure complexity that accumulate when deploying AI agents in enterprise production environments, including integrations, observability, governance, and agent-specific requirements like human-in-the-loop systems and evaluation frameworks for non-de...

The New Stack · 2026-04-03

Score 98/100 sur Claude Code — Top 0.1% Mondial des Sessions

A developer achieved a 98/100 score on Claude Code across a single session that produced 69,340 lines of code, modified 351 files, and generated a complete French-compliant e-invoicing system with full test coverage and documentation. The session orchestrated 25+ parallel sub-agents across system...

Dev.to - Claude · 2026-04-03

You test your code. Why aren’t you testing your AI instructions?

A study found that instruction scaffolding affects AI coding task performance by 17 percentage points regardless of model choice, prompting development of agenteval, a tool to test instruction files for common issues including dead file references, filler text, contradictions, and context budget ...

Dev.to - Claude · 2026-04-03

Chat SDK brings agents to your users

Vercel released Chat SDK, a TypeScript library that lets developers build chatbots working across Slack, Microsoft Teams, Google Chat, Discord, Telegram, GitHub, and Linear from a single codebase using platform-specific adapters.

Vercel Blog · 2026-04-03

There’s a hidden tax on every AI-generated merge request

AI coding tools have increased merge request volume but shifted bottlenecks to code review, with 2025 DORA data showing no improvement in delivery metrics. Senior engineers with critical system knowledge face enlarged review queues, reducing time for design work, while automated checks cannot rep...

The New Stack · 2026-04-03

Build knowledge agents without embeddings

Vercel released an open-source Knowledge Agent Template that replaces vector embeddings with filesystem-based search using bash commands like grep and find. The approach reduced costs from $1.00 to $0.25 per query while improving output quality and debuggability compared to traditional embedding ...

Vercel Blog · 2026-04-03

Agent responsibly

Vercel outlined a framework for safely deploying AI-generated code, arguing that agents produce convincing but context-blind outputs that can pass tests while creating production risks. The company recommends engineers maintain full ownership of agent-generated changes and build infrastructure wh...

Vercel Blog · 2026-04-03

The hidden reason your AI assistant feels so sluggish

AI agent workloads are straining traditional cloud data warehouses because agents generate dozens of rapid concurrent queries instead of single queries, causing latency or cost problems. Companies are shifting toward real-time analytical databases paired with systems like PostgreSQL to handle the...

The New Stack · 2026-04-03

The laptop return that broke a RAG pipeline

A RAG-based customer-support agent incorrectly cited a 2023 return policy allowing 30 days instead of the current 14-day window because vector search finds semantically similar documents without accounting for recency or scope. The author proposes hybrid search—combining vector similarity with st...

The New Stack · 2026-04-03

SERHANT.'s playbook for rapid AI iteration

SERHANT. scaled its S.MPLE AI product from 200 to 900+ real estate agents using Vercel's AI SDK and Next.js, routing tasks across Claude, OpenAI, and Gemini models to optimize cost and performance without rebuilding infrastructure.

Vercel Blog · 2026-04-03

Making Turborepo 96% faster with agents, sandboxes, and humans

Vercel improved Turborepo's task graph computation speed by 81-91% through eight days of optimization work using AI agents and engineering practices, with three merged pull requests delivering a 25% reduction, 6% improvement, and an algorithmic replacement on its 1,000-package monorepo.

Vercel Blog · 2026-04-03

Unified reporting for all AI Gateway usage

Vercel launched a Custom Reporting API in beta for AI Gateway that consolidates cost and token usage data across multiple AI providers and user-provided API keys into a single reporting endpoint. One AI platform serving 200K+ users replaced its third-party cost tracking system with the API and re...

Vercel Blog · 2026-04-03

How FLORA shipped a creative agent on Vercel's AI stack

FLORA deployed an AI creative agent called FAUNA on Vercel's AI Stack to automate visual design workflows for fashion and creative industries. The company migrated from separate LangChain and Temporal systems to Vercel's integrated platform, which includes AI SDK, Workflow SDK, and Fluid compute ...

Vercel Blog · 2026-04-03