// edition · 2026-05-16

May 16, 2026

28 stories on AI dev tools, agents, and the coding stack — curated from the day's RSS haul by Agentic Dev's pipeline.

Top Signal · MCP & Integrations

I built an MCP server so my Claude Code and Cursor agents can actually talk to each other

A developer open-sourced Agent Room, an MCP server that gives multiple AI coding agents (Claude Code, Cursor, Codex, Gemini) a shared message channel using room codes. The project is MIT-licensed, available on npm as `agent-room-mcp`, and self-hostable, with a browser UI at agent-room.com.

Dev.to - AI

Tool Updates

Claude 3.5 Sonnet vs Haiku: Why Your Agent Budget Disappeared in 3 Hours

A developer reported spending $340 in three hours after configuring a customer support agent to use Claude 3.5 Sonnet for all 847 ticket operations, compared to an estimated $5/day cost using Claude 3.5 Haiku. The two models carry a 15x price differential, with Sonnet at $3/$15 per million tokens...

Pricing & Plans Dev.to - Claude

Claude Managed Agents' Dreaming, Outcomes, and Orchestration — How Agents Self-Improve While You Sleep

Anthropic announced three agent features at its Code with Claude conference in San Francisco on May 6: Dreaming (automated memory consolidation across sessions), Outcomes (success-criteria-based self-evaluation), and Multiagent Orchestration (parallel lead-subagent execution). The company also do...

Agent Engineering Dev.to - Claude

Optimizing your Claude Code usage (and spending less $$)

TokenJam released a feature called "tj optimize" that reads Claude Code's local JSONL session logs into a DuckDB database, identifies sessions that could use smaller models, and projects monthly API spending against a user-defined budget.

CLI Agents Dev.to - Claude

AWS found bugs in 60% of software requirements. Its fix isn’t more AI — it’s a 50-year-old logic engine.

AWS added a Requirements Analysis feature to its Kiro development platform that uses SMT solvers — formal mathematical reasoning engines — combined with LLMs to detect contradictions, ambiguities, and gaps in software specifications. AWS says the system found bugs in 60% of software requirements ...

Agentic IDEs The New Stack

Code Quest: A Claude Code Web UI That Runs in Interactive Mode — Just in Time for the June 15 Billing Change

A developer released code-quest, an open-source web UI for Anthropic's Claude Code CLI that runs in interactive mode via a three-tier WebSocket architecture. The project notes that starting June 15, 2026, Anthropic will bill `claude -p` and Agent SDK usage from separate monthly credits rather tha...

CLI Agents Dev.to - AI

Why AI Coding Tools Over-engineer Your MVP — And the One Fix

AI coding assistants default to production-grade recommendations because they lack explicit business context about project stage and scale, not due to intelligence limitations. Developers can adjust outputs by specifying stage, scale, and trade-off priorities in prompt context files like CLAUDE.m...

Workflows & Tips Dev.to - Claude

CLAUDE.md for C++: 13 Rules That Make AI Write Safe, Modern, Idiomatic C++

A developer published 13 rules for configuring CLAUDE.md files to guide AI coding assistants toward modern C++ practices, covering standards enforcement (C++20/23), smart pointer usage, and avoiding legacy idioms like raw owning pointers and C++98 patterns.

Workflows & Tips Dev.to - Claude

How data science teams use Codex

OpenAI published guidance on how data science teams can use Codex to automate analytical outputs including root-cause briefs, KPI memos, impact readouts, scoped analyses, and dashboard specifications from existing work inputs.

CLI Agents OpenAI Blog

Sort providers by cost, latency, or throughput on AI Gateway

Vercel added a `sort` option to AI Gateway that lets users rank AI providers by cost (price per million tokens), time to first token, or throughput at request time. The feature is compatible with existing routing controls such as Zero Data Retention filters.

Workflows & Tips Vercel Blog

OpenAI vs Claude vs Gemini API — Real Cost for India MVP 2026

A cost comparison of AI APIs for Indian developers estimates that running a WhatsApp support bot at 10,000 conversations per month costs approximately ₹1,250 on Gemini 2.5 Flash, ₹3,800 on GPT-5-mini, and ₹7,200 on Claude Sonnet 4, excluding GST and a 2% TDS applied to foreign invoices.

Pricing & Plans Dev.to - Claude

Why every Claude Code-built site looks the same — and the image layer that breaks it

A developer released a Claude Code skill that calls OpenAI's gpt-image-2 via Codex CLI to generate project-specific images, aiming to reduce the visual uniformity common to AI-built sites using default Tailwind, shadcn/ui, and Lucide icon stacks. The tool reads a DESIGN.md file and triggers on na...

CLI Agents Dev.to - Claude

Building a general-purpose accessibility agent—and what we learned in the process

GitHub's experimental accessibility agent has reviewed 3,535 pull requests in its pilot, resolving 68% of identified issues. The agent automatically detects and suggests fixes for WCAG violations in front-end code, integrating with GitHub Copilot CLI and VS Code.

Agent Engineering GitHub Blog

How business operations teams use Codex

OpenAI published a guide showing how business operations teams can use Codex to generate documents such as initiative briefs, strategy updates, leadership decision packets, and progress updates from existing work inputs.

CLI Agents OpenAI Blog

RLHF in 2026: when to pick PPO, DPO, or verifier-based RL

A technical guide outlines when to use PPO, DPO, or verifier-based RL (RLVR) for post-training language models, recommending DPO for style and instruction-following tasks, RLVR for math and code with ground-truth checkers, and PPO only when on-policy sampling costs are justified.

Agent Engineering Dev.to - AI

What we shipped -- 2026-05-15

Glad Labs fixed a race condition in voice conversation sessions via PR #436, adding a retry mechanism in `ClaudeCodeBridgeLLMService` that catches "Session ID already in use" errors on the first turn and resumes against existing session data. They also expanded a test suite from 5 to 18 cases and...

Agent Engineering Dev.to - Claude

The hidden cost of build vs. buy for agentic AI in regulated industries

Organizations in regulated industries face integration and governance costs when assembling agentic AI platforms from multiple point solutions, mirroring fragmentation seen in early DevOps toolchains. The core trade-off is between building custom orchestration layers with associated compliance ov...

Agent Engineering The New Stack

I Used Claude to Generate 37 Amazon JP Product Listings in a Day (Here's My Actual Workflow)

An e-commerce seller reported using Claude to generate 37 Amazon Japan product listings in one day, reducing per-SKU writing time from 30–60 minutes to approximately 5 minutes. The workflow uses structured spreadsheet inputs and Japan-specific prompt guardrails covering honorifics, punctuation, a...

Workflows & Tips Dev.to - Claude

Use native curl syntax with Vercel CLI

Vercel added a `vercel curl` command to its CLI that accepts native curl syntax, including full URLs, bare hostnames, and the `--url` flag. The command uses Vercel authentication to bypass Deployment Protection and supports path-only arguments when a project is linked.

Workflows & Tips Vercel Blog

QR code generator

Simon Willison built a browser-based QR code generator tool using Claude, supporting both URL/text and WiFi network QR codes. The tool includes options for style, size, color, and border customization.

Workflows & Tips Simon Willison

Ecosystem

Claude Mythos vs Claude Opus 4.6: what the leaked benchmarks mean for developers

Draft documents accidentally exposed from Anthropic described an unreleased model codenamed "Claude Mythos" (internally "Capybara"), reportedly scoring higher than Claude Opus 4.6 on coding, academic reasoning, and cybersecurity benchmarks, with early access limited to cyber defense organizations...

Model Releases Dev.to - Claude

datasette-llm-limits 0.1a0

Simon Willison released datasette-llm-limits 0.1a0, a Datasette plugin that enables per-user or global spending limits on LLM usage, configurable by scope and time window, such as a $1.00 rolling 24-hour per-user cap.

Open Source Tools Simon Willison

Databricks brings GPT-5.5 to enterprise agent workflows

Databricks integrated OpenAI's GPT-5.5 model into its enterprise agent workflows. The model achieved a top score on the OfficeQA Pro benchmark prior to the adoption.

Model Releases OpenAI Blog

Why Block handed Goose to the Linux Foundation

Block transferred its open-source coding agent Goose to the Agentic AI Foundation, a Linux Foundation entity, after retaining trademark ownership created governance issues that slowed enterprise adoption. The AAIF launched with three projects: Goose, Anthropic's Model Context Protocol, and Agents...

Industry & Funding The New Stack

The 'AI is replacing engineers' narrative is mostly bullshit, and I'm tired of pretending otherwise

A METR study found experienced developers were 19% slower on real tasks when using AI tools, contradicting claims that AI-driven productivity gains are behind recent tech layoffs. An analyst argues most cuts reflect post-2021 over-hiring corrections, with AI efficiency cited as a more market-frie...

Opinion & Analysis Dev.to - AI

May 16, 2026

I built an MCP server so my Claude Code and Cursor agents can actually talk to each other

Tool Updates

Claude 3.5 Sonnet vs Haiku: Why Your Agent Budget Disappeared in 3 Hours

Claude Managed Agents' Dreaming, Outcomes, and Orchestration — How Agents Self-Improve While You Sleep

Optimizing your Claude Code usage (and spending less $$)

AWS found bugs in 60% of software requirements. Its fix isn’t more AI — it’s a 50-year-old logic engine.

Code Quest: A Claude Code Web UI That Runs in Interactive Mode — Just in Time for the June 15 Billing Change

Why AI Coding Tools Over-engineer Your MVP — And the One Fix

CLAUDE.md for C++: 13 Rules That Make AI Write Safe, Modern, Idiomatic C++

How data science teams use Codex

Sort providers by cost, latency, or throughput on AI Gateway

OpenAI vs Claude vs Gemini API — Real Cost for India MVP 2026

Why every Claude Code-built site looks the same — and the image layer that breaks it

Building a general-purpose accessibility agent—and what we learned in the process

How business operations teams use Codex

RLHF in 2026: when to pick PPO, DPO, or verifier-based RL

What we shipped -- 2026-05-15

The hidden cost of build vs. buy for agentic AI in regulated industries

I Used Claude to Generate 37 Amazon JP Product Listings in a Day (Here's My Actual Workflow)

Use native curl syntax with Vercel CLI

QR code generator

Ecosystem

Claude Mythos vs Claude Opus 4.6: what the leaked benchmarks mean for developers

datasette-llm-limits 0.1a0

Databricks brings GPT-5.5 to enterprise agent workflows

Why Block handed Goose to the Linux Foundation

The 'AI is replacing engineers' narrative is mostly bullshit, and I'm tired of pretending otherwise

AI radio hosts demonstrate why AI can’t be trusted alone

Osaurus brings both local and cloud AI models to your Mac

OpenAI keeps shuffling its executives in bid to win AI agent battle

May 16, 2026

Tool Updates

Ecosystem

Adjacent editions