// edition · 2026-05-06

May 06, 2026

32 stories on AI dev tools, agents, and the coding stack — curated from the day's RSS haul by Agentic Dev's pipeline.

Top Signal · CLI Agents

Claude Code Context Window Rot: Why Sessions Get Dumber (And How to Fix It)

A Chroma 2025 study of 18 frontier AI models, including Claude 4, GPT-4.1, and Gemini 2.5, found all performed worse as input length increased, with some dropping from 95% to 60% accuracy past a context saturation threshold. The effect, called "context rot," is more pronounced in coding agents be...

Dev.to - Claude

Tool Updates

Which Claude Code Hook Do You Need? A Decision Guide

Claude Code supports four hook handler types — command, prompt, agent, and http — across 21 lifecycle events. Command hooks run in under 5ms and produce deterministic results, while prompt hooks invoke an LLM and take 300–2000ms, and agent hooks spawn full Claude Code sessions with file and tool ...

CLI Agents Dev.to - Claude

How I Stopped Burning Through My Claude Code Quota by Noon

Anthropic's Claude Code uses prefix caching that can reduce token costs by up to 10x, but actions like switching models mid-session, modifying tool configurations, or opening new sessions invalidate the cache and trigger full-price recalculation. Keeping sessions long and tool definitions stable ...

Workflows & Tips Dev.to - AI

Stop prompting Codex like ChatGPT

A developer guide argues that OpenAI's Codex, an autonomous coding agent that reads repos and runs commands, performs better when given bounded "atomic" tasks with defined outcomes and verification steps rather than the open-ended conversational prompts suited to ChatGPT.

CLI Agents Dev.to - AI

Claude Code 2026 vs. Codeium 2.0: 45% Faster PR Reviews for Monorepo Codebases

A benchmark across 12 production monorepos (4.2M lines of code) found Claude Code 2026 reviewed TypeScript PRs 45% faster than Codeium 2.0 (12.4s vs 22.6s), while Codeium 2.0 was 22% faster for Java/Kotlin repos; Claude Code 2026 costs $149/seat vs $109 for Codeium 2.0.

CLI Agents Dev.to - Claude

Building Your Own Claude API Cost Tracker: A Practical Guide to Staying on Budget

A developer published a guide describing how to build a cost tracking system for Anthropic's Claude API, using a three-layer approach covering pre-request token estimation, cost calculation, and threshold-based alerts. The guide includes Python code targeting Claude 3.5 Sonnet, Opus, and Haiku mo...

Workflows & Tips Dev.to - Claude

How to Fix "command 'claude-vscode.editor.openLast' not found" in VS Code

Version 2.1.129 of the Claude Code VS Code extension contains a bug that produces a "command 'claude-vscode.editor.openLast' not found" error, preventing the extension from opening. The workaround is to downgrade to version 2.1.128 via the extension's "Install Another Version" option.

Workflows & Tips Dev.to - Claude

Skills and the discovery ceiling: why your AI coding agent ignores most of what you install

AI coding agents that support the Agent Skills standard, including Claude Code, do not automatically read installed SKILL.md files when performing tasks, causing them to hallucinate commands or fail rather than use available documentation. A developer observed this behavior when Claude Code ignor...

Agent Engineering Dev.to - Claude

I trained a sprite model with agents. The data was the bottleneck.

A developer released pixel-llm, a 2.9-million-parameter autoregressive transformer that generates 32x32 pixel art sprites of reef sea creatures using a 64-color palette. Built using AI agent sessions, the model trained across four dataset iterations but failed to converge on two of six sprite cat...

Agent Engineering Dev.to - AI

Ecosystem

Using an MCP Gateway with Claude Code: A Practical Guide

An MCP gateway consolidates multiple MCP server connections into a single endpoint for Claude Code, reducing configuration overhead and token usage. Anthropic reported that connecting multiple MCP servers can inject up to 150,000 tokens per agent interaction; Bifrost, an open-source gateway by Ma...

MCP & Integrations Dev.to - Claude

GPT-5.5 Instant: smarter, clearer, and more personalized

OpenAI released GPT-5.5 Instant as an updated default model for ChatGPT, citing improvements in answer accuracy, reduced hallucinations, and expanded personalization controls.

Model Releases OpenAI Blog

The context window has been shattered: Subquadratic debuts a 12-million-token window

Subquadratic, a Miami startup, launched a model with a 12-million-token context window using an architecture called Subquadratic Selective Attention, which the company says scales linearly in compute and memory. The model scores 83 on MRCR v2 and 92.1% on needle-in-a-haystack retrieval at 12 mill...

Model Releases The New Stack

BuyWhere MCP Goes Live: The Open Source Commerce API for AI Agents

BuyWhere launched an open-source MCP server that gives AI agents access to over 50 million products across six markets — Singapore, the US, Japan, Korea, China, and Australia — via structured, merchant-direct data. The MIT-licensed server is available via npm and supports Claude, Cursor, and othe...

MCP & Integrations Dev.to - AI

Codens vs Devin vs Cursor Composer vs Sweep — picking the AI coding agent that matches your bottleneck

Four AI coding tools occupy distinct roles: Devin handles async ticket delegation, Cursor Composer assists developers inside the IDE, Sweep converts GitHub issues to PRs, and Codens routes Notion tickets through multiple specialized agents covering the full software development lifecycle.

Opinion & Analysis Dev.to - Claude

GPT-5.5 Instant System Card

OpenAI published a system card for GPT-5.5 Instant, a model in its GPT-5.5 lineup, documenting the model's safety evaluations and deployment considerations.

Model Releases OpenAI Blog

AI and Claude: The internal rebellion that changed Amazon’s rules

Amazon granted its tens of thousands of developers access to Anthropic's Claude Code and OpenAI's Codex on May 12, running via AWS and Amazon Bedrock, after roughly 1,500 employees pushed back against a policy restricting use of third-party tools in favor of Amazon's own Kiro coding assistant.

Industry & Funding The New Stack

OpenAI rolls out GPT-5.5 Instant as default ChatGPT model, promises more accurate responses

OpenAI replaced ChatGPT's default model with GPT-5.5 Instant, a lighter variant of its April flagship model designed for everyday tasks. The new model scores 81.6% on the CharXiv benchmark, up from 75.0% for its predecessor GPT-5.3 Instant, and introduces a "memory sources" feature showing users ...

Model Releases The New Stack

Benchmark: Claude 3.5 vs. GPT-4o for Cloud Cost Anomaly Detection in AWS and GCP

A benchmark of Claude 3.5 Sonnet and GPT-4o across 12,000 AWS and GCP billing logs found Claude scored higher precision (94.2% vs. 89.7% on GCP anomaly detection) and lower cost per detection ($0.87 vs. $1.12 per 1,000), while GPT-4o processed requests 18% faster at 12.7 RPS versus 10.7 RPS.

Model Releases Dev.to - Claude

OpenAI claims ChatGPT’s new default model hallucinates way less

OpenAI released GPT-5.5 Instant as ChatGPT's new default model, claiming it produces 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts in medicine, law, and finance, based on internal evaluations. The company also says it reduced inaccurate claims by 37.3% on conversatio...

Model Releases The Verge - AI

OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT

OpenAI released GPT-5.5 Instant as the new default model for ChatGPT, citing reduced hallucinations in law, medicine, and finance while maintaining low latency compared to its predecessor.

Model Releases TechCrunch - AI

“Real maturity problems”: Not every developer is thrilled with Bun after Anthropic acquisition

Anthropic acquired Bun, the JavaScript/TypeScript runtime and toolkit, in December 2025 to power Claude Code, which uses Bun as its executable. Some developers have raised concerns about Bun's production maturity, memory usage, and complexity compared to Node.js.

Industry & Funding The New Stack

datasette-llm 0.1a7

datasette-llm 0.1a7 adds a configuration mechanism for setting default options on specific LLM models, allowing users to define defaults such as model selection and temperature for enrichment operations within Datasette.

Open Source Tools Simon Willison

llm-echo 0.5a0

Simon Willison released llm-echo 0.5a0, a plugin for the LLM tool that provides a fake "echo" model for automated testing. The update adds a `-o thinking 1` option that simulates a reasoning block, compatible with LLM 0.32a0 and higher.

Open Source Tools Simon Willison

AI has a sprawling data problem. Airbyte has just launched a tool to fix it.

Airbyte launched Airbyte Agents on Tuesday, a service that precomputes and indexes business data from SaaS tools like Salesforce, Zendesk, Jira, and Slack into a single "Context Store," reducing typical AI agent API calls from five or six down to one or two.

MCP & Integrations The New Stack

My first article on DevTo - about an app I "built" with Claude. Should software engineers change their job title to "design consultant/product owner/implementation instructor"?

Developer Daniel Dao built a chess notation trainer app using Claude without writing any code, describing his role as directing the AI through design and implementation decisions rather than coding directly.

Opinion & Analysis Dev.to - Claude

ChatGPT vs Claude vs Gemini: Which AI Is Actually Worth Using in 2026?

A 2026 comparison of ChatGPT, Claude, and Gemini found ChatGPT favored for general writing and coding, Claude preferred for nuanced editorial content and code explanation, and Gemini rated most reliable for research due to its Google Search integration.

Model Releases Dev.to - Claude

Our AI started a cafe in Stockholm

Andon Labs deployed an AI system called Mona to manage a Stockholm cafe, following a prior experiment in San Francisco. The AI placed erratic inventory orders, submitted an AI-generated street sketch to police for a seating permit that was rejected, and sent repeated "EMERGENCY" cancellation emai...

Opinion & Analysis Simon Willison

What Reddit Is Actually Talking About When It Talks About AI Agents in May 2026

A May 2026 analysis of Reddit's AI agent discussions found community discourse has shifted away from hype toward skepticism, with top threads demanding ROI evidence and favoring simple, deployable agents over complex multi-agent systems.

Opinion & Analysis Dev.to - AI

AI agents need to spend money — Stripe and iWallet are building the rails

Stripe and Tempo jointly released the Machine Payments Protocol (MPP) for programmatic transactions by AI agents, while fintech startup iWallet proposed an Autonomous Settlement Protocol (ASP) for event-triggered multi-party settlements. Both protocols address the gap in existing payment infrastr...

Industry & Funding The New Stack

Memory as a Sixth Sense

A developer essay argues that AI memory should be understood as active perception rather than passive storage, contending that AI systems without persistent memory lack the ability to detect patterns across time and provide contextual continuity across conversations.

Opinion & Analysis Dev.to - AI

CopilotKit raises $27M to help devs deploy app-native AI agents

CopilotKit, a Seattle-based startup that provides tools for deploying app-native AI agents, raised $27 million in a Series A round led by Glilot Capital, NFX, and SignalFire.

Industry & Funding TechCrunch - AI

Welcome to Maintainer Month: Celebrating the people behind the code

GitHub launched its sixth annual Maintainer Month, announcing new tools including granular pull request limits for unknown contributors and pull request archiving to remove spam. The releases follow GitHub data showing merged pull requests have nearly doubled year over year, with AI-generated con...

Open Source Tools GitHub Blog

May 06, 2026

Tool Updates

Ecosystem

Adjacent editions