32 stories on AI dev tools, agents, and the coding stack — curated from the day's RSS haul by Agentic Dev's pipeline.
Top Signal · CLI Agents
A Chroma 2025 study of 18 frontier AI models, including Claude 4, GPT-4.1, and Gemini 2.5, found all performed worse as input length increased, with some dropping from 95% to 60% accuracy past a context saturation threshold. The effect, called "context rot," is more pronounced in coding agents be...
Dev.to - Claude
Claude Code supports four hook handler types — command, prompt, agent, and http — across 21 lifecycle events. Command hooks run in under 5ms and produce deterministic results, while prompt hooks invoke an LLM and take 300–2000ms, and agent hooks spawn full Claude Code sessions with file and tool ...
CLI Agents
Dev.to - Claude
Anthropic's Claude Code uses prefix caching that can reduce token costs by up to 10x, but actions like switching models mid-session, modifying tool configurations, or opening new sessions invalidate the cache and trigger full-price recalculation. Keeping sessions long and tool definitions stable ...
Workflows & Tips
Dev.to - AI
A developer guide argues that OpenAI's Codex, an autonomous coding agent that reads repos and runs commands, performs better when given bounded "atomic" tasks with defined outcomes and verification steps rather than the open-ended conversational prompts suited to ChatGPT.
CLI Agents
Dev.to - AI
A benchmark across 12 production monorepos (4.2M lines of code) found Claude Code 2026 reviewed TypeScript PRs 45% faster than Codeium 2.0 (12.4s vs 22.6s), while Codeium 2.0 was 22% faster for Java/Kotlin repos; Claude Code 2026 costs $149/seat vs $109 for Codeium 2.0.
CLI Agents
Dev.to - Claude
A developer published a guide describing how to build a cost tracking system for Anthropic's Claude API, using a three-layer approach covering pre-request token estimation, cost calculation, and threshold-based alerts. The guide includes Python code targeting Claude 3.5 Sonnet, Opus, and Haiku mo...
Workflows & Tips
Dev.to - Claude
Version 2.1.129 of the Claude Code VS Code extension contains a bug that produces a "command 'claude-vscode.editor.openLast' not found" error, preventing the extension from opening. The workaround is to downgrade to version 2.1.128 via the extension's "Install Another Version" option.
Workflows & Tips
Dev.to - Claude
AI coding agents that support the Agent Skills standard, including Claude Code, do not automatically read installed SKILL.md files when performing tasks, causing them to hallucinate commands or fail rather than use available documentation. A developer observed this behavior when Claude Code ignor...
Agent Engineering
Dev.to - Claude
A developer released pixel-llm, a 2.9-million-parameter autoregressive transformer that generates 32x32 pixel art sprites of reef sea creatures using a 64-color palette. Built using AI agent sessions, the model trained across four dataset iterations but failed to converge on two of six sprite cat...
Agent Engineering
Dev.to - AI
An MCP gateway consolidates multiple MCP server connections into a single endpoint for Claude Code, reducing configuration overhead and token usage. Anthropic reported that connecting multiple MCP servers can inject up to 150,000 tokens per agent interaction; Bifrost, an open-source gateway by Ma...
MCP & Integrations
Dev.to - Claude
OpenAI released GPT-5.5 Instant as an updated default model for ChatGPT, citing improvements in answer accuracy, reduced hallucinations, and expanded personalization controls.
Model Releases
OpenAI Blog
Subquadratic, a Miami startup, launched a model with a 12-million-token context window using an architecture called Subquadratic Selective Attention, which the company says scales linearly in compute and memory. The model scores 83 on MRCR v2 and 92.1% on needle-in-a-haystack retrieval at 12 mill...
Model Releases
The New Stack
BuyWhere launched an open-source MCP server that gives AI agents access to over 50 million products across six markets — Singapore, the US, Japan, Korea, China, and Australia — via structured, merchant-direct data. The MIT-licensed server is available via npm and supports Claude, Cursor, and othe...
MCP & Integrations
Dev.to - AI
Four AI coding tools occupy distinct roles: Devin handles async ticket delegation, Cursor Composer assists developers inside the IDE, Sweep converts GitHub issues to PRs, and Codens routes Notion tickets through multiple specialized agents covering the full software development lifecycle.
Opinion & Analysis
Dev.to - Claude
OpenAI published a system card for GPT-5.5 Instant, a model in its GPT-5.5 lineup, documenting the model's safety evaluations and deployment considerations.
Model Releases
OpenAI Blog
Amazon granted its tens of thousands of developers access to Anthropic's Claude Code and OpenAI's Codex on May 12, running via AWS and Amazon Bedrock, after roughly 1,500 employees pushed back against a policy restricting use of third-party tools in favor of Amazon's own Kiro coding assistant.
Industry & Funding
The New Stack
OpenAI replaced ChatGPT's default model with GPT-5.5 Instant, a lighter variant of its April flagship model designed for everyday tasks. The new model scores 81.6% on the CharXiv benchmark, up from 75.0% for its predecessor GPT-5.3 Instant, and introduces a "memory sources" feature showing users ...
Model Releases
The New Stack
A benchmark of Claude 3.5 Sonnet and GPT-4o across 12,000 AWS and GCP billing logs found Claude scored higher precision (94.2% vs. 89.7% on GCP anomaly detection) and lower cost per detection ($0.87 vs. $1.12 per 1,000), while GPT-4o processed requests 18% faster at 12.7 RPS versus 10.7 RPS.
Model Releases
Dev.to - Claude
OpenAI released GPT-5.5 Instant as ChatGPT's new default model, claiming it produces 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts in medicine, law, and finance, based on internal evaluations. The company also says it reduced inaccurate claims by 37.3% on conversatio...
Model Releases
The Verge - AI
OpenAI released GPT-5.5 Instant as the new default model for ChatGPT, citing reduced hallucinations in law, medicine, and finance while maintaining low latency compared to its predecessor.
Model Releases
TechCrunch - AI
Anthropic acquired Bun, the JavaScript/TypeScript runtime and toolkit, in December 2025 to power Claude Code, which uses Bun as its executable. Some developers have raised concerns about Bun's production maturity, memory usage, and complexity compared to Node.js.
Industry & Funding
The New Stack
datasette-llm 0.1a7 adds a configuration mechanism for setting default options on specific LLM models, allowing users to define defaults such as model selection and temperature for enrichment operations within Datasette.
Open Source Tools
Simon Willison
Simon Willison released llm-echo 0.5a0, a plugin for the LLM tool that provides a fake "echo" model for automated testing. The update adds a `-o thinking 1` option that simulates a reasoning block, compatible with LLM 0.32a0 and higher.
Open Source Tools
Simon Willison
Airbyte launched Airbyte Agents on Tuesday, a service that precomputes and indexes business data from SaaS tools like Salesforce, Zendesk, Jira, and Slack into a single "Context Store," reducing typical AI agent API calls from five or six down to one or two.
MCP & Integrations
The New Stack
Developer Daniel Dao built a chess notation trainer app using Claude without writing any code, describing his role as directing the AI through design and implementation decisions rather than coding directly.
Opinion & Analysis
Dev.to - Claude
A 2026 comparison of ChatGPT, Claude, and Gemini found ChatGPT favored for general writing and coding, Claude preferred for nuanced editorial content and code explanation, and Gemini rated most reliable for research due to its Google Search integration.
Model Releases
Dev.to - Claude
Andon Labs deployed an AI system called Mona to manage a Stockholm cafe, following a prior experiment in San Francisco. The AI placed erratic inventory orders, submitted an AI-generated street sketch to police for a seating permit that was rejected, and sent repeated "EMERGENCY" cancellation emai...
Opinion & Analysis
Simon Willison
A May 2026 analysis of Reddit's AI agent discussions found community discourse has shifted away from hype toward skepticism, with top threads demanding ROI evidence and favoring simple, deployable agents over complex multi-agent systems.
Opinion & Analysis
Dev.to - AI
Stripe and Tempo jointly released the Machine Payments Protocol (MPP) for programmatic transactions by AI agents, while fintech startup iWallet proposed an Autonomous Settlement Protocol (ASP) for event-triggered multi-party settlements. Both protocols address the gap in existing payment infrastr...
Industry & Funding
The New Stack
A developer essay argues that AI memory should be understood as active perception rather than passive storage, contending that AI systems without persistent memory lack the ability to detect patterns across time and provide contextual continuity across conversations.
Opinion & Analysis
Dev.to - AI
CopilotKit, a Seattle-based startup that provides tools for deploying app-native AI agents, raised $27 million in a Series A round led by Glilot Capital, NFX, and SignalFire.
Industry & Funding
TechCrunch - AI
GitHub launched its sixth annual Maintainer Month, announcing new tools including granular pull request limits for unknown contributors and pull request archiving to remove spam. The releases follow GitHub data showing merged pull requests have nearly doubled year over year, with AI-generated con...
Open Source Tools
GitHub Blog