26 stories on AI dev tools, agents, and the coding stack — curated from the day's RSS haul by Agentic Dev's pipeline.
Top Signal · Agent Engineering
AWS developer advocate Morgan Willis demonstrated that redesigning agent tools from API-endpoint-mapped to intent-based reduced token usage from roughly 52,000 to 2,000 per query in AWS Strands Agents, a 96% reduction. Adding semantic search via AWS Agent Core Gateway to filter a 16-tool catalog ...
The New Stack
A benchmark test on a 10,000-line Go 1.24 codebase with 412 injected bugs found Claude Code 2.5 detected 89.3% of defects versus 76.1% for Codeium 1.8, though Codeium processed files 50% faster and costs $20 less per seat monthly.
CLI Agents
Dev.to - Claude
LLMs have fixed context windows that cause "lost in the middle" failures, where information buried in large prompts is ignored and hallucinations increase as memory fills. Retrieval-Augmented Generation (RAG) is proposed as a fix, decoupling the model from its data source rather than relying on l...
Workflows & Tips
Dev.to - AI
Anthropic launched Claude Design, a research preview tool that converts text prompts into interactive prototypes, landing pages, slide decks, and internal tool mockups. It is powered by Claude Opus 4.7, integrates with Claude Code for developer handoff, and is available on Pro, Max, Team, and Ent...
Workflows & Tips
Dev.to - Claude
A developer contrasted Claude Code's Telegram Plugin, which executes commands remotely on demand, with a separate autonomous agent fleet running on systemd timers that completed 47 tasks in 24 hours without human input, using local Ollama inference.
Agent Engineering
Dev.to - Claude
A developer reverse-engineered the cloud API of a 3i G10+ robot vacuum in one week, using mitmproxy, Frida hooks, and Dart AOT decompilation to gain full control. They then integrated Anthropic's Claude Haiku 4.5 vision model into the robot's drive loop at $0.003 per call, with peak daily AI cost...
Agent Engineering
Dev.to - Claude
A developer built a pull request risk evaluation engine for a SaaS product that runs a deterministic rules engine first, then applies an LLM advisory layer only for high-risk PRs, with the AI restricted to posting comments and never blocking merges. The system uses four rule match types: file pat...
Agent Engineering
Dev.to - Claude
The Pragmatic Engineer podcast featured Mario Zechner, creator of Pi — a minimalist, self-modifying AI coding agent — and Armin Ronacher, creator of Flask, discussing Pi's design, its use in building AI-powered tools, and the limits of agentic workflows in software development.
Agent Engineering
Pragmatic Engineer
Developers building their own AI agents for tasks like incident triage and deployment are bypassing platform engineering governance, creating what the industry calls "agent sprawl" — autonomous agents operating without audit trails, proper credentials, or PII controls.
Agent Engineering
The New Stack
A developer used AI assistance to port an Excel VBA script to Go, producing SentryScript, a Windows app that monitors YouTube channel subtitles for user-defined keywords using yt-dlp and local subtitle parsing across 11 languages.
Workflows & Tips
Dev.to - AI
A developer published a bash script that customizes the Claude Code status line to display token usage pace using tortoise and hare emojis, where the tortoise marks expected consumption at the current time in the 5-hour window and the hare marks actual usage.
Workflows & Tips
Dev.to - Claude
Anthropic launched Claude Managed Agents in public beta, a suite of APIs providing infrastructure for production AI agents including secure sandboxing, long-running sessions, and credential management, priced at $0.08 per session hour plus standard API token rates. The company added persistent me...
Industry & Funding
The New Stack
A 1,200-test benchmark translating TypeScript 5.6 to Rust 1.83 found GPT-5 achieved a 94.2% first-pass compilation rate, compared to 91.7% for Claude 4.0 and 82.3% for Llama 4 70B, though Claude 4.0 led on memory safety checks at 96.4%.
Model Releases
Dev.to - Claude
Simon Willison released version 0.32a0, an alpha build, of his open-source `llm` command-line tool for interacting with large language models.
Open Source Tools
Simon Willison
Warp, maker of a Rust-based agentic development environment, released its client as open source under the AGPL license, with OpenAI named as founding sponsor of the repository. The agent workflows powering the platform are built on GPT models, and the company cited faster community-driven develop...
Open Source Tools
The New Stack
A developer tutorial describes building an MCP server that charges USDC per tool call using x402, an HTTP payment protocol that uses the 402 status code to require crypto payment before serving requests. Payments settle on the Base blockchain via Coinbase's facilitator, using EIP-3009 signatures ...
MCP & Integrations
Dev.to - Claude
Simon Willison released llm 0.32a1, a bug fix for the prior 0.32a0 release, correcting an issue where tool-calling conversations were not correctly restored from SQLite storage.
Open Source Tools
Simon Willison
Simon Willison released LLM 0.32a0, an alpha version of his Python library and CLI tool for accessing LLMs. The update refactors the core abstraction so model inputs can be represented as a sequence of messages and responses can include multiple typed parts, replacing the previous single-prompt, ...
Open Source Tools
Simon Willison
A Dev.to article cites Stack Overflow 2025 survey data showing 84% of developers use AI coding tools, while trust in AI-generated code fell from 40% to 29% over the same period. The author argues senior engineers should focus on building verification and constraint systems around AI tools rather ...
Opinion & Analysis
Dev.to - Claude
OpenAI published a post-mortem on "goblin outputs," unusual personality-driven behaviors observed in GPT-5, detailing their origin, how they spread through the model, and the fixes applied to address them.
Model Releases
OpenAI Blog
AWS added OpenAI's GPT-5.4, GPT-5.5, and Codex to Amazon Bedrock in limited preview, announced by CEO Matt Garman in San Francisco. The deals carry infrastructure commitments: OpenAI agreed to approximately 2 gigawatts of Trainium3/4 capacity, while Anthropic separately committed over $100 billio...
Industry & Funding
The New Stack
A developer documented switching from a self-hosted Claude API proxy (claude-max-api-proxy) to a $29/month managed service after experiencing recurring outages from CLI token rotation, version mismatches, and maintenance overhead estimated at 2–4 hours monthly.
Opinion & Analysis
Dev.to - Claude
An informal evaluation of two AI agents, Openclaw and Hermes, both running on MiniMax 2.7, scored them 68 and 58 respectively out of 147 points across eight capability categories, with Claude Opus 4.7 scoring 82 as a reference. Hermes lost the most ground in browser/web control tasks, while Openc...
Opinion & Analysis
Dev.to - Claude
The Zig programming language project bans LLM-assisted contributions to issues, pull requests, and bug tracker comments, with the stated rationale that reviewing PRs serves to develop trusted contributors rather than just land code. Bun, a Zig-based JavaScript runtime acquired by Anthropic in Dec...
Opinion & Analysis
Simon Willison
Anaconda announced the acquisition of Outerbounds, the company behind the Metaflow ML orchestration framework originally developed at Netflix. Anaconda cited data indicating AI-generated code produces 1.7 times more defects than human-written code, and that 80% of dependencies recommended by AI c...
Industry & Funding
The New Stack
A developer-focused website published a ranked list of 12 AI coding tools for 2026, based on stated criteria of developer reviews and performance. No specific tools are named in the available excerpt.
Opinion & Analysis
Dev.to - AI