The Bill for Plumbing â€” Mosaic Theory Blog

← All posts

Last week we said the LiteLLM compromise was not a theoretical concern. This week it stopped being theoretical for Mercor. Mercor confirmed a cyberattack tied to the LiteLLM compromise, and on the same day attackers began exploiting a fresh pre-auth SQLi in LiteLLM. The agent-infrastructure attack surface is no longer a slide in a threat-model deck. It is a billing event.

Credential proxies for agents have crystallised as a category in about three weeks. Brex open-sourced CrabTrap, an LLM-as-judge HTTP proxy that mediates agent calls in production. Infisical shipped Agent Vault, a credential broker that sits between an agent and the upstream API so the agent never holds the secret. Simon Willison published llm-openai-via-codex, which intercepts and brokers credentials for the Codex backdoor API. Three independent groups, three different threat models, and the same architectural answer. Agents should not be trusted with raw credentials, and a middleware layer should enforce that. Separately, Cisco is reportedly acquiring Astrix Security for $250M+, and Commvault rolled out agentic-workflow data controls. If you are building autonomous agents and your auth model is "give the LLM the API key in an env var", the market is pricing that risk into acquisitions.

DeepSeek V4 and the open-weights reset

DeepSeek V4 dropped on HuggingFace and immediately moved the open-weights frontier. DeepSeek-V4-Pro-Base went up alongside V4-Flash-Base, and MIT Tech Review's writeup focuses on the new attention design enabling substantially longer prompts. SGLang published a Day-0 inference-and-verified-RL stack for V4 on the day of release, which is the more telling fact. The serving infrastructure was ready before the weights cooled. For anyone whose enterprise pitch depends on "frontier reasoning behind a closed API," V4 is the second annual reminder that the gap is not stable.

The coding-assistant economy is repricing itself in real time. GitHub restructured Copilot Individual plans. Anthropic's Claude Code pricing flirted with a $100/month tier, then walked it back amid confusion about what users were actually paying for. AWS brought Claude Cowork into Bedrock for enterprise gateways. The signal underneath the pricing churn is that vendors are trying to find the price at which a coding agent looks like a workstation and not a SaaS seat. The fact that Anthropic is reaching for triple digits per developer per month tells you how much compute these agents are actually burning. It also tells you the seat-license model is in trouble.

The verification layer is where the smart money is going. Qodo raised $70M for code verification. OpenAI's Agents SDK gained native sandbox execution. Anthropic launched Claude Managed Agents. Charlie Labs pivoted from building autonomous coding agents to building cleanup tools for the technical debt those agents generate. Every link in that chain points to the same conclusion. Generating the code is the easy part. Verifying it, sandboxing it, and cleaning it up afterward is where the unit economics live. If you are evaluating coding-agent startups, ask where the failure modes go and who pays to fix them.

Microsoft's TRELLIS.2 collapsed the image-to-3D pipeline onto consumer hardware in a single release. The model is 4B parameters, generates 1536-cubed PBR outputs with 16x spatial compression, and has an Apple Silicon port that runs without CUDA or flash-attn. A ComfyUI workflow appeared within days. Zero-to-CAD followed in the same week, agentically synthesising interpretable CAD programs at million-scale without real training data. The combination matters because the two main bottlenecks in 3D-from-image generation, namely the hardware tax and the data-licensing problem, both moved in the same week. Game-asset pipelines, 3D-printing services, and CAD vendors who built their moats on the assumption that high-quality 3D-from-image generation needs an H100 cluster should re-check that assumption.

LLM-generated CVE reports are now actively degrading Linux kernel review. LWN documented kernel maintainers excising legitimate code to triage a flood of model-generated security reports. The triage burden has crossed the threshold where it is cheaper to remove code than to argue with a stochastic process. This is a different kind of supply-chain attack. Nobody is exfiltrating credentials. The damage is to the human attention budget of the people who maintain the most critical software in the world. Mitigations remain unclear, and the incentive gradient runs the wrong way. Filing more reports is cheap, reviewing them is expensive, and the production function for "plausible but wrong" CVE reports just got better. If you depend on Linux kernel velocity, this is now a structural risk.

On our radar

LeRobot RCE with no patch. HuggingFace's LeRobot has a critical unauthenticated RCE sitting unpatched as of week-end. LeRobot is a popular reference framework for robot policy work, and it is now sitting in research labs and small robotics startups with a live exploit path. If you have a LeRobot install reachable from anywhere except localhost, that needs attention this week.
AI-assisted phishing kits are productising. Bluekit, a new phishing-as-a-service offering, ships with an AI assistant and 40 templates out of the box. The pattern of LLMs as a feature inside criminal tooling is no longer notable on its own. What is notable is the productisation curve, with onboarding flows and template libraries pitched the same way legitimate SaaS does. Defender automation needs to assume this is the baseline now.
OCR and document-AI consolidation continues. Last month we flagged the shift from traditional OCR pipelines to vision-language models. This week's data shows continued momentum on the agentic-document side, with new architectures for optical context retrieval and native file formats designed for agent traversal rather than human reading. The transition from "text extraction" to "document-as-agent-input" is the medium-term direction.

Signal data for this briefing is provided by HiddenState, Mosaic Theory's signal intelligence platform.

â€” Cosmo