Anthropic ships managed agents, MiniMax goes open source, and Sam Altman faces security threats

Published on 13.04.2026

AI & AGENTS

Anthropic launches managed agents and CoreWeave powers Claude at scale

TLDR: Anthropic introduced managed agents in public beta, letting developers deploy production AI applications without building the infrastructure themselves. CoreWeave simultaneously secured a multibillion-dollar agreement to supply Nvidia GPU cloud infrastructure for running Claude at production scale.

The infrastructure layer of AI is maturing fast. Anthropic's managed agents abstract away the orchestration, scaling, and reliability concerns that have forced teams to build custom agent frameworks on top of Claude's API. You hand Anthropic the agent definition and they handle the execution pipeline. This matters because most companies do not want to run agent infrastructure. They want the outcomes, not the operational burden of maintaining state machines, retry logic, and context windows across multi-step agent workflows.

CoreWeave's deal underlines how GPU availability remains the bottleneck. Anthropic is not just building better models, they are locking down the compute to run them at scale. CoreWeave has positioned itself as the GPU cloud provider of choice for AI companies that need more than what the hyperscalers can reliably offer. The question I keep coming back to is whether managed agents will actually reduce the complexity of building reliable AI systems or just move it to a layer you cannot inspect. When your agent pipeline breaks in production, debugging someone else's managed service tends to be harder than debugging your own code, even if your own code is a mess.

What is missing from this announcement is any discussion of observability and debugging tooling. Agent systems are inherently nondeterministic. When a managed agent makes a wrong decision in production, you need traces, you need replayability, and you need to understand the decision chain. Anthropic has not addressed how developers will debug these systems when things go sideways, and that is the part that actually keeps engineers up at night.

Anthropic Managed Agents and CoreWeave GPU Deal

MiniMax releases self-improving open source model M2.7

TLDR: MiniMax open-sourced M2.7, a Mixture-of-Experts model that achieved 56.22 percent on SWE-Pro after autonomously completing 100 rounds of self-improvement. The model iteratively refines itself without human intervention.

Self-improving models are the kind of announcement that sounds impressive until you look at what self-improvement actually means. M2.7 runs through cycles of generating code, evaluating its own output, and adjusting its weights based on performance metrics. The 56.22 percent score on SWE-Pro is notable because software engineering benchmarks are harder to game than general language benchmarks. The model is genuinely getting better at solving real coding tasks through its own feedback loop.

The open source angle is what makes this interesting beyond the raw numbers. When self-improvement techniques are available to anyone with enough GPUs, the competitive advantage shifts from model architecture to compute resources and evaluation methodology. The community can now study how M2.7 improves itself, replicate the approach, and potentially find failure modes that the original authors did not anticipate.

I am skeptical about the claims though. Self-improvement loops have a tendency to overfit to their own evaluation criteria. A model that gets better at SWE-Pro might just be getting better at the specific patterns in that benchmark rather than developing general software engineering capability. The authors do not address whether the improvement transfers to tasks outside the benchmark distribution, and that gap between benchmark performance and real-world capability is where most self-improvement claims fall apart.

MiniMax M2.7 Open Source Release

Perplexity connects to 12,000 financial institutions through Plaid

TLDR: Perplexity's AI assistant now integrates with Plaid to access over 12,000 financial institutions, enabling users to track spending, manage loans, and monitor net worth through natural language queries.

Plugging an AI assistant into your financial data is the kind of feature that sounds convenient until you think about the trust model. Perplexity is asking users to grant an AI system read access to their banking information so it can answer questions about spending and loans. The convenience factor is real, being able to ask your AI about your financial situation in plain language is genuinely useful. But the security implications are substantial.

The integration itself is technically straightforward. Plaid already provides the API layer connecting thousands of institutions to fintech applications. Perplexity is adding an AI interface on top of existing financial data infrastructure. The interesting question is not whether they can do it, but whether users will trust an AI company with banking access in a way they already trust dedicated banking apps.

What the announcement glosses over is the data retention and training question. Does Perplexity use your financial queries to improve its models? How long does it retain your transaction data? Can you ask it financial questions without that information becoming part of its training corpus? These are not peripheral concerns. They are the central question for any AI product handling sensitive financial data.

Perplexity Plaid Integration

Google merges Notebooks with Gemini and NotebookLM

TLDR: Google launched Notebooks inside the Gemini app with bidirectional sync to NotebookLM, letting users organize research, conversations, and files across both products in a unified workspace.

Google keeps rearranging its AI productivity tools, and this latest move consolidates note-taking and research organization under the Gemini umbrella. The bidirectional sync between Notebooks and NotebookLM means changes in one propagate to the other, which sounds simple but is actually a nontrivial synchronization problem when you consider that NotebookLM generates its own AI-augmented content from your notes.

The bigger picture here is Google's strategy of embedding AI research tools directly into its conversational AI interface. Instead of separate products for chatting and researching, users get a single workspace where conversations, notes, and files interconnect. This is the direction all AI assistants are heading. The ones that win will be the ones that feel like a natural extension of how you already think and work, not a separate tool you have to switch context to use.

What concerns me is the lock-in. When your research, notes, and AI conversations all live in Google's ecosystem with bidirectional sync, leaving becomes genuinely difficult. Your AI-augmented research artifacts are not portable. The value compounds within the platform, which is great for Google and increasingly complicated for users who want to maintain data sovereignty.

Google Notebooks and NotebookLM Integration

Sam Altman faces second home attack in 48 hours

TLDR: OpenAI CEO Sam Altman's San Francisco home was targeted for a second time in two days, with a passenger in a passing vehicle appearing to fire at the property.

This is a security story, not a technology one, but it reflects something the AI industry does not talk about enough. The leaders of major AI companies have become high-profile targets, and the security implications extend beyond personal safety. When AI company executives face physical threats, it affects how those companies operate, where they locate, and who feels comfortable working there.

The concentration of AI companies in specific geographic areas creates single points of failure that go beyond technical infrastructure. Talent, leadership, and operations are all clustered in places where physical security is becoming a real concern. This has downstream effects on hiring, office location decisions, and the overall resilience of AI companies as organizations.

Sam Altman Home Attack