Enterprise AI Costs, Job Market Realities, and the Node.js Streams Trap
Published on 26.05.2026
Microsoft's Quiet Claude Code Retreat and the Real Cost of Enterprise AI
TLDR: Microsoft is killing most Claude Code licenses in its Experiences and Devices division, redirecting engineers to GitHub Copilot CLI by June 30. The driver is cost, not philosophy. Token-based pricing for agentic AI tools is breaking enterprise budget models in ways that per-seat licensing never did.
This one caught my attention because it puts a number on something we've all been hand-waving about. Uber's CTO reportedly said his company burned through its entire 2026 AI coding budget in four months. Individual engineers were spending between $500 and $2,000 per month on tokens alone. That's not a rounding error in a software budget; that's a line item that triggers a conversation with finance.
The structural problem here is that token-based pricing behaves like a metered utility. Think of it like your water bill versus your Netflix subscription. Per-seat SaaS is Netflix: predictable, budgetable, easy to forecast. Tokens are water: usage spikes without warning, especially when you give agentic systems more autonomy. And the uncomfortable truth is that agentic systems don't consume fewer tokens as they get better; they consume more, because they do more per task.
Gartner placing generative AI in the trough of disillusionment tracks with what I'm hearing from teams everywhere. The experimental "absorb costs for learning" phase that kicked off in 2023 is over. Enterprises are moving toward capped budgets, tiered access, and AWS-style billing models. If you're building an AI-assisted workflow for your team right now, you need to model token consumption as seriously as you model database query costs.
The Microsoft move is less about Claude being bad and more about consolidation under a single toolchain they already own. GitHub Copilot is a known quantity with predictable pricing. That's a CFO argument, not an engineering one, and CFO arguments tend to win.
Key takeaways:
- Agentic AI tools scale token usage with task complexity, making costs unpredictable and often shocking at enterprise scale
- Token-based billing is fundamentally different from per-seat licensing and requires different budget planning
- Enterprise AI toolchain consolidation is accelerating, with cost predictability winning over capability breadth
Why do I care: This is the moment where the "AI will just be free eventually" optimism collides with actual financial reality. As architects and senior engineers, we need to start treating AI tool costs as infrastructure costs, with the same scrutiny we give to cloud compute. If you're not tracking token burn per team or per workflow, you're flying blind, and at Uber-scale that blindness costs millions.
Microsoft's quiet Claude Code retreat and the real cost of enterprise AI
Retrospective on My Job Search
TLDR: A senior frontend engineer and former VP of Engineering spent five months applying to 60 jobs, got rejected without an interview 78% of the time, and was ghosted by 23% of companies. The search eventually ended at a company called Duna, praised specifically for running a humane hiring process.
Read this one slowly if you're hiring, because the numbers are genuinely grim. Of 60 applications, only 13 led to any conversation at all. That's a 22% response rate in a field where, not five years ago, a senior engineer with executive leadership experience would have had recruiters in their inbox constantly. Something has changed structurally in how companies filter candidates, and the culprit the author points to is AI-automated screening.
When your resume is being evaluated by a system optimized to find keyword matches and eliminate outliers, someone who has done both IC frontend work and VP-level leadership looks confusing rather than impressive. The system doesn't know what to do with you, so it rejects you. This is a real problem and it's getting worse as more companies bolt AI onto their ATS pipelines without thinking carefully about what signals those systems actually capture.
The over- and under-qualified paradox the author describes is one I've heard from many people making IC comebacks after leadership stints. You're too expensive for the junior teams and too rusty (in the system's view) for the senior ones, even if the reality is that a year as VP of Engineering teaches you more about what makes code maintainable than most senior engineers learn in five years of pure IC work.
What gives me some hope here is the ending: the company that actually hired this person ran a hiring process described as humane. That's not a common word choice and it says something important. Good hiring processes still exist, they're just rarer than they should be. If you're building a hiring process right now, the bar for "humane" is apparently low enough that meeting it makes you stand out.
Key takeaways:
- AI-automated screening is narrowing the funnel in ways that actively disadvantage experienced candidates with non-linear careers
- The 78% no-interview rejection rate suggests most applications never reach a human reviewer
- Companies with thoughtful, respectful hiring processes are a competitive advantage in attracting experienced talent
Why do I care: The frontend job market in 2026 is harder than it was in 2022 and the tooling companies are selling to recruiters is making it harder in specific, fixable ways. If your team is using AI screening, I'd strongly recommend auditing what it's actually filtering out. The candidates it rejects most aggressively are often the most interesting ones.
Retrospective on My Job Search
How soon is now in PostgreSQL?
TLDR: PostgreSQL's now() returns the transaction start time, not the wall clock, and it stays frozen for the entire transaction duration. This caused a subtle distributed locking bug in the Emmett framework where retry logic became useless because the timeout predicate always evaluated the same stale timestamp.
This is one of those bugs that I love reading about precisely because it's the kind of thing that passes every reasonable code review and only surfaces in production under a specific combination of conditions. The bug: a stored procedure was being called in a retry loop, all inside a single transaction. The procedure checked now() to see if a previous processor owner had gone stale. Because now() is frozen at transaction start, no matter how many retries you ran, the staleness check always saw the same timestamp. Retries were effectively a no-op.
The fix is clock_timestamp(), which reads the actual wall clock on every invocation regardless of transaction boundaries. One word change, completely different semantics. But knowing which word to reach for requires understanding a distinction that PostgreSQL doesn't exactly advertise loudly in beginner documentation.
What I find more interesting than the fix is the post-mortem on why the tests missed it. Unit tests for the stored procedure worked fine because they only tested the procedure in isolation, not inside a retry policy wrapping a stale row scenario. End-to-end tests didn't hit the specific combination of crash, new instance ID, and retry timeout. The seam between those two test levels is where the bug lived, invisible to both.
This is a broader lesson about distributed systems testing. The interactions between your retry policy, your timeout logic, and your time source are exactly the kind of thing that integration tests at a higher level need to cover explicitly. Writing a test that says "what happens when a node crashes, a new one comes up, and the retry fires within the stale window" isn't glamorous work, but it's the work that prevents this class of bug.
Key takeaways:
now()in PostgreSQL is transaction-scoped and frozen; useclock_timestamp()when you need actual wall clock time inside a long-running transaction- Test seams matter: bugs often live in the interaction between components, not within any single component tested in isolation
- Retry logic with time-based predicates needs explicit testing at the retry-plus-stale-state boundary, not just at the stored procedure level
Why do I care: I've seen variants of this PostgreSQL time trap bite teams more than once. The now() vs clock_timestamp() distinction is genuinely non-obvious and the consequences in distributed locking scenarios can be severe: you think you have a working failover mechanism and you actually have a system that silently refuses to fail over. Worth adding to your PostgreSQL onboarding checklist.
How soon is now in PostgreSQL?
Your Node.js Streams Aren't Backpressuring. They're Silently Eating Your Memory.
TLDR: Node.js streams don't automatically enforce backpressure, and most code ignores the cooperative protocol that makes them memory-safe. The result is unbounded heap growth that only shows up at scale, usually in production, usually at the worst possible time.
The core issue is one of those things that's technically documented but rarely internalized: .write() returns false when the internal buffer is full, but nothing in the runtime stops you from calling it again immediately. You have to check the return value yourself and wait for the drain event before writing more. If you don't, the buffer grows until your process runs out of memory. Node.js 22 bumped the default highWaterMark by 4x, which made this problem quieter (your buffer can absorb more before signaling) without making it go away.
The objectMode detail is worth flagging specifically: when streams operate on objects rather than bytes, the highWaterMark counts objects, not bytes. A stream of large objects can silently consume enormous amounts of memory while keeping a perfectly innocent-looking buffer count. This is the kind of thing that makes production memory debugging feel like a mystery novel where the clues are all in the appendix.
The .pipe() section should permanently change how you write stream code. .pipe() swallows errors and leaks file descriptors. It's been the default example in Node.js documentation for years and it is not safe for production use in any code path that handles large data or error conditions. pipeline() from node:stream/promises does the same job with proper error propagation and cleanup. If you have .pipe() anywhere in code that handles file uploads, large exports, or streaming responses, that's a refactor worth doing.
The final section on connection pool starvation is something I hadn't thought through explicitly before. Once you correctly implement backpressure, you expose a secondary problem: slow clients hold database cursors open for longer, which can exhaust your connection pool. The solution space includes query timeouts, dedicated worker pools, and queue-based offloading for large exports. Backpressure doesn't eliminate resource contention; it moves it downstream.
Key takeaways:
- Backpressure in Node.js streams is opt-in: you must check
.write()return values and awaitdrain, or your memory usage is unbounded - Replace
.pipe()withpipeline()fromnode:stream/promisesin any production code that needs reliable error handling and cleanup - Correct backpressure implementation can expose database connection pool starvation; plan for that as a secondary effect
Why do I care: Large file exports and streaming APIs are standard features in most production systems and the Node.js stream API has been quietly dangerous for years because the safe path requires understanding implementation details most developers never encounter in tutorials. This post is the kind of documentation that should be required reading before anyone ships a streaming endpoint to production. Bookmark it and share it with your team.
Your Node.js Streams Aren't Backpressuring. They're Silently Eating Your Memory.