How 10 Tech Companies Choose and Measure AI Developer Tools

TLDR: Small teams adopt AI dev tools based on how developers feel about them over 2-week trial periods, mid-to-large companies struggle with security reviews and vendor lock-in concerns, and universally, measuring actual productivity improvement remains unsolved despite growing AI tool adoption.

The Shift in Developer Tooling Landscape

The developer tooling ecosystem has undergone a seismic shift in just eighteen months. What was once a straightforward decision—buy GitHub Copilot and use ChatGPT—has exploded into a complex landscape with tools like Cursor, Claude Code, Codex, and Gemini CLI competing for attention. Add to that the emergence of specialized AI code review tools like CodeRabbit, Graphite, and Greptile, not to mention the growing ecosystem of MCP integrations powering agentic tools, and you're looking at a decision matrix that makes enterprise procurement committees sweat.

This explosion of choice, while offering developers genuine improvements in capability and velocity, has created a new problem: how do you actually choose the right tools for your specific context? A 2024 AI tooling survey showed Copilot and ChatGPT dominating mentions, but that dominance has fractured completely. Today's engineering leaders face an overwhelming array of options, each with different strengths, different pricing models, and different integration capabilities.

The real insight from surveying ten tech companies—ranging from a five-person seed-stage startup to a publicly listed 1,500-person fintech—is that there is no universal answer. What works brilliantly for one organization is often loathed in another. More importantly, the selection criteria, decision velocity, and measurement approaches vary dramatically based on company size and maturity.

Small Teams: Trust-Driven, Rapid Decisions

For small teams with fewer than sixty engineers, AI tool adoption is remarkably straightforward. Decision-making is fast, informal, and heavily influenced by how individual developers feel about the tools. Trial periods are typically just two weeks, and individual developers have outsized influence on whether something "sticks" or gets replaced.

At a seed-stage logistics startup with five engineers, the head of engineering describes their approach as refreshingly high-trust: "We agreed to try new tools for 2 weeks and see how everyone felt. We didn't use any hard-and-fast measurement. TLDR: I trust our devs and their opinion is a big part of this." This isn't recklessness; it's pragmatism. When a team is small enough, developers can quickly assess whether a tool genuinely improves their workflow or just creates friction.

For code review, this startup tried Korbit first, but it "felt off" to the developers. Within a few days of testing CodeRabbit, they noticed developers were actually embracing its suggestions, unlike with Korbit which was being ignored. That signal—developer adoption without mandate—was sufficient. The tool stuck.

The broader tooling evolution at this startup illustrates how interconnected modern developer workflows have become. They use Figma for design, Linear for ticketing with cross-functional UX collaboration, Claude Code and Cursor for coding (connected to Linear via MCP), and CodeRabbit for code review. Claude Code even writes tickets, creating a continuous context flow downstream to the code review tool. They've explicitly separated company-wide tools (everyone uses Claude and CodeRabbit) from personal environments (IDE choice, terminal setup remain individual choices).

Interestingly, when they had a developer who refused to use either Cursor or Claude Code, they didn't force adoption. But the signal was unmistakable: everyone else seemed to ship more code with better quality. That organic peer pressure, based on visible results rather than edict, eventually drove adoption.

A Series A startup with fifteen engineers shows similar patterns but with a twist. They discovered that increased code output—enabled by Claude Code and Cursor—was creating code review bottlenecks. Pre-Opus 4.5, quality was suffering. Their evaluation of code review tools (Cursor's Bugbot, Graphite, and Greptile) led them to Greptile, which offers confidence scoring on reviews. Notably, they maintain extensive Agents.md and Claude.md documentation files that serve as a single source of truth, usable by their AI coding tools, code review tools, and developers alike. This organizational clarity makes tool integration and adoption more effective.

At the larger end of the small-company spectrum, a Series D observability company with sixty engineers found that Claude Code became "the most definite value-add" among tools they tried. Tellingly, non-engineers—product managers, solutions engineers, technical account managers—adopted Claude Code even more enthusiastically than median engineers, using it to directly open PRs for customer bug fixes. This signals that the tool's utility extends beyond traditional coding tasks into knowledge work automation more broadly.

Mid-to-Large Companies: Bureaucracy, Security, and Vendor Relationships

Everything changes at the 150-plus engineer scale. Tool decisions stop being about gut feel and start becoming entangled with security reviews, compliance requirements, executive budgeting decisions, and sometimes existing vendor relationships that lock you in.

An EU-based software company with 500 people illustrates the perils of leadership declaring AI-first strategy without implementation planning. In summer 2025, leadership announced an AI-first pivot and issued Copilot Business subscriptions to anyone who requested them. Problem solved, right? Absolutely not. Developers immediately asked about alternatives. Cursor had supporters. Claude Code had early adopters. Gemini CLI had interested parties. But the organization became gridlocked. Legal and IT couldn't align on governance. The EU AI Act created regulatory uncertainty. Six months later, they were stuck unable to approve new tools, while developers were quietly paying for preferred tools out of pocket. Meanwhile, their default Copilot configuration was still running GPT-4.1 from ten months prior. Nobody knew if they could change models or enable coding agents. This created a vicious cycle where the approved tool felt underwhelming, suppressing adoption and making it harder to justify further investment.

A cloud infrastructure company with 900 people experienced similar tensions. They started with Copilot (easy to procure because they were already an M365 customer), but switching to Cursor took forever. Pricing creates constant friction. Their executives repeatedly ask "why aren't we on Claude Code?" but when presented with Claude's team plan at roughly $150 per month versus Cursor at $65 versus Copilot at $40, the C-level hesitates. The principal engineer responsible for AI tooling worries about cost escalation: "Claude Code and Codex are definitely eating the costs right now… we all know that won't last. If my execs push me on this, I will need to say — okay, our developers got much slower in 6 months, but now we need to pay $250 per month, per developer, to get higher limits."

Vendor lock-in concerns also weigh heavily. A public travel company with 1,500 people and 800 engineers explicitly stated they want to avoid being locked into a single solution, so they're cautiously evaluating alternatives to Copilot while remaining skeptical about the per-engineer costs of tools like Claude Code.

Security requirements add another layer. A 2,000-person tech company focused on productivity identified security and compliance as the biggest hurdle. Dev tool startups frequently don't prioritize security and compliance until they're late Series A or B. This company evaluates tools through a structured process: word-of-mouth from peers at other companies, community chatter on social platforms, ability to cut through hype, and crucially, two-week beta trials before making expansion-or-termination decisions. Every tool must move a measurable metric; those directly impacting existing metrics get faster approval paths.

The Universal Measurement Crisis

Here's what unites every single company in this research regardless of size: nobody has solved the problem of measuring whether AI tools actually work.

Executives want data. Engineers distrust most available metrics. Vendors' own measurement approaches are mostly useless.

The EU-based software company debated what to measure and found every option was bad or worse. "Lines of code written by AI" creates perverse incentives and is doubly problematic because it doesn't capture lines that create actual business value. Some of the most valuable AI uses—research, ideation, debugging, learning—don't show up as generated code. Yet this company eventually defaulted to measuring lines of code generated by Copilot, an "official" metric that developers despised. They hated it because it's based on telemetry from specific IDEs, so using Copilot CLI or maxing out premium requests doesn't even register in their measurement system.

The principal engineer at the cloud infrastructure company was blunt: "My engineering org is getting hooked on AI, but execs want metrics on value-add. I don't want to push vanity usage metrics just to justify spend, but outside of vanity metrics, I have nothing of value to show!" He dismissed commercial developer productivity vendors' approaches as circular logic: "They are just DORA plus velocity metrics combined with whatever they can extract from APIs. Sure, it looks good on paper: Team A is faster and they use AI. But is AI usage and speed actually correlated?" The fundamental question remains unanswered: "How can we make effective use of our AI agent subscriptions?"

The Wealthsimple Case Study: Structured Measurement

Wealthsimple, a Canadian fintech with 600 engineers, offers a counter-narrative. Their CTO, Diederik van Liere, led a 2-month structured evaluation for code review tools, running a "shootout" process that resulted in selecting Graphite. For AI coding tools, the decision was more conviction-driven, but backed by usage data from Jellyfish. They rolled out Claude Code across their engineering organization.

The specificity of their evaluation process—running actual comparative measurements across multiple tools—stands in sharp contrast to other organizations' struggles. This structured approach, while time-consuming, provides data-driven confidence in tool selection rather than hoping things will work out.

Patterns That Emerge

Several consistent patterns emerge across all these organizations:

Developer trust drives adoption more than mandates. You can't force teams to use tools they don't believe in. The developers who visibly ship more code and maintain better quality with a particular tool become organic advocates.

There's a well-trodden migration path: Copilot → Cursor → Claude Code. This reflects the evolution of tool capability and the pain points developers experience with each generation.

Nobody has cracked productivity measurement. This is the elephant in the room. Every organization is flying somewhat blind on whether they're getting actual return on investment in AI tools, despite widespread adoption.

Vendor relationships and existing technology stacks matter more than you'd expect. Being already locked into Microsoft (M365) made Copilot a natural default. But this creates path dependency that slows adoption of potentially better tools.

Different company sizes require different decision-making approaches. Small teams can move fast based on developer preference. Large organizations need structured evaluation processes, security reviews, and formal approval mechanisms. The middle is messy—you have enough structure to slow decisions but not enough discipline to make them efficiently.

Key Takeaways

Small team tool adoption is driven by developer sentiment over 2-week trial periods, with individual developers having significant influence
Medium and large organizations face significant friction from security reviews, compliance requirements, and executive budget scrutiny
Pricing is a persistent headache, with cost escalation concerns preventing teams from adopting tools they believe in
No organization has reliably solved the problem of measuring actual productivity improvements from AI tooling investments
Vendor lock-in and existing technology relationships often matter more than intrinsic tool quality
The Copilot → Cursor → Claude Code migration path is well established, suggesting market consensus on tool capability progression
Non-engineers often adopt AI coding tools more enthusiastically than engineers, suggesting broad applicability beyond traditional development

Tradeoffs

Choosing tools based on developer preference gains rapid adoption and enthusiasm but sacrifices structured evaluation and measurable ROI
Implementing security and compliance processes protects organizational risk but significantly slows tool adoption velocity
Investing in higher-capability tools like Claude Code improves developer velocity and satisfaction but increases per-engineer costs substantially
Measuring lines of code generated provides simple executive visibility but creates perverse incentives and misses the most valuable AI use cases

Link

Deepdive: How 10 Tech Companies Choose the Next Generation of Dev Tools