AI as Amplifier and the Will Smith Spaghetti Benchmark

You Have to Care - Because the Amplifier Won't

TLDR: AI is an amplifier that makes good work and bad work louder in equal measure. The mechanism has no opinion about where you point it, so the responsibility for taste, judgment, and caring about the outcome stays with you. The piece sits against a backdrop of wild contradictions in the market: the software sector shedding $800 billion in market cap over three months while AI companies raise money faster than ever.

Summary: I like this framing because it cuts through a lot of the noise around whether AI is making engineers better or worse. The author's argument is that the tooling is fundamentally neutral. Point it at a clear problem with a clear sense of what good looks like, and it amplifies that into something genuinely productive. Point it at sloppy thinking, vague requirements, or work you don't actually understand, and it amplifies that too, except now the mess ships faster and louder. The amplifier doesn't know the difference, and it never will. That part of the job stays human.

The article opens with a set of deliberately contradictory headlines. The software sector lost eight hundred billion dollars in market capitalization in three months. AI companies are raising money faster than at any point in history. Programmers have never been more productive, and programmers are losing productivity. Everyone is hiring and nobody is hiring. The point of stacking these against each other is to show that the macro story is genuinely confused right now, and that you can find evidence for almost any narrative you want to believe. The author's response is to stop trying to read the tea leaves and instead focus on the one variable you control, which is whether you actually care about the quality of what you produce.

What I find compelling is the implied warning to teams that are treating AI as a way to stop thinking. If you offload judgment to the model, you don't get a smaller version of yourself, you get an amplified version of whatever process you fed it. A weak code review culture amplified by AI becomes a firehose of plausible-looking pull requests that nobody can actually vet. A strong one becomes a team that ships more without lowering the bar. The tool didn't decide which of those you got. You did, long before you opened the editor.

Where the piece is a little thin is on the mechanics. It leans hard on the amplifier metaphor but doesn't dig much into what caring concretely looks like day to day, beyond the general exhortation. The author is also avoiding the uncomfortable possibility that for some kinds of work, the amplified mediocre output is commercially good enough, and the market will reward speed over care regardless of how any individual engineer feels about it. That tension between craft and what actually gets funded is the harder question lurking under the optimism.

Key takeaways:

AI tooling is neutral; it amplifies the quality of your inputs, your process, and your judgment rather than supplying any of those itself.
The current market sends contradictory signals, so chasing the macro narrative is a distraction from the work you can control.
Offloading judgment to a model doesn't reduce your responsibility, it raises the volume on whatever process you already had.

Why do I care: For anyone running a frontend team or owning architecture decisions, this is a useful gut check on how you adopt AI assistance. The risk isn't that the model writes bad code, it's that it lets a team scale a weak review culture or fuzzy requirements faster than anyone can catch. If you're a senior dev, your leverage now is less about typing speed and more about being the person who still cares about correctness, taste, and whether the thing should exist at all. I'd treat this as a prompt to invest in review discipline and clear specs before you hand the amplifier to a junior who hasn't built that muscle yet.

You Have to Care - Because the Amplifier Won't

How Will Smith Eating Spaghetti Became a Benchmark for AI Video Progress

TLDR: The cursed 2023 clip of an AI-generated Will Smith devouring spaghetti has quietly turned into an informal stress test for text-to-video models. Because everyone remembers how grotesque the original was, regenerating the same prompt over the years gives a vivid, intuitive read on how fast the technology is improving.

Summary: This one is more fun, but there's a real point underneath. Back in 2023, text-to-video generation was still visibly primitive. Image generation had already gone mainstream and produced convincing stills, but video stayed unstable, slippery, and strange. The original Will Smith spaghetti clip was funny precisely because it was bad. Faces melted, the fork phased through the bowl, the pasta behaved like a living organism. It became a meme because it captured exactly how far the technology had to go.

The clever twist is that the clip stayed bad in very specific, memorable ways, and that consistency is what made it useful. When a meme is burned into everyone's memory, regenerating the same prompt on a newer model gives an instant before-and-after that needs no chart or metric. You just look at it. Does the spaghetti still writhe? Does Will Smith still have the wrong number of teeth? The article treats this as an accidental benchmark, a shared cultural reference point that tracks progress in a way formal evaluations struggle to communicate to a general audience.

What makes the spaghetti test genuinely hard, and therefore a decent informal probe, is the combination of challenges packed into it. You've got a recognizable human face that has to stay consistent frame to frame, fine motor action with a fork and mouth, the physics of a messy deformable food, and increasingly the alignment of audio with the chewing motion. Those are exactly the things that early models flubbed. The topic tags on the piece point at multimodal generation and audio-video alignment, which signals that the 2026 version of this test is no longer just about whether the picture looks right, but whether the sound and motion agree.

The weakness here, and the article doesn't quite confront it, is that a meme is not a rigorous benchmark. It measures vibes. A model could nail the spaghetti clip because that specific scene leaked into training data while still failing on novel prompts. Vibe-based evaluation is great for communicating progress to the public and terrible for actually deciding which model to ship in a product. It's a fun lens, not a procurement tool, and it's worth keeping that distinction straight.

Key takeaways:

The Will Smith spaghetti prompt works as an informal benchmark because its 2023 failure mode is so memorable that regenerating it shows progress at a glance.
The scene stresses several hard problems at once: facial consistency, fine motor action, deformable physics, and now audio-video alignment.
Meme-based evaluation communicates progress intuitively but measures vibes, not the rigorous generalization you'd need for a production decision.

Why do I care: Most frontend folks aren't shipping text-to-video models, so this is mostly a cultural and product-awareness story. But if you build anything that surfaces generated media, the takeaway matters: the quality bar for AI video is rising fast enough that "it looks obviously fake" is no longer a safe assumption, which has implications for trust, moderation, and provenance signaling in any UI that displays user-generated or AI-generated clips. I'd also flag the meta-lesson for engineers generally, which is to be skeptical of vibe-based benchmarks. The spaghetti test is delightful, but if you ever pick a model or a library based on a demo that looks impressive, remember the difference between a memorable demo and reproducible quality on your actual inputs.

How Will Smith Eating Spaghetti Became a Benchmark for AI Video Progress