Tech Blog

Insights on SME automation, production AI, and reliable engineering systems.

AI/ML2026-05-13

The VLM That Scored a Collapsing Robot 62/100

After metrics kept missing degenerate gaits and LLM-iterated rewards hit the survival cliff, we tried a vision-language model as the fitness scorer. The VLM was harsher than metrics and produced actionable failure descriptions — and it scored a collapsing robot 62/100. The case study in honest fitness, plus the four-layer evaluation stack we landed on.

AI/ML2026-05-06

When LLMs Iterate on Rewards: A Negative Result From Humanoid Locomotion

We ran NVIDIA's Eureka methodology — an LLM iteratively proposing reward functions for an RL agent — on cold-start humanoid bipedal walking. 24 candidates across two rounds, all fell by step 70. The LLM was thoughtful; the survival cliff was state-space coverage, not reward design.

AI/ML2026-05-02

Five Ways a Humanoid Cheats at Walking

Pure RL, physics priors, single-image poses, adversarial structural rewards, LLM-iterated rewards — five attempts to train a humanoid walker without mocap, five distinct ways the policy cheated. With the v16 'flamingo hopping' retraction the user caught.

AI/ML2026-04-18

Tech Blog

The VLM That Scored a Collapsing Robot 62/100

When LLMs Iterate on Rewards: A Negative Result From Humanoid Locomotion

Five Ways a Humanoid Cheats at Walking

Two Poses Are Enough: How Much Mocap Data Does a Humanoid Need to Walk?

The VecNormalize Trap: Two Silent Bugs That Hid a Working Walking Policy

I Ran Karpathy's Autoresearch on a $1,299 MacBook — Here's What Happened

The Agent Harness Inflection Point: What's Actually Feasible, What's Coming, and How to Adapt

Case Study: Building an Autonomous EV Charging Robot (And What We Learned When We Couldn't)

Case Study: From CE Advisor to Engineering Lead at a Healthcare Robotics Startup

Two Weeks, 40 Commits, and an AI That Remembers My Preferences

Fractional CTO for Deep Tech Startups: When You Need Technical Leadership but Not a Full-Time Hire

How I Built a Cross-Tool Memory and Skill System for AI-Assisted Development

Claude Code Skills: How to Set Them Up and Use Them