Show HN: PhAIL – Real-robot benchmark for AI models. The gap to humans is 20x
6 by vertix | 7 comments on Hacker News.
I built this because I couldn't find honest numbers on how well VLA models actually work on commercial tasks. I come from search ranking at Google where you measure everything, and in robotics nobody seemed to know. PhAIL runs four models (OpenPI/pi0.5, GR00T, ACT, SmolVLA) on bin-to-bin order picking – one of the most common warehouse operations. Same robot (Franka FR3), same objects, hundreds of blind runs. The operator doesn't know which model is running. Best model: 64 UPH. Human teleoperating the same robot: 330. Human by hand: 1,300+. Everything is public – every run with synced video and telemetry, the fine-tuning dataset, training scripts. The leaderboard is open for submissions. Happy to answer questions about methodology, the models, or what we observed.
Humans Times
Be Upto date with Per hour news with Humanstimes
Tuesday, March 31, 2026
Monday, March 30, 2026
Sunday, March 29, 2026
Saturday, March 28, 2026
Friday, March 27, 2026
New top story on Hacker News: Telnyx package compromised on PyPI
Telnyx package compromised on PyPI
9 by ramimac | 49 comments on Hacker News.
https://ift.tt/TKWqnr3 https://ift.tt/tpuQsVW...
9 by ramimac | 49 comments on Hacker News.
https://ift.tt/TKWqnr3 https://ift.tt/tpuQsVW...
Subscribe to:
Comments (Atom)