Discussion about this post

User's avatar
Alvin Ånestrand's avatar

Updates on time horizons:

For an early version of Mythos Preview, METR estimates a 50% time horizon of "at least 16hrs (95% CI 8.5hrs to 55hrs)", refraining from providing a specific estimate due to the high uncertainty:

"Of the 228 tasks in our suite, only 5 are estimated as 16+ hours long, making measurements at this range unstable and less meaningful than at ranges with better task coverage. Thus, we are not highlighting exact estimates for models above 16 hours measured with our current suite."

(Source: https://x.com/METR_Evals/status/2052896623852929510)

80% time horizon for this early version is estimated to 3h 6min. (Source: https://metr.org/time-horizons/)

This corresponds to roughly 1.6 doublings from the 1h 10min time horizon of Opus 4.6. The early version of Mythos Preview was evaluated in March, indicating an extremely short doubling time.

The 95% confidence interval for the 50% time horizon is surprisingly low compared to the ECI score, while the 80% time horizon seems roughly consistent with the ECI score. I'm uncertain what to make of this.

Note that the final version announced on April 7 might be even more capable.

torchbearercommunity's avatar

Excellent breakdown of the ECI trend-break, Alvin. The jump from a 3-month to a 1-month doubling time for the 50% time horizon is staggering and it feels like we're moving from incremental progress to vertical takeoff. Your point about Mythos being able to automate its own R&D makes the Jan/Feb 2027 timeline for a Superhuman AI Researcher feel much more grounded in data than mere speculation. Great work quantifying how extreme this jump is.

No posts

Ready for more?