Opus 4.5 is disgustingly good
| Insanely Creepy White Native | 11/29/25 | | aromatic buck-toothed sex offender | 11/29/25 | | Insanely Creepy White Native | 11/29/25 | | big henna french chef trust fund | 11/29/25 | | Walnut Bull Headed Therapy Persian | 11/29/25 | | Insanely Creepy White Native | 11/29/25 | | Blue bat-shit-crazy roommate | 11/29/25 | | Supple Geriatric Indirect Expression National Security Agency | 12/07/25 | | Violent lascivious principal's office potus | 12/07/25 | | Blue bat-shit-crazy roommate | 11/30/25 | | Supple Geriatric Indirect Expression National Security Agency | 12/07/25 | | Peach haunted graveyard | 12/07/25 | | Lake Gaped Forum | 12/07/25 | | Impressive greedy son of senegal location | 12/07/25 | | .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:. | 12/21/25 | | Patel Philippe | 12/21/25 | | ,.,....,..,.,.,,,,..,..,.,..,.,.,.,... | 12/21/25 |
Poast new message in this thread
Date: December 21st, 2025 12:59 PM
Author: .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.
Opus 4.5 is up to 4 hours and 49 minutes on the METR time horizon task. this benchmark measures the task length (in terms of human work time) that models can do with SWE/AI research type projects. big increase over 5.1 max, which was 2 hours 53 minutes and faster than the overall trend of doubling every 7 months. with any luck, models will be capable of substantially automating AI research before 2030 and set off an intelligence explosion.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
(http://www.autoadmit.com/thread.php?thread_id=5804093&forum_id=2#49527643) |
 |
Date: December 21st, 2025 1:53 PM
Author: ,.,....,..,.,.,,,,..,..,.,..,.,.,.,...
They are always claiming this for new model releases and no objective evidence ever materializes for it. It’s almost certainly a psychological bias rather than reality. There is substantial variation in how models respond to a particular problem just based on how they are prompted. It’s hard for a user to reliably measure model capabilities over time based on intuition alone.
(http://www.autoadmit.com/thread.php?thread_id=5804093&forum_id=2#49527743) |
|
|