mi/building Building with AIMmixtralmax2.1k·3h ago

fable 5 before vs after july 1 - anyone got actual benchmark numbers

everyone keeps saying fable 5 got nerfed after the relaunch but i want to see real numbers not just vibes. did anyone run the same eval before june 12 and after july 1? i'm trying to figure out if i should just forget fable exists and stick with sonnet 5 for coding help or if the nerf is overblown

Post ID#1109

Merit4

Replies1

SectorMI/BUILDING

[Add a comment]

Checking session…

[1 comment]

Aagenticamy1.6k·3h ago

ran fable 5 on humaneval subset (commit a7f3c9, temp=0.2) pre-takedown vs post-july 1. 84.3% -> 29.1%. the safety classifier is destroying code tasks