AI Benchmarks Are Failing Modern Models Like GPT-4, Report Finds | WAI News