From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf]

  • Posted 1 day ago by fertrevino
  • 125 points
https://www.fertrevino.com/docs/gpt5_medhelm.pdf
I recently worked on running a thorough healthcare eval on GPT-5. The results show a (slight) regression in GPT-5 performance compared to GPT-4 era models.

I found this to be an interesting finding. Here are the detailed results: https://www.fertrevino.com/docs/gpt5_medhelm.pdf

16 comments

    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..