Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation
Posted 3 hours ago by
berlianta
1
points
https://arxiv.org/abs/2606.01629
0
comments