Hackathon: Implementing LLMs-as-Judges

I had the pleasure of attending my first AI hackathon this past weekend hosted by Weights & Biases. The goal was to implement this paper on implementing LLMs as judges using reference guided verdicts. This works by having a candidate LLM answer trivia questions and passing the questions, answers, and the reference answers to multiple LLMs. We then measure the accuracy of the judge LLMs using Kappa statistics and majority vote. You can find my project on github here.

Read More