Upload and evaluate existing logs from external sources like spreadsheets or databases.
article
: Contains the full text of news articles, which serves as our inputhighlights
: Contains human-written bullet-point summaries of each article, which we’ll use to simulate the expected output from our LLM summarization taskSample eval script for external logs
scikit-learn
library for keyword extraction. Install it using pip install scikit-learn
.evaluate
function with your dataset and defined evaluators.For the purposes of our example, we’ll assume our data has already been transformed into this required format:dataset_id
when running the experiment.
For instructions on uploading and managing datasets within HoneyHive, please refer to the Upload Dataset page.highglights
column as our output:evaluate
function’s expected format, you can apply powerful client-side and server-side evaluations without rerunning the original AI/LLM calls. This provides a flexible way to assess performance, track quality over time, and gain insights from historical data.