TypeScript Quickstart
Get started with running evaluations with HoneyHive
Running evaluations is a natural extension of the tracing capabilities of HoneyHive. We recommend you to go through the tracing quickstart before proceeding with this guide.
What is an Evaluation Run?
An evaluation run in HoneyHive is a group of related sessions that have a common metadata.run_id
field.
When an evaluation run is linked to a dataset, then it can be compared against other runs on the same dataset.
The flexibility here helps you compare the performance of your application across any dimension of configuration you want to test - models, chunking strategies, vector databases, prompts, so on.
Running a evaluation
Prerequisites
- You have already created a project in HoneyHive, as explained here.
- You have an API key for your project, as explained here.
- You have instrumented your application with the HoneyHive SDK, as explained here.
Expected Time: 5 minutes
Steps
Installation
To install our SDKs, run the following commands in the shell.
Initialize the SDK
To initialize the API client, we simply pass the API key to the SDK constructor.
Initialize the evaluation run
To initialize the evaluation run, we need to provide a name for it.
dataset_id
to the evaluation run. The docs for creating datasets can be found here.run_id
for future reference.Setup your application tracing
Create a function which traces your application code.
At the end of the function, return the session_id
for future reference.
To initialize the tracer, we require 3 key details:
- Your HoneyHive API Key
- Name of the project to log the trace to
- Name for this session - like “Chatbot Session” or “Customer RAG Session”.
datapoint_id
on the tracer metadata.Setup your harness
Now you can loop over any dataset you have and run your application code.
Push your evaluation to HoneyHive
Finally, push the evaluation to HoneyHive.
Next Steps
Congratulations! You have successfully run an evaluation in HoneyHive.
Now, to view the evaluation results, head over to the HoneyHive Evaluations.