Realign is an evaluation and experimentation framework built by HoneyHive. This guide will walk you through the process of setting up and running evaluations using Realign and HoneyHive.

Prerequisites

Before you begin, make sure you have:

  1. Realign and HoneyHive installed in your Python environment.
pip install honeyhive realign
  1. A HoneyHive account with an API key and Project set up.
  2. A Realign configuration file (YAML).

Configuration File

Your Realign configuration file (sample-config.yaml) should define the LLM agents used in your evaluation. Here’s an example:

llm_agents:
  summary_writer:
    model: openai/gpt-4o-mini
    template: |
      Provide a summary of the latest {{ product_type }} sales performance 
      in the {{ region }} market for {{ time_period }}. 
      Include key metrics such as {{ metric_1 }} and {{ metric_2 }}. 
      Also, highlight any significant trends or factors affecting sales 
      in this region.

This configuration defines an agent named ‘summary_writer’ that uses the OpenAI GPT-4 model and a specific prompt template. The {{ variable_name }} indicates any variable within the prompt to be replaced by corresponding fields from the evaluation input.

Setup

First, let’s set up the necessary configurations:

import realign
realign.config.path = 'path/to/sample-config.yaml'
realign.tracing.honeyhive_key = '<YOUR_HONEYHIVE_API_KEY>'
realign.tracing.honeyhive_project = '<YOUR_HONEYHIVE_PROJECT_NAME>'

Make sure to replace '<YOUR_HONEYHIVE_API_KEY>' and '<YOUR_HONEYHIVE_PROJECT_NAME>' with your actual HoneyHive API key and project name.

Creating an Evaluation Class

Next, create a class that inherits from realign.evaluation.Evaluation:

from realign.evaluation import Evaluation
from realign.llm_utils import llm_messages_call

class SampleEvaluation(Evaluation):
    async def main(self, run_context):
        # inputs -> datapoint as dict
        inputs = run_context.inputs 

        ### Evaluation body starts here ###

        message = llm_messages_call(
            agent_name='summary_writer', 
            template_params=inputs,
        )
        
        ### Evaluation body ends here ###

        # Return output to stitch to the session
        return message.content

This class defines the main logic for your evaluation. In this example, it uses an LLM agent named ‘summary_writer’ to generate content based on the input parameters.

Preparing Input Data

Running the Evaluation

Finally, create an instance of your evaluation class and run it:

Dashboard View

Remember to review the results in your HoneyHive dashboard to gain insights into your model’s performance across different inputs. The dashboard provides a comprehensive view of the evaluation results and performance across multiple runs.

Conclusion

By following these steps, you can set up and run evaluations using Realign and HoneyHive. This allows you to systematically test your LLM-based systems across various scenarios and collect performance data for analysis.