Once you have defined your evaluators, you can set up guardrails or passing ranges to monitor and control your model’s performance. Guardrails help you detect when your apps go out-of-bounds and fail to meet your predefined criteria. This can be particularly useful when unit testing and defining certain validation checks that your model must satisfy.
Configuring your evaluator
Let’s consider define guardrails for an email generation prompt. We may want to set up a guardrail to ensure that the number of sentences in the generated email is always less than 10.
Here’s how you can configure the guardrail:
- Defining Return Type: Define the return type for your evaluator. This helps HoneyHive provide different passing ranges and aggregation functions for different types of evaluators. Choose between
- Defining Guardrails: Define the passing range for the evaluator. In this case, we want the number of sentences to be less than 10. This means that if the evaluator returns a value greater than or equal to 10, the completion will not pass the guardrail.
- Enabling the Evaluator in Production: Choose whether to enable this evaluator to be computed across data logged via the logging endpoints (in production). In some cases, you may only want to compute this evaluator offline during app development and testing.
- Backfilling Data: You may also have the option to backfill all logged data in your project with this evaluator. This helps ensure consistent evaluation across your entire dataset.
Now that you’ve defined some evaluators and guardrails within HoneyHive, learn more about how to use your evaluators to run offline evaluations and monitor performance in production:
Running Evaluations in HoneyHive
How to run simple evaluations using the HoneyHive UI.
Logging Evaluation Runs via the SDK
How to programmatically run evaluations and log runs via the SDK.