tfg.evaluation package
Submodules
tfg.evaluation.evaluator module
Evaluator module for analyzing system responses per scenario.
This module is used during batch evaluation of LLM agent outputs. It provides simple heuristic metrics like word count, response length, and reference detection from the assistant’s last message.
- tfg.evaluation.evaluator.evaluate_response(result, original_prompt)[source]
Evaluates the quality of the system’s response with detailed metrics.
- Parameters:
result (dict) – The full result returned by the LangGraph.
original_prompt (str) – The original scenario prompt.
- Returns:
Dictionary containing multiple evaluation metrics.
- Return type:
dict