tfg.evaluation package

Submodules

tfg.evaluation.evaluator module

Evaluator module for analyzing system responses per scenario.

This module is used during batch evaluation of LLM agent outputs. It provides simple heuristic metrics like word count, response length, and reference detection from the assistant’s last message.

tfg.evaluation.evaluator.evaluate_response(result, original_prompt)[source]

Evaluates the quality of the system’s response with detailed metrics.

Parameters:

result (dict) – The full result returned by the LangGraph.
original_prompt (str) – The original scenario prompt.

Returns:

Dictionary containing multiple evaluation metrics.

Return type:

dict

tfg.evaluation package

Submodules

tfg.evaluation.evaluator module

tfg.evaluation.scenario_runner module

Module contents