tfg.evaluation package

Submodules

tfg.evaluation.evaluator module

Evaluator module for analyzing system responses per scenario.

This module is used during batch evaluation of LLM agent outputs. It provides simple heuristic metrics like word count, response length, and reference detection from the assistant’s last message.

tfg.evaluation.evaluator.evaluate_response(result, original_prompt)[source]

Evaluates the quality of the system’s response with detailed metrics.

Parameters:
  • result (dict) – The full result returned by the LangGraph.

  • original_prompt (str) – The original scenario prompt.

Returns:

Dictionary containing multiple evaluation metrics.

Return type:

dict

tfg.evaluation.scenario_runner module

tfg.evaluation.scenario_runner.parse_scenarios(file_path)[source]

Parses the scenarios file and returns a list of dicts with metadata and prompt

tfg.evaluation.scenario_runner.run_all()[source]

Main loop: parse scenarios and run each

tfg.evaluation.scenario_runner.run_and_log(scenario)[source]

Runs the conversation and logs the output and metrics

Module contents