This chapter covers the core features and capabilities of Model Evaluation Tool.
# Run demo with sample sentiment analysis data
python src/model_evaluation_tool.py --demo
# Evaluate predictions vs. labels
python src/model_evaluation_tool.py --predictions preds.txt --labels true.txt
# Export report as JSON
python src/model_evaluation_tool.py --predictions preds.txt --labels true.txt --export report.json
# Print confusion matrix
python src/model_evaluation_tool.py --confusion-matrix preds.txt true.txt
# Run benchmark across splits
python src/model_evaluation_tool.py --benchmark preds.txt true.txt --export bench.jsonFollow this guide to get Model Evaluation Tool up and running in your environment.
model-evaluation-tool/
├── README.md
├── LICENSE
├── src/
│ └── model_evaluation_tool.py # Core engine (~350 lines)
└── examples/
├── basic_usage.py # Programmatic usage example
└── sample_predictions.jsonl # Sample prediction data
| Flag | Description |
|---|---|
--demo | Run demo with built-in sample data |
--predictions FILE | Predictions file (JSONL or plain text) |
--labels FILE | True labels file (JSONL or plain text) |
--confusion-matrix PREDS LABELS | Print confusion matrix |
--benchmark PREDS LABELS | Run benchmark evaluation |
--export FILE | Export report to JSON |
Get the full Model Evaluation Tool and unlock everything.
Get the complete guide with every chapter unlocked, including code samples, diagrams, and best practices.
Access all interactive tools with complete data, all workload profiles, and the full scenario library.
Downloadable source code, configuration files, and working examples from every chapter.
Free updates for life. Every new chapter, tool, and improvement included.