Skip to main content
MLOps

Evaluation

Frames model, prompt, and system evaluation as a reproducible experiment with baselines, datasets, and explicit metrics.

Enabled by defaultBuilt In
CLI install commandelephant skills install evaluation
Overview

Bundled with the packaged Elephant Agent CLI as a built-in procedural skill.

Already ships inside the packaged Elephant Agent bundle. Use `elephant skills install evaluation` only when you want an explicit local materialization record.

Aliases

evalevaluationbenchmark models

Trigger phrases

evaluate this modelbenchmark these promptsset up an eval harness

Keywords

evaluationbenchmarkdatasetmetricbaselinereproducibility