Skip to main content

TRL Fine Tuning

MLOps

TRL Fine Tuning

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Enabled by defaultBuilt In
CLI install commandelephant skills install trl-fine-tuning
Overview

Bundled with the packaged Elephant Agent CLI as a built-in procedural skill.

Already ships inside the packaged Elephant Agent bundle. Use `elephant skills install trl-fine-tuning` only when you want an explicit local materialization record.