LLM Preference Tuning
Similar Flow to Instruction Tuning : The process remains the same as LLM Instruction Tuning, with a few key differences:
- Select “Train Model+” and create a repository.
- Set Task Type to Preference Tuning instead of Instruction Tuning.
- Choose a Pre-Uploaded Preference Tuning Dataset
- Must be formatted as: prompt, chosen, rejected.
- Select a Model for Preference Tuning
- Llama-1B
- Llama-3B
- Qwen-1.5B
- Qwen-3B-Coder ,etc.
- Configure settings and initiate training.
Similar Flow for Reasoning Model
The process remains the same as LLM Tuning, with a few key differences:
-
Select “Train Model+” and create a repository.
-
Set Task Type to LLM-RL-Tuning instead.
-
Select a Model
- Llama-1B
- Llama-3B
- Qwen-3B-Coder
- Qwen-1.5B, etc.
-
Training Status Indicators:
🟡 Yellow – Training in progress
🟢 Green – Training completed successfully
🔴 Red – Training failed (retry or report issue)
💡Once training is complete, your model will be fine-tuned to prioritize better responses!
Last updated on