Quick Start
Installation
To use OpenRLHF, first launch the docker container (Recommended) and pip install
openrlhf inside the docker container:
# Launch the docker container
docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v $PWD:/openrlhf nvcr.io/nvidia/pytorch:24.07-py3 bash
pip uninstall xgboost transformer_engine flash_attn pynvml opencv-python-headless -y
# pip install
pip install openrlhf
# If you want to use vLLM acceleration (To install vLLM 0.8.5.post1)
pip install openrlhf[vllm]
# latest vLLM is also supported
pip install openrlhf[vllm_latest]
# Install vLLM, ring-flash-attention and Liger-Kernel
pip install openrlhf[vllm,ring,liger]
# pip install the latest version
pip install git+https://github.com/OpenRLHF/OpenRLHF.git
# Or git clone
git clone https://github.com/OpenRLHF/OpenRLHF.git
cd OpenRLHF
pip install -e .
Note
We recommend using vLLM 0.8.5.post1+. We also provided the Dockerfiles for vLLM and One-Click Installation Script of Nvidia-Docker.
Prepare Datasets
OpenRLHF provides multiple data processing methods in our dataset classes. Such as in the Prompt Dataset:
def preprocess_data(data, input_template=None, input_key="input", apply_chat_template=None) -> str:
if apply_chat_template:
chat = data[input_key]
if isinstance(chat, str):
chat = [{"role": "user", "content": chat}]
prompt = apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
else:
prompt = data[input_key]
if input_template:
prompt = input_template.format(prompt)
return prompt
We can use
--input_key
to specify theJSON key name
of the input datasets--prompt_data {name or path}
(PPO) or--dataset {name or path}
, and use--apply_chat_template
to utilize thechat_template
from the Huggingface Tokenizer.If you don’t want to use
--apply_chat_template
, you can use--input_template
instead, or preprocess the datasets offline in advance.OpenRLHF also support mixing multiple datasets using
--prompt_data_probs 0.1,0.4,0.5
(PPO) or--dataset_probs 0.1,0.4,0.5
.
How Chat Templating Works:
dataset = [{"input_key": [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]}]
tokenizer.apply_chat_template(dataset[0]["input_key"], tokenize=False)
"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
How to specify test datasets ?
Please set test datasets path using --eval_dataset {name or path}
.
Note
The JSON key
options depends on the specific datasets.
See Reward Dataset and SFT Dataset
Pretrained Models
OpenRLHF’s model checkpoint is fully compatible with HuggingFace models. You can specify the model name or path using --pretrain
, --reward_pretrain
and --critic_pretrain
.
We have provided some pre-trained checkpoints and datasets on HuggingFace OpenRLHF.
RLHF with Ray and vLLM
Then you can use the startup scripts we provide in the examples directory, or start the training using the following command using Hybrid Engine:
python3 -m openrlhf.cli.train_ppo_ray \
--ref_num_nodes 1 \
--ref_num_gpus_per_node 8 \
--reward_num_nodes 1 \
--reward_num_gpus_per_node 8 \
--actor_num_nodes 1 \
--actor_num_gpus_per_node 8 \
--vllm_num_engines 8 \
--vllm_tensor_parallel_size 1 \
--colocate_all_models \
--vllm_gpu_memory_utilization 0.6 \
--advantage_estimator reinforce \
--pretrain OpenRLHF/Llama-3-8b-sft-mixture \
--reward_pretrain OpenRLHF/Llama-3-8b-rm-700k \
--save_path /openrlhf/examples/test_scripts/final/llama3-8b-rlhf \
--ckpt_path /openrlhf/examples/test_scripts/ckpt/llama3-8b-rlhf \
--save_hf_ckpt \
--micro_train_batch_size 4 \
--train_batch_size 128 \
--micro_rollout_batch_size 8 \
--rollout_batch_size 1024 \
--n_samples_per_prompt 1 \
--max_epochs 1 \
--prompt_max_len 1024 \
--max_samples 100000 \
--generate_max_len 1024 \
--zero_stage 3 \
--bf16 \
--actor_learning_rate 5e-7 \
--critic_learning_rate 9e-6 \
--init_kl_coef 1e-3 \
--prompt_data OpenRLHF/prompt-collection-v0.1 \
--input_key context_messages \
--apply_chat_template \
--normalize_reward \
--gradient_checkpointing \
--packing_samples \
--vllm_sync_backend nccl \
--enforce_eager \
--vllm_enable_sleep \
--deepspeed_enable_sleep
For more details about the PPO and other online RLHF algorithms with Ray and vLLM, please refer to PPO with Ray (vLLM).
OpenRLHF provides usage scripts and docs for the supported algorithms in examples/scripts and usage.
One-Click Installation Script of Nvidia-Docker
# remove old docker
sudo apt-get autoremove docker docker-ce docker-engine docker.io containerd runc
dpkg -l |grep ^rc|awk '{print $2}' |sudo xargs dpkg -P
sudo apt-get autoremove docker-ce-*
sudo rm -rf /etc/systemd/system/docker.service.d
sudo rm -rf /var/lib/docker
# install docker
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
# install nvidia-docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
docker ps