Assuming your dataset is a list of list of dictionaries like the below:
[
[{'from': 'human', 'value': 'Hi there!'},
{'from': 'gpt', 'value': 'Hi how can I help?'},
{'from': 'human', 'value': 'What is 2+2?'}],
[{'from': 'human', 'value': 'What's your name?'},
{'from': 'gpt', 'value': 'I'm Daniel!'},
{'from': 'human', 'value': 'Ok! Nice!'},
{'from': 'gpt', 'value': 'What can I do for you?'},
{'from': 'human', 'value': 'Oh nothing :)'},],
]
You can use our get_chat_template to format it. Select chat_template to be any of zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth, and use mapping to map the dictionary values from, value etc. map_eos_token allows you to map <|im_end|> to EOS without any training.
You can also make your own custom chat templates! For example our internal chat template we use is below. You must pass in a tuple of (custom_template, eos_token) where the eos_token must be used inside the template.