💬Chat Templates

List of Colab chat template notebooks:

More Info

Assuming your dataset is a list of list of dictionaries like the below:

[
    [{'from': 'human', 'value': 'Hi there!'},
     {'from': 'gpt', 'value': 'Hi how can I help?'},
     {'from': 'human', 'value': 'What is 2+2?'}],
    [{'from': 'human', 'value': 'What's your name?'},
     {'from': 'gpt', 'value': 'I'm Daniel!'},
     {'from': 'human', 'value': 'Ok! Nice!'},
     {'from': 'gpt', 'value': 'What can I do for you?'},
     {'from': 'human', 'value': 'Oh nothing :)'},],
]

You can use our get_chat_template to format it. Select chat_template to be any of zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth, and use mapping to map the dictionary values from, value etc. map_eos_token allows you to map <|im_end|> to EOS without any training.

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "chatml", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("philschmid/guanaco-sharegpt-style", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

You can also make your own custom chat templates! For example our internal chat template we use is below. You must pass in a tuple of (custom_template, eos_token) where the eos_token must be used inside the template.

unsloth_template = \
    "{{ bos_token }}"\
    "{{ 'You are a helpful assistant to the user\n' }}"\
    "</div>"\
    "<div data-gb-custom-block data-tag="for">"\
        "<div data-gb-custom-block data-tag="if" data-0='role' data-1='role' data-2='] == ' data-3='user'>"\
            "{{ '>>> User: ' + message['content'] + '\n' }}"\
        "<div data-gb-custom-block data-tag="elif" data-0='role' data-1='role' data-2='] == ' data-3='assistant'></div>"\
            "{{ '>>> Assistant: ' + message['content'] + eos_token + '\n' }}"\
        "</div>"\
    "</div>"\
    "<div data-gb-custom-block data-tag="if">"\
        "{{ '>>> Assistant: ' }}"\
    "</div>"
unsloth_eos_token = "eos_token"

tokenizer = get_chat_template(
    tokenizer,
    chat_template = (unsloth_template, unsloth_eos_token,), # You must provide a template and EOS token
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)

Last updated