๐Unsloth Benchmarks
Want to know how fast Unsloth is?
For our most detailed benchmarks, read our Llama 3.3 Blog.
Benchmarking of Unsloth was also conducted by ๐คHugging Face.
We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):
Llama 3.3 (70B)
80GB
2x
>75%
13x longer
1x
Llama 3.1 (8B)
80GB
2x
>70%
12x longer
1x
Context length benchmarks
The more data you have, the less VRAM Unsloth uses due to our gradient checkpointing algorithm + Apple's CCE algorithm!
Llama 3.1 (8B) max. context length
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
8 GB
2,972
OOM
12 GB
21,848
932
16 GB
40,724
2,551
24 GB
78,475
5,789
40 GB
153,977
12,264
48 GB
191,728
15,502
80 GB
342,733
28,454
Llama 3.3 (70B) max. context length
We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
48 GB
12,106
OOM
80 GB
89,389
6,916
Last updated