πŸ’‘What Model Should I Use?

Llama, Qwen, Mistral, Phi or?

When preparing for fine-tuning, one of the first decisions you'll face is selecting the right model. Here's a step-by-step guide to help you choose:

1

Choose a model that aligns with your usecase

  • E.g. For image-based training, select a vision model such as Llama 3.2 Vision. For code datasets, opt for a specialized model like Qwen Coder 2.5.

  • Licensing and Requirements: Different models may have specific licensing terms and system requirements. Be sure to review these carefully to avoid compatibility issues.

2

Assess your storage, compute capacity and dataset

  • Use our VRAM guideline to determine the VRAM requirements for the model you’re considering.

  • Your dataset will reflect the type of model you will use and amount of time it will take to train

3

Select a Model and Parameters

  • We recommend using the latest model for the best performance and capabilities. For instance, as of January 2025, the leading 70B model is Llama 3.3.

  • You can stay up to date by exploring our catalog of model uploads to find the most recent and relevant options.

4

Choose Between Base and Instruct Models

Further details below:

Instruct or Base Model?

When preparing for fine-tuning, one of the first decisions you'll face is whether to use an instruct model or a base model.

Instruct Models

Instruct models are pre-trained with built-in instructions, making them ready to use without any fine-tuning. These models, including GGUFs and others commonly available, are optimized for direct usage and respond effectively to prompts right out of the box.

Base Models

Base models, on the other hand, are the original pre-trained versions without instruction fine-tuning. These are specifically designed for customization through fine-tuning, allowing you to adapt them to your unique needs.

Should I Choose Instruct or Base?

The decision often depends on the quantity, quality, and type of your data:

  • 1,000+ Rows of Data: If you have a large dataset with over 1,000 rows, it's generally best to fine-tune the base model.

  • 300–1,000 Rows of High-Quality Data: With a medium-sized, high-quality dataset, fine-tuning the base or instruct model are both viable options.

  • Less than 300 Rows: For smaller datasets, the instruct model is typically the better choice. Fine-tuning the instruct model enables it to align with specific needs while preserving its built-in instructional capabilities. This ensures it can follow general instructions without additional input unless you intend to significantly alter its functionality.

Experimentation is Key

We recommend experimenting with both models when possible. Fine-tune each one and evaluate the outputs to see which aligns better with your goals.

Last updated