Selecting the right LLM for your use case
·
3 min read
·
Aug 8, 2023
I have had numerous conversations with clients about LLMs, and one of the overarching questions has been “which model should I use?”. This post aims to answer the following questions:
- Which model should I use?
- How do I evaluate LLMs?
TL;DR: It depends :) .. read on!
Start with your use case
The first step is to understand your use case. Think of the business problem that you want to solve by utilizing a large language model.
Other things to consider:
- The tasks you anticipate the model to execute, such as text production, emotion analysis, or translation.
- The projected scale of your operation.
- Available computational capacities.
- The source & state of enterprise data which you will integrate with and ultimately use the LLM for.
Criteria for Model Evaluation
Several factors must be considered when assessing language models:
- Size and Capabilities: Language models differ in scale and capabilities. While larger models often boast superior functions and results, they demand greater computational power. Weigh the benefits and drawbacks in relation to your specific needs. Know the bigger is not always better (more on that later).
- Training Data and Knowledge Limitations: The dataset a model is trained on defines its knowledge boundaries. It’s pivotal to recognize that models have a last update point, post which they might not recognize newer data. If staying current is a priority, opt for models updated recently.
- Adaptability: Some models facilitate custom training for niche tasks or sectors. If tailoring the model to your needs is paramount, choose models that allow this level of customization.
- Efficiency and Speed: Review essential metrics such as accuracy and response speed, especially if your application demands swift response.
- Availability and cost: Language models vary in terms of access and cost. Some are freely accessible, while others might entail a fee. Match your selection to your budgetary confines. Consider total cost of ownership for your application especially if a hosted model has a per API or per token charge. Availability also implies geopolitical & enterprise boundary availability — is the model available & respects the legislative boundaries your organization must adhere to?
- Ethical Implications: Large models can sometimes produce biased content. Hence, opting for models that are designed with ethical guidelines and focus on reducing bias is crucial.
Exploring Beyond GPT-4 and Other Language Models
Despite the prowess of GPT-4 and similar models, it’s worthwhile to assess other potential fits. Noteworthy alternatives include:
- BERT (Bidirectional Encoder Representations from Transformers): Developed by Google, BERT shines in understanding language nuances and is adept at tasks like sentiment analysis.BERT is available through the Hugging Face Transformers library and can be fine-tuned for specific use cases.BERT was specifically trained on Wikipedia (~2.5B words) and Google’s BooksCorpus (~800M words). These large informational datasets contributed to BERT’s deep knowledge not only of the English language but also of our world!
- XLNet: XLNet is a highly effective LLM that overcomes the limitations of traditional left-to-right and masked language models. Surpassing the constraints of traditional models, XLNet considers word permutations for more refined language predictions.
- T5: Google’s T5 is adaptable and addresses a vast array of language tasks. It employs a “text-to-text” framework, where all tasks are transformed into a text generation problem, simplifying the process of adapting the model to different tasks and domains.
- RoBERTa(Robustly Optimized BERT Pretraining Approach): RoBERTa builds on BERT’s language masking strategy, modifies key hyperparameters in BERT (including removing BERT’s next-sentence pretraining objective), and training with much larger mini-batches and learning rates. This allows RoBERTa to improve on the masked language modeling objective compared with BERT and leads to better downstream task performance.
- LLama-2: Llama-2 family of models are trained on 2 trillion tokens, and have a context length of 4096 tokens. Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests..
It’s essential to align the models’ features with your needs, considering aspects like adaptability, technical compatibility, costs, legal and ethical implications.