LLM Selection Criteria
·
5 min read
·
Oct 27, 2023
This blog talks on how to select the right LLM before diving into plethora of available models. To select the right large language model one must understand the unique use case that needs to be solved. Asking the key questions such as
- The primary goal of using an LLM
- What business use-case has to be solved such as text generation, sentiment analysis, translation, etc.
- Considering the application’s scale (low traffic or enterprise-level)
- Demarcating the computational resource constraints
will help in selecting the most suitable LLM.
Below are the three broad selection criteria that have to be considered before choosing any LLMs
Task Relevance & Functionality
Here’s how the Task Relevance can influence LLM selection:
- Task Type : Depending on the business use-case, for instance encoder-only models (BERT, RoBerta, etc.) are good for classification. Whereas for text summarization Encoder-Decoder models (T5, Flan T5, etc.) can be utilized.
LLM strength on myriad use-cases (* Based on personal experiments)
To understand more about the different types of Transformer Architectures and their associated business use cases, please read my blog
- Performance : Though there are defined LLM architectures that can be opted for given a use-case, these rules do not always hold up in real-time performance. For instance, it is advised to use encoder-only LLMs for best performance in classification problems. But it so happens sometimes that decoder-only models (parameter heavy) such as GPT or Llama perform better as compared to ones suggested. In such situations one should contest the performance and the model size (infra limitations) to make a decision on which model strikes a balance between the two.
- Pretraining Training : LLMs are pretrained on vast amounts of text data, which can come from different sources. The relevance of the pretraining data to your task is crucial. If your task involves specialized or domain-specific language, you may want to choose a model that was pretrained on similar or relevant data. For instance, Star-Coder is trained on permissively licensed data from GitHub, including from 80+ programming languages and is useful for Code generation / Code interpretation. Also FinBERT is BERT tailored for the financial domain. It is trained on financial news, reports, and documents. FinBERT can be used for sentiment analysis of financial news, stock market predictions, and other finance-related tasks.
- Knowledge cutoff : The knowledge cutoff represents the date at which the pretraining data ends. Newer models may have more up-to-date information and knowledge. For domain-specific tasks, a more recent knowledge cutoff might be beneficial, as it ensures the model is aware of the latest developments and trends in that domain. For instance knowledge cutoff for GPT 3.5 is January 2022. It will not be able to provide correct context for any event post that cut-off.
- Fine-Tuning : Fine-tuning allows the model to adapt to your specific requirements. If your task is domain-specific, like legal documents, healthcare, or financial analysis, choose an LLM that is amicable to fine-tuning. Many LLMs such as GPT-2 / GPT 3/ XLNet are hard to fine-tune which makes their usage a bit constrained for custom tasks.
- Latency : Time it takes for a Large Language Model (LLM) to process and respond to a request is an important criteria if one is building Real-Time Applications, Interactive systems & Scalable applications. However, it’s important to note that there may be trade-offs between latency and model size or complexity. Smaller models or model quantization techniques can reduce latency but might sacrifice some aspects of model performance. Therefore, the choice of an LLM should strike a balance between low latency and other factors.
Data Privacy
Data privacy is a critical criterion when selecting a Large Language Model (LLM), especially in applications that involve handling sensitive or private information. It’s essential to prioritize the protection of user and organizational data and adhere to relevant privacy regulations and best practices.
For instance, financial institution and banks deal with confidential PII which has to be protected at all costs. In such scenarios the use of Open-source LLMs (in-house deployment) such as Llama, Bert, Falcon, etc. is advisable as compared to the Proprietary LLMs (API based) such as GPT, Davinci etc. Although there have been recent developments where OpenAi has introduced enterprise use of LLMs to secure the data privacy but these efforts are at a nascent stage.
Resource / Infrastructure Limitations
Resource and infrastructure limitations can significantly impact the practicality and cost-effectiveness of deploying LLMs in your application. Selecting an LLM that aligns with your available resources and infrastructure is essential to ensure the successful and efficient operation of your NLP-based application.
Here’s how resource and infrastructure limitations can influence LLM selection:
- Compute Resources : The computational power required to train and deploy LLMs can vary significantly. Based on the availability of computing resources (GPUs or TPUs), choose an LLM that can run efficiently within your hardware constraints.
- Memory and Storage : LLMs often have substantial memory and storage requirements, especially for larger models. Ensuring that your infrastructure can handle the memory and storage demands of the chosen LLM is a must.
- Latency & Throughput: Resource limitations can affect the latency and throughput of LLMs. For instance, larger LLMs with low compute resources will increase the LLM latency. Choose an LLM that balances model size and computational requirements to meet your desired latency and throughput targets.
- Resource Efficiency : LLMs with Resource-efficient variants come in different sizes, allowing you to choose a smaller model that consumes fewer resources while still delivering acceptable performance for your application.
The selection of an LLM model should be guided by the specific requirements of your task, the available resources, data privacy constraints and the performance characteristics of the model. Additionally, it’s essential to stay informed about new developments in the field, as LLM research is rapidly evolving, and newer models may offer improved performance and capabilities.