As the adoption of large language models (LLMs) continues to grow across various industries, the associated costs of using these powerful tools have become a critical consideration for businesses and individuals alike.
Optimizing token costs is essential for maximizing the return on investment. This article will bring you some practical ways to optimize token costs for ChatGPT and LLM APIs. By understanding and implementing the recommended options, you can keep their costs manageable while leveraging LLMs’ full potential.
Understanding Tokens and Costs
Tokens are the basic units used to measure input or output text. When you enter a prompt into an LLM API, it divides the text into tokens to understand and respond to your request. The LLM API cost is usually based on the number of tokens used in each prompt.
So, how to calculate the cost by tokens?
LLM providers calculate the cost based on the total number of tokens processed during an interaction, which includes both the input tokens (from the user’s query) and the output tokens (from the model’s response).
For example, GPT-4 Turbo has the price of:
- $0.01 per 1,000 prompt tokens
- $0.03 per 1,000 completion tokens
Each token has a predefined cost set by each specific LLM provider. To determine the total cost, the LLM first tokenizes the input and output text into their respective tokens, sums them up, and then multiplies the total number of tokens by the cost per token.
Not sure how to calculate your team’s budget for LLM? Check out our TypingMind pricing calculator for a smart and appropriate spending plan!
Why We Need To Optimize LLM Token Costs?
LLM token costs can significantly impact both the operational and financial aspects of deploying these advanced AI systems. Here are the key benefits of optimizing these costs:
- Better cost management: Businesses often operate within budget constraints. By optimizing token costs, companies can better manage their budgets and allocate resources more efficiently. Cost savings in token processing can lead to increased profitability.
- Easier to scale: As businesses grow, so does the volume of data and the number of tasks that need to be processed. By reducing token costs, businesses can handle a higher workload without requiring proportionally more resources, enabling seamless scaling potential.
- More competitive advantages: Lower costs can lead to more frequent and sophisticated use of LLMs in many tasks, such as supporting customers. This will help to enhance working efficiency, increasing competitive advantage significantly.
- Boost R&D investment: Savings from optimized token costs can be reallocated to research and development, fostering innovation and allowing businesses to explore new applications of LLMs.
Top 6 Effective Ways to optimize token costs For ChatGPT and LLM APIs
#1. Use Cheaper AI Models
There are plenty of AI models on the market such as Claude 3.5 Sonnet, GPT-4o, and Google Gemini 1.5 Pro. The easiest way to optimize your API cost is to choose an AI model that has good API pricing but still can provide quality responses that meet your needs.
For complex task like math or coding, you may need to use the smartest AI model to ensure accuracy. These models are often costly. Yet, when you finish all the tough questions, switch to a more budget-friendly model for easier tasks in the middle of the conversation.
For example, by the same performance or even smarter, GPT-4o is 6x cheaper than GPT-4 Turbo:
Input tokens:
- GPT-4o: $5.00/1M tokens
- GPT-4: $30.00/1M tokens
Output tokens:
- GPT-4o: $15.00/1M tokens
- GPT-4: $60.00/1M tokens
Let’s take another example. The new Claude 3.5 Sonnet even outperforms GPT-4o in programming and many aspects with even 2x cheaper input tokens price ($3.00/1M tokens).
Here are API pricing for some common AI models:
/Model | Context | Input/1M Tokens | Output/1M Tokens |
128k | $5.00 | $15.00 | |
128k | $10.00 | $30.00 | |
16k | $0.50 | $1.50 | |
1M | $0.35 | $1.05 | |
2M | $3.50 | $10.50 | |
200k | $3.00 | $15.00 | |
200k | $15.00 | $75.00 | |
200k | $3.00 | $15.00 | |
200k | $0.25 | $1.25 |
API Pricing Of Some Common LLMs
#2. Limit the context provided to the AI model
This option allows you to determine how many messages should be included in the AI assistant’s context.
If you don’t need to use any context in your conversation history, you can delete old messages to save token usage, which can result in reducing API costs.
Or if you don’t want to do that manually, you can easily use TypingMind app to do the simple setup – select the x number of messages to include in the context of AI assistant.
For example, When set to 1, the AI assistant will only see and remember the most recent message.
With this option, the system can allocate resources more effectively, leading to quicker response times and smoother operation.
Set Context Limit In TypingMind
#3. Optimize Prompts To Limit Message Length
One of the most efficient methods to reduce LLM cost is to optimize your prompts. Each time you enter a prompt to an LLM, you will be charged according to the number of tokens handled.
Instead of giving an unclear and random text at the LLM, try to give a concise and focused question by removing any extraneous words or phrases, and go right to the point.
To optimize the response, you can add the desired message length to the prompt. This will help you get answers of the right length while avoiding wasted costs.
Example:
- Don’t: Write an introduction about the product TPM, which is a body lotion.
- Do: Write a 200-word product introduction about a body lotion, named TPM. The main feature of this body lotion is to hydrate and whiten the skin.
Add the desired message length to the prompt
Another way to limit the message length is to create an AI agent with specific commands. For example, in TypingMind, we have an AI agent called “Pro Coder” that comes with a prompt “Help you write code without overexplain things too much using only its internal knowledge and treat like a professional developer”. This way, it helps you create code without any long explanation in the output.
#4. Limit Max_tokens
Max_tokens refers to the maximum number of tokens to generate before stopping. For example, when you set the Max_Tokens of 1000, the LLM will generate a response with less than 1000 tokens (around 750 words).
Setting a maximum token limit (max_tokens) in your API requests can help reduce costs when using ChatGPT. This prevents unexpectedly long responses that could significantly increase costs.
A token limit gives you a predictable upper bound on how much each request might cost. This makes it easier to manage and forecast your budget.
Set Max Tokens To 1000
#5. Organize Chats Into Folders, Adding Tags, Staring The Chats
Sometimes you may ask your AI chatbot one questions for several times. Yet, if you organize your important chats in a folder or add tags, you can easily find them when needed and continue the chats.
For example, you should put all chats related to Excel in the “Excel” folder. Everytime you have questions about Excel, you can open the folder without having to start a new chat every time.
The more simple solutions are adding tags or staring the important chats. These chats will be more visible and easier to find!
#6. Do Research Before Chat With AI
Researching provides you with basic knowledge of the topic you are inquiring about. This helps you formulate more precise and informed questions.
Instead of beginning with broad or basic questions, you can dive straight into the topic’s more complex or specific aspects. Thereby maximizing the value derived from the interaction and significantly reducing the cost of using an AI chatbot.
Moreover, having a grasp of the subject allows you to evaluate the AI’s responses, discerning useful and accurate information from less relevant or incorrect data.
Conclusion
Optimizing LLM token costs is crucial for maintaining cost-efficiency while still achieving high-quality AI responses, whether for personal or business use.
By implementing strategies such as choosing a cheaper model, setting context limits, optimizing prompts, limiting max_tokens, and conducting research before interacting with AI, you can make your use of AI models more feasible and sustainable.