In the fast-moving field of artificial intelligence, large language models (LLMs) have proven to be a transformative technology with immense potential. Over the last year, these models are becoming increasingly capable while simultaneously becoming more affordable, ushering in a new era of accessibility and opportunity.
Understanding the Basics
Before diving in, I want to define some key terms and concepts that will help you understand the discussion on AI price trends.
- Input tokens: These represent the number of tokens you use to make a request. In simpler terms, the length of your query or task description will determine how many input tokens are used.
- Output tokens: These are the tokens generated by the model in response to your query. The more detailed or complex the response, the higher the number of output tokens used.
- Model price: The main way that’s used to express model price is cost per million input/output tokens ($xx.x/$xx.x per million). For example, the current cost for GPT-4o is $5.00/$15.00 per million input/output tokens.
- The difference in pricing between input and output tokens stems from the computational effort required. When an LLM generates an output, it must perform complex reasoning, maintain consistency, and create original content – all of which demands more resources than simply processing input text.
- Pricing strategy: LLM pricing strategy refers to how a company or organization determines the cost associated with using or accessing their large language model.
- Pay-as-You-Go/Pay-by-Token model: Users are charged based on the amount of data (tokens) processed during the input and output operations.
- Subscription model: Users pay a fixed or monthly fee to access the model with certain usage limits.
Models are Improving, but Costs are Declining
Source: a16z
The AI model industry is experiencing a phenomenon which some call “LLMflation” – where the cost of inference is declining rapidly even though model performance continues to improve. Similar to Moore’s Law, we’re seeing a massive drop in the cost of the underlying commodity that’s driving the current AI technology cycle.
The current generation of LLMs have remarkable improvements across several dimensions which are setting a new baseline for performance:
- Multi-modal capabilities: Many LLMs can process not just text, but images and audio, offering more versatility than ever before.
- Expanded context windows: Models can now process much longer inputs, enabling more complex applications and use cases.
- Improved speed and accuracy: Models continue to achieve higher benchmark scores and better real-world performance with shorter inference times.
Less than six months ago, cutting-edge large language models like GPT-4, Claude 2, Gemini 1.0, and Llama 2 were less capable and more expensive than their current versions.
As a point of comparison, when GPT-3 debuted in 2021, it was the only model to achieve an MMLU benchmark score of 42, at a cost of $60 per million tokens. As of early 2025, an equivalent model is available for just $0.06–a 1000x decrease. This is actually outpacing the famous exponential progress of Moore’s Law.
Across the board, we’re seeing top model makers like Google, OpenAI, and Amazon slash their prices, along with lower end model makers in China dropping their prices as well. Model providers are charging less even as model performance continues to rise.
Why are prices falling?
There are several factors that have contributed to the ongoing decline in model prices.
- Open source: The increasing availability of open source models has fundamentally changed the market dynamics. These models enable cloud providers to offer high-performance AI without massive R&D and infrastructure investments, creating downward pressure on prices.
- Compute efficiency: Specialized hardware is enabling startups like Cerebras, Groq, and SambaNova to serve open weight models faster and at a lower cost. As a whole, as compute is becoming increasingly commoditized, it’s becoming cheaper and faster to train and serve models.
- Competition between providers: The top players (OpenAI, Anthropic, Google, etc.) are all competing for market share and are utilizing pricing as a competitive differentiator. New entrants to the model space, particularly from China and startups (see above) using open source models, have also intensified competition.
- Economies of scale: As more applications adopt agentic workflows that consume more tokens at inference, providers can spread their costs across a larger volume of transactions, enabling lower per-token prices.
Although this may seem like a race to the bottom, premium segments of the market continue to command higher prices. OpenAI’s o1 costs $15/$60 per million input/output tokens and Anthropic’s Claude 3 Opus costs $15/$75 per million input/output tokens.
These costly, high-performance models are typically reserved for applications like financial research and drug discovery where AI can help unlock extreme asymmetric returns. Another notable application is software development, where engineering resource costs remain quite high, allowing developers to maximize their productivity.
The Business Implications
Source: TensorOps
For companies looking to harness LLMs, this new environment presents both opportunities and challenges.
The drastic reduction in input/output costs is opening up new applications that previously wouldn't have been economically viable. Experiments are cheaper, the bar to POCs is lower. For example, agentic AI applications like AI-powered sales assistants or customer support chatbots are becoming more feasible due to the decreased costs of LLMs.
However, the full costs of an LLM product extend beyond raw tokens to include prompt design, agent orchestration, data management, testing, and more. As models get cheaper, these other components make up a greater share. In the case of agentic AI, the costs associated with API usage for agent interactions can add up quickly, even with lower per-token prices.
Pricing for end-user applications is still a complex question. Outcome-based pricing may make more sense than subscription-based pricing in many domains.
Software-as-a-Service vendors typically price based on value, while Service-as-a-Software vendors often price per transaction or resource consumption. Startups may be better positioned to offer agents in an outcome-based model given their narrow scope and dedicated focus on one problem or industry.
The competitive landscape has intensified as the technology has been democratized. Differentiation through unique data, workflows, and domain expertise is key.
While the pace of innovation is thrilling, businesses must also account for considerable uncertainty in such a dynamic space. Interoperability and flexibility in architecture are essential.
Closing thoughts
The dizzying progress in LLMs capabilities and costs is a testament to the transformative potential of the technology. As barriers fall and new opportunities emerge, businesses big and small are racing to bring increasingly powerful AI to every domain.
Though challenges remain, one thing is clear–the age of accessible, capable, and affordable AI is upon us, and the implications will be profound.
As costs continue to plummet and capabilities soar, the question is no longer if or when companies will adopt LLMs, but how they will harness them to innovate, compete, and create value in a new era of intelligent machines.
Thanks for reading The Neural Network! Subscribe for free to receive new posts and support my work.