Generative AI solutions driven by foundation models will usher in a new era in computing and business. This Impact Radar identifies the key components that product leaders must understand to exploit GenAI opportunities and deliver value to clients.
Overview
Key Findings
- Generative AI (GenAI) driven by foundation models is rapidly moving from hype to grounded reality as these models have reached a tipping point in effectiveness and accuracy. In the very near term, the race is on for embedded GenAI applications and the scaling of large language models (LLMs).
- GenAI will impact nearly all technology companies and their products, as well as their customer experiences.
- Domain-specific models as a service (MaaSs) will be one of several significant growth opportunities in the near term and become a main battleground among AI specialists, GenAI native providers and tech giants.
- Increasing regulatory pressure, rapidly evolving competitive forces and shifting players are the main challenges for technology providers to navigate as the market is quickly evolving.
- In the longer term, intelligent automation and productivity increases will give way to multiagent generative systems that will initiate a new era of how people will work and accomplish complex tasks.
Recommendations
Product leaders developing GenAI-enabled products and services should do the following:
- Develop a “client zero” approach by deploying and testing your GenAI-enabled products and services internally with clearly defined business outcomes.
- Prioritize the most prevalent use cases, such as enterprise search/knowledge mining and virtual agents, as these evidently already deliver real value to users.
- Draw an investment roadmap that prioritizes opportunities, such as LLMs, MaaS, GenAI-augmented applications and GenAI virtual assistants, because their impact (mass) will be very strong.
- Create a competitive edge (and trust with your customers) by centering your GenAI offerings on solid guardrails and hallucination management as part of comprehensive AI governance and responsible AI strategy.
- Hold off with long-range GenAI technology investments, such as multiagent generative systems, until you have mastered the near-term technologies.
Strategic Planning Assumptions
By 2026, single-modality AI models will lose out to multimodal AI models (text, image, audio and video) in over 60% of GenAI solutions, up from less than 1% in 2023.
By 2026, workflow tools and agents will drive efficiencies for 20% of knowledge workers, up from less than 1% today.
By 2027, foundation models will underpin 70% of natural language processing (NLP) use cases, up from less than 5% in 2022.
By 2027, 30% of environmental, social and governance (ESG) software offerings will leverage GenAI for automatic sustainability report generation.
By 2028, 20% of repetitive processes will be automated by domain-specific GenAI implementations in every industry.
Analysis
An Emerging Tech Impact Radar is an analysis of the maturity, market momentum and influence of emerging technologies and trends.
Product leaders should use the Radar Profile range to plan investment timing in the related emerging technology or trend (ETT). “Range” represents Gartner’s estimate of time to reach early majority (more than 16% target market adoption), not when product leaders should act on investment. Considering time to plan, develop and launch, a starter guide to product leader investment timing, based on product strategy, is as follows:
- First movers should be acting now on items in the 6-to-8-years ring (or beyond).
- Fast followers should be acting now on ETTs in the 3-to-6-years ring.
- Majority followers should be acting on ETTs in the Now and 1-to-3-years rings.
- Laggard followers can wait until the ETT has passed through to early, or even late, majority.
The broad availability of large datasets and tools has made GenAI accessible to the public. This is driving enterprise application development and citizen developer growth like no other technology before. The approachability of GenAI is reflective of the use cases it can address, because it is as much about automation and efficiencies (generative agents) as it is about more intuitive experiences (conversational AI) and convenience (e.g., summarization). This unprecedented technology diffusion is impacting virtually all markets, their products and customers. As a consequence, GenAI will drive significant shifts in competitive forces and unlock new business opportunities for technology and service providers.
To better understand the underlying technologies and trends of GenAI, we will discuss 25 of the most important GenAI technologies along four main themes.
Model-Related Innovations Are at the Core of GenAI Offerings
This theme highlights elementary components, such as LLMs, as well as new innovative approaches to business models (models as a service). Knowledge graphs, as a relatively mature technology, are expected to play an important role in the GenAI domain, because they can be used to improve the performance of GenAI-enabled applications. LLMs have been at the center stage of this GenAI era with vendors publishing new iterations of their models on a weekly basis in the last 10 months. LLMs have reached a tipping point in terms of accuracy and effectiveness, attracting large capital investments and R&D development on a broad scale. At the same time, this rapid development initiates the biggest area of concern: hallucinations, leading to undesired outcomes such as bias, misinformation and ultimately distrust. While post hoc methods to reduce hallucinations will play an increasingly important role, they have limitations that cannot be overcome unless the foundation model provider takes responsibility and action.
Model Performance and AI Safety
The user plays a critical role in reducing GenAI risk and setting guidelines for vendors to manage their GenAI responsibly. Hallucination management is a key focus area for the industry. The market is slowly migrating from human-in-the-loop to user-in-the-loop concepts, which is an effective tool to improve model performance. The challenge will be, since there are new use cases emerging almost every day, that regulatory frameworks will always be a step behind latest technological developments. Organizations will have to adopt responsible AI-by-design approaches and invest in hallucination management capabilities to improve model performance.
Model Build- and Data-Related Advancements
This theme discusses some of the critical steps that involve building a GenAI model and the decisions that have to be made on each of these steps and building blocks. We expect multimodal GenAI models to become mainstream in a short time frame, standing out as one of the largest-impact technologies in this GenAI Radar. It removes conventional data barriers and allows AI to support humans in performing more tasks, regardless of the environment and type of data available.
Emerging types of datasets — synthetic data — can enhance the dataset available from the real world. At the current rate of adoption, we expect synthetic data to play a critical role in the near term to the point where it will become inevitable as real-world data will not keep up with the demand for training data. This represents a great opportunity for vendors to offer value, especially in situations where high-quality data is needed or real-world data cannot be used and sourced.
The Next Generation of AI-Enabled Applications
We expect a myriad of new applications to emerge over the next three years, of which some will enable new use cases while others will enhance existing experiences, for example GenAI-enabled virtual assistants. We expect new applications such as workflow tools and agents to have a fundamental impact on how people will work and complete their tasks. Advanced simulation techniques such as simulation twins, once technology cost comes further down, will enable test environments at a fraction of the cost and time required for testing in the real world. The disquieting side of these phenomenal innovations is applications that enable the creation of deepfakes, of which some are entertaining while others mostly generate disinformation with negative consequences for society. See Figure 1 and Table 1.
The Impact Radar
Figure 1: Impact Radar for Generative AI
A bullseye-style image that plots 25 emerging technologies and trends in the generative AI space. The further a dot is from the center, the longer until its impact on the space. Dot size represents the potential impact of the technology or trend in this space.
Emerging Technologies or Trend Profiles
Table 1: Emerging Technologies in GenAI Based on Time to Adoption
Now | 1 to 3 Years | 3 to 6 Years | 6 to 8 Years |
Now
Knowledge Graphs
Analysis by Danielle Casey, Afraz Jaffri
Description: A knowledge graph (KG) is a machine-readable data structure describing the relationship between heterogeneous data via a network of nodes (vertices) and links (edges/arcs). KGs represent knowledge of the physical and digital worlds, including entities (people, companies, digital assets) and their relationships, which adhere to a graph data model. A KG consists of one or more of the following components — ontology, taxonomy, vocabulary, graph databases, semantic-mapping tools, a data-mapping framework and the ability to perform data extraction and integration from heterogeneous sources. Applying inferencing to a KG allows organizations to discover new relations between existing nodes.
Sample Providers: AllegroGraph, Amazon, Attivio, Cambridge Semantics, Neo4j, Ontotext, Oxford Semantic Technologies, Stardog, TigerGraph,Timbr.ai, Tresata
Range: 0 to 1 Year
The range for knowledge graphs is Now (down from one to three years in 2022), as KG adoption has rapidly accelerated in conjunction with the growing use of AI, generally, and large language models (LLMs), specifically. GenAI models are being used in conjunction with KGs to deliver trusted and verified facts to their outputs, as well as provide rules to contain the model.
KGs provide the explicit knowledge, rules and semantics needed in conjunction with AI/ML methods for pattern recognition. KGs are being adopted in conjunction with AI because it renders data actionable by:
- Improving use of unstructured data held in documents, correspondence, images and videos, using standardized metadata that can be related and managed.
- Improving management of numerous data silos where data is often duplicated, and where meaning, usage and consumption patterns are not well-defined.
- Improving business insights by identifying influencers, customer segments, fraudulent activity and critical bottlenecks in complex networks.
KGs may be present in consumer products and services, such as smart devices and voice assistants, chatbots, search engines, recommendation engines, and route planning.
Despite these drivers, several inhibitors are limiting KG adoption:
- KGs are an underlying technology in many applications, meaning customer awareness is low and business value difficult to capture.
- Methods to maintain KGs as they scale from prototype to production — to ensure reliable performance, handle duplication and preserve data quality — remain immature.
- The graph DBMS market is fragmented, causing confusion and hesitation among adopters. However, graph DBMS technology and proprietary graph query languages are improving to handle the storage and manipulation of graph data structures at scale.
- Enabling internal data to be interoperable with external knowledge graphs can be a challenge.
- In-house expertise, especially among small and midsize businesses, is scarce, and identifying third-party providers is difficult.
Despite these challenges, the inherent benefits and frequent necessity of KGs for AI/GenAI performance will drive the technology to early majority adoption over the next year.
Mass: High
The mass for KGs is high because KGs capture and represent complex relationships, improving the performance of GenAI-enabled applications. KGs are a key underlying technology and act as the backbone of several products that will be impacted by GenAI, including:
- Search and recommendation engines for data exploration and discovery
- Data and analytics engines for data insights
- Enterprise decision and knowledge management solutions
- Virtual assistants for answering questions and task support
The impact of KGs prevails across business function and industry, with KGs driving business impact in a variety of different settings, including:
- Digital workplace (e.g., collaboration, sharing and search)
- Automation (e.g., ingestion of data from content to robotic process automation)
- Machine learning (e.g., augmenting training data)
- Investigative analysis (e.g., law enforcement, cybersecurity or financial transactions)
- Digital commerce (e.g., product information management and recommendations)
- Data management (e.g., metadata management, data cataloging and data fabric)
Recommended Actions:
- Target a set of AI use cases where knowledge graphs can be utilized by examining the business needs in up to three functional areas, with focus on discovery, search and recommendation.
- Increase adoption of KGs by highlighting their ability to reduce data silos and increase actionable data use.
- Reduce the KG implementation barrier by creating a set of KG-based services and integrations.
- Decrease time to value for KG development by taking an agile approach, reusing industry standard ontologies, and adapting with minimum viable ontologies and minimum viable graphs.
Recommended Reading:
- Emerging Tech: Venture Capital Growth Insights for Graph Technologies
- How to Build Knowledge Graphs That Enable AI-Driven Enterprise Applications
- AI Design Patterns for Knowledge Graphs and Generative AI
- How Large Language Models and Knowledge Graphs Can Transform Enterprise Search
1 to 3 Years
AI Code Generation
Analysis by Radu Miclaus
Description: AI code generation is an application of GenAI that uses large language models (LLMs) for generating code based on prompt instructions submitted by the user. The main facility to deliver code generation capabilities is through AI code assistants that are configured as productivity tools for coders directly into their code development environments. The LLMs on which AI code assistants are built support a wide variety of programming languages and can be used for development work in one language and across languages.
Sample Providers: Amazon, GitHub, Google, OpenAI, Replit, Tabnine
Range: 1 to 3 Years
The factors that are influencing the adoption timeline of one to three years are the risk areas that can affect the security, quality and accuracy of code generation. Code generation has been one of the most popular GenAI modalities for adoption across the developer population. A recent study from GitHub claims that more than 90% of developers have experimented with the GitHub co-pilot either in the context of their work or privately. The opportunities for developer productivity are substantial and that is pushing experimentation at a fast pace in the immediate time horizon.
Risks like privacy and IP protection, inconsistent language coverage, cybersecurity exposure through data leakage, evolving vendor ecosystem and cost structures of including code generation in development flows are a few to mention.
Mass: High
The mass for code generation is high as it will affect the entire technology stack from infrastructure-as-code to front-end app development as well as cross-industry exposure as the digital maturity curves are progressing in most verticals. The use cases of code generation can vary based on task and present a wide spectrum including:
- Generate boilerplate code
- Generate regular expression (regex)
- Rewrite code using correct style/optimal code
- Rewrite code in different languages (code translation)
- Refactor legacy code
- Write code to use an existing third-party API
- Generate service implementation code based on an API specification
- Explain/comment/document the code
- Write test cases
- Check vulnerabilities
- Automate packaging features with security layers for testing and delivery
- Monitor and provide recommendations for improvements as applications mature in usage
- Implement algorithmic logic
As engineering teams in enterprises start onboarding in higher volumes, the quality of the models will likely improve due to exposure to higher quality and tested codebases that run applications in production. As the cost of refining models on private code repositories goes down, the ability for enterprises to have their own customized LLMs optimized for their programming languages and their code structures and standards will increase the confidence in code generation and, with that, the adoption.
Recommended Actions:
Product leaders looking to accelerate the entire software development life cycle for their own developers and developer personas should take these steps:
- Work with your engineering leader to explore ways for accelerating the product development and testing cycles by adopting AI code assistants-enabled integrated development environments (IDEs), adding support for multiple development frameworks and languages. Accelerate developer experience within the workflow your platforms provide and the coding platforms by augmented IDE services that allow for support.
- Expand your target user base by adding the capability to build low-code and no-code workflows within the offerings that enables citizen technologists to prompt the platforms and accelerate their development of components.
- Build awareness of the risks that come with the use of AI code assistants, and plan to mitigate them by selecting vendors that address these risks specifically.
Recommended Reading:
- Emerging Tech: Generative AI Code Assistants Are Becoming Essential to Developer Experience
- Assessing How Generative AI Can Improve Developer Experience
AI-Generated Synthetic Data
Analysis by Vibha Chitkara
Description: Synthetic data is a class of data that is artificially generated rather than collected from real-world events. However, it is often derived and extrapolated from a set of real-world data. The most emerging aspect of synthetic data is the generation of tabular synthetic data and text-based artifacts (as opposed to image-based artifacts).
The recent advancements in LLMs and variational autoencoder (VAE) technology is significantly expanding synthetic text data generation, which, in turn, is used for fine-tuning the LLMs for specific domains or specific tasks such as intent detection. Synthetic data generation capabilities can address many specific AI and analytics concerns, such as data scarcity, accessibility, bias, privacy and regulatory compliance by supplementing real data.
Sample Providers: Anyverse (formerly Next Limit), Bitext, Datagen, Gretel, Mindtech, MOSTLY AI, Parallel Domain, Sarus, Statice by Anonos, Synthesis AI
Range: 1 to 3 Years
The range for synthetic data is one to three years to early majority adoption as we expect by 2025, 60% of data for AI will be synthetic to simulate reality, future scenarios and derisk AI. The popularity of GenAI conversation tools like OpenAI’s ChatGPT has sparked an interest in synthetic data which is helping the vendor community.
Tabular synthetic data is farther away from mainstream adoption than image and video. It is most commonly used for sharing privacy-preserved data for various use cases such as training ML models, software testing, and research and collaboration. To ensure privacy, simple generative techniques are insufficient and require additional privacy measures such as differential privacy, which add noise to the dataset. The generation of tabular synthetic data usually requires multiple iterations to balance accuracy and privacy. A perfectly accurate synthetic dataset, with every relationship between attributes retained, will probably not be completely private. This is because attributes like location and age (when combined), can often provide a “key” between the original source data and the synthetic dataset. Which attributes to retain for accuracy, and which attributes can be randomized or masked for privacy, depends on how the dataset will be used, so use-case identification is critical for the adoption of tabular synthetic data.
Although there has been significant investment in synthetic data, user skepticism, reliance on real data, and a lack of standards, trust and awareness can impede acceptance.
Mass: Medium
The mass of this technology is medium because the adoption of synthetic data is still restricted to supplement real data and not replace it. However, with the enormous growth in demand for robust training data in the LLM arena, we expect accelerated interest in synthetic data. We expect in the next three years that the demand for data will exceed the supply of real-world data, and therefore, organizations will become more reliant on synthetic data.
Adoption is increasing across various industries. Tabular synthetic data is delivering significant advantages in regulated industries such as healthcare and financial services by enabling the use of privacy-compliant data for research and ML model development for high-risk use cases such as fraud detection. Image and video synthetic data is used for training and improving computer vision models across most industries, especially in manufacturing, retail, aerospace and defense, and automotive for training autonomous vehicles.
Ongoing digital transformation will surface many unique use cases for which the ability to deliver value is constrained by the availability of data. Synthetic data will be key to differentiation for the “long tail” of use cases. Some companies are already pioneering unique AI solutions based on synthetic data, for purposes such as developing personalized treatments and identifying rare manufacturing anomalies.
Recommended Actions:
- Deliver successful and differentiated offerings by understanding the customer’s specific needs and business use cases to deliver the right synthetic data that is balanced in the appropriate way.
- Educate customers by presenting real-world use-case examples and customer references for the successful application of synthetic data and its impact on the business.
- Demonstrate cost-benefit analysis by comparing the use of synthetic data plus real data versus use of only real data to train AI models.
- Incorporate your synthetic data within your data management principles by setting policies on storage, editing and retrieval once synthetic datasets become important to business outcomes like ML training results, software testing, etc.
Recommended Reading:
- Emerging Tech: Top Use Cases for Image and Video Synthetic Data
- Emerging Tech: Tech Innovators in Synthetic Data for Image and Video Data — Domain-Focused
- Emerging Tech: Tech Innovators in Tabular Synthetic Data — Domain-Focused
AI Model as a Service
Analysis by Whit Andrews
Description: An AI model as a service (AIMaaS) provides AI model inference and fine tuning offered as a consumable service by cloud providers. The underlying AI models are from the cloud providers themselves, other tech companies or open-source initiatives. As foundation models and other AI models proliferate, cloud providers seek to offer a variety of such models to their clients. This includes both direct access to pretrained models and the ability to fine-tune such models for custom use.
Sample Providers: Alibaba Cloud, Amazon Web Services, Google, Hugging Face, IBM, Microsoft, OpenAI, Oracle
Range: 1 to 3 Years
The range for AI models as a service is one to three years because there will be heavy use of cloud GenAI services for small-scale and low-value needs. However, organizations are today seeking ways to invest in AI in ways that allow them to experiment with and evaluate new GenAI capabilities. The proliferation of AIMaaS choices increases the complexity of choice and will force new approaches to integration and interoperation across cloud services. However, most generally used services will be selected more based on the hyperscaler that provides them than on price or performance.
Most organizations are adopting AIMaaSs as a means of gaining access to the specific functionality they need in order to experiment and consolidate the vendors they employ. AIMaaSs allow organizations swiftly to experiment with different models and deliver value more quickly into their organizations. However, AIMaaSs also increase the complexity of application and AI architectures.
Mass: High
The mass for AI models as a service is high impact because they will impact most aspects of enterprise business in all verticals. As GenAI evolves, so will multitudes of models that address challenges aligned with specific industry and business function domains. Models as a service allow organizations to build solutions on top of models developed and maintained by external providers. We expect domain-specific models as a service to evolve into a vibrant market.
Additionally, as more and more massive large language models evolve, the host organizations will apply the economics behind models as a service to meet their enormous expense. This presents a new business opportunity for the cloud hyperscalers.
About 80% of organizations are doing at least half of their AI use cases as aspects of other business applications. In many cases, that AI functionality will be acquired under the auspices of the larger application vendor, but it will often be from independent partner vendors. AIMaaSs allow for vendors to remain agile in their model delivery, whether they are the dominant vendor or offering functionality from independent smaller vendors.
Discussions in Gartner’s Peer Community through 2023 showed that clients’ interest in GenAI API services was related to their reliability and integrity. The clients also expressed concerns about proprietary data in free services, leading them to express interest in services where data security is a feature. AIMaaSs from reputable vendors allow for agile adoption of new functionality without increased data risk.
Recommended Actions:
- Explore ways to capitalize on the growing AIMaaS business opportunity by becoming either a provider of models or a user building solutions on top of them.
- Partner with trusted vendors of AIMaaSs in order to leverage their credibility, security and integrity instead of chasing lesser-known but cooler vendors.
- Document how end-user customers’ data is protected, including any steps you take to anonymize its key aspects and make the process safer.
Recommended Reading:
AI Molecular Modeling
Analysis by Bill Ray
Description: AI molecular modeling uses simulation techniques to rapidly test a wide range of potential treatments by modeling how different compounds will bind and interact with target molecules. This process reduces the cost and time of development and allows the testing of novel compounds that can create new approaches in treating diseases and conditions.
Sample Providers: Insilico Medicine, Absci, DeepMind, Exscientia, Relay Therapeutics, Schrödinger, Atomwise, XtalPi, Nimbus Therapeutics
Range: 1 to 3 Years
AI molecular modeling will enter mainstream adoption in one to three years as the commercial benefits become impossible to resist. Early adopters are already benefiting, with several AI-native drug discovery companies progressing into clinical trials. Training AI for molecular modeling is still expensive and time consuming, but once trained, the AI can greatly accelerate development timelines and reduce costs, raising high expectations in the R&D community. In addition, many established pharmaceutical companies have formed billion-dollar discovery partnerships with AI companies to explore the technology.
In 2022, Nature published a study that found the top 20 AI-based drug discovery companies already have around 160 disclosed discovery programs in preclinical testing ( AI in Small-Molecule Drug Discovery: A Coming Wave? Nature). This compares to the 20 largest pharmaceutical companies globally, which have around 330 such programs using traditional methods. While AI companies are focused on established targets, rather than more-speculative research, this comparison demonstrates how AI molecular modeling is accelerating the rate of drug discovery to the point where it will rapidly become mandatory for companies wishing to remain competitive.
Mass: Medium
The mass (impact) of AI molecular modeling is medium, as the large impact is (immediately) limited to the pharmaceutical industry, spreading to biotech, cosmetics and a range of other fields over time. The indirect benefits of cheaper treatments made available more quickly will have a broader impact on society, though that’s harder to quantify. In an industry where 90% of trials fail, any technique which can reduce the cost of those trials will have a significant impact. However, the application of AI molecular modeling also enables the creation of novel compounds, by reducing the cost of testing such compounds to the point where wide-ranging experimentation is delivering unexpected results.
Existing projects have a strong emphasis on well-established targets as appropriate testing grounds, which is driven by a desire to de-risk internal pipelines by focusing on targets with validated biology. The purpose is to prove the viability of the technique, as much as to develop new treatments, but this will rapidly change as the first wholly AI-designed-and-developed drugs pass through clinical trials and become broadly available, which we expect to happen within the next three years.
Recommended Actions:
- Identify parts of the drug development process where the implementation of broader AI-enabled techniques will have the maximum business value by partnering with stakeholders to understand current inefficiencies and future opportunities.
- Help drug development companies differentiate by assisting them in customizing their models and integrating proprietary data into the training process.
- Extend product offerings by integrating other GenAI capabilities, such as the identification of novel biological targets, onto a drug-development platform.
Recommended Reading:
- Quick Answer: How Is AI Being Used in Preclinical Drug Development?
- Emerging Tech: Top Use Cases for Generative AI
- Use-Case Prism: Generative AI for Life Sciences
Diffusion AI Models
Analysis by Radu Miclaus, Annette Jump
Description: Diffusion models are generative models that use probabilistic variation to add noise to data (for example blurring an image) and then reverse the process (clearing of the image) to generate new samples of data. Diffusion models have proven more effective than generative adversarial networks (GAN) especially in applications related to image processing/synthesis/summarization and computer vision.
The main diffusion model architecture is inspired from thermodynamics and instead of learning data distributions, it makes use of a series of distributions of data and noise, using Markov Chain stochastic models for events sequence, to remove noise from the data gradually to produce generated samples of data (images or other modalities).
Sample Providers: Adobe, Google, Hugging Face, Midjourney, OpenAI, Shutterstock, Stability AI
Range: 1 to 3 Years
The range for diffusion models to reach early majority adoption is one to three years because applications targeted for creative content creation are already available and evolving. DALL-E, a diffusion model from OpenAI was one of the first generative applications that captured the imagination of the creative user community as well as regular technology users. Since then, more and more applications for graphic design including multimodal type outputs (avatars, video, others) have been penetrating both the consumer and enterprise market.
While the experimentation in the creative user community of diffusion model-based applications has been growing, so is the focus on IP protection scrutiny and concerns around generative output being produced based on models trained on protected content. This can expose companies to IP infringement litigation. As regulation around IP protection as well as self-regulation of vendors of models and applications will continue to evolve and adjust to the pace of change.
Mass: High
The mass for this technology is high because it can be applied to multiple areas and industries. While application in the image generation and editing has been one of the front-running applications of diffusion models, the use cases for diffusion models include:
- Image processing: generation, resolution, restauration, editing, anomaly detection
- Computer vision: video generation, semantic segmentation, resolution
- Multimodal generation: text to image, text-to-3D, text-to-motion, text-to-video, text-to-audio/sound
- Natural language processing (various use cases)
- Temporal data processing: time series data imputation, forecasting, anomaly and signal detection
- Applications based on sequences and connections: material design, medical image processing, biochemistry applications (industrial and life science applications)
The creative space for augmentation of consumer brand interaction (especially as augmented reality and virtual reality and metaverse applications advance), as well as the gaming industry, will continue to be big drivers of the mass for applications using diffusion models.
Recommended Actions:
- Advance the content creation capability for enterprise software with human-augmentation capabilities by exploring both open-source and proprietary diffusion models to benefit from GenAI-enabled productivity experimentation.
- Build or integrate with models developed in an ethical manner, without exposure to IP protected content to develop competitive advantage in the enterprise use cases that are focused on mitigating these types of risks.
- Invest in superior prompt engineering experience and a support community for sharing best practices for refining the completion from diffusion model-based applications.
Recommended Reading:
- Innovation Insight for Artificial Intelligence Foundation Models
- Emerging Tech: The Key Technology Approaches That Define Generative AI
Embedded-GenAI Applications
Analysis by Annette Zimmermann
Description: Embedded-GenAI applications are existing software applications that have been enhanced by embedding GenAI capabilities to improve on existing use cases or deliver new ones. As opposed to native GenAI applications, embedded applications were not initially designed to leverage GenAI technology. They are legacy systems, such as customer relationship management systems (CRMs), that receive net new capabilities via the addition of GenAI.
Sample Providers: Capgemini, Cognizant, Google, Hexaware, kama.ai, Microsoft, Quantiphi, TelemetryDeck, Teleperformance
Range: 1 to 3 Years
The range is one to three years because the market is exploding with a wide range of vendors, from startups to hyperscalers and global AI service providers, pursuing GenAI projects (estimated 200% growth year over year) to embed GenAI into existing applications. Current areas of focus are powering enterprise content search engines or creating an effective knowledge management system. The majority of vendors significantly increased R&D spend on GenAI while augmenting their expertise via hiring and dedicated training. We currently view the barrier to adoption of embedded-GenAI applications to be extremely low, and we expect this to fuel rapid adoption over the next three years. Not only is there an enormous number of end users inquiring about how to embed GenAI into their systems and applications, but many of these early “let’s do something with GenAI” projects generate quick wins and tangible results, such as increased productivity or an improved customer experience. In addition, some of the largest global AI service providers reported an active opportunity-and-leads pipeline of, on average, 200 to 400 projects as of July 2023.
Despite these fast developments, some challenges remain. The main challenge with embedded-GenAI applications lies in the nature of “upgrading” a legacy application with next-generation technology: The user organization is often not prepared in terms of cultural readiness as well as data-readiness. The former relates to employees’ concerns whether the new GenAI application could replace them, which may lead to delays in adoption. The latter arises from the fact that many organizations are not “data ready” for GenAI because a common data structure simply does not exist. This and eliminating data silos are needed to drive adoption of the low-hanging fruit in GenAI, such as enterprise search, virtual assistants and the automation of manual tasks (e.g., summaries and reports).
Mass: High
The mass for this technology is high because we are witnessing a multitude of industry efforts to embed GenAI into existing applications, including banking and financial services; life science; telecom and media; transportation; airlines; and consumer packaged goods (CPG) companies. In addition, cross-function applications are also major targets for embedded GenAI, including customer relationship management (CRM), marketing and sales applications, and contact center software that are enhanced with GenAI to serve many different industries.
Technology advancements are made continuously by augmenting existing applications with LLMs in combination with other capabilities and techniques, such as knowledge graphs, reasoning, plug-ins and retrieval-augmented generation (RAG). For example, for the development of an advanced semantic search solution leveraging GenAI for a large pharma company, the GPT-3-based model extracted information from medical research articles. The architecture pattern RAG was used on a library of 2,000 historical papers to ensure sources were delivered along with result output. The process of extracting the most relevant information from medical research articles was previously a manual process. With the enhancement of the LLM-powered search solution, process efficiency increased tenfold, reducing manual effort significantly. Accuracy and lack of short-term memory in extended chat conversations, as well as effective guardrails, remain challenges that could potentially lower the overall impact of embedded-GenAI applications. But as seen in the just described example, tools and techniques are available that contribute to organizations’ transparency and reliability efforts.
Recommended Actions:
- Make your clients “GenAI ready” by offering data advisory services, such as data integration management and data governance, or provide support to obtain such services from a third party. A strong data strategy can be critical to the success of any GenAI application project.
- Help your clients overcome cultural challenges averse to the adoption of embedded-GenAI applications by actively getting involved and directly supporting change management in those organizations.
Recommended Reading:
- Emerging Tech: Top Use Cases in Generative AI
- Emerging Tech: Top Emerging GenAI Use Cases for AI Services
- Emerging Tech: Tech Innovators for Intelligent CRM Applications
GenAI-Enabled Virtual Assistants
Analysis by Danielle Casey
Description: GenAI-enabled virtual assistants (VAs) represent a new generation of VAs that leverage large language models (LLMs) that deliver superior functionality well beyond previous VA methods. GenAI is being used to improve VA performance, add new functionality, extend task automation and support new value outcomes.
Sample Providers: Amelia, Anthropic (Claude), Avaamo, Baidu Research (ERNIE), Google (Bard), Microsoft (Bing Chat), Moveworks, OpenAI (ChatGPT), Openstream.ai
Range: 1 to 3 Years
GenAI-enabled VAs are one to three years from early majority adoption due to significant proof-of-concept activity (largely driven by GPT) over the past eight months; some vendors have been using LLMs (such as GPT2) for several years. Providers are either currently piloting, already using, or planning on adding GenAI capabilities to their R&D roadmap, as LLMs will materially augment VAs and there is significant end-user demand. Many VA providers are able to pivot rapidly to using GenAI due to pre-existing knowledge graphs, indexed vector databases, and technical in-house expertise. Coupled with existing, high-performing out-of-box models, repositioning VAs as Gen-AI-enabled is relatively achievable for most vendors. GenAI-enabled VAs are closely aligned with “expert level 4” virtual assistants as described in Emerging Tech: Use Generative AI to Transform Conversational AI Solutions..
By 2025, GenAI will be embedded in 80% of conversational AI offerings, up from 20% in 2023. This means that the distinction between GenAI VAs and VAs will rapidly diminish, as GenAI becomes an expected feature of most VAs.
Despite this activity, there are obstacles inhibiting the adoption for GenAI-enabled VAs, including:
- Accuracy Issues — LLMs are not entirely accurate and have the unique problem of hallucinating (i.e., making up facts). Techniques to improve accuracy and reduce hallucination include prompt engineering and policy injection, knowledge graphs and indexed vector databases, and model fine-tuning.
- Lack of Explainability — LLMs differ from traditional, nongenerative conversational AI models in that they lack transparency to provide explainable outcomes.
- Regulatory Uncertainty — LLMs have received particular scrutiny from regulators, across geographies. Moreover, advancements in responsible AI trail GenAI adoption. Adopters must apply ongoing risk mitigation measures to avoid compliance issues.
- Cost of Customization — Creating industry-specific LLMs by retraining open-source models on an industry dataset is costly and time-consuming. Yet, custom LLMs will augment domain-specific VAs (such as VA for pharma, VA for transportation and VA for retail).
- Data Privacy and Security Concerns — There are outstanding data privacy and security concerns around LLM data usage.
As the market gains more certainty in mitigating and/or managing these challenges, GenAI-enabled VAs will advance into early majority customer adoption.
Mass: Very High
GenAI-enabled VAs have a very high mass due to the impact across industries, use cases and business functions, as well as the addition of new features and functionality. Additionally, these advanced virtual assistants hold the potential to transform how people interact with technology, moving the primary interface, for many use cases, from screen and keyboard to voice commands. It also represents a shift from computers helping humans to accomplish work to humans helping the advanced virtual assistants do the work.
Examples of GenAI-enabled capability advancement include:
- Improving customer support by empowering advanced virtual assistants to autonomously complete complex customer requests through natural language dialogue
- Improving employee support by providing text summarization, content generation, and content visualization features
- Improving usability through multimodality, due to LLM flexibility around data ingestion and supported outputs
Some example use cases include:
- Personalized marketing/sales engagements with customers and prospects and improved lead generation
- Virtual tutoring assistants to support students doing homework, including solving math problems, writing essays, etc.
- Virtual clinician assistants for real-time summarization and translation of patient conversations, offering disease diagnosis assessments, predicting future health issues, recommending prescriptions, conducting medical record searches, and scheduling follow-up appointments with relevant doctors
- Virtual legal assistants to assist with case management by helping draft contracts, answer or retrieve legal information, and provide recommendations on legal arguments based on court case history
- Virtual employee assistants to help with performing daily administrative tasks
Some industries are experiencing faster adoption due to near-term benefits and lower risk (such as travel and retail) compared to more regulated industries (such as healthcare and government). Also, some use cases are considered lower-risk if they are internal or external facing and if a human is in the loop.
GenAI VAs will be able to perform additional tasks, support more complex use cases, deliver higher levels of operational efficiency, reduce costs and enable new service offerings. An eager customer base is willing to pay for these performance improvements and associated value outcomes. To not use LLMs is to miss out on significant and immediate revenue opportunities in the next two years.
Recommended Actions:
- Remain competitive in a rapidly evolving market by adding GenAI to your VAs now, or risk losing market relevance.
- All end-user-facing software providers should prepare for a future where virtual assistants do more and more meaningful work, via voice commands, on behalf of the user.
- Prepare for future regulations and differentiate your GenAI-enabled VAs by investing in responsible AI features and functionality.
- Create an optimal GenAI-enabled VA by evaluating whether to chain multiple LLM API integrations, embed an out-of-box model into your offering, or retrain an LLM to create a customized model. Each option must evaluate performance requirements against time and cost.
Recommended Reading:
- Emerging Tech: Use Generative AI to Transform Conversational AI Solutions
- Emerging Tech Roundup: ChatGPT Hype Fuels Urgency for Advancing Conversational AI and Generative AI
- Emerging Tech: Top Use Cases for Generative AI
- Emerging Tech: Primary Impact of Generative AI on Business Use Cases
GenAI Extensions
Analysis by John Santoro, Jim Hare
Description: GenAI extensions, such as “plugins” or “agents,” are tools that augment the capabilities of GenAI models by giving the models the ability to retrieve real-time information, incorporate company and other business data, perform new types of computations, and safely take action on the user’s behalf. GenAI extensions indicate, via their metadata, the types of prompts that they support, and they map keywords in those prompts to API calls that can access information (e.g., the costs from a WordPress website) or take actions (e.g., create an integration with Zapier) that otherwise could not be performed by the GenAI model.
Sample Providers: Amazon, Google, Microsoft, OpenAI
Range: 1 to 3 Years
The range for GenAI extensions is one to three years, because this approach is essential to businesses connecting the LLM with the capabilities from which they make money. Currently, providers are creating extensions to connect their content to the user interface tool associated with the LLM (e.g., ChatGPT), where users will select an extension from a marketplace and load it into their environment. However, the majority use case quickly will be providers embedding LLMs into their products via API, so that users will enter a prompt within a product and not even be aware that the prompt is executed by an LLM with the GenAI extension to give it product-specific capabilities.
GenAI models will need an extension architecture to access data that changes more frequently than the model can be updated or to take actions. Therefore, the speed with which GenAI models roll out their extension architecture and support for extension development will affect the speed of adoption by product providers and independent software vendors. A lack of a standard extension architecture across AI models will inhibit adoption, as providers and developers will have to reproduce some portion of their effort developing, testing, supporting and maintaining plugins for multiple GenAI models.
Mass: Medium
The mass for this technology is medium, because not all providers and organizations will use this approach. Extensions allow LLM users to access real-time data and processing, but some businesses will use custom models or chatbots to achieve the same result without extension. The impact of this technology also will depend on business factors like the availability of support for creating extensions, the availability of an extension marketplace for customers to discover extensions that they wish to buy, the cost of developing extensions, and the extent to which providers can profit from them.
Recommended Actions:
- Create a GenAI extension when you need to enhance a product that requires dynamic information or needs to take actions based on GenAI model prompts.
- With any plugin, verify the accuracy of the information it provides before attempting to use it. AI accuracy and hallucinations remain common issues with GenAI models.
- Understand the user interfaces and programming interfaces to build extensions, portability across GenAI models, and licensing costs in order to determine the feasibility and profitability of building GenAI extensions.
Recommended Reading:
- How to Package and Price Your Generative AI-Enhanced Product
- Quick Answer: How Will the Generative AI Plug-In Market Evolve?
- Effective API Pricing Protects Your Product From GenAI Misuse
Hallucination Management
Analysis by Annette Zimmermann, Annette Jump and Ray Valdes
Description: Hallucination consists of incidents when content generated by an LLM is nonsensical or blatantly factually incorrect. Hallucinations can emerge from several areas, including: (1) training data quality, suitability, imbalance or mislabeling; (2) an overly complex or simplified model; (3) insufficient amount of training; (4) underlying ambiguity in model input; (5) inadequate prompts; and several others. There are two main approaches to managing hallucinations: (1) There are methods to reduce hallucinations to a certain degree, which can be called “after-the-fact mechanisms” such as prompt engineering and human-in-the-loop. (2) The root cause of the problem stemming from the model (or training data) can be addressed. It can be addressed only by the vendor that built the model.
Managing hallucinations in Approach 1 involves mitigation techniques that address the problem from a data angle and a model angle. It involves a two-step process in which first the level of hallucination produced is assessed and subsequently the measures are taken to reduce hallucinations. The methods available to evaluate hallucinations — statistical metrics and model-based metrics — are both imperfect and not easy to apply as hallucination can have subjective nuances. In other words, performance expectations of a model may differ among users as well as whether content is perceived as biased or toxic. Gartner expects that over the next three years, advanced tools and techniques will emerge to directly address the causes of hallucinations.
Sample Providers: Amazon, Anthropic, Google, Hugging Face, Midjourney, NeMo Guardrails, OpenAI, Spark NLP, Stable Diffusion
Range: 1 to 3 Years
The range for hallucination management is rather short — one to three years — because hallucinations represent one of the greatest risks to GenAI technology adoption, and thus, managing this risk is a key component of adopter organizations’ GenAI strategy. Data bias, toxic and exclusive content, and misinformation will lead to undesirable outcomes, including low productivity and lack of trust. The increasing availability of post hoc tools to reduce model-based hallucinations and the growing awareness of the risks are accelerating hallucination management adoption. Leading AI service providers have implemented guardrails via different modalities, such as retrieval-augmented generation (RAG). Moreover, there is a renewed focus on data quality, and the growing availability of high-quality datasets helps reduce data-driven hallucinations.
The main obstacle to effective hallucination management is organizations’ lack of the right skills and knowledge to address these problems. This is primarily an issue on the adopter organization side but technology providers are also struggling currently to attract the best talent in this area. Product leaders therefore need to invest in the right skills or acquire them externally to ensure robust performance of their GenAI-enabled products. Few enterprise-grade products are currently available on the market, which is further manifesting the issue.
Mass: Very High
The mass of hallucination management is very high because it will improve GenAI as a whole. This will impact any industry, processes and business units where LLM-based applications are deployed. LLM-based enterprise search engines and knowledge mining are making inroads into organizations as they deliver good results in terms of productivity gains, improved customer experience, faster decision making and cost savings. Additional use cases are observed in the context of customer experience settings, for example, customer-facing virtual assistants and employee-facing virtual agents. However, especially here it will be critical that the model produces accurate outputs in order to maintain a high level of service experience. Certainly, all of these outcomes can only be achieved with good AI model performance. Based on statistical performance evaluation methods such as BLEU, ROGUE and METEOR, around 25% of summaries contain hallucinations. This is roughly in line with Gartner’s recent findings where vendors reported to achieve around 80% summarization accuracy initially in a project and then over weeks of hallucination management, accuracy significantly improves (usually above 90%).
To assist with hallucination management, a combination of different mitigation strategies can be applied. Prompt engineering, retrieval-augmented generation and human-in-the-loop workflows will continue to be the key approaches to improve model performance, of which the latter has the advantage that it can be performed by subject matter experts (while the former two require engineering skills). The selection of mitigation strategy will depend on the use case and root cause of the hallucination because different use cases require different measures. The main limitation to hallucination management impact is inherent to all “after-the-fact mechanisms” described here, namely that they can reduce but usually not eliminate hallucinations when the root cause lies in the model built.
In other instances where the problem stems from the data used for training/fine-tuning it could be feasible to update the datasets. Yet this may lead to unintended consequences: By updating the data to eliminate a certain data bias, it may introduce a new one.
Recommended Actions:
- Eliminate hallucinations issues in your GenAI-enabled solutions by using better fit-for-purpose data as the best approach.
- Improve model performance by Implementing human-in-the-loop workflows as this mechanism is effective as an after-the-fact tool and can be performed by subject matter experts.
- Address data-driven hallucinations in your solutions by analyzing data slices and making it a habit of updating training data (for fine tuning) to minimize the risk of discriminatory and biased content.
- Test some of the post-hoc tools and/or guardrails that are becoming increasingly available in the market keeping in mind that this market is still very emerging and may therefore not always generate the best outcome.
Recommended Reading:
- Hype Cycle for Analytics and Business Intelligence, 2023
- Introduce AI Observability to Supervise Generative AI
Light LLMs
Analysis by Ray Valdes and Arun Batchu
Description: A light LLM is a large language model (LLM) that can support use cases where a massive (or heavy) LLM is infeasible. Light LLMs represent a trade-off between the generalized power of heavy LLMs and the narrower requirements of resource-constrained environments, such as on-premises deployments or edge network nodes, as well as latency-constrained scenarios such as code completion.
Light LLMs can be viewed as a countertrend to the primary market trend, which is the emergence of massive or heavy LLMs. By contrast, light LLMs are much smaller, such as Microsoft Orca (7 billion), MPT (7 billion) and Stability AI (7 billion).
All these sizes can be considered light LLMs in comparison to GPT4 and other heavy models mentioned earlier. The number of parameters that categorize an LLM as light varies in relation to mainstream LLMs. GPT-2 was 1.5 billion parameters when it was released in 2019 and not considered light at the time but would be so today. GPT-3 was 175 billion parameters when released in 2020, which was heavy for the time. By year-end 2023, such a model size could be considered light, given that GPT-4 is 1.7 trillion parameters.
Most of the light LLMs mentioned are open-source, unlike massive LLMs which are not open-source (despite starting out as open-source, as did GPT-2).
Sample Providers: Adept; Alpaca; Cohere; Falcon; Meta; Microsoft; Stability AI; StableCode; Vicuna
Range: 1 to 3 years
Range for light LLMs is one to three years. Strong market demand exists for LLMs that can run in an enterprise data center, or in disconnected mode (no cloud connection), or leverage company data while keeping it private, or tuned to the needs of a particular vertical industry (such as healthcare or legal sector), run on small devices at the edge of a network. Open-source light LLMs provide a good starting point for meeting these requirements. Completing the task requires customization and solution building, which would be done by either vertical-focused vendors, or external service providers, or by in-house technical teams (for the small number of enterprises that have these skill sets).
Light LLMs cannot compare in performance and scope (in terms of generalization, language processing, completeness and reasoning) to massive LLMs. Light LLMs offer these capabilities that heavy LLMs cannot easily do:
- Run on smaller devices, such as non-GPU servers, laptops, smartphones and IOT devices such as Raspberry Pi.
- Trained or fine-tuned at low cost on task-specific user data, such as code and document repositories.
- Customized in various ways, including embedding in applications or in orchestrated agent frameworks, such as LangChain
Therefore, light LLMs can support on-premises local LLMs, edge LLMs and autonomous agents. To achieve these results often requires customization, which means some kind of open-source, business-friendly or permissive license.
Light LLMs benefit from ongoing innovations in more efficient training, inference, tooling, and frameworks. There will continue to be a steady stream of releases by both existing players as well as new entrants. In addition, technical skill sets in implementing LLM-based solutions are becoming more widespread — both in systems integrators as well as in-house teams in Type A organizations.
The result is a virtuous cycle where rapid adoption of light LLMs will occur over one to three years, for those use cases where they are well-suited.
Mass: Medium
The mass for this technology is medium because it only has high impact In certain use cases and industries. Systems based on light LLMs could process natural language inputs in real time and generate responses on devices such as smart speakers, cars, smartphones, factory machinery, building thermostats, field robotics, and construction equipment. This can stimulate market growth and deliver productivity improvements in various industries. In addition, systems based on light LLMs have the potential to reduce costs, offer a competitive edge to early adopters, provide an enhanced user experience, ensure data privacy, and enable new types of applications.
Code generation for software development is another use case where smaller models can be more effective when focused on a specific language or code repository (such as an enterprise’s legacy COBOL code), or in real-time code completion where low latency is a key aspect of the developer experience.
Light LLMs will deliver an incremental change in capabilities in the short term, but in the long term they could bring a revolutionary change. They could enable GenAI to become more pervasive, as it will be possible to run a light LLM on a phone or an electric scooter and many various devices and IoT.
Recommended Actions:
Product leaders should:
- Evaluate use cases and solution requirements in light of capabilities provided by light LLMs, in order to gain benefits such as expanded deployment scope, lower cost, and improved data privacy in return for trade-offs in performance and completeness.
- Ensure an adequate degree of loose coupling and modularity in LLM-based solution architecture, to allow for replacement of one LLM with another, due to the rapidly evolving nature of the field.
Recommended Reading:
Model Hubs
Analysis by Whit Andrews
Description: Model hubs are repositories that host pretrained and readily available machine learning (ML) models, including generative models. Model hubs can serve as for-fee marketplaces for models available for commercial use or repositories of open-source generative models that are entitled as part of a service such as machine learning operationalization (MLOps) platforms. Model hubs increasingly offer automation and governance tools, curated datasets, model APIs and GenAI-enabled applications targeting specific enterprise needs.
Sample Providers: Aleph Alpha, Amazon Web Services, Databricks, Google, Hugging Face, IBM, Microsoft, Replicate, Snowflake
Range: 1 to 3 Years
The range for model hubs is one to three years, because while organizations are intensely focused on experimentation today, the standards for inclusion in hubs and the idea of collecting services (or switching between them) are still in their earliest stages. Even the idea of “data hubs” appears comparatively rarely in Gartner inquiry, and those discussions are mostly separate from AI conversations. That said, a fleet of new models and new model iterations found their way into vendors’ model hubs in 2023, strengthening their offering. Vendors are only just starting to try to differentiate these via different model sizes and transparent data curation. Hubs will expose organizations to models that they might not otherwise discover and will allow them to establish relationships with new suppliers, but for now organizations are focused on a handful of vendors and the abilities that those vendors offer.
Mass: Medium
The mass for model hubs is medium because it will impact mostly enterprises that are willing to invest extra resources in optimizing their model selection and employment based on performance or particular value. Especially, tech giants are driving investment in this space by making domain-specific models of different sizes available to specific verticals.
Organizations are increasing their use-case exploration. In the 2022 Gartner AI Use-Case ROI Survey, the average number of deployed use cases was 41. GenAI is likely to accelerate this growth, which will demand ways to scale the location of effective models that are proven helpful in industries, tasks or even inside specific organizations. Organizations value the relationships they have to their existing vendors and the assurances that come with them.
Software and service vendors will benefit from maintaining sets of secure and trustworthy models that organizations can employ. The sudden proliferation of models entices organizations but disturbs risk managers who are concerned about novel solutions with unclear provenance and that present the potential of supply chain risk.
Recommended Actions:
- Offer a model hub of your own services that features their performance, reliability and unique aspects.
- Inspect other vendors’ model hubs to discover competitive opportunities and explore usage.
Recommended Reading:
Multimodal GenAI Models
Analysis by Danielle Casey and Roberta Cozza
Description: Multimodal GenAI is the ability to have multiple types of data inputs and outputs within a single generative model, such as images, videos, audio (speech), text and numerical data. Multimodality augments the usability of GenAI by allowing models to interact with and create outputs across various modalities. Today, many multimodal models offer processing across two or three modalities (e.g., text-to-code or speech-to-image). This will increase over the next few years to include more modalities.
Sample Providers: Google, Meta, Microsoft, Midjourney, NVIDIA, OpenAI, Stability AI, Twelve Labs
Range: 1 to 3 Years
Multimodal GenAI is one to three years from early majority adoption due to accelerated use of foundation models and experimentation to enable LLMs to acquire common sense beyond text or just one modality in 2023.
Multimodal GenAI is important because robust information about the real world is typically multimodal. Multimodality helps capture the relationships between different data streams and scales the benefits of GenAI across potentially all data types and applications. It removes traditional data barriers by allowing users to interact with, manipulate, and create outcomes from numerous data types. This allows AI to support humans in performing more tasks, regardless of the environment, by meeting users where they sit and with the data that is available. In addition, the presence of large datasets that combine text, video and audio will be a driver of multimodal LLM training. Multimodality is a new capability that has to be learned: Whereas some modalities are well-understood (such as text-to-speech), others are more emerging (such as voice-to-video). Multimodal functionality will increasingly be present in tech offerings, as users demand associated performance.
Multimodal GenAI adoption is currently inhibited by the following challenges:
- Training Challenges — Multimodal GenAI models use deep learning, data alignment and fusion techniques to train, integrate and analyze data sourced from the multiple modalities. Multimodal data has varying degrees of quality and formats versus unimodal data.
- Lack of Data — Data availability may be limited in some modalities. For example, availability of large-scale audio datasets is more limited compared to other modalities like images and text. This impacts model training and accuracy.
- Data Exposure — Multimodal GenAI increases the exposure to a wider range of sensitive data. Examples of particularly sensitive data types include maps or geolocation data, biometric data, or health data.
- Data management — Keeping multimodal data clean, up to date and accurate is difficult. Managing the context and interrelationships between modalities that often have different drift rates is challenging.
- Other Risks — Bias and inaccurate or fabricated outputs are amplified by multimodal data sources. Also, regulations and standards are a work in progress and are lagging GenAI’s capability advancement.
Mass: Very High
The mass for this technology is very high because it supports the creation and expedition of new tasks, workflows and applications, such as data extraction, converting one data type to another, and creating new data outcomes. Applications that support multimodality will have higher automation potential.
Multimodal GenAI will have a transformational impact on enterprise applications by enabling the addition of new features and functionality otherwise unachievable. The impact of multimodality is not limited by industry or use case, and can be applied at any AI-human touchpoint.
Multimodal GenAI is being applied to improve data and analytics (D&A) by supporting the manipulation of text, numerical, voice and visual data to glean insights. Early use cases are in:
- Customer support (such as the call center and virtual assistants operating in text- and image-based environments, like retail and insurance)
- Application performance monitoring (understanding multiple data inputs)
- Website creation (including website design, product descriptions and images)
- Visualization of text and numerical information in data-heavy industries (such as financial services) for D&A support
- AI avatars (which can use visual, auditory and textual inputs)
Over the next one to three years, multimodal GenAI will appear in more and more applications, as the future of AI is multimodal.
Recommended Actions:
- Choose which GenAI model to incorporate into your product offerings by identifying the modalities that are most important for your customers and associated use-case support.
- Determine the immediacy of the need and opportunity for embedding multimodality into your applications by assessing your customers’ primary types and their data silos.
- Build or acquire expertise to cover the technical complexities of processing and integrating data inputs and outputs from diverse multimodal sources, and validate early how these can best be integrated with key legacy or more current workflows.
Recommended Reading:
- Critical Capabilities for Enterprise Conversational AI Platforms
- Emerging Tech: The Key Technology Approaches That Define Generative AI
Multistage LLM Chains
Analysis by Annette Zimmermann
Description: Multistage LLM chain libraries connect different LLMs to solve multiple tasks. For example, they can create a workflow that consists of two chains, i.e., two different models, where the first model performs summarization of a text and the second one performs sentiment analysis. This sequence can be extended further and even connected to other systems such as mathematical suites that perform additional tasks in this chain. Multistage LLM chains represent the basic architecture for building agents.
Sample Providers: Forethought, LangChain, Microsoft
Range: 1 to 3 Years
We expect multistage LLM chains to reach early majority adoption within the next three years as organizations are moving with high speed to deploy this architecture. Evidenced by many provider and end-user discussions over the past months, a certain portion of deployments has already started in 1H23, while the majority is currently in the implementation and upscaling phase. The architecture is implemented so rapidly that global adoption will likely reach 15% within the next three years.
As with any LLM-based technology, there are several factors that can slow down the adoption curve. Inaccuracy, security and lack of transparency present significant risks. Moreover, users are exposed to certain risks when implementing LLMs that are under legal scrutiny for potentially infringing copyrights. Providers are working on overcoming these challenges by implementing guardrails that apply important policies. For example, they impose access control to conducting queries, remove unsafe responses and increase transparency by delivering the source/origin along with the specific response (retrieval-augmented generation). The biggest challenge at this time is lack of skills and know-how within adopter organizations due to the complexity of creating multistage LLM chains. Even within technology providers, capabilities are still evolving to create more sophisticated chains that connect different models and other systems, such as code generators that are able to perform the required tasks with acceptable accuracy and speed. There are several orchestration tools on the market available today that should make the process easier. These tools help managing LLM chains and the creation of templates, yet they are also still maturing in most cases.
Mass: Very High
The mass for this technology is high because possibilities seem to be endless at this stage as extensively trained LLMs can be deployed as a central reasoning tool that is connected to a variety of other systems, such as search engines, mathematical suites and code generators. The number of use cases is growing rapidly, with certain industries including financial services and insurance; telecom and media; healthcare; and retail being at the forefront of discovering the value. The focus is currently on content creation, improving customer experience and supercharging enterprise search engines. We are observing a multitude of users with product teams; marketing and sales; human resources; and audit and finance experimenting with applications based on multistage LLM chains. For example, a global pharmaceutical company leverages one model in the first chain for search (for specific articles) and then another model for entity extraction and summarization supporting the decision-making process.
Recommended Actions:
- Articulate and define the specific business outcome and ROI your multistage LLM chain-based offering will deliver for your client to ensure client expectations and targets are met.
- Overcome adoption challenges by educating customers and prospects on how your offering has guardrails and protections to address LLM risks in the enterprise.
Recommended Reading:
- Emerging Tech: The Key Technology Approaches That Define Generative AI
- Emerging Tech: Top Use Cases for Generative AI
- Emerging Tech: Emergence Cycle for Generative AI
- Quick Answer: What Technology Companies Should Know and Do About ChatGPT
Open-Source LLMs
Analysis by Eric Goodness
Description: Open-source large language models (LLMs) are deep-learning foundation models distinguished by the terms of use, distribution granted to developers, and the developers’ access to source code and the model architecture. Open-source LLMs are made available to the public through a license that enables anyone to access, use, modify and distribute the model source code without restriction.
Sample Providers: BigCode (StarCoder); BigScience (BLOOM); Cerebras (Cerebras-GPT); EleutherAI (GPT-J, GPT-NeoX, Polyglot, Pythia); Google (Flamingo, FLAN, PaLM); H2O.ai (h2oGPT); Meta (GALACTICA, LLaMA, Llama 2, XGLM); NVIDIA (NeMo); Replit (Code); Stability AI (StableLM)
Range: 1 to 3 Years
The range for open-source LLMs is one to three years because open-source LLMs, frameworks, libraries, tools and datasets may accelerate the value recognized from the implementation of GenAI. The key drivers of value are democratizing access, reducing complexity and removing impediment for developers. In addition, open-source LLMs provide access to developer communities in enterprises, academia and other research roles that are working toward common goals to improve the models and make them more valuable, which is also viewed as an accelerant to recognize implementation and time-to-value.
The maturation of open-source communities is also driving the adoption of open-source LLMs. Open-source LLMs are generally supported and enriched by the collaborative power of development communities that continuously refine the models. This driver is predicated on the vibrancy of the community. Some developer communities build on top of these models and fine-tune them for specific use cases and runtime scenarios.
Adoption challenges also inhibit the range of open-source LLMs. These include:
- Investments in data engineering, tooling integration, and infrastructure to train and run open-source LLMs can be high. These costs represent a significant fixed cost, compared with proprietary alternatives, as well as longer time-to-value and an impediment to implementation.
- The complexity of the variety of licensing models in open-source impedes adoption. Open-source can impose restrictions on the consumer and require rigorous review from legal teams before adoption. For example, not all open-source models are certified for commercial use.
- As measured by various benchmarks (e.g., Abstraction and Reasoning Corpus [ARC], BIG-bench, Helm, HellaSwag and TruthfulQA), the accuracy gap between proprietary and open-source LLMs reduces demand. This gap might not matter, depending on the accuracy required for your use case.
Mass: Very High
The mass for open-source LLMs is very high because it will improve customization, privacy and security controls, provide collaborative development and model transparency and reduce vendor lock-in. Ultimately, open-source LLMs offer enterprises smaller models that are easier and less costly to train, and enable business applications and core business processes.
Increased interest in customization is driving open-source LLM adoption. Open-source LLMs are often more flexible to customize than proprietary LLMs because engineers can access open-source LLMs’ model parameters and source code. Such access enables developers to customize these models and have more control over costs, output and alignment for their use cases. Product ownership based on open-source LLMs enables enterprises to continuously develop them, based on internal and customer demands, and makes their applications harder to imitate by competitors.
Open-source LLMs also provide measures of transparency as proprietary LLMs are notoriously opaque. For example, closed-source LLMs do not provide transparency relating to LLM architectures, training methodologies, datasets or access to information relating to model weights and checkpoints. Open-source LLMs offer more transparency, enabling developer inspection and analysis.
Recommended Actions:
- Perform due diligence to understand the legal exposure related to training data and the potential biases in the models, such as verifying the open-source license and checking for restrictions on its commercial use.
- Investigate to ensure that data privacy and security measures are in place when using open-source LLMs to process sensitive information.
- Proactively engage with open-source LLM communities as part of the due-diligence process to discern the positives and drawbacks associated with various models.
- Evaluate different open-source LLMs, based on such factors as performance, resource requirements, compatibility and documentation. Test the models on sample data, and compare their outputs against your defined objectives.
- Consider experimenting with fine-tuning on domain-specific data to assess the LLM’s adaptability to your specific use cases.
Recommended Reading:
- Quick Answer: What Are the Pros and Cons of Open-Source Generative AI Models?
- Quick Answer: How Do I Compare LLMs?
- A CTO’s Guide to Open-Source Software: Answering the Top 10 FAQs
- Tool: OSS Governance Policy Template
- Hype Cycle for Open-Source Software, 2023
Retrieval-Augmented Generation
Analysis by Radu Miclaus
Description: Retrieval-augmented generation (RAG) is an architecture pattern (see AI Design Patterns for Large Language Models) that combines a search function with a generative capability for grounding the output from generative completions. For content consumption, users need both retrieval and synthesis of information. In order for generative synthesis to be based on factual information and reduce hallucinations, retrieval is used to inform and augment the content of the prompts submitted to the LLM generative process with search results from a curated knowledge base. This allows for two things: the generative output provides sources and citations to defend the generated output, and the model gets informed by the most recent information and is not subject to limitations in the data it was trained on.
Sample Providers: Amazon, Charli, Google, Microsoft, Perplexity
Range: 1 to 3 Years
The range for RAG is one to three years because this technique sees strong adoption with an 80%-to-90% growth rate year over year. Applying GenAI for content consumption in the enterprise and for customer applications (like self-service customer support) requires a high level of accuracy and fact-based output of the generative process. Retrieval of information, as part of traditional enterprise search, has been fulfilled by insight engines that need to be configured against knowledge bases in the enterprise. The activities needed for knowledge base activation (indexing, embedding preprocessing and/or building knowledge graphs) and search engineering (building and integrating search pipelines into applications) have traditionally been challenging for enterprises due to skill set gaps, data sprawl and ownership, and technical limitations.
As vendors start offering tools and workflows for data onboarding, knowledge base activation and components for RAG application design, enterprises will engage more actively in the learning curves around robust retrieval in support for grounding the generative applications for content consumption. These learning curves will take some time. They will start with pilots and progress with understanding and perfecting the hybrid RAG pattern and ultimately scaling existing use cases and discovery of new ones. One of the major friction points for the enterprises looking at building RAG architectures will be activating unstructured and semistructured data for retrieval in order to ground the generative output. It’s an area that traditionally has been underinvested in for the enterprise, and now it’s becoming critical.
Mass: Medium
Mass for this technology is medium because of the learning curves of the RAG architecture and ability to fund efforts to improve knowledge activation and retrieval for supporting generative output. In the content consumption area for the enterprise, content gets surfaced either through productivity tools or by custom-built applications. In the case of content consumption within productivity tools, the technology vendors will perfect any elements of the RAG architecture pattern that are relevant to users and tasks at hand within the applications themselves, so that enterprise users can just adopt the new features. For content consumption that needs aggregation of multiple knowledge bases across multiple functions, enterprises need to build custom-made RAG applications. Based on this ratio of out-of-the-box functionality versus custom-built RAG applications, the proportion of adoption of the enterprise will vary.
Productivity gains measurement in support of investment is still a moving target for enterprises. The ability to put a value on the productivity gains from efficient enterprise search has been challenging in the past, even though it’s a common estimation that knowledge workers spend 20% to 30% of their time looking for information.
Recommended Actions:
Product leaders responsible for building GenAI features for content consumption in the enterprise should:
- Focus on features that make knowledge base onboarding, enrichment and activation as streamlined as possible in order to accelerate the learning curves of adopting organizations.
- Work with service teams or partners to build service offerings that can fill in the skills gap in the enterprise to build this hybrid architecture.
- Build-in integration and observability features that allow operations teams to build and scale the RAG architectures for responsive applications while balancing infrastructure costs.
Recommended Reading:
- AI Design Patterns for Large Language Models
- AI Design Patterns for Knowledge Graphs and Generative AI
- Prompt Engineering With Enterprise Information for LLMs and GenAI
3 to 6 Years
GenAI Engineering Tools
Analysis by Eric Goodness
Description: GenAI engineering tools enable enterprises to operationalize models faster, balancing both governance and time to market. AI engineering tools can be subdivided into model-centric and data-centric tools. There are numerous terms such as DataOps, LLMOps, LangOps, FMOps or broader ModelOps or MLOps, which are used frequently but we believe they are a subset of AI engineering. Some of the prominent market categories for tools specific to GenAI engineering include prompt engineering, vector databases, model fine-tuning, model deployment, application frameworks, and AI trust, risk and security management (TRiSM)
Sample Providers: Credo AI, Gradient, Humanloop, LangChain, OctoML, Pinecone, Promptable, Qdrant, Scale AI, Snorkel AI
Range: 3 to 6 Years
The market for GenAI engineering tools will reach early majority adoption in three to six years. GenAI engineering tools are an emerging, critical set of middleware necessary to develop and manage GenAI technologies to achieve business goals successfully. Despite the utility of these tools, the marketplace is a highly fractured set of existing, legacy data science and machine learning and MLOps software and specialty tools brought to market by a growing collection of startup software companies. Very few providers offer end-to-end platforms to manage the life cycle of LLMs.
This presents users with decisions whether to accept “good enough” solutions with legacy providers or to engage in extended due diligence with a marketscape of emerging and innovative providers where viability and a deep pool of references are a concern.
Mass: High
The impact of GenAI engineering tools is high. The tools are an emerging, critical set of middleware necessary to develop and manage GenAI technologies to achieve business goals successfully. In short, GenAI engineering tools are the picks and shovels to the GenAI gold rush. Some of the value offered by using GenAI engineering tools includes:
- Steer models without incurring significant retraining costs.
- Enable applications to respond with low latency to high concurrency requests (prompts).
- Fine-tune base models for task specificity, higher model performance and lower hallucinations.
- Orchestrate workflows by chaining prompts or models together to achieve intended outcomes.
- Protect against loss of intellectual property, hallucinations, lack of model explainability, bias and toxicity in model output, and misinformation.
The tools will affect all sectors and businesses operationalizing GenAI directly or indirectly. GenAI engineering tools that embody end-to-end foundation model operations (FMOps) capabilities would help organizations pave a clear path from experimentation to production.
Recommended Actions:
- Ensure accelerated sales cycles by engaging in GenAI ecosystem partnerships with providers that supplement and complement capabilities when offering single-purpose tools such as fine-tuning, prompt engineering and such.
- Create demand for your GenAI engineering tool by demonstrating capabilities to add value with both closed and open-source models and model hubs.
Recommended Reading:
- Emerging Tech: Techscape for AI Seed Stage Startups
- How to Choose an Approach for Deploying Generative AI
- Competitive Landscape: Cloud Providers Artificial Intelligence Services
GenAI Native Applications
Analysis by Roberta Cozza
Description: GenAI native applications consist of software designed with GenAI technology and capabilities at its core. These are purpose-built applications supported by domain-specific LLM and offered primarily as a service (SaaS) proposition, but some may offer API extensions for accessing the domain model as a service.
These are complete AI solutions covering the entire process from understanding the customer need to managing the offering performance. These native stand-alone applications are very tailored to cover specific user-case business capabilities or are very industry-focused as opposed to embedded GenAI applications offered by all types of enterprise software that can more easily span across more industries and customer needs.
Sample Providers: Cogram, Harvey AI, inFeedo, Insilico Medicine, Jasper, Midjourney, Tabnine, Tome, Writesonic
Range: 3 to 6 Years
The range for GenAI native applications is three to six years because this segment is made up of purpose-built solutions targeting specific use cases limiting their penetration mainly within these. The providers that make up these native applications are generally very specialized AI start-ups. These providers can be quick to introduce new solutions, are highly innovative, and have focused domain expertise and local market knowledge. However, they still offer relatively new products that can be challenged to scale beyond their specialized domains or expand to adjacent use cases and industries to fulfill growing demand and grow broader product adoption. These applications might also require expertise in prompt engineering to refine outputs beyond the prebuild prompt that could be given. In addition, these start-ups could also come under pressure on differentiation from larger AI players and hyperscalers as they develop their own native or embedded solutions even if less specialized.
Mass: High
The mass for GenAI native applications is high because these are new applications that have innovative GenAI-driven capabilities that can be transformative across many business domains and targeted industries. A clear example of this is in the pharma industry where some native GenAI applications have shown to drastically reduce time and the cost of drug discovery. Other specialized offerings are quickly emerging in areas like specific enterprise, marketing and sales content creation, workforce productivity optimization, automation of customer support, code generation and domain knowledge management in legal and finance, and content generation for media.
Recommended Actions:
- Consider if you want to develop an entirely new offering/product based on GenAI services as an integral part of your offering, and assess whether this specialized solution better fits with your customer and business outcomes versus broader horizontal GenAI offerings.
- Understand the requirements of the domain where you are building applications and the user expectations, above all within regulated industry verticals.
- Evaluate carefully AI startups as partners in their ability to support your GenAI offering roadmap evolution with their domain expertise. Other considerations are their intellectual property, robustness of proprietary algorithms or datasets.
Recommended Reading:
Prompt Engineering Tools
Analysis by Jim Hare, Radu Miclaus
Description: Prompt engineering is the discipline of providing inputs, in the form of text or images, to GenAI models to specify and confine the set of responses the model can produce. The input prompt produces a desired outcome without updating the actual weights of the model (as done with fine-tuning). Prompt engineering is also referred to as “in-context learning,” where examples are provided to further guide the model.
Sample Providers: FlowGPT; HoneyHive; LangChain; PromptBase; Prompt Flow; PromptLayer, Vellum
Range: 3 to 6 Years
Prompt engineering is three to six years from early mainstream adoption. Currently, prompt engineering processes and the need for dedicated roles are unknown or enterprises have a low level of understanding and experience. There is often a lack of consensus on prompt engineering’s business approach, as well as agreed-upon standards, methodology and approaches. This has led to fierce debates on the value of prompt engineering and how to establish governance. In addition, prompt engineering can be an iterative and challenging task. It involves several processes, including data preparation, the crafting of custom prompts, the execution of these prompts through the LLM API, and the refinement of the generated content. Despite these challenges, prompt engineering is likely to be a skill that is picked up by many areas of the workforce as natural language interfaces to LLMs become common across business apps. We also expect the emergence of new startups building prompt engineering tools.
Mass: High
The mass is high because the prompt engineering tools used in the enterprise in the coming years will be used across multiple industries but will be just one of several approaches to address prompt challenges. Alternative options, such as building a model from scratch or fine-tuning, can be also used, but will be much more complex and expensive and still require a certain level of prompt engineering to get good results. Domain-specific models that are used for specific industry verticals or business function use cases could mitigate some of the need for prompt engineering in order to improve the results. However, use cases that require chaining together inputs and outputs from many smaller models could offset the benefit of using domain models to reduce prompt engineering.
Recommended Actions:
- Abstract away as much of the prompt engineering that needs to be done to make products easier to use.
- Where prompt engineering is required, provide tools that are simple to use by less technical users and increase the accuracy and relevance of prompts.
- Offer capabilities in adjacent areas such as prompt catalogs that allow organizations to find, store, and manage prompts for improved governance and reuse.
- Develop custom prompts inside your software that unlock LLM functionality that work with multiple models and treat these as your own IP.
- Create a marketplace of prebuilt prompts for key use cases/users and build a community for third-party engineers who can be incentivized to develop prompts.
Recommended Reading:
- Prompt Engineering With Enterprise Information for LLMs and GenAI
- Quick Answer: How Will Prompt Engineering Impact the Work of Data Scientists?
Provenance Detectors
Analysis by Ray Valdes and Anthony Mullen
Description: GenAI provenance detection is the ability to detect whether text, audio or video content was produced using GenAI. The primary mechanism for identifying GenAI-enabled audio and video content will likely be the detection of signatures, metadata, or digital watermarks left behind from LLMs, as well as analysis of indicative patterns or artifacts that related to “problem areas” for GenAI-created content, such as fuzziness, errors around smaller features (such as hands and ears) and cadence for voice.
Provenance detectors are tools that detect the provenance of AI-generated content. There is a strong market demand for organizations to know whether a given piece of content was crafted by a human or automatically generated by an AI system. In part this is due to the newness and lack of familiarity with this disruptive technology.
Sample Providers: AI Content Detector, GPT Minus One, GPT Radar, GPTZero, Intel (FakeCatcher), Originality.AI, StealthGPT, Turnitin, Writer (AI Content Detector)
Range: 3 to 6 Years
The range for content detectors is three to six years, because even though there is strong market demand at present, the current generation of products does not fulfill the perceived requirements. In response to the demand, there are many tools that claim to detect provenance. However these tools are imperfect and produce inconsistent results, and the challenge of identifying content will only get worse, due to ongoing improvements in GenAI systems.
The content detection tools try to discern patterns in content that are consistent indicators of AI provenance. In the case of text, there are attributes such as vocabulary, sentence structure, tone, and technical attributes such as “perplexity” (a measure of the probability distribution of content, where a lower score is indicative of machine-generated text). In many business workflows, output from GenAI systems is not used verbatim, but typically involves additional postprocessing by a professional content creator or author. This is not just for AI-generated content, but for human-generated content as well. This mixed-provenance content poses a challenge for automated detection tools.
If there were reliable, accurate and inexpensive tools, there would likely be a proliferation in less than three years. This does not seem to be a likely scenario. Instead, there will be ongoing attempts that will attempt to catch up to the rapidly evolving content generation capability.
The tools for satisfying the need for content detection face significant challenges:
- There is a proliferation of GenAI systems from a growing number of LLM vendors. These systems are built on different foundation models, were trained on different content datasets (although there is of course some overlap), and will produce output that differs from the others. For example, one vendor’s output is characterized by having subtle emotional tones that are absent in others.
- There are parameters that are user-controlled that can affect the nature of the output, such as the “temperature” setting (degree of randomness) or stylistic directives (“please give the answer in the manner of a Brooklyn taxi driver”)
- Image generators have some flaws that can be easily detected by visual inspection, with no tool needed. Video generation tools have an even more difficult time, because of lack of consistency in objects from frame to frame, resulting in flickering or morphing effects. These shortcomings are well-known and are being addressed by vendors.
Text is easy to edit and intermingle with human-written text, and in fact this is often part of the content production process (a student merges AI-written text with content from Wikipedia plus content that they wrote).
The task of content detection is easier for other data types such as images. Although image generation tools like Midjourney are being rapidly adopted because of the highly detailed, visually striking images they produce, they are much less mature than text generators like GPT4 or PalM.
One scenario that has been discussed is for vendors to embed digital watermarks or metadata into generated content. These would be indicators that would not be easily removed through editing or automatic word substitution through obfuscators. Vendors that acted alone with this would not likely gain competitive advantage, and in fact would suffer, because users (the content creators, not the content consumers) would gravitate to tools that chose not to include this capability.
Adoption of watermarking might be mandated by legal mechanisms or possibly by social/cultural incentives. These are unlikely to happen within the next three years.
Mass: Medium
The mass for this technology is medium for the reasons stated above: Accurate content detectors are unlikely to appear in the market due to technology limitations, and supplementary mechanisms such as legal mechanisms or cultural norms are unlikely to have a major impact in the short term. There remains a strong market demand for solutions across many industries and business functions. However the market demand is for hypothetical tools that don’t exist and are unlikely to exist within the next 3 years.GenAI systems are expected to have the greatest impact on industry sectors that are knowledge-work-intensive, like the legal sector, medical care, financial analysis, software development, marketing, education, entertainment and government. To varying degrees, each of these sectors has a market demand for authenticated content provenance. The market demand relates to accountability, compensation, and monetization of knowledge work. There are managers, corporate counsel, university professors, law firm partners, and others who would like to know the provenance of output from the teams that they supervise — be they employees, job applicants, university students or other types of knowledge workers. Within the next three years, the need for accountability of AI-generated content will not be satisfied by technology mechanisms, but instead by nontechnical means like regulations, governance mechanisms and social norms.
Recommended Actions:
- Track the ongoing evolution of tools for content generation, as well as for content detection, and understand that the adoption of content detectors will be limited to specific vertical sectors, regions or market segments.
- Enterprises should consider other nontechnical approaches for ensuring accountability among their staff that create content, including regulations, policies, organizational culture, education and other incentives.
Recommended Reading:
Simulation Twins
Analysis by Annette Zimmermann, Leinar Ramos and Alfonso Velosa
Description: Simulation twins leverage the best of digital twins and what-if or AI technologies. The digital twin provides the mirror of reality so the enterprise has real-time data on its processes or customers or operations across data silos. This real-time view is then fed into simulation software to model the system of interest. This simulation twin may be
- A basic threshold rules-based, what-if analysis in the digital twin’s core enabling technology
- A data-driven AI-based system model of customer behavior
- A model-based systems engineering software view to simulate the product’s performance
- A physics-based software simulation of corrosion of an asset
- An operational process model in analytics software to optimize back-office processes
In some cases, the simulation twin starts with AI simulation, where data from assets or supply chains or customers or patients is ingested and the AI technologies plus applications jointly develop the digital twin model. It creates the simulated environments in which the AI agents can be trained, tested and deployed, plus produce recommendations and next best actions. In other cases, the simulation twin starts by ingesting the digital twin models and data and simulating the environment to either produce synthetic data or recommendations on the next best action. Finally, a simulation twin may be created using GenAI to optimize the next best action, for example, by taking the time series data or the image or audio file to generate simulations that help the team drive future action.
Sample Providers: Altair, Ansys, Braincube, Cosmo Tech, Falkonry, TADA, TwinThread, Worlds
Range: 3 to 6 Years
We expect three to six years until AI simulation and digital twins will reach early majority adoption. While in certain industries, such as oil and gas, AI simulation in combination with digital twins is starting to approach mainstream adoption, in others such as healthcare it is much more emerging. While many enterprises could use simulation twins to improve their business processes, it requires business leaders to invest political capital to drive business change using this technology. Cultural change represents one the biggest hurdles to adoption. Therefore, it will take at least three years before enough leaders and organizations see examples and invest appropriately in both technology and culture change efforts. The different technology building block elements such as data fabric, vector graphs and GenAI will move much faster than the business process change needed to adopt simulation twins.
Mass: High
The mass for this technology is high because organizations across multiple industries and business functions will benefit from it. Once an enterprise builds a digital twin with the product leader to answer the question, “What is the status of my thing (patient, equipment, procurement system, etc.)?” they want the product leader to help them answer the question “What is the next best action?” Metropolitan train organizations want to predict evening rush hour, based on the morning rush hour. Hotels want to manage their energy costs based on real guest and weather data. Manufacturers want to optimize their sales and operations planning efforts. Industrials want to train new employees based on their institutional knowledge and best practices. The tech provider may create new views of a truck and its identity, from nighttime to sunrise changes to its color, to shorten its processing time at the entry gate. Or take in passenger data during morning rush hour to predict where the enterprise will need to put resources to manage the evening rush hour. This may be further enriched by synthetic data to use historical data to improve prediction scenarios.
Leading-edge product leaders understand that once the enterprise customer builds the digital twin and then the simulation twin, they have effectively helped the enterprise codify their institutional knowledge in their technology practice. The enterprise will not only be a long-term annuity stream, but as long as the product leader shows them value every year from the simulation twin, there will also be add-on solutions to sell.
Recommended Actions:
- Drive relevance to the market for your simulation twin product portfolio by investing in a limited set of vertical and subvertical domains. Enterprises rarely buy generic simulation twins; they want business solutions for their problems.
- Address gaps in your technology and vertical domain capabilities by partnering for best-of-breed capabilities for complementary technologies, like domain-specific foundation models or technologies for IoT integration to 3D models to finite element analysis capabilities.
Recommended Reading:
- Innovation Insight: AI Simulation
- Emerging Tech: Tech Innovators for Digital Twins — IT Providers
- CRM Strategists: Use a Digital Twin to Model Customer Behaviors and Evolve From Simple ML Modeling
- Emerging Technologies Tool: Model Customer Revenue to Optimize Digital Twin Product and Sales Decisions
- 2023 Oil & Gas Trend: Digital Twins Expansion
- A Digital Twin of a Customer Predicts the Best Customer Experience
Workflow Tools and Agents
Analysis by Aakanksha Bansal
Description: Agents are AI programs/algorithms that leverage the power of LLMs to solve problems based on natural language. The agent interprets user natural language input and creates a chain-of-thought sequence on the fly by decomposing the user request. Agents can identify patterns in their environment, make decisions, and decide which actions to take and in what order.
Tools are functions that agents can use to interact with the world. These tools can be generic utilities (e.g., search), other chains, or even other agents. They allow users to explore the power of generative agents in developing applications by acting as the integration layer between the LLMs and other sources of data, as well as enable prompt chaining, model chaining, interfacing with external APIs, and retrieving contextual data from data sources.
Sample Providers and product: AgentGTP, AutoGTP, Baby AGI, Duet AI, LlamaIndex, Fixie, Godmode, Google, Hugging Face Transformers Agents, LangChain, Microsoft Semantic Kernel.
Range: 3 to 6 Years
The range for this technology is three to six years because these tools are nascent and represent a significant shift in AI capabilities. They make handling challenging tasks and workflows easier by reducing operational expenses, eliminating redundancies, boosting efficiency and saving time, allowing for high scalability and easy data accessibility, and granting a competitive edge. Further, they provide functionalities that expand AI foundation models’ capabilities, making them more adaptable, interactive, context-aware and efficient in various applications. They also provide templates for developing new GenAI applications.
Specialists are among the first adopters. Enterprise software developers and knowledge workers can use these tools and agents for developing composable and extensible AI applications. IT leaders may reduce tool fragmentation by developing a single ML platform to assist enterprises in integrating data sources with models and chain prompts.
Since agents can make complex decisions, they can be used for decision-making tasks that require controlling parameters and making complex decisions. However, most are not robust enough to control the underlying issues of foundation (e.g., hallucinations and inaccurate input). They can amplify such issues and make them worse. For example, an automatic agent that uses foundation models to achieve goals recursively by breaking them into tasks can lead to completely wrong outcomes and irrevocable and harmful business decisions.
The fear of replacement — potential loss of job — may result in massive resistance to adoption within the workforce. However, it is important to understand that their value lies in the interaction between human and agent and not in completely replacing the human. Regulatory frameworks will be stringent given the involvement of sensitive information and critical infrastructure.
Mass: High
The mass for this technology is high because agents and workflow tools will have the most significant impact on how people will work and complete tasks. We expect that by 2026, workflow tools and agents will drive efficiencies for 20% of knowledge workers, up from less than 1% today. Some of the primary use cases are conversation interaction and creation, code generation, question answering, reason over structured and unstructured data, search, and summarization.
These agents can, given a high-level task, self-prompt and execute a series of subtasks to achieve the overall high-level task. The capability will have a disruptive impact in enabling further automation for enterprise business processes. More complex agents can be built to provide a variety of capabilities, including:
- Working toward a goal through tasks and subtasks
- Enabling business workflows for complex work
- Retrieving and using information from other sources, such as databases, file systems and the web
- Effectively using prompts and the outputs to achieve the objectives
- Automation of all of the above, including error handling and fault tolerance
Agents that address some combination of the above are emerging, both commercially and in open source. We expect more to emerge, because advances in this field will help maximize the potential of GenAI. The learning curves for developers working with frameworks such as LangChain (GitHub) or Transformers Agents (Hugging Face) will entail a mentality switch toward the design of agentlike behavior that adapts to the prompt inputs from applications’ users to complete more-complex tasks.
Recommended Actions:
- Prioritize use cases where AI model accuracy is high and where human operator skills are mature to ensure the highest chance of adoption success.
- Identify real world quantified use cases or success stories for pilot projects where adding workflow tools can add business value and reduce cost.
- Encourage experimentation in industries where AI maturity and rate of adoption is high by targeting complex and changing environments that require a variety of AI tools to improve performance levels in the organization.
Recommended Reading:
- How to Choose an Approach for Deploying Generative AI
- Assessing How Generative AI Can Improve Developer Experience
- Video: Choosing the Best Deployment Approach for Large Language Models (LLMs)
6 to 8 Years
Multiagent Generative Systems
Analysis by Eric Goodness and Anthony Bradley
Description: Multiagent generative (MAG) systems fuse computational software agents and large language models to simulate an environment of complex multiagent system behaviors and interactions. As a result, agents generate emergent individual, group and even social dynamics.
Sample Providers: AI21 Labs, Anthropic, Cohere, Fable Studios (SHOW-1), Google, LangChain, OpenAI, Replit
Range: 6 to 8 Years
Multiagent generative systems are six to eight years away from early majority adoption in the market. The range is extended because the design patterns and emergent architectures of MAG systems are very new; the emerging systems are very task- and use-case-specific with no standardization, ontologies or heuristics to guide development; and adoption outside of research and development is limited.
However, more basic multiagent generative technology, in some form, has been successfully integrated into various applications, such as agent-based modeling, video game and video content generation, and robotic logistics and planning.
Although an emerging framework, ongoing R&D in multiagent generative technologies will create hype similar to the emergence of ChatGPT and generative models. Innovation leaders need high levels of creativity and practicality to deliver the MAG system capabilities desired by corporate boards. It’s necessary to anticipate this desire and begin due diligence and R&D relating to the impact of multiagent generative systems. In time, MAG systems will comprise numerous autonomous agents that perceive, remember, reflect, plan and execute collaboratively.
Such an architecture is natively complex and presents system integration, interoperability and coordination challenges. Product leaders must target key use cases and focus on developing robust and scalable solutions that interact with other agents and systems.
Multiagent generative systems introduce new security and privacy challenges as agents may interact with sensitive data. Product leaders must prioritize security efforts and implement authentication, authorization and encryption protocols to protect customers’ data and their own systems.
Additionally, complex multiagent generative simulations are highly compute-intensive.
Mass: Very High
The mass for this technology is very high, because the “art of the possible” implications are tremendous across a multitude of applications. MAG systems are a new technology design pattern with numerous potential applications, including creating immersive environments, cross-agent and interpersonal communication planning, and prototyping. Example tasks and use cases include:
- Simulated virtual world scenarios for business applications, such as strategy formulation, prototyping, training or operational improvements. An example of such is simulating how to sell a particular product to a new market or target buyer.
- Simulating and exploring potential human behavior responses to business actions.
- Multiagent networks of smart things, smart spaces, smart robots, autonomous vehicles and so forth, all coordinating for operations in next-generation, real-world Internet of Things environments.
- Designing, simulating and operationalizing multiagent interactions where human and machine agents work together and optimized collaboration emerges to address evolving situations.
- A new approach to development where human employee simulacra, operating in a real-world simulation, autonomously design, prototype and test new products, buildings, cars, parks and so forth.
While the potential to impact all industries and business segment sizes is high, there are impediments to adoption in highly regulated industries based on issues of trust and security. Emergent behavior is a valuable capability of multiagent generative systems. However, MAG systems may also generate undesirable behaviors that are difficult to predict and control. Interactions among agents may lead to unintended results and make it more difficult to interpret the agent’s behavior.
Multiagent generative systems are novel and complex. Designing, developing and coordinating numerous autonomous generative agents with different capabilities and behaviors can increase the overall system complexity. Different generative models and techniques perform differently based on the types of data and tasks for which an agent is created. The selection of the most appropriate models across the system may require extensive experimentation and expertise.
Recommended Actions:
- Capitalize on the early-stage emergence of MAG systems by engaging leading market players (as they emerge) in collaborative R&D and product development to create and operationalize the technology.
- Increase your success in productizing MAG systems by identifying high-value use cases where generative agents offer significant advantages as a replacement product — for example, for industrial robots.
Scalable Vector Databases
Analysis by Afraz Jaffri
Description: Vector databases provide vector search (also referred to as semantic search) capability and are used in conjunction with large language models (LLMs) to apply the model’s ability to respond to natural language with information that is custom or specific to an enterprise or domain.
Vector databases store embeddings, generated from trained LLMs, to encode words, sentences, paragraphs or entire documents in a numerical format. This format allows calculations to be performed to look for the most similar match to an input query. When a match is found, this can then be served back to an LLM via a prompt to provide natural language responses and answers. This pattern of utilizing an external store to allow an LLM to be used with custom data is retrieval-augmented generation (RAG). Vector databases are one of the stores that can be used in a RAG architecture and can be implemented as pure vector databases or through existing databases that enable vector processing. They can also be used independent of LLMs for information retrieval and ranking use cases.
Sample Providers: AWS, Couchbase, Elastic, Google, Microsoft, Milvus, Pinecone Systems, Qdrant, Redis, SingleStore, Weaviate, Zilliz
Range: 6 to 8 Years
The range for scalable vector databases is six to eight years because of the complexity of managing and integrating a specialized database into existing data management architectures. Though the popularity of vector databases is growing rapidly in terms of available products and adoption, we estimate them to be still quite far from early majority adoption (currently just under 5% of the way). Even with managed cloud platforms available, the many dimensions of setting up a vectorized LLM integration including choosing the embedding model, indexing structures, filters, security, updates, scalability and performance evaluation require specialist knowledge and skills that are currently out of reach for most enterprises. In addition, existing databases are adding vector-based searching to their products, making the case for stand-alone solutions harder to justify.
Mass: Medium
The mass for this technology is medium because adding domain-specific data for usage with LLMs has become a key use case across several different business domains. Analyzing and extracting information from documents is a knowledge-based task that takes many hours across multiple business processes and has the potential to be significantly accelerated or automated using LLMs and RAG-based architectures. Vector-based searching will enable conversational question answering interfaces to be provided across a wide range of business applications and platforms in addition to conventional drag and drop and command line interfaces.
Recommended Actions:
- Build vector database integration into existing products where specialized domain knowledge is a key driver for LLM integration.
- Evaluate the differences between fine-tuning LLMs and using vectorized search for adding this domain knowledge into a product.
- Identify the natural language interface functionality that will be built into a product and assess the feasibility of vector search to maintain performance and accuracy requirements.
Recommended Reading:
User-in-the-Loop AI
Analysis by Anushree Verma and Annette Jump
Description: User-in-the-Loop (UITL) AI is a workflow that requires users to be looped into any stage of the AI system development pipeline — from concept design and initial training conditions through to live and even in-live training. The user can have varying fidelities to shape the behavior of AI like range from simple mechanics such as like/share, rate or thumbs up, to richer feedback — correcting output and curation of training data, to more detailed manipulation of model weights.
Traffic flows both ways in user-in-the-loop AI systems— user to AI and AI to user. Finding a lingua franca (or workable translation/UI) between users and AI varies by use cases and will take some time. The latency around how quickly AI system behavior changes as a result of user-in-the-loop feedback will be reduced increasingly to zero as better interfaces and techniques are used for feedback. Having UITL solutions weaved into the AI systems ensures sustained model effectiveness and helps in responsible use of AI.
Sample Providers: Converseon, DARPA, Levity, Intel, Netomi, Uniphore
Range: 6 to 8 Years
The range for user-in-the-loop AI is six to eight years because we are still at the beginning of the process where we are learning how to communicate with the AI systems. Today, there is no single system that provides organizations with a broad set of feedback loops in the loop of AI design and development. Some vendors have started incorporating UITL solutions as mentioned above; however, this is not considered an explicit design consideration in many AI projects yet and many are not even aware of such an easy workflow to improve their AI model. Therefore, the growth is expected to be slow and range is yet six to eight years until early majority adoption.
There are multiple benefits when incorporating UITL solutions like reducing error or dealing with hallucinations. It is fairly easy to implement as well. However, there are multiple challenges that need to be overcome as well. Some of them are:
- The UIs and visualizations that will cause the user to develop the most useful actions in response to a request
- The level of constraint the language of actions should be
- Users’ need to be guided when and where they need to take actions
As we see more user interfaces plugging the gap and having UITL integrated into AI solutions, there will be development of new interfaces to embed user-in-the-loop AI.
Mass: High
The mass for this technology is high because it has an impact across markets and industries and it is critical to limit failure when implementing AI solutions. User-in-the-loop has already seen adoption in training virtual customer assistants. These algorithms can be trained directly from responses in customer emails or improving training datasets and data classification to bridge unsupervised and semisupervised learning to create nuanced text analytics and social listening capabilities in social intelligence strategy platforms. It also has a high impact in shaping chatbot behavior where users can be used to classify outliers and exceptions and this data is fed back into the training data.
Potential is not only limited to improving the accuracy of datasets but there can also be feedback via physical demonstration or manipulation where users can demonstrate the task to a robot (observed through computer vision). Many pilots and academic research have been carried out recently for establishing where to include users-in-the-loop to boost performance.
Recommended Actions:
- Prioritize UITL for any conversational AI opportunity as it can significantly improve the performance of your technology over time.
- Develop partnerships with software engineering teams to support the development of interfaces and UIs, and also with business units that act as both a source of domain knowledge and the workforce to engage with UITL at scale.
Recommended Reading:
How to Use the Impact Radar
This Emerging Technologies and Trends Impact Radar content analyzes and illustrates two significant aspects of impact — when we expect it to have a significant impact on the market (namely, the range); and how big of an impact it has on relevant markets (specifically, mass). Each emerging technology or trend profile analysis is composed of these two aspects. See Note 1 for a complete description of our approach to this research.
In this document, profiles are organized by range and mass. Impact Radar range starts with the center and moves to the outer rings of the radar. The center of the impact radar represents when the emerging technology will cross the chasm from early adopter to early majority. The rings represent one to three years, three to six years and six to eight years from crossing the chasm.
Mass is rated from very high to very low, represented by the size of the bubble on the Impact Radar Graphic. The higher the mass score, the more broadly the Emerging Technology or Trend is predicted to be adopted, and the more revolutionary the innovation is expected to be.
The objective of this research is to guide product leaders on how emerging technologies and trends are evolving and impacting areas of interest. Providers can leverage this knowledge to determine which technologies or trends are most important to the success of their business and when it makes sense to advance their products and services by investing in them. Technology vendors should use this Emerging Technologies and Trends Impact Radar to:
1. Identify emerging technologies and trends that are important to the success of their
business
2. Determine when to act upon those trends and technologies based on business strategy
3. Begin formulating a response to the technology or trend’s evolution
Evidence
2022 Gartner AI Use-Case ROI Survey: This survey sought to understand where organizations have been most successful in deploying AI use cases and figure out the most efficient indicators that they have established to measure those successes. The research was conducted online from 31 October through 19 December 2022 among 622 respondents from organizations in the U.S. (n = 304), France (n = 113), the U.K. (n = 106) and Germany (n = 99). Quotas were established for company sizes and for industries to ensure a good representation across the sample. Organizations were required to have developed AI to participate. Respondents were required to be in a manager role or above and have a high level of involvement with the measuring stage and at least one stage of the life cycle from ideating to testing AI use cases. Disclaimer: The results of this survey do not represent global findings or the market as a whole, but reflect the sentiments of the respondents and companies surveyed.
Note 1:
The Emerging Technologies and Trends Impact Radar content analyzes and illustrates two significant aspects of impact:
1. When we expect it to have a significant impact on the market (specifically, range)
2. How big an impact it will have on relevant markets (namely, mass)
Analysts evaluate range and mass independently and score them each on a 1 to 5 Likert-type scale:
- For range, this scoring determines in which radar ring the Emerging Technologies and Trends will appear.
- For mass, the score determines the size of the radar point.
In the Emerging Technologies and Trends Impact Radar, the range estimates the distance (in years) that the technology, technique or trend is from crossing over from early adopter status to early majority adoption. This indicates that the technology is prepared for and progressing toward mass adoption. So at its core, range is an estimation of the rate at which successful customer implementations will accelerate. That acceleration is scored on a five-point scale with one being very distant (beyond eight years) and five being very near (within a year). Each of the five scoring points corresponds to a ring of the Emerging Technologies and Trends Impact Radar graphic (see Figure 1). Those Emerging Technologies and Trends with a score of one (beyond eight years) do not qualify for inclusion on the radar. When formulating scores for range, Gartner analysts consider many factors, including:
- The volume of current successful implementations
- The rate of new successful implementations
- The number of implementations required to move from early adopter to early majority
- The growth of the vendor community
- The growth in venture investment
Mass in the Emerging Technologies and Trends Impact Radar estimates how substantial an impact the technology or trend will have on existing products and markets. Mass is also scored on a five-point scale — with one being very low impact and five being very high impact. Emerging Technologies and Trends with a score of one are not included in the radar. When evaluating mass, Gartner analysts examine the breadth of impact across existing products (specifically, sectors affected) and the extent of the disruption to existing product capabilities. It should be noted that an emerging technology or trend may be expressed in different positions on different Emerging Technologies and Trends Impact Radars. This occurs when the maturity of Emerging Technologies and Trends varies based on the scope of radar coverage.
© 2024 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. It consists of the opinions of Gartner's research organization, which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or influence from any third party. For further information, see "Guiding Principles on Independence and Objectivity." Gartner research may not be used as input into or for the training or development of generative artificial intelligence, machine learning, algorithms, software, or related technologies.