Join Transform 2021 and learn about the most important topics in enterprise AI and data. learn more.
The most impressive of OpenAI’s natural language processing (NLP) model is GPT-3, with its huge scale. The transformer encoder/decoder model has more than 175 billion weighted connections between words called parameters, which blows water out of its 1.5 billion parameter predecessor, GPT-2. In this way, the model can generate surprisingly human-like text after only entering a few examples of the task you want to perform.
Its content released in 2020 dominated the headlines, with people scrambling to enter the waiting list to access its API hosted on OpenAI̵
Despite the hype, there are still questions about whether GPT-3 will become the cornerstone of the NLP application ecosystem or whether there is a new and more powerful NLP model that makes it stand out. As companies begin to imagine and design NLP applications, here is what they should know about GPT-3 and its underlying ecosystem.
GPT-3 and the NLP arms race
As I described in the past, there are actually two ways to pre-train NLP models: generalized and non-generalized.
Non-generic methods have specific pre-training goals that are consistent with known use cases. Basically, these models go deep into smaller, more concentrated data sets, rather than deep into massive data sets. An example is Google’s PEGASUS model, which was built specifically to enable text summarization. PEGASUS was pre-trained on a data set very similar to the final goal. Then fine-tune the text summary data set to provide the latest results. The benefit of the non-universal method is that it can significantly improve the accuracy of specific tasks. However, it is also much less flexible than the general model, and still requires a large number of training examples to begin to achieve accuracy.
In contrast, the generalized approach is extensive. This is the 175 billion parameters of GPT-3 at work, and it has basically been pre-trained on the entire Internet. This allows GPT-3 to perform basically any NLP task with just a few examples, although its accuracy is not always ideal. In fact, the OpenAI team emphasized the limitations of general pre-training, and even believed that GPT-3 has “obvious weaknesses in text synthesis.”
OpenAI decided that, in terms of accuracy, it would be better to be bigger, because each version of the model increases the number of parameters by several orders of magnitude. Competitors have noticed. Google researchers recently published a paper focusing on the Switch Transformer NLP model with 1.6 trillion parameters. This is a ridiculous number, but it may mean that we will see an arms race in terms of general models. Although these are the two largest general models, Microsoft does have a 17 billion parameter Turing-NLG, and may also want to join the arms race. When you think that training GPT-3 costs OpenAI nearly $12 million, such an arms race may become expensive.
Promising GPT-3 application
From the perspective of the application ecosystem, the flexibility of GPT-3 makes it attractive. You can use it to accomplish anything you can imagine in words. It is foreseeable that start-ups have begun to explore how to use GPT-3 to power the next generation of NLP applications. This is an interesting list of GPT-3 products compiled by Alex Schmitt of Cherry Ventures.
Many of these applications are consumer-oriented, such as “love letter generators”, but there are more technical applications, such as “HTML generators”. When companies consider how and where to incorporate GPT-3 into their business processes, there are two most promising early use cases in healthcare, finance, and video conferencing.
For companies in the medical, financial services, and insurance industries, streamlining research is a huge demand. The data in these fields is growing exponentially, and in the face of this peak, it becomes impossible to stay at the top of the field. GPT-3-based NLP applications can grab the latest reports, papers, results, etc., and summarize key findings based on the context to save researchers time.
As video conferencing and telemedicine become more and more important during the pandemic, we have seen an increase in demand for NLP tools that can be used for video conferencing. The function provided by GPT-3 is not only the ability to script and take notes for a single meeting, but also to generate “too long; not read” (TL; DR) abstracts.
How companies and start-ups build a moat
Despite these encouraging use cases, the main obstacle to the GPT-3 application ecosystem is that imitators can easily replicate the performance of any application developed using GPT-3’s API.
Everyone using GPT-3’s API will get the same NLP model pre-trained on the same data, so the only difference is that the organization uses it to fine-tune the data to specialize use cases. The more fine-tuned the data you use, the more complex the difference in output.
What does it mean? Larger organizations with more users or more data than competitors will make better use of GPT-3’s promise. GPT-3 will not lead to disruptive startups; with existing advantages, it will allow companies and large organizations to optimize their products.
What does this mean for companies and startups moving forward?
Applications built using GPT-3’s API have only just begun to touch on possible use cases, so we have not yet seen the development of an interesting proof-of-concept ecosystem. How to monetize and mature such an ecosystem is still an open question.
Because differentiation requires fine-tuning in this case, I hope that companies will accept the generalization of GPT-3 in certain NLP tasks, while sticking to non-generic models (such as PEGASUS) for more specific NLP tasks.
In addition, with the exponential increase in the number of parameters among large NLP participants, we can see users moving between ecosystems, depending on who is currently leading.
Regardless of whether the GPT-3 application ecosystem is mature or has been replaced by other NLP models, companies should be excited about the relative ease of creating a highly clear NLP model. They should explore use cases and consider how to use their position in the market to quickly build value-added for their customers and their own business processes.
Dattaraj Rao is the innovation and R&D architect of Persistent Systems and the author of the book From Keras to Kubernetes: The journey from machine learning models to production. At Persistent Systems, he leads the AI research laboratory. He holds 11 patents in machine learning and computer vision.
VentureBeat’s mission is to become a digital town square for technology decision makers to gain knowledge about transformative technologies and transactions. Our website provides important information about data technologies and strategies to guide you as you lead your organization. We invite you to become a member of our community, you can visit:
- Up-to-date information on topics of interest to you
- Our newsletter
- Closed thought leader content and discounted access to our important events, such as “transformation”
- Network functions, etc.