This month, Google expressed dissatisfaction with the company’s withdrawal of research papers, forcing the company to withdraw from a well-known AI ethics researcher. This article points out the risks of language processing artificial intelligence, the type of use in Google search and other text analysis products.
The huge risk in developing this AI technology lies in the carbon footprint. According to some estimates, the carbon emissions from training an AI model are the same as the carbon emissions required to build and drive five cars during its life cycle.
I am a researcher who researches and develops AI models, and I am very familiar with the energy and financial costs of the rapid development of AI research. Why are AI models becoming so power-hungry, and how are they different from traditional data center computing?
Today’s training is inefficient
Traditional data processing tasks done in the data center include video streaming, email and social media. AI needs more calculations because it needs to read a lot of data until it learns to understand it, that is, it has been trained.
Compared with the way people learn, this kind of training is very inefficient. Modern AI uses artificial neural networks, which are mathematical calculations that simulate human brain neurons. The connection strength between each neuron and its neighbors is a parameter of the network, called the weight. In order to learn how to understand language, the network starts with random weights and adjusts until the output matches the correct answer.
A common method of training language networks is to provide them with large amounts of text from websites such as Wikipedia and news media, hide some words in them, and ask them to guess the words that have been masked. An example is “My dog is cute”
A recent model called “Bidirectional Encoder Representation from Transformers” (BERT) uses 3.3 billion words from English books and Wikipedia articles. In addition, the number of times BERT reads this data set during training is not once, but 40 times. In contrast, an average child may hear 45 million words by the age of five when learning to speak, which is 3000 times less than BERT.
Find the right structure
The reason for the higher cost of language model construction is that this training process will happen many times during the development process. This is because researchers hope to find the best network structure-how many neurons, the number of connections between neurons, the speed of parameter changes during the learning process, etc. The more combinations they try, the greater the chance that the network will achieve high accuracy. In contrast, the human brain does not need to find the best structure-they have pre-built structures that have been refined and refined.
As companies and academia compete in the AI field, there is increasing pressure to continuously improve technology. Even on difficult tasks such as machine translation, even increasing the accuracy by 1% is considered important and can lead to good publicity and better products. But to obtain a 1% improvement, a researcher may train the model thousands of times with different structures until the best model is found.
Researchers at the University of Massachusetts Amherst estimated the energy cost of developing AI language models by measuring the power consumption of general-purpose hardware used during training. They found that training BERT once had the carbon footprint of passengers flying between New York and San Francisco. However, by using different structures to search (that is, by training the algorithm multiple times on data with slightly different numbers of neurons, connections and other parameters), the cost is equivalent to 315 passengers or an entire 747 aircraft.
Bigger and hotter
AI models are also much larger than they need and are growing every year. The latest language model similar to BERT is called GPT-2, and its weight in the network is 1.5 billion. GPT-3 caused a sensation this year due to its high precision, it weighs 175 billion pounds.
Researchers have found that even if only a small part of the network is ultimately useful, having a larger network will bring greater accuracy. When neuronal connections are added and then reduced, a similar situation occurs in children’s brains, but biological brains are more energy efficient than computers.
The AI model is trained on dedicated hardware (such as a graphics processor), which consumes more power than traditional CPUs. If you own a gaming laptop, it may have one of these graphics processor units, which can create advanced graphics for playing Minecraft RTX. You may also notice that they generate much more heat than ordinary laptops.
All this means that the development of advanced AI models will increase a substantial carbon footprint. Unless we switch to 100% renewable energy, AI advancements may conflict with the goal of reducing greenhouse gas emissions and mitigating climate change. The financial cost of development is also getting higher and higher, so that only a few selected laboratories can afford it, and these laboratories will be the agenda for formulating which AI model.
Do more with less
What does this mean for the future of artificial intelligence research? Things may not be as bleak as they seem. With the invention of more effective training methods, the cost of training may decrease. Similarly, although energy use in data centers is expected to increase sharply in recent years, this has not happened due to the increase in data center efficiency and the emergence of more efficient hardware and cooling functions.
There is also a trade-off between the cost of training the model and the cost of using the model, so spending more effort in training to come up with a smaller model may actually make it cheaper to use them. Since the model will be used multiple times during its life cycle, it can save a lot of energy.
In the research in my laboratory, we have been studying methods of reducing AI models by sharing weights or using the same weights in multiple parts of the network. We call these Shapeshifter networks because a small set of weights can be reconfigured into larger networks of any shape or structure. Other researchers have shown that load sharing has better performance under the same training time.
Looking to the future, the artificial intelligence community should invest more in the development of energy-saving training programs. Otherwise, it is possible that AI will be dominated by a small number of people who are unable to undertake the task of setting the agenda, including which models have been developed, which data have been used for training, and which models have been used.
This story originally appeared in “Dialogue”.