Reading as a god

Chapter 253 AI Revolution

Chapter 253 AI Revolution
The idea of ​​​​generating confrontation networks to solve this problem is different from the previous methods. Generative confrontation networks learn two neural networks at the same time: one neural network generates images, and the other neural network classifies images to distinguish real images from generated images. Image.

In the generative adversarial network, the first neural network is the generative neural network. The purpose of the generative adversarial network is to hope that the generated image is very similar to the real image in nature. In this case, the second network behind it is That classifier cannot distinguish between real-world images and generated images; and the purpose of the second neural network, that is, the classifier, the generative adversarial network is to correctly distinguish the generated images, that is, fake images, from real ones. Nature images can be distinguished.

The purpose of these two neural networks is actually different. They can be trained together to get a good generative neural network.

When the generative confrontation network was first proposed, it was mainly for the generation of images.

What Zhang Shan proposed in his thesis is obviously to apply this method to various problems.

However, the focus of the paper is still Zhang Shan on how to learn from unlabeled data!
In the article, he proposed a new idea called dual learning.

The idea of ​​dual learning is very different from the previous generative confrontation learning.

Zhang Shan found that many artificial intelligence tasks have dual properties in structure.

In machine translation, Zhang Shan translates Chinese into English. This is a task, but Zhang Shan also needs to translate English into Chinese. This is a dual task.

Between the original task and the dual task, their input and output are just reversed.

Originally, Zhang Shan felt guilty because the thesis was made systematically, but now it seems that the thesis was written completely according to his ideas.

Because of the previous excellence in multiple languages.

If Zhang Shan wrote this thesis himself, the most likely application involving the dual property would be translation work.

But the dual work doesn't stop there.

In speech processing, speech recognition is to convert speech into text, and speech synthesis is to convert text into speech, which are also two tasks that are dual to each other.

In image understanding, looking at pictures means generating a descriptive sentence for a picture. The dual task of the generative adversarial network is to generate a picture for a sentence. One of these two tasks is from image to text, and the other is One is from text to image.In the dialogue system, answering questions and generating questions are also dual issues. The former is to generate answers for given questions, and the latter is to generate questions for given answers.

In search engines, returning relevant documents for a given search term and returning keywords for a given document or advertisement are also dual problems: the main task of a search engine is to match some documents with the search terms submitted by users and return the most relevant documents ; When an advertiser submits an advertisement, the advertising platform needs to recommend some keywords to him so that his advertisement can be displayed and clicked by the user when the user searches for these words.

Dual learning attempts to apply the dual properties of this structure to machine learning.

The basic idea is relatively simple, and Zhang Shan uses machine translation as an example to illustrate.

When we want to translate a Chinese sentence into English, we can first use a Chinese-to-English translation model to translate this sentence into an English sentence. Because there is no English label, we don’t know whether the English translation is good or bad. How good and how bad.Zhang Shan then uses the English-to-Chinese translation model to translate the English sentence into a Chinese sentence. In this way, Zhang Shan obtains a new Chinese sentence.

The whole process includes two steps of forward translation and reverse translation which are dual to each other.

Then Zhang Shan compares the original Chinese sentence with the Chinese sentence obtained later. If the two translation models are good, the two Chinese sentences should be similar. If the two models are not good or one model is not good, get The two Chinese sentences of are not similar.Therefore, Zhang Shan can obtain feedback information from unlabeled data through this dual process, know whether Zhang Shan’s model is working well or not, and then train and update the forward and reverse models based on these feedback information, so as to achieve unlabeled data purpose of learning.

Zhang Shan did some experiments in machine translation and found that through the process of dual learning, Zhang Shan only needs to use 10% of the labeled data (about 100 million English and French bilingual sentence pairs), plus a lot of unlabeled data, reaching 100 % Accuracy of the model trained on labeled data (1200 million English-French bilingual sentence pairs).

The cost of labeling 1000 million training corpora is about 2200 million US dollars. If Zhang Shan can reduce the labor cost of labeling from 2200 million US dollars to 200 million US dollars, this will be a very good result, which can greatly reduce the company’s operating costs and improve operating efficiency. .

Many problems used to be due to the limitation of unlabeled data, and there was no way to use deep learning technology.

Now that Zhang Shan can learn from unlabeled data, deep learning technology can be applied to many applications and problems.

The thesis is already very good here!

But is it just that~
Zhang Shan continued to read, and was soon shocked!
Because the paper seems to propose a new concept.

Although deep learning is very popular now, in the final analysis, deep learning mainly learns from big data, that is, through a lot of labeled data, some models are learned by using deep learning algorithms.

Although called the name of artificial intelligence.

But this way of learning is very different from human intelligence.

Humans learn from small samples.

People classify images, and only a few samples are needed to achieve accurate classification.

When a two- or three-year-old child starts to know the world, if he wants to know what kind of animal is a dog, we show him a few pictures of dogs, and tell him the characteristics of a dog, and how it differs from other animals like cats or sheep If it is, the child can quickly and accurately identify the dog.

But like a deep residual neural network, generally speaking, a category requires thousands of pictures to be fully trained and get more accurate results.

Another example is car driving. Generally speaking, most people can drive on the road after training in a driving school, that is, dozens of hours of study and hundreds of kilometers of practice.

However, unmanned vehicles like today may have traveled millions of kilometers, and they still cannot reach the level of fully automatic driving.

The reason is that after limited training, humans can cope with various complex road conditions combined with rules and knowledge, but the current AI does not have the ability to think logically, associate and reason, and must rely on big data to cover various possible road conditions. The possible road conditions are almost endless.

With the improvement of each ability, Zhang Shan now has a deep understanding of people.

Human intelligence includes many aspects. The most basic stage is cognitive intelligence, which is the cognition of the whole world.

Although now for image recognition and speech recognition, AI has almost reached the human level, of course, it may be able to reach the human level under certain specific constraints.

But in fact, this kind of cognitive task is very simple for human beings. Now the things that AI can do or the level that can be achieved are actually very easy for humans to do.

It's just that AI may be faster in speed, and the cost will be lower after scaling up, and it doesn't need to rest for 24 hours.A more challenging question is whether artificial intelligence can do things that humans cannot or are difficult to do well.

The reason why AI does well in cognitive tasks such as image recognition and speech recognition is that these tasks are static. The so-called static is a given input, and the prediction result will not change over time.

However, decision-making problems often have complex interactions with the environment. In some scenarios, how to make optimal decisions, these optimal decisions are often dynamic and will change over time.

Some people are now trying to apply AI to the financial market, such as how to use AI technology to analyze stocks, predict stock rises and falls, give advice on stock trading, and even replace people in stock trading. This type of problem is a dynamic decision-making problem.

The second difficulty in decision-making problems lies in the mutual influence of various factors, which can affect the whole body.

The ups and downs of one stock will affect other stocks, and one person's investment decision, especially the investment decision of a large institution, may have an impact on the entire market, which is different from static cognitive tasks.

On static cognitive tasks our predictions do not have any influence on the question (e.g. other images or speech).

But in the stock market, any decision, especially the investment strategy of a large institution, will have an impact on the entire market, other investors, and the future.

At present, deep learning has achieved great success in static tasks. How to extend and extend this success to such complex dynamic decision-making problems is also one of the current challenges of deep learning.

Zhang Shan believes that one possible idea is game machine learning.

In game machine learning, by observing the environment and the behavior of other individuals, and constructing a different personalized behavior model for each individual, AI can think twice before acting.

Choose an optimal policy that will adapt to changes in the environment and changes in the behavior of other individuals.

……

In this paper, Zhang Shan proposed a kind of machine learning that is almost completely anti-deep learning - shallow learning.

Emphasize the importance of enhancing game machine learning, emphasizing the logic and speculative nature of AI, and greatly reducing the amount of "machine learning" tasks.

There is no doubt that this is a whole new way of machine learning!

At the very least, the performance of this new model in processing dynamic information will be revolutionary.

Shallow learning names sounds a bit weird!

The reason why it is not called shallow learning that sounds more straightforward.

It is because in fact shallow learning has appeared on the stage of history!
Due to the invention of the backpropagation algorithm (also called the Back Propagation algorithm or BP algorithm) of the artificial neural network, it has brought hope to machine learning and set off a "statistical model-based" machine learning boom.This craze continues to this day.It has been found that using the BP algorithm can allow an artificial neural network model to learn statistical laws from a large number of training samples, so as to predict unknown events.This statistically-based machine learning method has shown advantages in many ways compared to past artificial rule-based systems.The artificial neural network at this time, although also known as a multi-layer perceptron (Multi-layer Perceptron), is actually a shallow model containing only one layer of hidden layer nodes.

In the 90s, various shallow machine learning models were proposed, such as Support Vector Machines (SVM, Support Vector Machines), Boosting, and maximum entropy methods (such as LR, Logistic Regression), etc.The structure of these models can basically be regarded as having a layer of hidden layer nodes (such as SVM, Boosting), or without hidden layer nodes (such as LR).These models have achieved great success in both theoretical analysis and application.In contrast, due to the difficulty of theoretical analysis and the need for a lot of experience and skills in training methods, shallow artificial neural networks were relatively quiet during this period.

However, it seems inappropriate to call it shallow learning. The previous shallow learning usually refers to shallow supervised learning~
Shallow supervised 1-hidden-layer neural networks have some desirable properties that make them easier to interpret, analyze, and optimize than deep networks; but they are not as representational as deep networks.

A learning problem with 1 hidden layer is generally used to sequentially build deep networks layer by layer, which can inherit the properties of shallow networks.

Zhang Shan also mentioned these in the paper~
Shallow Supervised Learning Deep convolutional neural networks trained on large-scale supervised data via the backpropagation algorithm have become the dominant method in most computer vision tasks.

This has also led to the successful application of deep learning in other fields, such as speech recognition, natural language processing, and reinforcement learning.However, it is still difficult to understand how deep networks behave and why they perform so well.A big reason for this difficulty is the end-to-end learning used in the layers of the network.

Supervised end-to-end learning is a standard approach to neural network optimization.

But it also has some potential problems worth considering.

First, the use of global objectives means that the final functional behavior of a single intermediate layer of a deep network can only be determined indirectly: how these layers work together to obtain high-accuracy predictions is completely unclear.

Some researchers have argued and experimentally shown that CNNs can learn to implement mechanisms that gradually induce invariance into complex but uncorrelated variability while increasing the linear separability of the data.

Sequential learning of CNN layers by solving shallow supervised learning problems is an alternative to end-to-end backpropagation.

This strategy can directly specify the goals of each layer, for example by incentivizing refinement of specific properties of the representation, such as asymptotic linear separability.Theoretical tools for deep greedy methods can then be developed from theoretical understanding of shallow degree subproblems.

However, various shortcomings of traditional shallow supervised learning are still relatively obvious.

(End of this chapter)

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like