This interview has been edited briefly and lightly for clarity.
MIT Technology Review: I’m sure people often ask you, “How do I do AI-First business?” What do you usually call it?
Andrew NG: I usually say, “Don’t do this.” If I go to a team and say, “Hey, everyone, please be AI-First,” which focuses the team on technology, which can be great for a research lab. But in terms of how I run the business, I’m almost never customer-led or mission-led, almost never technology-led.
You now have this new initiative called Landing AI. Can you tell us a bit about what it is and why you chose to work on it?
After leading AI teams at Google and Baidu, I realized that software like AI web search and online advertising has transformed the consumer internet. But I wanted to take AI to all of the other industries, it’s a much bigger part of the economy. So after looking at a lot of different industries I decided to focus on manufacturing. I think multiple industries are AI-ready, but one example of an industry being more AI-ready is that if there is some digital conversion there is some information. This creates opportunities for AI teams to create values using data.
So one of the projects I’ve been interested in lately is producing visual inspections. Can you see the image from the smartphone’s production line and see if there’s anything wrong with it? Or look at an auto component and see if there are any holes in it? One huge difference is that the customer software is the Internet, you probably have one billion users and a huge amount of data. In terms of manufacturing, however, no factory has been able to produce billions or even millions of scratched smartphones. Thank you Mangalbhav for this. So the challenge is, can you get an AI to work with a hundred images? It often turns out you can. I have actually wondered many times how much you can do even with a moderate amount of data. And despite all the hype and excitement and public relations giant data sets around AI, I think the challenges are quite different where we need to grow a lot more to open up these other apps.
How did you do that
Too often I see CEOs and CIOs make mistakes: they tell me something like “Hey, Andrew, we don’t have that much data – my data is messed up. So give me two years to build a great IT infrastructure. Then we will have this great data to create AI. I always say, “This is a mistake. Don’t do it. “First of all, I don’t think any company on the planet today – not even technologists – thinks their data is completely clean and perfect. It’s a journey. You are missing feedback from the AI team to help prioritize.
For example, if you have a lot of users, should you ask them questions in the survey to get some more data? Or in a factory, what would you prefer to upgrade a sensor from something that records the sensor 10 times per second? It often starts with an AI project with the data you already have that enables you to respond to help an AI team prioritize what additional data to collect.
In industries where we don’t just scale consumer software internet, I think we need to change our mindset. Huge Information Good Information. If you have millions of images, go ahead, use it – it’s great. However there are many problems that can use many small data sets that are clearly labeled and carefully cured.
Can you give an example? What do you mean by good data?
I can first give an example from speech recognition. When I was working with voice search, you would find audio clips where you heard someone say, “Um today’s weather”. The question is, what is the exact transcript of that audio clip? Is it “um (comma) today’s weather,” or is it “um (dot, dot, dot) today’s weather,” or “um” something we just don’t copy? Any one of these has been proven to be subtle, but it is not correct for different transcultures to use each of the three labeling conventions. Then your data is noisy, and it hits the speech detection system. Now, when you have a few million or a billion users, you can have that noise information and just average it out. The learning algorithm will fix that. But if you are in a setting where you have a small data set – say, a hundred examples – then this kind of noise will have a huge impact on data performance.
Another example from manufacturing: We’ve done a lot of work with steel inspections. If you drive a car, the side of your car was once made of steel sheets. Sometimes there are very few wrinkles on the steel, or it has little dents or stains. So you can use a camera and computer vision to see if there are any errors. However, different labels will label the data separately. Some will put a huge bounding box across the whole area. Put a little bounding box around some small particles. When you have a moderate data set, make sure that different quality visitors consistently label the data – this becomes the most important thing.
For many AI projects, the open-source model literature you downloaded GitHub is a good enough neural network that you can get from the literature. Not for all problems, but major problems. So I went to a lot of my teams and said, “Hey, everyone, the neural network is good enough. Let’s not mess with the code anymore. All you have to do now is create processes to improve data quality. “And it turns out that often the effectiveness of algorithms improves rapidly.
What is the data size you are thinking about when you say small data set? Are you talking about a hundred examples? Ten examples?
Machine learning is so diverse that one-size-fits-all is really hard to answer. I’ve worked on issues that have about 200 to 300 million images. I’ve also worked on issues that have my 10 images and everything in between. When I look at manufacturing applications, I think it’s not uncommon for a picture of a hundred images to be something like a decade or perhaps a defective class, but there is also a wide variety of it within the factory.
I’ve found that switching AI exercises goes under the training set size, let’s say 10,000 examples, because it’s a marginal edge where engineers can basically look at each example and design it themselves and then make a decision.
Recently I was chatting with a very good engineer in a big technology company. And I asked, “Hey, what do you do if the labels are incomplete?” And he said, “Well, we have a few hundred people abroad who do the labeling. So I’ll write the labeling instructions, bring in three people to label each image, and then I’ll build.” And I said, “Yeah, when you have any big data. It’s the right thing to do when set. “” But when I work with a small team and the labels are inconsistent, I don’t just get two people to agree with each other, I get both on a zoom call and try to reach a resolution. Let’s try to talk to each other.
We would like to draw your attention right now to talk about your thoughts on the general AI industry. Algorithm This is our AI newsletter, and I give our readers the opportunity to submit some questions to you in advance. One reader asked: The development of AI seems to be largely biased in favor of academic research or programs of larger organizations such as the larger, resource-intensive, OpenAI, and DeepMind. It doesn’t really leave much space for small startups to contribute. Do you think there are some real issues that small companies can really focus on for commercial adoption of AI?
I think a lot of the media focus is on large corporations, and sometimes on large academic institutions. But if you go to academic conferences, small research teams and research labs have done a lot of work. And when I talk to different people in different organizations and in the industry, I think there are many business applications that they can use AI to deal with. I usually go to business leaders and ask, “What are your biggest business problems? What things worry you the most?” So I can better understand business goals and then storm my brain to see if there are any AI solutions. And sometimes not, and that’s okay.
Perhaps I will mention a few gaps that I find exciting. I think creating AI systems today is still very manual. You have a few brilliant machine-learning engineers and data scientists do things on a computer and then push the products into production. The process has many manual steps. So I got excited about ML Ops [machine learning operations] To help streamline the process of building and deploying AI systems as an emerging discipline.
Also, if you’ve seen a lot of business issues – from marketing to talent, there’s plenty of room for improvement in automation and efficiency.
I also hope the AI community can look at the biggest social problems – see what we can do for climate change or homelessness or poverty. Sometimes in addition to very valuable business problems we should be dealing with the biggest social problems.
How do you follow the process of identifying if there is an opportunity for your business to do something with machine learning?
I will try to learn something about business myself and help business leaders learn something about AI. Then we usually look at a set of projects intelligently and for each concept, I will do both technical perseverance and business perseverance. We will note: Do you have enough data? What is accuracy? Do you have a long tail when you are engaged in production? How do you fill in the data and close the loop for continuous learning? So make sure the problem is technically possible. And then the perseverance of the business: we are sure that it will achieve the ROI that we are expecting. After this process, you have the usual routine such as allocating resources, milestones and then hopefully can be implemented.
Another suggestion: Getting started fast is more important and getting started small is just okay. But the Google Speech team helps to make speech recognition more accurate, which helps the brain team to go after credibility and larger partnerships. Google Maps was the second largest partnership where we used Computer Vision to geolocate homes on Google Maps to read house numbers. And after these first two successful projects, I had more serious conversations with the advertising team. So I think I see more companies failing by starting too small and failing big. It’s best to start a small project as an organization to learn how to use ATI, and then build bigger success.
What should our visitors start from tomorrow to implement AI in their companies?
Jump up. AI changes the pace of many industries. So if your company isn’t already quite aggressive and smart investing, this is a good time.