Internet giants like Google and Facebook have taken a leading role in the development of artificial intelligence (AI) and that is down to one thing: data. Owing to their billions of daily active users and aggressive information-gathering methods, these companies have access to enormous troves of data.
But AI cannot always rely on big data sets. Accumulating even a tiny fraction of the data in the hands of major consumer internet companies would require a massive investment of time and money. The future for most companies lies in small data – information sets that are accessible to human comprehension.
“Something that we have found, particularly in manufacturing, is that people usually don’t have [significant] amounts of data, or they end up with very imbalanced datasets,” said Alejandro Betancourt, the general manager and senior machine learning tech lead of Landing AI Latin America, an AI innovation company with offices in Medellin, Colombia. “Usually companies have around 1,000 images of [well-produced] elements and five or 10 examples of defective products. It is very challenging for them to start training the system… and produce the best performance they can with that little information.”
Landing AI is one of a group of companies founded and run by Andrew Ng, a tech entrepreneur and leading AI expert. His other ventures include Deeplearning.ai, a company focused on developing AI education initiatives, and AI Fund, an AI startup accelerator. Last year, Ng opened a new Latin American headquarters for his three companies in Medellin. The destination caught his eye because of its strong educational system, thriving startup culture and supportive government.
While AI initiatives in Beijing and Silicon Valley have grown up alongside internet companies with access to big data, the Medellin hub is focusing on the problem of data scarcity.
“I think [every] region has different challenges,” Betancourt said. “Little by little, people specialize in different areas… A large part of the work that is being developed in Landing AI around smaller data sets is being developed by people that work in the Medellin office. It’s nice to see that the teams located in Latin America can contribute to technology that can be used later by a company in Asia or the US.”
A recent IBM/Forbes Insights survey of more than 200 technology leaders around the world singled out a lack of data as the most significant barrier to AI adoption in their companies. Industries such as manufacturing and healthcare rarely have access to big data sets. A car manufacturer will not have a million images of faulty parts. A healthcare provider will not have a million CT scans of new diseases such as Covid-19.
But recent advances in small data techniques have the potential to open AI to a broad range of industries that work with much smaller sets of data.
Small Data Techniques
One of the most popular techniques for training AI on a small amount of data is few-shot learning. As its name suggests, the technique requires just a few data points.
“Traditionally, you tried to train models with millions of observations,” Betancourt said. “In few-shot learning what you do is change the question. You say: ‘I have three samples. One for car, one for dog and one for cat.’ The challenge for the model is not to say which it is every time you get a new image… Instead, they try to find which is the most similar one… You reformulate the question. Now, [the model] is not saying whether it’s a dog or a cat. Instead, it is saying: ‘from the samples I have, this is most similar to a dog.’
The technique is mainly used for training computer vision for manufacturing lines. It allows manufacturers to cheaply categorize objects, even when data scarcity is an issue. One-shot learning is an extreme variation of few-shot learning that requires just a single training image.
Transfer learning is another breakthrough AI training technique.
“Maybe you don’t have enough data but there is a dataset published that you can use to train a first version of your model,” Betancourt said. “That data might come from a competitor or a [research] paper.”
Through transfer learning, 1,000 images of scratches on a variety of electronic products could be used to train an AI model to detect scratches on cell phones.
Synthetic data generation is another revolutionary technology. An AI model can be trained on synthetic data (for example, a 3D model of a scratched cell phone) before transferring that learning to real data sets (a cell phone manufacturing line).
Betancourt stresses that small data and big data are complementary rather than opposing concepts. “In the end, the more quality data you have the better systems you will end up with,” he said. But as data scarcity and cost are significant deterrents to AI adoption, small data techniques remove these entry barriers.
“If you managed to deploy your system with a small sample, what ends up happening is you start collecting more and more data,” Betancourt said. “[Small data techniques] help different industries that have not traditionally been involved in machine learning or in AI learning development to jump in and start creating value.”
Initiating AI programs will be the most significant task most business leaders – regardless of their sector – face in the coming years. IBM/Forbes Insights research indicates that only 4% of AI applications are currently critical to business. In contrast, 42% of those applications are pilot efforts, 26% are experimental, while another 4% are still in the planning phase.
It may still be early days for the use of AI in business, but analysts agree the technology is set to transform productivity and generate enormous wealth. The research firm McKinsey sees AI delivering the equivalent of an additional $13 trillion to the global economy by 2030. Meanwhile, Tej Kohli, a tech entrepreneur, sees AI adding another $150 trillion by 2025.
Regardless of the exact dimensions of the transformation, AI will undoubtedly play a key role in successful business strategies for the future. The technology is set to be a competitive differentiator in practically every industry. Companies testing small data solutions today could soon become powerhouses, while those sticking to traditional methods will most likely disappear.
What does it take achieve great outcomes in Nearshore services? If you would like to share an exciting case study or news story drop me a note — Steve Woodman, Managing Editor