In 2006, Netflix took a long, hard look at its world-class Cinematch Technology. Cinematch was as straightforward as it sounds: With user-submitted data, the technology could predict which movies a user may or may not enjoy. With it, Netflix created personalized movie recommendation lists, custom-built for each individual user.
Cinematch worked, but behind the scenes, Netflix worried it was not operating at full potential. So, in a move completely unheard of at the time, Netflix made public a huge set of its anonymous rating data and issued a global challenge: develop an algorithm that could beat it.
The Netflix Prize, as it came to be known, was the first large-scale public crowdsourced competition of its kind. The competition helped draw attention to the value of recommendation engines across the world. Soon, e-commerce companies were touting similar technology, and, as a result, the internet became a better place for consumers and for brands. The Netflix Prize opened new doors for data science, and pushed it to new heights.
Now, nearly 10 years since I handed out the million-dollar Netflix Prize, I think it’s time for five more industries to issue a “Netflix Prize” of their own. By crowdsourcing for solutions using machine learning to utilize the abundant data available across these five industries, we can find relevant signals and patterns among the noise and make these industries not only more efficient, but also more integral to improving our lives.
The security industry, and specifically methods around risk and fraud detection, is a very rules-driven market. When consumers make purchases online or in stores with a credit card, the card issuer will run a quick approval process that involves questions such as: Is the consumer’s account in good standing? Is the card being used at a merchant that’s relevant to the consumer? Is the location of the store in line with locations of recent purchases? Based on these (and several more) data points, the transaction is either approved or denied.
These rules-based authentication systems are woefully incomplete, and often unable to adapt the changing nature of data. There is so much more data about consumers and their devices, both online and offline, that can be taken into consideration.
A crowdsourced competition, like the Netflix Prize, could provide the opportunity to expose user patterns across devices, time and location to uncover unique user behavior that provides a more complete view of the purchaser. Ultimately, applying machine learning to the security industry could allow security teams to build adaptive learning strategies to not only make fraudulent transactions less likely, but also ensure legitimate transactions aren’t flagged.
Health and pharmacology
Data is the key to saving lives, but right now, pharmaceutical companies and healthcare providers largely operate in their own silos. This is a poorly designed system for solving critical health issues. A crowdsourced Netflix Prize for medicine and healthcare could produce radical results.
In fact, there is already evidence to support it. In 2012, the pharmaceutical company Merck hosted a contest wherein it shared data on the chemical structure of thousands of different molecules, and tasked the scientific community to identify which might lead to new and better drugs. The winning result demonstrated a 17 percent improvement over the industry standard benchmark, and blazed new avenues for pharmaceutical research aided by machine learning.
Beyond chemical data, patients have generations’ worth of family data we willingly provide to doctors. From heart-rate records, urine samples, family history, blood pressure and pages and pages of doctors’ notes, the body is one large data science dream. If we apply big data machine learning techniques to all of that data, while complying with HIPAA patient confidentiality, could medical professionals detect patterns and susceptibilities in families and individuals before they become a problem? Likewise, if we were able to input every piece of data from every drug study, we could potentially make better predictions about drugs, beyond even the Merck example.
Advertising and marketing
In the world of advertising and marketing technology, a big gap that every brand, agency and enterprise faces is digital identity. The issue stems from the global phenomenon of device proliferation. Between our smartphones, tablets, laptops, connected TVs, smartwatches and even connected cars, our digital lives are extremely fragmented, and customer experiences on the internet are largely broken.
Facebook, Google, Amazon, Netflix and others have “solved” this by forcing a login. For example, my Facebook News Feed is identical on mobile and desktop; Amazon recommended products that are consistent across devices and unique to me. But what about the rest of the internet? What about the time I spend online and in apps where I’m not logged in?
The good news is that the internet, by definition, is a boundless sea of data. Browser data, device data, location data, usage data, network data — enough data to keep an army of data scientists busy trying to resolve identity by using these signals. Several companies are already addressing this question of digital identity, but there are few, if any, open standards and very limited collaboration.
If solved by a crowdsourced data science competition, digital identity can revolutionize online experiences for both brands and consumers. Recommendations and content can be personalized, and marketing can be automated. Cross-device attribution becomes simple, and the marketer’s view of consumers becomes holistic.
Traffic and transit
Think of all the data we issue every morning while combating our daily commutes. We input data to Waze, share our location and speed with GPS, drive through speed monitor zones and even provide license plate information and traffic patterns with intersection cameras and toll booths. There’s data coming from millions of drivers, and even more from buses and trains.
What if there were a crowdsourced “Netflix Prize” to produce an open-source program that could tell us exactly when to leave our house to minimize our commute time or arrive precisely when we want, beyond what Google Maps can do today. If all of this data was more widely available to data scientists, a data science challenge could help determine how many lanes to have open at a given time of day, how to fluctuate tolls dynamically based on traffic needs, how to monitor and rate the changing of traffic signals and much more. Machine learning in this realm could vastly improve traffic flows and create a more efficient transportation environment and experience.
By 2050, Earth will be home to nine billion people. That’s a 35 percent increase from today, or two billion additional mouths to feed. Can our current agricultural practices keep up with that demand for food without wreaking havoc on the planet? Agriculture has become one of the most contested practices when it comes to environmentalism. Crops take water, emit carbon dioxide, require pesticides and send runoff nitrogen and other waste into our precious water. It’s time for a more data-driven approach to farming.
From weather patterns to soil nutrient levels, even insect life and plant growth records, agricultural data can be used to determine not only which crops to plant, but also when to do it, where to do it, how to harvest and even how much to irrigate. The Farmer’s Almanac and human hunches have been trusted sources for centuries, but at this crucial turning point, is it possible to make the industry a more precise science, and potentially save our race? By implementing a crowdsourced solution that fuses machine learning with innovative engineering, we could build sustainable solutions to support generations to come.
Ten years ago, the Netflix Prize used data and science to change recommendation engines for the better. Algorithmic intelligence changed the things we watched on our screens. What would happen if we applied those same concepts to other industries? These five are just the beginning.