Fueling innovation with data
McCowan is cautious not to restrict the use of external tools — particularly cloud-native tools — that help scientists dig for discoveries. At the infrastructure level, Regeneron scientists use AWS EMR and Cloudera. At the data pipeline level, scientists use Apigee, Airflow, NiFi, and Kafka. At the data warehouse level, scientists use Redshift. As you go up the stack, different data analytics come into play, such as Dataiku. From a language perspective, scientists use Python and Jupyter Notebooks.
For McCowan, the key is to give scientists any and all tools that allow them to explore their hypotheses and test theories. “One of the fantastic things about Regeneron is that we’re driven by curiosity,” the CIO says. “We’re driven by science, and by innovation, and we try to avoid putting hard boundaries around what we do because it tends to stifle innovation.”
Despite the fact that Regeneron scientists have AI and ML tools at their disposal, data remains the key, McCowan says, and it’s the power of the cloud and analytics alone that may reveal the next biggest breakthrough from data that is 10 years old.
“I can’t tell you how many times I’ve read about these fantastic projects using AI and ML, but you never see the output because they fail,” McCowan says. “And the reason they are failing is that people are not putting enough thought into where the data is coming from. That is why we built our data infrastructure. So, by the time that data lands in the data lakes, and we start applying AI and ML, we know we are using it against high-quality data.”
As the company’s chief technologist, McCowan’s job is to digitize everything and help scientists make the best use of the data and metadata regardless of how it is generated.
“It always comes back to the data and the insights that we can provide using different technologies and increasing the speed of decision-making,” McCowan says, adding that providing scientists with the ability to run experimentation mathematically through engines using AI and ML models speeds up discovery, but it will never replace the wet lab.
The combination of enhanced IT and science is what will drive maximum innovation at Regeneron, McCowan says. And here, the MetaBio data platform will play a key role in facilitating breakthrough discoveries far faster than previously possible.
“The level of detail there with us digitizing everything, we’re able to apply technology and tools to help scientists make connections that they were just not able to make before,” McCowan says. “If you look at it from a pure data perspective, what we can do is find ways to [enable scientists] to connect the data better and faster and make those insights and bring drugs to market down to a five-year or four-year [process], when before it was a 10-year process.”