What's new

Welcome to kuyez | Welcome My Forum

Join us now to get access to all our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, and so, so much more. It's also quick and totally free, so what are you waiting for?

Secure AI Collaboration Will Fine-Tune OpenFold3 with Proprietary Data

Hoca

Administrator
Staff member
Joined
Apr 6, 2025
Messages
114
Reaction score
0
Points
0
“Public data is hitting its limit. The real breakthroughs can only happen through increased amounts of data and of course, tapping into industrial data,” so says Robin Roehm, CEO and co-founder of Apheris, a Berlin-based start-up focused on enabling governed, private, and secure access to data for machine learning.

In a new initiative by the AI Structural Biology (AISB) Consortium and powered by Apheris, OpenFold3, a protein structure prediction algorithm developed by the lab of Mohammed AlQuraishi, PhD, assistant professor of systems biology at Columbia University, will be fine-tuned using proprietary data from AbbVie and Johnson & Johnson in a confidentiality-preserving environment.

The collaboration will evaluate and refine OpenFold3 for predicting 3D structures of molecule complexes, focusing on small molecule-protein and antibody-antigen interactions for drug discovery.

Roehm says the motivation for this initiative was to bring parties together to build a better predictive model for individual commercial use. While access to the newly fine-tuned OpenFold3 will be limited to participants who contribute their proprietary data to the program, Roehm highlights that different forms of data accessibility are impactful for the industry.

“A key problem for any party that builds and innovates new model architectures is that they cannot benchmark on proprietary data. The validity for industrial grade research is something that you cannot assess,” Roehm told GEN. “Access to industry data for benchmarking is a huge value add for everyone who builds models.”

According to AlQuraishi, an open-source version of OpenFold 3, only trained on publicly available data, will also be available “soon” to the greater scientific community with a highly permissive license for both commercial and non-commercial use.

“Through this consortium, we can share data with other pharma partners, exploring the hypothesis that each of our internal data sets will be highly complementary when training AI models,” said John Karanicolas, PhD, head of computational drug discovery at AbbVie. “The result could be transformative in how we advance AI-driven drug discovery to develop better medicines faster.”

Getting complex

According to Roehm, “Who will build the next protein data bank (PDB)?” has been one of the biggest questions of the AI space.

AlphaFold, the AI breakthrough that revolutionized biological research by providing access to virtually all the 200 million protein structures that researchers have identified, was powered by the PDB, the public repository housing over 200,000 entries for experimentally-determined protein and nucleic acid structure data collected by scientists over 50 years.

Given that AlphaFold lacked the code and data required to train new models, researchers were unable to investigate AlphaFold’s sensitivity to changes in data composition and model architecture nor create variants of the model to tackle new tasks beyond protein structure prediction. Last year, AlQuraishi’s team published OpenFold as a fast, memory efficient, and trainable implementation of AlphaFold, to address this gap.

Notably, the next wave of AI models, such as AlphaFold 3 and AlQuraishi’s OpenFold3, have moved toward understanding protein-ligand interactions, a focus area crucial for drug discovery yet particularly sparse in public databases.

“We expect that by training on proprietary data, the model will become more capable on hard problems that AlphaFold 3-based models struggle with, such as predicting protein-small molecule complexes. This is especially likely because the availability of such data is limited in the PDB, and often excludes small molecule drugs that are of most practical interest,” AlQuraishi told GEN.

Given the impact of public repositories, such as the PDB, Roehm said the industry generally “wishes for more open access to models and data.” However, expecting large-scale pharma companies to publicly release their assets may not be realistic.

To maintain data confidentiality between parties, the AISB Consortium initiative leverages a federated architecture, which trains the model within individual environments for each pharma company, and then aggregates the results to a global model in a way that the underlying data cannot be reverse engineered. The program also implements data quality assessments to support fair commercial value exchange for participating parties.

Although leveraging industrial data is argued to be a “game changer,” most proprietary assets remain under lock and key. Time will tell whether evolving data collaboration efforts will impactfully push forward the next generation of AI breakthroughs.

The post Secure AI Collaboration Will Fine-Tune OpenFold3 with Proprietary Data appeared first on GEN - Genetic Engineering and Biotechnology News.
 
Top Bottom