AI Unlocks Novel Protein Design by Decoding Bacterial Genomes
8
What is the Viqus Verdict?
We evaluate each news story based on its real impact versus its media hype to offer a clear and objective perspective.
AI Analysis:
While the immediate applications may seem niche, this research demonstrates a fundamental shift in how AI can approach biological design, fueling broader excitement and speculation about the future of synthetic biology and drug discovery.” 2025. DOI: 10.1038/s41586-025-09749-7 ( About DOIs ). John Timmer Senior Science Editor John Timmer Senior Science Editor John is Ars Technica's science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots. 0 Comment.
Article Summary
A team at Stanford University has achieved a breakthrough in protein design by creating ‘Evo,’ a novel genomic language model trained on a massive collection of bacterial genomes. The model leverages the principle that proteins are derived from nucleic acid changes rather than direct protein generation. Evo operates similarly to large language models, predicting the next base in a DNA sequence, but with a crucial difference: it’s trained on bacterial genomes, which frequently exhibit clustering of functionally related genes transcribed into a single mRNA. This allows Evo to ‘link nucleotide-level patterns to kilobase-scale genomic context,’ effectively understanding the statistical rules governing genomic DNA. Remarkably, when prompted with a novel bacterial toxin lacking a known antitoxin, Evo generated two completely new antitoxin proteins with only 25% sequence identity to known ones, assembled from 15-20 individual proteins. Furthermore, the system created RNA-based inhibitors of CRISPR and even predicted entirely new proteins, demonstrating an ability to generate novel proteins without considering their 3D structure. The research highlights a shift in protein design towards harnessing the principles of evolutionary adaptation encoded within genomic data. This isn’t a replacement for directed enzyme design but represents a fundamentally different approach that leverages the raw, undirected power of evolution.Key Points
- Evo, a genomic language model, is trained on bacterial genomes to predict and generate new proteins.
- The model exploits the clustering of functionally related genes in bacterial genomes, mimicking how proteins are derived from nucleic acid changes.
- Evo successfully created two entirely new antitoxin proteins with limited similarity to known anti-toxins, showcasing its ability to generate novel sequences.