La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching

ICLR 2026 Conference SubmissionAnonymous Authors
Atomistic protein designflow matchinglatent diffusionmotif scaffolding
Abstract:

Recently, many generative models for de novo protein structure design have emerged. Yet, only few tackle the difficult task of directly generating fully atomistic structures jointly with the underlying amino acid sequence. This is challenging, for instance, because the model must reason over side chains that change in length during generation. We introduce La-Proteina for atomistic protein design based on a novel partially latent protein representation: coarse backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables of fixed dimensionality, thereby effectively side-stepping challenges of explicit side-chain representations. Flow matching in this partially latent space then models the joint distribution over sequences and full-atom structures. La-Proteina achieves state-of-the-art performance on multiple generation benchmarks, including all-atom co-designability, diversity, and structural validity, as confirmed through detailed structural analyses and evaluations. Notably, La-Proteina also surpasses previous models in atomistic motif scaffolding performance, unlocking critical atomistic structure-conditioned protein design tasks. Moreover, La-Proteina is able to generate co-designable proteins of up to 800 residues, a regime where most baselines collapse and fail to produce valid samples, demonstrating La-Proteina's scalability and robustness.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: atomistic protein structure and sequence co-design. The field has evolved into a rich landscape organized around several complementary themes. Generative Model Architectures for Co-Design explores modern neural frameworks—ranging from diffusion models and flow matching to equivariant networks—that jointly generate backbone geometry and amino acid sequences. Functional and Context-Conditioned Protein Design addresses the challenge of designing proteins that bind specific ligands, recognize antigens, or satisfy functional constraints, often requiring careful modeling of binding pockets and interaction interfaces. Inverse Folding and Sequence Optimization focuses on methods that take a fixed or flexible backbone and optimize sequences to stabilize the fold, including classical energy-based approaches and newer learning-based techniques. Representation Learning and Pretraining for Protein Design examines how large-scale pretraining on structural databases can inform downstream design tasks, while Evaluation, Benchmarking, and Design Principles provides the experimental and computational standards needed to validate designed proteins. Finally, Reviews, Perspectives, and Historical Foundations situate recent advances within the broader trajectory of computational protein engineering. Within Generative Model Architectures, a particularly active line of work centers on flow-based methods that model continuous transformations of structure and sequence. LaProteina[0] exemplifies this direction by employing flow matching techniques for co-design, sitting alongside other flow-based approaches such as Discrete Generative Flow[3] and ProteinZen[40], which explore discrete and continuous normalizing flows respectively. These methods contrast with diffusion-based frameworks like RoseTTAFold Diffusion[5] and Equivariant Diffusion Codesign[15], which rely on iterative denoising rather than deterministic flow trajectories. A key trade-off across these branches involves balancing the expressiveness of generative architectures with computational efficiency and the ability to incorporate physical constraints. LaProteina[0] contributes to this conversation by demonstrating how flow matching can achieve competitive design quality while offering more direct control over the generative process, positioning it as a methodologically distinct yet complementary alternative to diffusion and energy-based paradigms.

Claimed Contributions

La-Proteina: Partially Latent Flow Matching Framework for Atomistic Protein Design

The authors propose La-Proteina, a generative model that uses a partially latent representation where the α-carbon backbone is modeled explicitly and sequence plus side-chain details are encoded in fixed-size per-residue latent variables. Flow matching in this hybrid space jointly models the distribution over sequences and full-atom structures.

10 retrieved papers
State-of-the-Art Performance on Unconditional Atomistic Protein Generation

The authors demonstrate that La-Proteina achieves state-of-the-art results on unconditional atomistic protein generation benchmarks, outperforming existing methods in all-atom co-designability, diversity, and structural validity metrics.

6 retrieved papers
Can Refute
Atomistic Motif Scaffolding for Indexed and Unindexed Tasks

The authors successfully apply La-Proteina to atomistic motif scaffolding tasks, including both indexed (where motif residue positions are specified) and unindexed (where positions are unknown) setups, as well as all-atom and tip-atom scaffolding variants, outperforming existing baselines.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

La-Proteina: Partially Latent Flow Matching Framework for Atomistic Protein Design

The authors propose La-Proteina, a generative model that uses a partially latent representation where the α-carbon backbone is modeled explicitly and sequence plus side-chain details are encoded in fixed-size per-residue latent variables. Flow matching in this hybrid space jointly models the distribution over sequences and full-atom structures.

Contribution

State-of-the-Art Performance on Unconditional Atomistic Protein Generation

The authors demonstrate that La-Proteina achieves state-of-the-art results on unconditional atomistic protein generation benchmarks, outperforming existing methods in all-atom co-designability, diversity, and structural validity metrics.

Contribution

Atomistic Motif Scaffolding for Indexed and Unindexed Tasks

The authors successfully apply La-Proteina to atomistic motif scaffolding tasks, including both indexed (where motif residue positions are specified) and unindexed (where positions are unknown) setups, as well as all-atom and tip-atom scaffolding variants, outperforming existing baselines.