Search SciRate

2 results for au:Vincoff_S in:q-bio

Show all abstracts

MeMDLM: De Novo Membrane Protein Design with Masked Discrete Diffusion Protein Language Models
Shrey Goel, Vishrut Thoutam, Edgar Mariano Marroquin, Aaron Gokaslan, Arash Firouzbakht, Sophia Vincoff, Volodymyr Kuleshov, Huong T. Kratochvil, Pranam Chatterjee
Oct 23 2024 q-bio.BM arXiv:2410.16735v1

@misc{2410.16735, author = {Shrey Goel and Vishrut Thoutam and Edgar Mariano Marroquin and Aaron Gokaslan and Arash Firouzbakht and Sophia Vincoff and Volodymyr Kuleshov and Huong T.~Kratochvil and Pranam Chatterjee}, title = {{M}e{MDLM}: {D}e {N}ovo {M}embrane {P}rotein {D}esign with {M}asked {D}iscrete {D}iffusion {P}rotein {L}anguage {M}odels}, year = {2024}, eprint = {2410.16735}, note = {arXiv:2410.16735v1} }
PDF
Masked Diffusion Language Models (MDLMs) have recently emerged as a strong class of generative models, paralleling state-of-the-art (SOTA) autoregressive (AR) performance across natural language modeling domains. While there have been advances in AR as well as both latent and discrete diffusion-based approaches for protein sequence design, masked diffusion language modeling with protein language models (pLMs) is unexplored. In this work, we introduce MeMDLM, an MDLM tailored for membrane protein design, harnessing the SOTA pLM ESM-2 to de novo generate realistic membrane proteins for downstream experimental applications. Our evaluations demonstrate that MeMDLM-generated proteins exceed AR-based methods by generating sequences with greater transmembrane (TM) character. We further apply our design framework to scaffold soluble and TM motifs in sequences, demonstrating that MeMDLM-reconstructed sequences achieve greater biological similarity to their original counterparts compared to SOTA inpainting methods. Finally, we show that MeMDLM captures physicochemical membrane protein properties with similar fidelity as SOTA pLMs, paving the way for experimental applications. In total, our pipeline motivates future exploration of MDLM-based pLMs for protein design.
PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling
Tianlai Chen, Madeleine Dumas, Rio Watson, Sophia Vincoff, Christina Peng, Lin Zhao, Lauren Hong, Sarah Pertsemlidis, Mayumi Shaepers-Cheu, Tian Zi Wang, Divya Srijay, Connor Monticello, Pranay Vure, Rishab Pulugurta, Kseniia Kholina, Shrey Goel, Matthew P. DeLisa, Ray Truant, Hector C. Aguilar, Pranam Chatterjee
Oct 24 2023 q-bio.BM arXiv:2310.03842v3

@misc{2310.03842, author = {Tianlai Chen and Madeleine Dumas and Rio Watson and Sophia Vincoff and Christina Peng and Lin Zhao and Lauren Hong and Sarah Pertsemlidis and Mayumi Shaepers-Cheu and Tian Zi Wang and Divya Srijay and Connor Monticello and Pranay Vure and Rishab Pulugurta and Kseniia Kholina and Shrey Goel and Matthew P.~DeLisa and Ray Truant and Hector C.~Aguilar and Pranam Chatterjee}, title = {{P}ep{MLM}: {T}arget {S}equence-{C}onditioned {G}eneration of {T}herapeutic {P}eptide {B}inders via {S}pan {M}asked {L}anguage {M}odeling}, year = {2023}, eprint = {2310.03842}, note = {arXiv:2310.03842v3} }
PDF
Target proteins that lack accessible binding pockets and conformational stability have posed increasing challenges for drug development. Induced proximity strategies, such as PROTACs and molecular glues, have thus gained attention as pharmacological alternatives, but still require small molecule docking at binding pockets for targeted protein degradation. The computational design of protein-based binders presents unique opportunities to access "undruggable" targets, but have often relied on stable 3D structures or structure-influenced latent spaces for effective binder generation. In this work, we introduce PepMLM, a target sequence-conditioned generator of de novo linear peptide binders. By employing a novel span masking strategy that uniquely positions cognate peptide sequences at the C-terminus of target protein sequences, PepMLM fine-tunes the state-of-the-art ESM-2 pLM to fully reconstruct the binder region, achieving low perplexities matching or improving upon validated peptide-protein sequence pairs. After successful in silico benchmarking with AlphaFold-Multimer, outperforming RFDiffusion on structured targets, we experimentally verify PepMLM's efficacy via fusion of model-derived peptides to E3 ubiquitin ligase domains, demonstrating endogenous degradation of emergent viral phosphoproteins and Huntington's disease-driving proteins. In total, PepMLM enables the generative design of candidate binders to any target protein, without the requirement of target structure, empowering downstream therapeutic applications.