PEERspectives: Reviewing the Enzyme Engineering Database (EnzEngDB)

Machine learning for protein engineering needs infrastructure for standardized sequence–function data. EnzEngDB aims to provide it. In this episode of PEERspectives, Le Yuan (PostDoc, NSF Molecule Maker Lab Institute) explores EnzEngDB, a new database platform linking enzyme sequences, mutations, reactions, and experimental performance data. The platform enables storage, visualization, search, and sharing of standardized sequence–function data for protein engineering and machine learning. It also includes an LLM-based pipeline that extracts enzyme engineering data from scientific literature, expanding the database and supporting data-driven enzyme design. PUBLICATION Long Y, Abbasinejad F, Li FZ, et al. Enzyme Engineering Database (EnzEngDB): a platform for sharing and interpreting sequence-function relationships across protein engineering campaigns. Nucleic Acids Res. 2026;54(D1):D564-D571. Doi:10.1093/nar/gkaf1142 ABSTRACT The discovery and engineering of new enzymes is important across the bioeconomy, with diverse applications from foods to pharmaceuticals, sensors to agriculture. However, enzyme engineering, in particular machine learning-guided engineering, is hampered by a lack of data. Currently there exists no database designed to capture and interpret datasets created in this domain, nor are there easy analysis and visualisation tools. We developed the Enzyme Engineering Database to provide a centralized resource and an online analysis tool to consolidate sequence-function data from enzyme engineering campaigns, thereby making three contributions: (i) a database into which researchers can deposit public data, (ii) visualisation and analysis tools for protein engineers to analyse their own data or compare enzyme variants to other engineering campaigns, and (iii) a gold-standard dataset for benchmarking automated extraction along with the first large language model extraction pipeline specific for enzyme engineering campaigns. The Enzyme Engineering Database is accessible at http://enzengdb.org/. KEYWORDS EnzEngDB, Enzyme Engineering Database, enzyme engineering, protein engineering, directed evolution, machine learning biology, AI in biology, bioinformatics, enzyme database, sequence-function relationships, sequence-function mapping, protein design, enzyme design, synthetic biology, biocatalysis, computational biology, biotechnology, protein variants, enzyme optimization, protein machine learning, data-driven protein engineering, protein fitness landscape, scientific databases, literature mining, large language models, LLM biology, enzyme evolution, bioengineering, molecular biology, Nucleic Acids Research, Frances Arnold, machine learning for proteins, AI-driven enzyme engineering, protein sequence analysis, biological data sharing, scientific data management, enzyme activity prediction, protein engineering datasets, bioinformatics tools, computational protein engineering, NSF Molecule Maker Lab Institute (NSF MMLI), MMLI, AI-driven molecular discovery, programmable biomolecules