AUC 2022

View Contributions

US-SOMO-AF: a database of hydrodynamic, circular dichroism, and SAXS-derived parameters for the AlphaFold-predicted protein structures
Submitter: Mattia Rocco
Authors: Emre Brookes, Mattia Rocco
Corresponding Author: Mattia Rocco
Title: US-SOMO-AF: a database of hydrodynamic, circular dichroism, and SAXS-derived parameters for the AlphaFold-predicted protein structures
Contribution Type: Full Talk
Selected for Presentation Yes
Abstract: Following recent spectacular advances in AI-based 3D structure predictions from protein sequences, the AlphaFold (AF) consortium has made available a database for the entire human and other organisms' proteomes (https://alphafold.ebi.ac.uk). However, apart from simple cases of highly homologous sequences, or clearly recognized folding classes, how to rapidly ascertain a predicted structure's reliability should be considered. Shape-sensitive hydrodynamic parameters such as the translational diffusion and sedimentation coefficients (Dt(20,w), s(20,w)), and the intrinsic viscosity ([η]) can assess the overall likeliness of a conformation, and SAXS yields the pair-wise distance distribution function p(r) vs. r, providing a direct structure correspondence. On this basis, we have calculated from the entire AF database containing >1,000,000 structures the corresponding Dt(20,w), s(20,w), Rs, [η], p(r) vs. r, and other parameters, using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite (http://somo.aucsolutions.com), and placed them in the US-SOMO-AF novel public domain database (https://somo.genapp.rocks/somoaf). Circular dichroism (CD) spectra were also computed using the SESCA program. Some of AF's drawbacks were mitigated, such as generating whenever possible a protein's mature form (resulting in ~110,000 curated entries, a sizeable amount). Others, like the AF current availability of single-chain structures only, or the absence of prosthetic groups, limit their present direct applicability. Tests were conducted to verify the discriminatory capability of the calculated parameters on a subset of ~42,000 AF structures. For Rs and [η], their calculated values were grouped in 5 kDa MW intervals bins, and the % differences between each pair were calculated. More than 70% of the Rs and 90% of [η] pairs had a difference greater than 9%, 3-times the average experimental error. For SAXS, eight p(r) vs. r datasets were chosen from the SASBDB database (https://www.sasbdb.org/) and compared with those calculated on the corresponding AF entries. Both good accord or noticeable differences were observed. These results confirm the major role that hydrodynamics and SAXS could provide in rapidly assessing the reliability of AF-predicted protein structures, supporting the usefulness of the novel US-SOMO-AF database.

Rocco and Brookes. A database of calculated solution parameters for the AlphaFold predicted protein structures. Sci. Rep., in press, 2022. https://doi.org/10.1038/s41598-022-10607-z