-
Notifications
You must be signed in to change notification settings - Fork 2
Description
MaveDB obtains gnomAD minor allele frequencies for variants assayed at the DNA level, but not for protein variants. Its data model allows each mapped variant to be linked to several gnomAD variants.
Should we provide minor allele frequencies for protein variants? This would involve reverse translation, and we could use our existing variant_translations table for this purpose, since gnomAD linkage is already limited to variants with ClinGen IDs. In the UI and perhaps in API responses, we would presumably give the sum of the allele counts and frequencies, and we would omit any ancestry-specific information, perhaps excepting the case where there is agreement in faf95_max_ancestry.
I have needed to do this to annotate a protein variant set, and I have code ready in case we want to integrate it into MaveDB. But it seems to warrant some discussion about (a) whether it's appropriate and (b) how to make our approach clear to users.