Wikidata, as was done in this study) and can be used in graph based applications. There are not many tools yet that can process and use nanopublications, but due to their structure, they can be implemented in graph based databases (e.g. Nanopublications are usually accompanied by a meta data header, and if correctly formed and made available they are automatically FAIR (Findable, Accessible, Interoperable and Re-usable 9). Nanopublications are short information sequences built from identifiers and ontologies with embedded provenance, that especially allow data mining and automated read-out 8. In order to describe the content on this spreadsheet and to provide it in a structured way we developed a dedicated version of the DisGeNET semantic model to produce a resource description framework (RDF) file and a set of nanopublications. The data are annotated with OMIM identifiers for the disease, gene identifiers (HGNC 6 and Ensembl 7) for the gene, and PubMed identifiers (PMIDs) for the literature. We produced a mapping dataset, which links rare, monogenic diseases to their causative genes (and vice versa), backed up by the publication which proves the genetic cause for a disease for the first time, or historic provenance. This information is very useful for historic perspectives on genetic and rare disease research. However, none of these databases provides a list or a dataset that links the unique publication, which described the link between the gene and the disease first. The given literature lists, which support gene-disease associations, are useful to provide accumulated provenance for this association. OMIM is the online version of the genetic (Mendelian) disease encyclopaedia that provides information in the form of a literature list for a disease or a gene and provides gene-disease mapping spreadsheets, e.g. Orphanet does as well but focuses more on patient care related information. DisGeNET provides an extensive collection of linked data including a semantic model. in the form of manually curated or text mining derived literature lists. Some well-known databases like OMIM ( Orphanet ( ), and DisGeNET 4 include provenance - e.g. Many genotype-phenotype databases link information about rare diseases, their causative genes, and gene variants, respectively 3. Within this process mapping of genetic data, identifiers and information is required in several ways. There are several bioinformatics workflows available to go from the raw data to the detection of the causative mutation, see e.g. For the identification of the disease-causing variant, experts cross-check with variant databases and use variant pathogenicity prediction algorithms. A typical human individual has about 4.1–5 Mio variants compared to the reference genome 1. Identification of the disease-causing mutation in the plethora of genetic variation an individual human carries is a difficult task in the diagnosis of rare diseases. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.Äescriptions of unusual diseases date back until the ancients, but rare genetic diseases are a relatively new chapter in the history of medicine as genes as carriers of hereditary diseases were only discovered in the middle of the last century (see e.g. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia,, and Google Scholar. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |