Scientists, including Biological Sciences alumnus Thijs van den Burg, have published a standard method for checking the quality of data in the global genetic database, GenBank. Using amphibians as a case study, they reviewed more than 39,000 records, of which over 2,300 were found to be incorrect. The article is published in Scientific Data.
Scientists in many different biological disciplines use genetic information in their research. When they publish results in scientific journals, they are asked to make new genetic information freely accessible via a genetic database. However, the files they offer are rarely updated afterwards
Within taxonomy, biologists describe new species and global species richness. This field is moving forward and with new data the boundaries between species can change. For example, one species can be split into several species, but also vice versa, and, species can be placed within another genus.
Thijs van den Burg is the first author of the research article that he conducted as part of his MSc Biological Sciences at the UvA Institute for Biodiversity and Ecosystem Dynamics: 'As the uploaded files on GenBank are merely quality-checked and are rarely updated, the database contains errors and species could until recently be found under multiple names due to taxonomic changes. In a recent update, GenBank now merges taxonomic synonyms, preventing unwary data users to artificially bias species richness. However, there remains the issue of identifying and removing erroneous records that can lead to biased research conclusions.’
Using amphibians and a single gene (cytochrome b) as a case study, the researchers found that of the 39,000 available records, 13% of those records were taxonomically outdated. After these corrections, the potential errors were identified, of which> 2,300 records (6%) were found to be incorrect; mainly because of animals that were incorrectly identified. The researchers have made their results and automated methodology (in R programming language) freely accessible, so that it can be used by other researchers to examine data from their own study species.
Identifying incorrect records is important to better understand the global species richness of amphibians and to identify unknown species. ‘This is especially important because amphibians are the most endangered group of vertebrates, with many species close to extinction due to the spread of deadly fungal diseases, climate change and habitat destruction,’ emphasizes van den Burg.
Matthijs P. van den Burg, Salvador Herrando-Pérez & David R. Vieites: ‘ACDC, a global database of amphibian cytochrome-b sequences using reproducible curation for GenBank records,’ in Scientific Data (2020). DOI: https://doi.org/10.1038/s41597-020-00598-9
Thijs van den Burg