Genome sequencing has enormous potential for personalized medicine, mitigating the next pandemic, and more. But it can’t live up to that potential if researchers don’t have a standardized way to communicate and share genome data analysis methods.
Today, there are dozens of platforms, scripts, and tools for analyzing genome data, used by tens of thousands of researchers worldwide—an abundance that reflects the burgeoning nature of the genomics field. But it also creates barriers to exchanging all of the key information that other researchers, as well as regulators such as the FDA, need to understand the results and replicate the tests.
This lack of “information interoperability” undermines their ability to quickly and effectively respond to emerging pandemics such as COVID-19. Even during normal times, it creates unnecessary delays and expenses every step of the way, from drug discovery to treatment delivery.
In a 2016 Nature article, a majority of the researchers surveyed feel that experiments published in scientific literature are often not reproducible. Part of that could be due to lack of enough information.
Collecting and Communicating Key Information
To overcome these challenges, the IEEE created the P2791 BioCompute Working Group (BCOWG), which led to the May 2020 publication of “IEEE 2791™-2020 – IEEE Standard for Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication.” The standard provides a framework for accurately and securely communicating bioinformatics protocols to facilitate bioinformatic data analysis exchanges between regulatory agencies, pharmaceutical companies, bioinformatics platform providers, and researchers. IEEE 2791-2020 also defines the assurance program for evaluating and certifying products against those requirements.
Wi-Fi might be a helpful analogy for understanding how IEEE 2791-2020 works and its benefits. Based on the IEEE 802.11™ standard, Wi-Fi creates a framework that enables a wide variety of devices from different vendors to communicate with one another. This standardized interoperability frees users to focus on what they’re communicating instead of the nuts and bolts of the communication process. Users also don’t have to worry about whether their favorite software, such as FaceTime or Skype, will work over Wi-Fi.
IEEE 2791-2020 provides clinicians, researchers, and others with the same freedom and flexibility to focus on their work. For example, they can continue to use their preferred bioinformatics platforms like Wi-Fi, because IEEE 2791-2020 doesn’t care about operating platforms or programming languages.
The standard also supports any security/privacy platforms and best practices that an organization has. So clinical trial data can use the protocols and policies necessary to maintain HIPAA compliance, while government-funded data sets can be completely open access to facilitate sharing between multiple organizations.
Less Clarification Enables More Effective Collaboration
Regulators and researchers use the BCO to get all of the information they need to repeat an entire bioinformatics pipeline, from input/raw data to result, along with metadata to identify provenance and usage, and with the ability to ask very targeted questions along the way. As a result, IEEE 2791-2020 eliminates the traditional gauntlet of clarification questions and answers, which is the direct result of the ad hoc recordkeeping that’s common among researchers.
For example, if you’re an academic researcher, think about all the times you’ve started to write a paper and realized that you didn’t write down every detail that your peers will need to analyze and replicate or reproduce your work. IEEE 2791-2020 eliminates those omissions and frustrations by providing a framework to automatically capture all of those details. That streamlines the process of getting new drugs out of the lab, through regulators, and on to patients.
In a recent email thread people were discussing reference SARS-CoV-2 genomes for mapping mutations. Different reference genomes can give different results. Similarly, different software parameter settings will produce a different list of mutations even with the same reference genome. Therefore, it is critical that all results are tied to details about the analysis so that they can be compared. Using IEEE 2791-2020 can eliminate guesswork about how the results were generated, thereby allowing scientists to rapidly compare and contrast their findings.
It is very important that scientists who are at the forefront of fighting this virus to at least take a look at the IEEE standard. This is an opportunity for them to try to understand the important things that need to be captured when they’re doing their bioinformatic analysis to make it useful to others.
How to Get Started
When the next pandemic emerges, IEEE 2791-2020 will be critical for enabling thousands of researchers worldwide to collaborate quickly and effectively. By recording and sharing their analyses as BCOs, they can streamline their work during normal times while gaining valuable hands-on experience with bioinformatics that will help them combat future pandemics.
The easiest way to get started is by visiting www.biocomputeobject.org, which includes publicly available resources developed by the BioCompute community. For example, the BCO Editor provides a form-based tool for generating BCOs, along with a database of created BCOs. The site also has software packages for validating and exporting BCOs, and links to BCO-capable platforms, such as the Cancer Genomics Cloud.