The purpose of this Notice of Special Interest (NOSI) is to inform the scientific community of the interest of NIGMS, NLM, and ODSS in supporting efficiency optimization and cost reduction for Sequence Read Archive (SRA) data storage and utilization.
The SRA, hosted by the National Center for Biotechnology Information (NCBI) at NLM, contains a broad collection of raw DNA and RNA sequence data and alignment information that continues to grow exponentially. SRA supports several scientific use cases, such as replication of a published study, data analysis to identify genetic variation, metagenomic profiling, expression analysis, and pathogen identification. Committed to store, preserve, and make the SRA available to the community, NIH recognizes the need to reduce the cost of SRA data storage and identify solutions for efficient SRA data storage, retrieval, and analysis. This NOSI encourages grant applications focused on efficiency optimization and cost reduction for SRA data storage and utilization. NIH is particularly interested in identifying efficient solutions for SRA data compression and representation while still meeting the needs of a range of use cases, including but not limited to:
- Assessment of current SRA data formats and proposed future refinements and utilization costs for scientific use cases
- Development of novel SRA data compression formats, methods, data representation strategies, and utility assessment for different use cases
- Development of tools that utilize current SRA compressed data formats to enable their use by or integration with existing bioinformatics tools
- Development of compressed and/or lower-cost SRA data formats to meet the needs of a range of use cases