Skip to Main Content

Data Repositories

How to select a data repository for your research needs.

'Generalist' and 'Domain-specific' Repositories

'Generalist' and 'domain-specific' are two terms that are frequently used to categorize repositories.

Domain-specific repositories are repositories which store data from a specific subject or field. They may accept a limited number of data types or file formats, use specialized metadata and vocabulary, or otherwise restrict the data that can be submitted and accessed. Examples of domain-specific repositories include the Data and Specimen Hub (DASH)Gene Expression Omnibus (GEO), and Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC).

Generalist repositories will store data regardless of subject matter. Examples of generalist repositories include Figshare, Dryad, and Zenodo.

Choosing a Repository

The NIH has provided general guidance on selecting a repository for data storage or searching for data to reuse. You can review their recommendations here but they are also summarized below. Consider repositories with the following features:

  • Assigns unique persistent identifiers
  • Long-term sustainability
  • Curation and quality assurance services
  • Free and easy access
  • Allows broad and measured reuse
  • Provides clear use guidance
  • Security and integrity
  • Maintains confidentiality
  • Supports common file formats
  • Records data provenance (e.g., tracks data versions)
  • Documented retention policies

The NYU HSL Data Repository Finder was developed to help NYU Langone researchers identify suitable repositories to fulfill data sharing requirements. You may be prompted to log in with your Kerberos ID if you are working off-site or not connected to the institutional network. A public-facing option is the Network of the National Library of Medicine Data Repository Finder.

Human Subjects Research

Additional repository features to consider for data from human subjects include:

  • Fidelity to consent
    • Confirm that access and use of the data is documented and consistent with participants' consent.
  • Restricted use compliance
    • Look for documentation on practices for restricting and enforcing data use restrictions.
  • Privacy
    • Check the repository's privacy policies and practices for protecting human data from unauthorized access.
  • Plan for breach
    • Review documented security measures and the response plan for detected data breaches.
  • Download control
  • Procedures for violations
  • Request review
    • The ability to review data access requests through an established and transparent process.

Adapted from "Supplemental Information to the NIH Policy for Data Management and Sharing: Selecting a Repository for Data Resulting from NIH-Supported Research".