POSEIDON: Usage In Grants

Grant Boiler Plate:

Precision Oncology Software Environment Interoperable Data Ontologies Network (POSEIDON), powered by the DNAnexus Platform, provides a cloud-based (AWS) data management and analytics platform enabling the integrated analyses of oncology phenotypes with multi-omics data using standardized analytical pipelines. POSEIDON also provides the flexibility for researchers to explore new methods while preserving data provenance in an auditable platform within a HIPAA-compliant, ISO27001-certified, FedRamp Moderate security environment.  POSEIDON is accessible to City of Hope (COH) researchers, clinicians, data scientists and bioinformaticists to transform data into information and knowledge using a scalable platform, for storage and compute that supports democratizing heterogeneous data across COH while facilitating internal and external collaborations.  POSEIDON has been designed, through a strong partnership between COH and DNAnexus, to be the data hub that will allow COH to manage its molecular and associated phenotypic data, creating the data foundation to accelerate translational research and the discovery of novel precision-oncology therapies.  Toward these goals, POSEIDON serves as the central data hub to support multi-institutional funded projects, led by COH researcher, promoting simplified data sharing and more meaningful collaborations.  Technological hurdles such as controlled access to institutional resources, scalable storage and compute resources, and challenges with data management between sites are solved by POSEIDON, providing secure access to a controlled centralized workspace that minimizes the risks to internal institutional resources for all collaborators.  Additionally, POSEIDON tracks the provenance of data processing, providing documented pipelines using JuypterLab Notebooks, workflow languages including WDL, CWL and applets and apps that can seamlessly be shared with internal and external researchers.   POSEIDON is licensed by COH (from DNAnexus) and no fees are charged for access to the platform.  Any costs associated with the use of POSEIDON, to the researchers or clinicians, are limited to the costs of data storage and compute on the platform.

Project Cost Sizing Template:

Rates:

  • Storage per GByte-month
    • Standard $0.0242
    • Archival $0.005
      • Dearchival $0.013
  • Compute per Core-Hour (subset for reference)
    • mem1_hdd2 $0.069
    • mem1_ssd1 $0.056
    • mem2_hdd2 $0.095
    • mem2_ssd1 $0.071
    • mem3_hdd2 $0.130
    • mem3_ssd1 $0.090
    • mem4_ssd1 $0.312­
    • … (please contact ri_help@coh.org for additional instance types)
  • Egress per GB: $0.13, (Ingress is Free)

Project Sample Cost

  • 5-year 100 patient whole exome study.
  • Assumptions:
  • 100GB per sample: Paired FASTQ 50GB, Bam/Bai 40GB, and VCF/Metrics/Misc 10GB
  • 2 rounds of FASTQ to VCF processing: ($10 per run)
  • 2 rounds of VCF annotation ($2.5 per run)
  • 10,000 hours of Juypter lab notebook time on mem4_ssd1 instance ($0.312 per hour)
  • 2 egress charges for all data (1x grant require egress, 1x long term storage egress, 0.2x buffer for graphs, figures, notebooks)

Formula:

Storage Cost

  • # of Samples x size GB x cost per GB per month x # of months
  • 100 x 100Gb x $0.0242/Gb month x 60 months = $14,520

Compute Cost

  • Fastq > VCF: # sample x cost per sample x rounds
  • Fastq > VCF: 100 x $10 per sample x 2 rounds = $2,000
  • VCF annotation: # sample x cost per sample x rounds
  • VCF annotation: 100 x $2.5 per sample x 2 rounds = $500
  • Jupyter Notebook: hours x cost of instance
  • Jupyter Notebook: 10,000 x $0.312 = $3,120

Egress Cost

  • # of Samples x size GB x egress cost per gb x 2.2 total egress
  • 100 x 100Gb x $0.13 per GB x 2.2 egress = $2,860

Total Cost:

  • Storage + Compute + Egress
  • $14,520 + $5,620 + $2,860 = $23,000

 

Project Sample for Data Repository

Assumptions

  • 5 year data repository of Genomic Raw Data
  • 100 Patients
  • 100GB per sample: Paired FASTQ 50GB, Bam/Bai 40GB, and VCF/Metrics/Misc 10GB
  • Small final data (vcf/csv/txt/notebooks) will be billed egress to grant (average size per patient of 250MB) estimated that data will be download 10,000 times.
  • Large data files (bam/fastq) will be billed egress to person/organization if they want to use it outside the platform
  • No compute will be done on the data in the repository

Formula:

Storage Cost

  • # of Samples x size GB x cost per GB per month x # of months
    • 100 x 100Gb x $0.0242/Gb month x 60 months = $14,520

Compute Cost

  • n/a

Egress Cost

  • # of Samples x size GB x egress cost per gb x 10000 total egress (vcf/csv/txt/notebooks)
    • 100 x .25 Gb x $0.13 per GB x 10000 egress = $32,500
  • # of Samples x size GB x egress cost per gb x 2 total egress ( 2 time egress to migrate all data)
    • 100 x 100Gb x $0.13 per GB x 2 egress = $2,600

Total Cost:

  • Storage + Compute + Egress
  • $14,520 + $0 + $35,100 = $49,620

 

FTE Cost for above projects: Initial setup and configuration of the repositories will be done as part of general support from Research Informatics. However, if informatics personal will be required help support or maintain these repositories for the duration of the repository, we request a budget of 20k per year to offset staff cost. If additional resources are required such as pipeline development or application development these would be incurred at an additional expense please contact us for an estimated staff cost.

 

Initial Setup: Require a PO for the total amount of the resources consumed and will be prepaid to a grant dedicated POSEIDON project. Monthly or Quarterly burn down usage rates can be provided.

 

These are all ballpark projections, please reach out to Research Informatics InformaticsHelp@coh.org to provide an analysis of the project scope to accurately determine funding scope.