Welcome to FAIRlyz! You may register your study without data if you are starting a new study that has not generated data or has not yet identified secondary data for analysis.
Use the guide below, if you have data for curation and quality control (QC) and want to make sure you select FAIRlyz-supported data.
For a quick and easy start, we recommend using the sample dataset available in our FAQs under “What kind of files and scientific data are supported?” and then using similar datasets you own.
You may choose data that you already published, it is not a requirement that the data be private.
Clinical or Phenotype Study Data
FAIRlyz users may provide a phenotype dataset in dbGap format. You can review the dbGap format by looking at public data dictionaries in dbGaP.
- Required Fields: Include essential fields like variable names, descriptions, units, and encoded values. Use these column names: VARNAME, VARDESC, UNITS, VALUES.
- Optional Fields: TYPE, MAX, MIN
- Encoded Values: For TYPE=encoded value, provide a comma-separated list of codes, each code followed by an equal sign and the description of the code. All values in the VALUES column follow the VALUE=MEANING format (e.g., 0=Yes, 1=No).
Your files should be in a tabular format such as CSV or Excel. We support dbGaP style XML data dictionaries. Other XML formats are not currently supported. Download sample data for review.
The dbGaP review process is as follows:
- Go to the search page: https://www.ncbi.nlm.nih.gov/gap/advanced_search/
- Take a look, for example, at the first study, and click the “Study Page FTP” link: https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs003786/phs003786.v1.p1/
- Select “pheno_variable_summaries/” to review phenotype data
- Click on any XML data dictionary. Review the format: https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs003786/phs003786.v1.p1/pheno_variable_summaries/phs003786.v1.pht015182.v1.NHSII_Mind_Body_Subject_Phenotypes.data_dict.xml
- There are many other studies to choose from. Another study that has more data: https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs003071/phs003071.v1.p1/pheno_variable_summaries/phs003071.v1.pht012703.v1.POISED_Subject_Phenotypes.data_dict.xml
Omics Data
FAIRlyz’ quality control (QC) capabilities are currently designed for omics datasets originating from DNA sequencing. This encompasses common types like genomics, metagenomics, transcriptomics, and epigenomics. Proteomics, metabolomics, and omics datasets from non-DNA/RNA sequencing methods are not yet included in the QC workflow.
Omics data is often accompanied by clinical or phenotype data, described in the previous section. Therefore, a study may have both clinical/phenotype data and omics data.
We require a MultiQC.html report for sequence data. Raw omics data is not processed by QC. See this example MultiQC report. Other example reports can be downloaded and used for testing: https://seqera.io/multiqc/#reports.