The QC Tool add-on “Data Dictionary Designer” will generate a dbGaP-formatted data dictionary template that you can edit.

FAIRlyz Data Dictionary Designer

  • Focus: Generate or edit a data dictionary to comply with the dbGaP and dbGaPCheckup formats
  • Adds and supports these fields: VARNAME, VARDESC, UNITS, TYPE, VALUES
  • Encoded Values: For TYPE=encoded value, it expects a list of codes, each code followed by an equal sign and the description of the code. All values in the VALUES column follow the VALUE=MEANING format (e.g., 0=Yes, 1=No).
  • For Non-encoded Values: The Values field is left empty. NA indicates missing values but is not required.
  • Missing information: Missing information is marked with a question mark “?” and an orange background. 

See handling of encoded values in this guide.

Steps

  1. Make sure the Data File that you chose earlier is the correct one. You can either use that file or choose a new one.
  2. Add a name for the data dictionary. Choose a name without spaces.
  3. Select the “OpenAI Guessing” checkbox if your codes contain text abbreviations that AI may be able to guess, e.g. f: for female and m: for male. Your subscription needs to have OpenAI prepaid funds available.
  4. Select the “Copy Text From Codes” checkbox if your data already contains text with the full meaning of encoded values that you want copied to replace the “?”, e.g. female: for female, or male: for male.
  5. Click “Create Data Dictionary”.
  6. You may select or deselect the check-boxes and then click “Create Data Dictionary” again to understand their functionality.
  7. You will see a table with data dictionary information that requires that you edit the encoded values and the descriptions of the columns wherever there is a “?” symbol with orange background.
  8. Inspect the rows which correspond to the columns in the data file.
  9. You may correct the Type of the column if the program guessed incorrectly. Example: Age is a non-encoded numeric column, but Gender (male, female, other) is a column with encoded values. In this version of the tool, encoded values that use a terminology like ICD10 are not counted as “encoded” as they do not require manual editing of the data dictionary. Test with the test_data.csv data file found here and compare it to the data dictionary file in that same location.
  1. Add Units for columns whose values are ambiguous like “age” which can be in years or weeks for infants.
  2. After you edit the data dictionary cells with “?”, the cells’ background will turn from orange to white.
  1. Click the Save button to save the data dictionary file. It will be saved in the same directory where the QC tool was installed.
  2. Click the back button to use the file in the QC-App.