The QC Tool add-on “Data Dictionary Designer” will generate a dbGaP-formatted data dictionary template that you can edit.
FAIRlyz Data Dictionary Designer
- Focus: Generate or edit a data dictionary to comply with the dbGaP and dbGaPCheckup formats
- Adds and supports these fields: VARNAME, VARDESC, UNITS, TYPE, VALUES
- Encoded Values: For TYPE=encoded value, it expects a list of codes, each code followed by an equal sign and the description of the code. All values in the VALUES column follow the VALUE=MEANING format (e.g., 0=Yes, 1=No).
- For Non-encoded Values: The Values field is left empty. NA indicates missing values but is not required.
- Missing information: Missing information is marked with a question mark “?” and an orange background.
See handling of encoded values in this guide.
Steps
- Make sure the Data File that you chose earlier is the correct one. You can either use that file or choose a new one.
- Add a name for the data dictionary. Choose a name without spaces.
- Select the “OpenAI Guessing” checkbox if your codes contain text abbreviations that AI may be able to guess, e.g. f: for female and m: for male. Your subscription needs to have OpenAI prepaid funds available.
- Select the “Copy Text From Codes” checkbox if your data already contains text with the full meaning of encoded values that you want copied to replace the “?”, e.g. female: for female, or male: for male.
- Click “Create Data Dictionary”.
- You may select or deselect the check-boxes and then click “Create Data Dictionary” again to understand their functionality.
- You will see a table with data dictionary information that requires that you edit the encoded values and the descriptions of the columns wherever there is a “?” symbol with orange background.
- Inspect the rows which correspond to the columns in the data file.
- You may correct the Type of the column if the program guessed incorrectly. Example: Age is a non-encoded numeric column, but Gender (male, female, other) is a column with encoded values. In this version of the tool, encoded values that use a terminology like ICD10 are not counted as “encoded” as they do not require manual editing of the data dictionary. Test with the test_data.csv data file found here and compare it to the data dictionary file in that same location.

- Add Units for columns whose values are ambiguous like “age” which can be in years or weeks for infants.
- After you edit the data dictionary cells with “?”, the cells’ background will turn from orange to white.

- Click the Save button to save the data dictionary file. It will be saved in the same directory where the QC tool was installed.
- Click the back button to use the file in the QC-App.
