The Practice of Licensing Biomedical data

There are several reasons why a researcher might choose to license their data using permissive licenses like MIT, Apache 2.0, or LGPL including increased accessibility and collaboration.

Benefits

Benefits for the scientific community:

  • Wider Use: Permissive licenses make data freely available for anyone to access, use, modify, and redistribute. This fosters collaboration, innovation, and accelerates research progress. More researchers can use the data to build upon existing findings or explore new avenues.
  • Reproducibility: Open data allows other researchers to verify and reproduce the original research findings. This transparency strengthens the research process and builds trust in scientific results.

Benefits for the Researcher:

  • Citation and Recognition: By making data openly available, researchers can encourage others to use it, potentially leading to more citations for their work and increased visibility in their field.
  • Faster Advancement: Open data allows others to build upon their work and potentially solve research problems faster. This can indirectly benefit the original researcher by accelerating progress in their field of study.

Alignment with Funding Agencies:

  • Many funding agencies (e.g., National Institutes of Health in the US) are increasingly mandating that research data be openly accessible. Choosing a permissive license ensures compliance with these mandates.

Downsides to Consider:

  • Loss of Control: Permissive licenses give users a lot of freedom, and researchers might lose some control over how their data is used.
  • Potential for Misuse: There’s a slight risk that data could be misused, but the benefits of open access often outweigh this concern.

Overall, researchers often choose permissive licenses to promote collaboration, transparency, and accelerate scientific progress. While there are some potential downsides, the benefits of open data sharing are widely recognized in the research community.

The Creative Commons License Chooser aids in selecting a license for biomedical data sharing.

The list below shows some organizations, initiatives, and repositories that license data (including source code) using permissive licenses like MIT, Apache 2.0, or LGPL:

Open-source Communities and Foundations

  • GitHub: A popular platform for hosting open-source projects. Many projects on GitHub license their code and data using permissive licenses. Users can filter search results based on license type.
  • SourceForge: Another platform for open-source projects, offering various licenses for code and data.
  • Apache Software Foundation (ASF): Hosts a large collection of open-source software projects, many of which use Apache 2.0 licenses. Users might also find datasets associated with these projects.
  • Linux Foundation: Promotes open-source software development, including projects with permissive data licensing.

Government and Public Data Initiatives:

  • Data.gov: The U.S. government’s open data portal provides access to various datasets with permissive licenses.
  • European Open Data Portal: Similar to Data.gov, this portal offers datasets from European Union member states, often with open licenses.
  • World Bank Open Data: The World Bank publishes various datasets related to development, poverty, and economics, frequently licensed under Creative Commons licenses (similar in spirit to MIT/Apache 2.0).

Scientific Data Repositories:

Additional Tips

Here are some additional tips for finding data with permissive licenses:

  • Look for the license information: Most reputable data repositories will clearly state the license associated with the data.
  • Search by license: Many repositories allow filtering by license type.
  • Look for keywords like “open data” or “open access”: These terms often indicate that the data might be available under a permissive license.

It is crucial to always check the specific license terms of any data downloaded from a data repository. Even permissive licenses like MIT or Apache 2.0 might have certain requirements regarding attribution or redistribution.