Semantic Data Dictionary project description goes here
Goals and Guidelines
The aim of the Semantic Data Dictionary approach is to annotate datasets such that it is machine readable, uses best practice ontologies, and follows FAIR Guiding Principals.
FAIR Guidelines
These protocols try to provide rules for data management using four main conditions:
- Findable - To begin, the knowledge representation the we are creating must be findable on the Web, and we accomplish this using unique persistant identifiers and web searchable metadata.
- Accessible - Accessibility is defined by the ease of access for a user of the Semantic Data Dictionary and is enabled by lack of restrictions on those who can use the software, documentation that allows users to understand how to most effectively implement the tool, and the persistance of the metadata that remains on the web after the dataset itself is gone.
- Interoperable - The data in our SDD should be usable in tandem with other technologies and applications. We achieve this by using formal vocabularies and best practice ontologies that are understood if not used by many others in the field.
- Reusable - The SDD approach best embodies this principal, as the fundemental goal of both is to make the reuse of data as seamless as possible. The SDD provides an organization for the data, and effectively stores it as well as its metadata in a well-documented knowledge graph format. Thus it facilitates the an easily understandable access of the data for future application.
Two other metrics that are related are Reproducibility and Transparency. Since our program is well documented, openly availible, and creates a knowledge representation that can be independently produced by outside parties, we argue that the SDD approach meets those two standards as well.
Getting Started
These instructions will let get you started on creating your own Semantic Data Dictionaries.
Prerequisites
There are several required prerequisites to using the sdd2rdf.py script detailed here.
Built With
- Python - The programming language used
- pandas - File reading, parsing, and writing
- configparser - Configuration handling
- rdflib - Semantics
Tutorial
A step by step series of instructions and examples for installing the required libraries, creating an appropriate directory structure can be found here.
Running the script
The sdd2rdf.py script can be run using python by specifying a configuration file associated with a project.
python sdd2rdf.py ExampleProject/config/config.ini.example
Querying and Testing
Applications
Software applications that can be used to interpret Semantic Data Dictionaries
Additional notes about how to use Semantic Data Dictionaries on a live system can be found here.
In Use
- NIH CHEAR through Mount Sinai School of Medicine
- Gates project through RPI CASE
- CBIS Experiment through RPI CASE
- United Nations’ ELM through Yale’s CEA
- Big Data Ceara through UNIFOR
- RPI TWC encoding of NHANES
- Brazil’s Global Burden of Disease through UFMG
- RPI HEALS encoding of SEER and CIVIC for Breast Cancer Staging
- RPI HEALS encoding of Medical Information Mart for Intensive Care (MIMIC) III
- RPI HEALS encoding of Synthea Synthetic Data
- RPI HEALS encoding of USDA Food Data
Contributing
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.
Please note we have a code of conduct, please follow it in all your interactions with the project.
Pull Request Process
- Ensure any install or build dependencies are removed before the end of the layer when doing a build.
- Update the README.md with details of changes to the interface, this includes new environment variables, exposed ports, useful file locations and container parameters.
- Increase the version numbers in any examples files and the README.md to the new version that this Pull Request would represent.
- You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.
Code of Conduct
Our Pledge
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
Our Standards
Examples of behavior that contributes to creating a positive environment include:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
- The use of sexualized language or imagery and unwelcome sexual attention or advances
- Trolling, insulting/derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others’ private information, such as a physical or electronic address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a professional setting
Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
Scope
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at [tetherless@cs.rpi.edu]. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project’s leadership.
Attribution
This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at http://contributor-covenant.org/version/1/4
This adapted version was retrieved from PurpleBooth
Versioning
We use … for versioning. For the versions available, see the sdd2rdf.
License
This project is licensed under the MIT License - see the LICENSE.md file for details
Authors
- Sabbir M. Rashid - Graduate Student - Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
- James P. McCusker - PhD - Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
- Paulo Pinheiro - PhD - Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
- Marcello P. Bax - PhD - Universidade Federal de Minas Gerais, Belo Horizonte - MG, 31270-901, BR
- Henrique O. Santos - PhD - Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
- Jeanette A. Stingone - PhD - Columbia University Irving Medical Center, New York, NY, 10032, USA
- Deborah L. McGuinness - PhD - Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
See also the list of contributors who participated in this project below.
Acknowledgments
This work is supported by
- The National Institute of Environmental Health Sciences (NIEHS) Award 0255-0236-4609 / 1U2CES026555-01
- IBM Research AI through the AI Horizons Network
- The Gates Foundation through the Healthy Birth, Growth, and Development knowledge integration (HBGDki) project
- The CAPES Foundation Senior Internship Program Award 88881.120772 / 2016-01
We acknowledge the members of the Tetherless World Constellation (TWC) as well as the members of the Institute for Data Exploration and Applications (IDEA) at Rensellaer Polytechnic Institute (RPI) for their contributions, including
- John Erickson
- Kristin Bennett
- Jason Liang
- Yue (Robin) Liu
- Katherine Chastain
- Rebecca Cowan
- Oshani Seneviratne
- Ishita Padhiar