NASA Dataset
The NASA dataset consists of a single large JSON file called code_projects.json
detailing NASA’s various code projects. The inferred schema has a unique structure because of the nature of the file - it includes a single document with various attributes one of which is an array of subdocuments each representing a NASA’s code project.
Entity | Data Link | Mapping |
---|---|---|
Code_projects |
Extract relevant fields from the JSON and convert them into a relational schema, for example a project and tags schemas. The relational structure reduces data redundancy by storing tags separately from project details, ensuring a more compact and maintainable database. PostgreSQL’s indexing and query optimization allow efficient retrieval of project details based on tags or vice versa, even as the dataset grows. Updates to project information or tags are straightforward and won’t impact unrelated data. This approach makes the NASA dataset structured, easier to query, and better suited for integration with other relational data sources.
Entity | Output Mapping |
---|---|
Project | |
Tags |
Note: While it is technically possible to define a mapping that creates realtion tables from an array within a JSON file, it is currently not possible to generate the transformed data. This limitation arises because the Transformation modules in MM-cat are not yet equipped to handle array objects effectively. Therefore, we do not provide the transformed dataset at this stage. Enhancements to the Transformation modules to address this limitation are planned for future development.