NASA Dataset

The NASA dataset consists of a single large JSON file called code_projects.json detailing NASA’s various code projects. The inferred schema has a unique structure because of the nature of the file - it includes a single document with various attributes one of which is an array of subdocuments each representing a NASA’s code project.

NASA dataset

Initial Dataset Specifications

Entity Data Link Mapping

Code_projects

Entity	Data Link	Mapping
Code_projects	Data Link	Mapping _: { $schema: 0, agency: 1, version: 36, measurementType: 2 { method: 3 } releases: -4 { _index: 37, _value: 6 { date_AI_tags: 13, description: 14, homepageURL: 15, laborHours: 16, local-id: 17, name: 18, organization: 19, repositoryURL: 28, service_version: 29, tags: -33 { _index: 38 _value: 35 } sti_keywords_passed_thresholds: -30 { _index: 39 _value: 32 } contact: 7 { email: 8, name: 9, phone: 10 } date: 11 { metadataLastUpdated: 12 } permissions: 20 { exemptionText: 21, usageType: 27, licenses: -22 { _index: 40, _value: 24 { URL: 25, name: 26 } } } } } }

Data Link

Mapping

_: {
    $schema: 0,
    agency: 1,
    version: 36,
    measurementType: 2 {
        method: 3
    }
    releases: -4 {
        _index: 37,
        _value: 6 {
            date_AI_tags: 13,
            description: 14,
            homepageURL: 15,
            laborHours: 16,
            local-id: 17,
            name: 18,
            organization: 19,
            repositoryURL: 28,
            service_version: 29,
            tags: -33 {
                _index: 38
                _value: 35
            }
            sti_keywords_passed_thresholds: -30 {
                _index: 39
                _value: 32
            }
            contact: 7 {
                email: 8,
                name: 9,
                phone: 10
            }
            date: 11 {
                metadataLastUpdated: 12
            }
            permissions: 20 {
                exemptionText: 21,
                usageType: 27,
                licenses: -22 {
                    _index: 40,
                    _value: 24 {
                        URL: 25,
                        name: 26
                    }
                }
            }
        }
    }
}

Generated Dataset Specifications

Case A: Transforming NASA Dataset into Relational Tables in PostgreSQL

Extract relevant fields from the JSON and convert them into a relational schema, for example a project and tags schemas. The relational structure reduces data redundancy by storing tags separately from project details, ensuring a more compact and maintainable database. PostgreSQL’s indexing and query optimization allow efficient retrieval of project details based on tags or vice versa, even as the dataset grows. Updates to project information or tags are straightforward and won’t impact unrelated data. This approach makes the NASA dataset structured, easier to query, and better suited for integration with other relational data sources.

Entity	Output Mapping
Project	Output Mapping `code_projects.json: { local-id: -4.6.17, name: -4.6.18, description: -4.6.14, repositoryURL: -4.6.28 }`
Tags	Output Mapping `code_projects.json: { local-id: -4.6.17, _value: -4.6.-33.35 }`

Note: While it is technically possible to define a mapping that creates realtion tables from an array within a JSON file, it is currently not possible to generate the transformed data. This limitation arises because the Transformation modules in MM-cat are not yet equipped to handle array objects effectively. Therefore, we do not provide the transformed dataset at this stage. Enhancements to the Transformation modules to address this limitation are planned for future development.