MM-cat DaRe
Documentation GitHub Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Yelp Dataset

The Yelp dataset includes business information, including reviews, user data, check-ins, and business attributes, offering a view of consumer interactions and feedback.

It is structured in JSON format, with each file containing distinct data types. The files are the following:

  • business.json with business details
  • user.json with user profiles
  • review.json with user reviews on businesses
  • tip.json with advice or comments left by users about businesses
  • checkin.json with check-in activity for businesses

Yelp dataset

Initial Dataset Specifications

Entity Data Link Mapping
Business
Mapping
User
Mapping
Review
Mapping
Tip
Mapping
Checkin
Mapping

Generated Dataset Specifications

Case A: Transforming Yelp Datasets into PostgreSQL Tables

JSON files from the Yelp Datasets are transformed into structured PostgreSQL tables. Since PostgreSQL does not support nested or complex structures in its relational schema, the new mapping does not contain them. Notice, for example, the flat structure of Business and how some of the attributes were added with a composite signature, or the creation of a new kind Friends.

PostgreSQL offers several advantages for structured datasets like the Yelp Datasets. It particularly has the ability to enforce strict data integrity through constraints like primary keys and foreign keys. Furthermore, its scalability and performance allow it to handle large datasets and execute complex queries efficiently.

By storing the transformed Yelp data in PostgreSQL, the dataset becomes easier to query and analyze, providing a solid framework for extracting meaningful insights and supporting further research or development.

Entity Output Mapping
Business
Output Mapping
User
Output Mapping
Friends
Output Mapping
Review
Output Mapping
Tip
Output Mapping
Checkin
Output Mapping

Generated Data Manipulation Language (DML) Commands:

Commands Link

Case B: Embedding Tips into Business Data in MongoDB

The Yelp Business data is enriched by embedding related “tips” directly within each business as an object. Each business document in the resulting dataset contains an array of tips. This enriched structure is stored as a MongoDB collection.

By structuring the data this way, querying becomes more efficient, as all the information about a business, including its tips, can be retrieved in a single query. Additionally, MongoDB’s ability to handle nested data makes it easy to access, filter, and manipulate embedded arrays, such as searching for businesses with specific tips.

This approach also reflects a more intuitive organization of the data, mirroring real-world relationships. A business inherently “owns” its tips, making embedding a logical and natural choice.

Entity Output Mapping
Business
Output Mapping

Generated Data Manipulation Language (DML) Commands:

Commands Link

Case C: Transforming Tips into a CSV File

The Yelp Tip data is transformed into a flat CSV file. Each row in the CSV file represents an individual tip. This straightforward structure provides a compact and highly portable format that can be easily used across a variety of tools and workflows.

Entity Output Mapping
Business
Output Mapping

Generated File:

File Link