Designing MongoDB Schema for Developers Familiar with SQL

If you've been working extensively with SQL databases, transitioning to MongoDB, a NoSQL database, can feel like a paradigm shift. However, understanding the nuances and benefits of MongoDB's flexible schema design can significantly enhance your data management capabilities. This guide aims to help SQL developers adapt to MongoDB schema design by comparing relational database principles with MongoDB's document-oriented data model.


Introduction to MongoDB for SQL Developers

Before diving into schema design, let's review some basic terminology translations to ease the transition from SQL to MongoDB:


SQL MongoDB Database Database Table Collection Row Document Column Field Index Index Joins Embed Document/DBRef

While these mappings provide a basic understanding, it's crucial to realize that MongoDB, unlike SQL databases, allows a more flexible and dynamic schema. Each document in a collection can have a different structure, which means you need to rethink how you model your data.


Schema Design: Moving Beyond Relational Concepts

Unlike relational databases where schema design involves creating tables with predefined rows and foreign keys, MongoDB requires upfront planning of your document schema to enable efficient data access. Below, we'll explore several types of relationships commonly managed in relational databases and how to implement them efficiently in MongoDB using an online bookstore as an example.


One-to-One Relationships

Embedding vs. Linking

In a one-to-one relationship, such as a user having a single address, there are two main approaches: embedding and linking.


Embedding

Embedding is straightforward and often more efficient for retrieval since the related data is stored within the same document.



{
"name": "Peter Wilkinson",
"age": 27,
"address": {
"street": "100 some road",
"city": "Nevermore"
}
}

Linking

Linking uses separate documents and references to represent relationships, which can be more flexible for certain use cases.



{
"_id": 1,
"name": "Peter Wilkinson",
"age": 27
}

{
"_id": 1,
"user_id": 1,
"street": "100 some road",
"city": "Nevermore"
}

Though embedding provides quick access when querying by user, linking is useful when the relationship needs to be accessed independently.


One-to-Many Relationships

Embedding, Linking, and Bucketing

Consider a book with multiple comments, where each comment is linked to a single book. There are three primary strategies:


Embedding

This approach embeds comments directly within the book document for fast access.



{
"title": "Great Expectations",
"description": "This is an awesome book by Charles Dickens",
"comments": [{
"name": "Peter Critic",
"created_on": "2019-01-01T10:01:22Z",
"comment": "Not so awesome book"
}, {
"name": "John Lenient",
"created_on": "2018-12-01T11:01:22Z",
"comment": "Awesome book"
}]
}

Linking

Linking separates comments into their own documents, which helps manage large numbers of comments through pagination.



{
"_id": 1,
"title": "Great Expectations",
"description": "This is an awesome book by Charles Dickens"
}

{
"book_id": 1,
"name": "Peter Critic",
"created_on": "2019-01-01T10:01:22Z",
"comment": "Not so awesome book"
}

{
"book_id": 1,
"name": "John Lenient",
"created_on": "2018-12-01T11:01:22Z",
"comment": "Awesome book"
}

Bucketing

Combining the strengths of embedding and linking, bucketing stores comments in smaller groups within the book document.



{
"_id": 1,
"title": "Great Expectations",
"description": "This is an awesome book by Charles Dickens"
}

{
"book_id": 1,
"page": 1,
"count": 50,
"comments": [{
"name": "Peter Critic",
"created_on": "2019-01-01T10:01:22Z",
"comment": "Not so awesome book"
}]
}

{
"book_id": 1,
"page": 2,
"count": 1,
"comments": [{
"name": "John Lenient",
"created_on": "2018-12-01T11:01:22Z",
"comment": "Awesome book"
}]
}

Bucketing allows efficient pagination with reduced read operations, offering a balanced approach for large datasets.


Many-to-Many Relationships

Embedding Strategies

In many-to-many relationships, such as multiple authors writing many books, two main embedding strategies can be applied: Two-Way Embedding and One-Way Embedding.


Two-Way Embedding

This approach embeds references to related documents on both sides of the relationship.



{
"_id": 1,
"name": "Peter Stanford",
"books": [1, 2]
}

{
"_id": 2,
"name": "George Peterson",
"books": [2]
}

{
"_id": 1,
"title": "A Tale of Two People",
"categories": ["drama"],
"authors": [1, 2]
}

{
"_id": 2,
"title": "A Tale of Two Space Ships",
"categories": ["sci-fi"],
"authors": [1]
}

This method is ideal for scenarios where the number of relationships is relatively small and manageable.


One-Way Embedding

For uneven many-to-many relationships like books and categories, where a category has numerous books, embedding references only in the book document is more efficient.



{
"_id": 1,
"name": "drama"
}

{
"_id": 1,
"title": "A Tale of Two People",
"categories": [1],
"authors": [1, 2]
}

This strategy optimizes read performance by reducing the size of the frequently queried document.


Querying Many-to-Many Relationships

Using the previous examples, here's how you would fetch data from these relationships:


Fetch Books by a Specific Author


var db = db.getSisterDB("library");
var booksCollection = db.books;
var authorsCollection = db.authors;

var author = authorsCollection.findOne({name: "Peter Stanford"});
var books = booksCollection.find({_id: {$in: author.books}}).toArray();

Fetch Authors by a Specific Book


var db = db.getSisterDB("library");
var booksCollection = db.books;
var authorsCollection = db.authors;

var book = booksCollection.findOne({title: "A Tale of Two Space Ships"});
var authors = authorsCollection.find({_id: {$in: book.authors}}).toArray();

These queries allow you to efficiently fetch related data in a many-to-many setup using MongoDB's querying capabilities.


Aggregation Framework: MongoDB’s Answer to SQL Queries

For processing data in MongoDB, the Aggregation Framework is used, offering functionalities akin to SQL operations:


SQL Aggregation Framework WHERE / HAVING $match GROUP BY $group SELECT $project ORDER BY $sort LIMIT $limit sum() / count() $sum join $lookup

MongoDB's Aggregation Framework allows extensive data manipulation and transformation. It includes stages such as $match for filtering, $group for grouping, and $lookup for performing left outer joins, thereby mimicking SQL queries in a NoSQL environment.


Aggregation Example

Here's an example of an aggregation pipeline to find the most popular books:



db.books.aggregate([
{ $unwind: "$comments" },
{ $group: { _id: "$_id", title: { $first: "$title" }, commentCount: { $sum: 1 } }},
{ $sort: { commentCount: -1 }},
{ $limit: 5 }
]);

This query expands each book document to multiple documents (one per comment), groups them by book ID, counts the comments, and sorts them in descending order to find the top 5 books with the most comments.


Conclusion

Migrating from SQL to MongoDB involves more than just understanding different terminology; it requires a shift in how you model and interact with your data. By leveraging MongoDB's flexible schema, you can design efficient and scalable applications that take full advantage of its document-based storage.


If you're interested in more details about transitioning from a relational database to a NoSQL database, the official MongoDB Documentation is a valuable resource.


Ready to dive deeper into MongoDB? Don't forget to check out our related articles on MongoDB Aggregation and Data Modeling Best Practices.


Happy coding in the world of MongoDB!