Skip to main content

Indexing Hierarchical Data

Hierarchical data

One significant advantage of document databases is their tendency not to impose limits on data structuring. Hierarchical data structures exemplify this quality well; for example, consider the commonly used comment thread, implemented using objects such as:

public class BlogPost
{
public string Author { get; set; }
public string Title { get; set; }
public string Text { get; set; }

// Blog post readers can leave comments
public List<BlogPostComment> Comments { get; set; }
}

public class BlogPostComment
{
public string Author { get; set; }
public string Text { get; set; }

// Allow nested comments, enabling replies to existing comments
public List<BlogPostComment> Comments { get; set; }
}

Readers of a post created using the above BlogPost structure can add BlogPostComment entries to the post's Comments field, and readers of these comments can reply with comments of their own, creating a recursive hierarchical structure.

For example, the following document, BlogPosts/1-A, represents a blog post by John that contains multiple layers of comments from various authors.

BlogPosts/1-A:

{
"Author": "John",
"Title": "Post title..",
"Text": "Post text..",
"Comments": [
{
"Author": "Moon",
"Text": "Comment text..",
"Comments": [
{
"Author": "Bob",
"Text": "Comment text.."
},
{
"Author": "Adel",
"Text": "Comment text..",
"Comments": {
"Author": "Moon",
"Text": "Comment text.."
}
}
]
}
],
"@metadata": {
"@collection": "BlogPosts"
}
}

Index hierarchical data

To index the elements of a hierarchical structure like the one above, use RavenDB's Recurse method.

The sample index below shows how to use Recurse to traverse the comments in the post thread and index them by their authors. We can then query the index for all blog posts that contain comments by specific authors.

public class BlogPosts_ByCommentAuthor : 
AbstractIndexCreationTask<BlogPost, BlogPosts_ByCommentAuthor.IndexEntry>
{
public class IndexEntry
{
public IEnumerable<string> Authors { get; set; }
}

public BlogPosts_ByCommentAuthor()
{
Map = blogposts =>
from blogpost in blogposts
let authors = Recurse(blogpost, x => x.Comments)
select new IndexEntry
{
Authors = authors.Select(x => x.Author)
};
}
}

Query the index

The index can be queried for all blog posts that contain comments made by specific authors.

Query the index using code:

List<BlogPost> results = session
.Query<BlogPosts_ByCommentAuthor.IndexEntry, BlogPosts_ByCommentAuthor>()
// Query for all blog posts that contain comments by 'Moon':
.Where(x => x.Authors.Any(a => a == "Moon"))
.OfType<BlogPost>()
.ToList();

Query the index using Studio:

  • Query the index from the Studio's List of Indexes view:

    &quot;List of Indexes view&quot;

  • View the query results in the Query view:

    &quot;Query View&quot;

  • View the list of terms indexed by the Recurse method:

    &quot;Click to View Index Terms&quot;

    &quot;Index Terms&quot;