Skip to main content

Indexing Hierarchical Data

One of the greatest advantages of a document database is that we have very few limits on how we structure our data. One very common scenario is the usage of hierarchical data structures. The most trivial of them is the comment thread:

private class BlogPost
{
public string Author { get; set; }
public string Title { get; set; }
public string Text { get; set; }

// Blog post readers can leave comments
public List<BlogPostComment> Comments { get; set; }
}

public class BlogPostComment
{
public string Author { get; set; }
public string Text { get; set; }

// Comments can be left recursively
public List<BlogPostComment> Comments { get; set; }
}

While it is very easy to work with such a structure in all respects, it does bring up an interesting question, namely how can we search for all blog posts that were commented by specified author?

The answer to that is that RavenDB contains built-in support for indexing hierarchies, and you can take advantage of the Recurse method to define an index using the following syntax:

public class BlogPosts_ByCommentAuthor : AbstractIndexCreationTask<BlogPost>
{
public class Result
{
public string[] Authors { get; set; }
}

public BlogPosts_ByCommentAuthor()
{
Map = posts => from post in posts
select new
{
Authors = Recurse(post, x => x.Comments).Select(x => x.Author)
};
}
}

This will index all the comments in the thread, regardless of their location in the hierarchy.

IList<BlogPost> results = session
.Query<BlogPosts_ByCommentAuthor.Result, BlogPosts_ByCommentAuthor>()
.Where(x => x.Authors.Any(a => a == "Ayende Rahien"))
.OfType<BlogPost>()
.ToList();