Skip to main content

Indexing Nested data

Sample data

  • The examples in this article are based on the following Classes and Sample Data:
public class OnlineShop
{
public string ShopName { get; set; }
public string Email { get; set; }
public List<TShirt> TShirts { get; set; } // Nested data
}

public class TShirt
{
public string Color { get; set; }
public string Size { get; set; }
public string Logo { get; set; }
public decimal Price { get; set; }
public int Sold { get; set; }
}

Simple index - Single index-entry per document

  • The index:
public class Shops_ByTShirt_Simple : AbstractIndexCreationTask<OnlineShop>
{
public class IndexEntry
{
// The index-fields:
public IEnumerable<string> Colors { get; set; }
public IEnumerable<string> Sizes { get; set; }
public IEnumerable<string> Logos { get; set; }
}

public Shops_ByTShirt_Simple()
{
Map = shops => from shop in shops
// Creating a SINGLE index-entry per document:
select new IndexEntry
{
// Each index-field will hold a collection of nested values from the document
Colors = shop.TShirts.Select(x => x.Color),
Sizes = shop.TShirts.Select(x => x.Size),
Logos = shop.TShirts.Select(x => x.Logo)
};
}
}
  • The index-entries:

    Simple - index-entries

    1. The index-entries content is visible from the Studio Query view.

    2. Check option: Show raw index-entries instead of Matching documents.

    3. Each row represents an index-entry.
      The index has a single index-entry per document (3 entries in this example).

    4. The index-field contains a collection of ALL nested values from the document.
      e.g. The third index-entry has the following values in the Colors index-field:
      {"black", "blue", "red"}

  • Querying the index:

// Query for all shop documents that have a red TShirt
var shopsThatHaveRedShirts = session
.Query<Shops_ByTShirt_Simple.IndexEntry, Shops_ByTShirt_Simple>()
// Filter query results by a nested value
.Where(x => x.Colors.Contains("red"))
.OfType<OnlineShop>()
.ToList();
// Results will include the following shop documents:
// ==================================================
// * Shop1
// * Shop3
  • When to use:

    • This type of index structure is effective for retrieving documents when filtering the query by any of the inner nested values that were indexed.

    • However, due to the way the index-entries are generated, this index cannot provide results for a query searching for documents that contain specific sub-objects which satisfy some AND condition.
      For example:

// You want to query for shops containing "Large Green TShirts",
// aiming to get only "Shop1" as a result since it has such a combination,
// so you attempt this query:
var GreenAndLarge = session
.Query<Shops_ByTShirt_Simple.IndexEntry, Shops_ByTShirt_Simple>()
.Where(x => x.Colors.Contains("green") && x.Sizes.Contains("L"))
.OfType<OnlineShop>()
.ToList();

// But, the results of this query will include BOTH "Shop1" & "Shop2"
// since the index-entries do not keep the original sub-objects structure.
  • To address this, you must use a Fanout index - as described below.

Fanout index - Multiple index-entries per document

  • What is a Fanout index:

    • A fanout index is an index that outputs multiple index-entries per document.
      A separate index-entry is created for each nested sub-object from the document.

    • The fanout index is useful when you need to retrieve documents matching query criteria
      that search for specific sub-objects that comply with some logical conditions.

  • Fanout index - Map index example:

// A fanout map-index:
// ===================
public class Shops_ByTShirt_Fanout : AbstractIndexCreationTask<OnlineShop>
{
public class IndexEntry
{
// The index-fields:
public string Color { get; set; }
public string Size { get; set; }
public string Logo { get; set; }
}

public Shops_ByTShirt_Fanout()
{
Map = shops =>
from shop in shops
from shirt in shop.TShirts
// Creating MULTIPLE index-entries per document,
// an index-entry for each sub-object in the TShirts list
select new IndexEntry
{
Color = shirt.Color,
Size = shirt.Size,
Logo = shirt.Logo
};
}
}
// Query the fanout index:
// =======================
var shopsThatHaveMediumRedShirts = session
.Query<Shops_ByTShirt_Fanout.IndexEntry, Shops_ByTShirt_Fanout>()
// Query for documents that have a "Medium Red TShirt"
.Where(x => x.Color == "red" && x.Size == "M")
.OfType<OnlineShop>()
.ToList();
// Query results:
// ==============

// Only the 'Shop1' document will be returned,
// since it is the only document that has the requested combination within the TShirt list.
  • The index-entries: Fanout - index-entries

    1. The index-entries content is visible from the Studio Query view.

    2. Check option: Show raw index-entries instead of Matching documents.

    3. Each row represents an index-entry.
      Each index-entry corresponds to an inner item in the TShirt list.

    4. In this example, the total number of index-entries is 12,
      which is the total number of inner items in the TShirt list in all 3 documents in the collection.

  • Fanout index - Map-Reduce index example:

    • The fanout index concept applies to map-reduce indexes as well:
// A fanout map-reduce index:
// ==========================
public class Sales_ByTShirtColor_Fanout :
AbstractIndexCreationTask<OnlineShop, Sales_ByTShirtColor_Fanout.IndexEntry>
{
public class IndexEntry
{
// The index-fields:
public string Color { get; set; }
public int ItemsSold { get; set; }
public decimal TotalSales { get; set; }
}

public Sales_ByTShirtColor_Fanout()
{
Map = shops =>
from shop in shops
from shirt in shop.TShirts
// Creating MULTIPLE index-entries per document,
// an index-entry for each sub-object in the TShirts list
select new IndexEntry
{
Color = shirt.Color,
ItemsSold = shirt.Sold,
TotalSales = shirt.Price * shirt.Sold
};

Reduce = results => from result in results
group result by result.Color
into g
select new
{
// Calculate sales per color
Color = g.Key,
ItemsSold = g.Sum(x => x.ItemsSold),
TotalSales = g.Sum(x => x.TotalSales)
};
}
}
// Query the fanout index:
// =======================
var queryResult = session
.Query<Sales_ByTShirtColor_Fanout.IndexEntry, Sales_ByTShirtColor_Fanout>()
// Query for index-entries that contain "black"
.Where(x => x.Color == "black")
.FirstOrDefault();

// Get total sales for black TShirts
var blackShirtsSales = queryResult?.TotalSales ?? 0;
// Query results:
// ==============

// With the sample data used in this article,
// The total sales revenue from black TShirts sold (in all shops) is 490.0
  • Fanout index - Performance hints:

    • Fanout indexes are typically more resource-intensive than other indexes as RavenDB has to index a large number of index-entries. This increased workload can lead to higher CPU and memory utilization, potentially causing a decline in the overall performance of the index.

    • When the number of index-entries generated from a single document exceeds a configurable limit,
      RavenDB will issue a High indexing fanout ratio alert in the Studio notification center.

    • You can control when this performance hint is created by setting the PerformanceHints.Indexing.MaxIndexOutputsPerDocument configuration key (default is 1024).

    • So, for example, adding another OnlineShop document with a tShirt object containing 1025 items
      will trigger the following alert:

      Figure 1. High indexing fanout ratio notification

    • Clicking the 'Details' button will show the following info:

      Figure 2. Fanout index, performance hint details

  • Fanout index - Paging:

    • A fanout index has more index-entries than the number of documents in the collection indexed.
      Multiple index-entries "point" to the same document from which they originated,
      as can be seen in the above index-entries example.

    • When making a fanout index query that should return full documents (without projecting results),
      then in this case, the TotalResults property (available via the QueryStatistics object) will contain
      the total number of index-entries and Not the total number of resulting documents.

    • To overcome this when paging results, you must take into account the number of "duplicate"
      index-entries that are skipped internally by the server when serving the resulting documents.

    • Please refer to paging through tampered results for further explanation and examples.