Skip to main content

Sharding: Indexing

Indexing in a sharded database

  • The same index definition is deployed across the database to all shards.
    However, each shard indexes only its own local data - there is no cross-shard indexing process.
    Each shard executes the index definition independently on the documents it stores locally.

  • As a result, each shard maintains its own local index entries for the data stored on that shard.
    There is no indexing stage that reads documents from multiple shards and builds a single shared index.

  • Querying a sharded index is coordinated by the orchestrator, which combines results from all shards.
    The orchestrator is a RavenDB server that mediates all communication between the client and the database shards.
    Learn more in Client-server communication.

Map-Reduce indexes in a sharded database

Map-reduce indexes in a sharded database work in two stages:

  1. At indexing time:
    During indexing, each shard maps and reduces only the documents it stores locally,
    just as a non-sharded database reduces its local data.
  2. At query time:
    When a query uses a map-reduce index, the orchestrator distributes the query to the shards,
    gathers the partial reduce results returned from each shard, and reduces them to produce the final query result.
    The data retrieved from the shards depends on the query shape.
    See order by and limit in a Map-Reduce query for details.

Learn more about querying map-reduce indexes in a sharded database in Sharding: querying.

Unsupported indexing features

Unsupported or not-yet-implemented indexing features include:

  • Custom sorters:
    Custom sorters are not supported in a sharded database.

  • Rolling index deployment:
    Rolling index deployment is not supported in a sharded database.

  • Outputting Map-Reduce results to a collection:
    Outputting map-reduce index results to an artificial documents collection is not supported in a sharded database.

  • Loading a document from another shard:
    Loading a document during indexing is possible only if the document resides on the same shard where the index is running. If the requested document is stored on a different shard, LoadDocument will return null.

    For example, consider the following index, which attempts to load a related Category document.
    To ensure that all documents are properly indexed - including those whose related document resides on another shard - handle this null case explicitly in your index definition, as shown below:

    public class Products_ByCategoryName :
    AbstractIndexCreationTask<Product, Products_ByCategoryName.IndexEntry>
    {
    public class IndexEntry
    {
    public string CategoryName { get; set; }
    }

    public Products_ByCategoryName()
    {
    Map = products =>
    from product in products
    // In a sharded database, LoadDocument returns null
    // if the related document resides on a different shard.
    let category = LoadDocument<Category>(product.Category)
    select new IndexEntry
    {
    // Handle the null case explicitly:
    CategoryName = category != null ? category.Name : null
    };
    }
    }

    Why the explicit null check matters:

    Without the explicit null check (e.g., assigning category.Name directly to CategoryName),
    RavenDB treats the resulting null as an implicit null and omits the field entirely from the index entry.
    Products whose category resides on another shard would then be missing the CategoryName field in the index,
    making them invisible to queries that filter on this field (including where CategoryName == null).

    Using category != null ? category.Name : null stores an explicit null in the index entry,
    keeping those products queryable.

    Storing documents in the same shard:

    You can make sure related documents are stored in the same bucket, and therefore on the same shard,
    by using the $ syntax. Learn more in Anchoring documents to a bucket.

In this article