Auditing document compliance: API
-
Audit stored documents' compliance with a schema to identify and address any validation violations.
- You can run an audit operation that scans a document collection or a part of it and generates a validation violations report.
- You can also audit document compliance by index, validating documents during indexing and embedding any validation error messages into the indexes. The indexed documents can then be queried and managed based on their schema compliance.
-
In this article:
Running an audit operation
Use this audit approach to examine the compliance of documents in a collection with a validation schema. You can limit the number of documents to be checked and the number of error messages returned.
This method is useful for one-time audits or periodic checks of data integrity.
To execute this audit, run StartSchemaValidationOperation. The operation will scan the documents in the specified collection, validate each document against the provided JSON schema, and produce a report of any validation errors.
StartSchemaValidationOperation
StartSchemaValidationOperation constructor:
public StartSchemaValidationOperation(Parameters parameters)
StartSchemaValidationOperation arguments:
Pass the operation a Parameters object that specifies -
public class Parameters
{
// JSON schema definition as a string
public string SchemaDefinition { get; set; }
// Target collection to validate
public string Collection { get; set; }
// Optional: limit the number of error messages returned (default: 1024)
public int MaxErrorMessages { get; set; }
// Optional: limit the number of documents to validate (default: unlimited)
public int MaxDocumentsToValidate { get; set; }
}
StartSchemaValidationOperation results:
The operation will run in the background and return a result object containing -
ValidatedCount
The number of documents checkedErrorCount
The number of documents with at least one validation failureErrors
A dictionary where each entry contains:Key- The name of the documentValue- A newline-delimited string of error messages for the document
Example
In this example we audit the Orders collection, ensuring each order has a Customer string and a non-negative Total property. The validation results are then retrieved and summarized.
// Define a validation Schema as a string
// `Customer` must be a string
// `Total` must be a number >= 0
// Both fields must exist
string schemaDefinition = @"{
""properties"": {
""Customer"": { ""type"": ""string"" },
""Total"": { ""type"": ""number"", ""minimum"": 0 }
},
""required"": [""Customer"", ""Total""]
}";
// Store valid and invalid orders
var store = GetDocumentStore();
using (var session = store.OpenSession())
{
// A valid order
session.Store(new Order { Customer = "Alice", Total = 100 }, "orders/1-A");
// An invalid order with two errors (missing Customer, negative Total)
session.Store(new Order { Total = -50 }, "orders/2-A");
// An invalid order (negative Total)
session.Store(new Order { Customer = "Bob", Total = -10 }, "orders/3-A");
session.SaveChanges();
}
// Run the Schema Validation Operation
var parameters = new StartSchemaValidationOperation.Parameters
{
SchemaDefinition = schemaDefinition,
Collection = "Orders",
MaxErrorMessages = 10 // Optional: limit the number of error messages returned
};
var operation = await store.Maintenance.SendAsync(
new StartSchemaValidationOperation(parameters));
// Wait for the operation to complete and get the validation report
var result = await operation.WaitForCompletionAsync<ValidateSchemaResult>(TimeSpan.FromMinutes(1));
// Handle the results, e.g., print a validation summary
// The number of inspected documents, and the number of documents with errors
Console.WriteLine($"Validated: {result.ValidatedCount}, Errors: {result.ErrorCount}");
// For each document with errors, print its Id and the associated error messages
foreach (var error in result.Errors)
{
Console.WriteLine($"Document Id: {error.Key}");
foreach (var line in error.Value.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries))
{
Console.WriteLine($"Error: {line}");
}
}
// A simplified Order class
public class Order
{
public string Customer { get; set; }
public double Total { get; set; }
}
Audit document compliance by index
Use this audit approach to continuously validate document compliance during indexing and store validation error messages in the index. The index can then be queried to filter documents by their compliance status.
This method is useful for checking data integrity in large or frequently changing datasets, where a single audit operation may be less efficient.
To validate documents by index:
Define a validation schema
Define a validation schema that the index will use to validate documents during indexing.
The index will use the schema defined in its index configuration (see below how to define it).
- If no schema is defined in the index configuration, the index will use the database-level schema.
- If no schema is defined either in the index configuration or at the database level, the index will not perform any validation.
Note: When the index uses a database-level schema, disabling validation at the database level will also disable auditing by the index.
However, when the index uses an index-level schema, disabling database-level validation will not disable auditing by the index.
Defining an index-level validation schema:
// Define a validation Schema as a string
// `Customer` must be a string
// `Total` must be a number >= 0
// Both fields are required
string schemaDefinition = @"{
""properties"": {
""Customer"": { ""type"": ""string"" },
""Total"": { ""type"": ""number"", ""minimum"": 0 }
},
""required"": [""Customer"", ""Total""]
}";
// Associate the schema with the Orders collection
var schemaDefinitions = new IndexSchemaDefinitions
{
{ "Orders", schemaDefinition }
};
Define and execute an index
-
Add to the index an Errors field to hold validation error messages.
// Create the index definition
var indexDefinition = new IndexDefinition
{
Name = "Orders_WithValidation",
Maps = { OrdersValidationMap },
SchemaDefinitions = schemaDefinitions,
Fields = new Dictionary<string, IndexFieldOptions>
{
{ "Errors", new IndexFieldOptions { Storage = FieldStorage.Yes } }
},
Type = IndexType.Map
}; -
Add an index map with a call to a validation method (
Schema.GetErrorsFor()for a C# static index, orschema.getErrorsFor()for a JavaScript index).// Create a map for the index, including a call to Schema.GetErrorsFor()
const string OrdersValidationMap = @"
from doc in docs
let errors = Schema.GetErrorsFor(doc)
where errors != null && errors.Length > 0
select new
{
Id = doc.Id,
Errors = errors
}
"; -
Execute the index to start validating each indexed document against the validation schema and storing any violation messages in the Errors field.
// Create the index
await store.Maintenance.SendAsync(new PutIndexesOperation(indexDefinition));
Query the index
You can now query the index to find documents by their validation status and specific errors.
// Query the index and print validation errors
using (var session = store.OpenAsyncSession())
{
// Retrieve results that contain validation errors from the index
var results = await session.Query<IndexResult>("Orders_WithValidation")
.Select(x => new
{
Id = RavenQuery.Metadata(x)["Id"] as string, // Also project the document Id
Errors = x.Errors
})
.ToListAsync();
foreach (var doc in results)
{
if (doc != null && doc.Errors is { Length: > 0 })
{
foreach (var error in doc.Errors)
Console.WriteLine($"{doc.Id} {error}");
}
else
{
Console.WriteLine("No errors or no document.");
}
}
}
Full example
- Static
- JavaScript
{
// Define a validation Schema as a string
// Customer must be a string
// Total must be a number >= 0
// These fields are required
string schemaDefinition = @"{
""properties"": {
""Customer"": { ""type"": ""string"" },
""Total"": { ""type"": ""number"", ""minimum"": 0 }
},
""required"": [""Customer"", ""Total""]
}";
// Associate the schema with the Orders collection
var schemaDefinitions = new IndexSchemaDefinitions
{
{ "Orders", schemaDefinition }
};
// Create a map for the index, including a call to Schema.GetErrorsFor()
const string OrdersValidationMap = @"
from doc in docs
let errors = Schema.GetErrorsFor(doc)
where errors != null && errors.Length > 0
select new
{
Id = doc.Id,
Errors = errors
}
";
// Create the index definition
var indexDefinition = new IndexDefinition
{
Name = "Orders_WithValidation",
Maps = { OrdersValidationMap },
SchemaDefinitions = schemaDefinitions,
Fields = new Dictionary<string, IndexFieldOptions>
{
{ "Errors", new IndexFieldOptions { Storage = FieldStorage.Yes } }
},
Type = IndexType.Map
};
var store = GetDocumentStore();
// Create valid and invalid orders
using (var session = store.OpenAsyncSession())
{
// Valid order
await session.StoreAsync(new Order { Customer = "Alice", Total = 100 }, "orders/1-A");
// Invalid order (2 errors: missing Customer, negative Total)
await session.StoreAsync(new Order { Total = -50 }, "orders/2-A");
// Invalid order (negative Total)
await session.StoreAsync(new Order { Customer = "Bob", Total = -10 }, "orders/3-A");
await session.SaveChangesAsync();
}
// Create the index
await store.Maintenance.SendAsync(new PutIndexesOperation(indexDefinition));
// Wait for the index to process all documents
await Indexes.WaitForIndexingAsync(store);
// Query the index and print validation errors
using (var session = store.OpenAsyncSession())
{
// Retrieve results that contain validation errors from the index
var results = await session.Query<IndexResult>("Orders_WithValidation")
.Select(x => new
{
Id = RavenQuery.Metadata(x)["Id"] as string, // Also project the document Id
Errors = x.Errors
})
.ToListAsync();
foreach (var doc in results)
{
if (doc != null && doc.Errors is { Length: > 0 })
{
foreach (var error in doc.Errors)
Console.WriteLine($"{doc.Id} {error}");
}
else
{
Console.WriteLine("No errors or no document.");
}
}
}
}
// A simplified Order class
public class Order
{
public string Customer { get; set; }
public double Total { get; set; }
}
public class IndexResult
{
public string[] Errors { get; set; }
}
{
// Define a validation Schema as a string
// Customer must be a string
// Total must be a number >= 0
// These fields are required
string schemaDefinition = @"{
""properties"": {
""Customer"": { ""type"": ""string"" },
""Total"": { ""type"": ""number"", ""minimum"": 0 }
},
""required"": [""Customer"", ""Total""]
}";
// Associate the schema with the Orders collection
var schemaDefinitions = new IndexSchemaDefinitions
{
{ "Orders", schemaDefinition }
};
// Create a map for the index, including a call to schema.getErrorsFor()
const string OrdersValidationMap = @"
map("@all_docs", (doc) => {
let errors = schema.getErrorsFor(doc);
if( errors != null && errors.length > 0)
return {
Id: id(doc),
Errors: errors
};
})";
// Create the index definition
var indexDefinition = new IndexDefinition
{
Name = "Orders_WithValidation_JS",
Maps = { OrdersValidationMap },
SchemaDefinitions = schemaDefinitions,
Fields = new Dictionary<string, IndexFieldOptions>
{
{ "Errors", new IndexFieldOptions { Storage = FieldStorage.Yes } }
},
Type = IndexType.JavaScriptMap
};
var store = GetDocumentStore();
// Create valid and invalid orders
using (var session = store.OpenAsyncSession())
{
// Valid order
await session.StoreAsync(new Order { Customer = "Alice", Total = 100 }, "orders/1-A");
// Invalid order (2 errors: missing Customer, negative Total)
await session.StoreAsync(new Order { Total = -50 }, "orders/2-A");
// Invalid order (negative Total)
await session.StoreAsync(new Order { Customer = "Bob", Total = -10 }, "orders/3-A");
await session.SaveChangesAsync();
}
// Create the index
await store.Maintenance.SendAsync(new PutIndexesOperation(indexDefinition));
// Wait for the index to process all documents
await Indexes.WaitForIndexingAsync(store);
// Query the index and print validation errors
using (var session = store.OpenAsyncSession())
{
// Retrieve results that contain validation errors from the index
var results = await session.Query<IndexResult>("Orders_WithValidation_JS")
.Select(x => new
{
Id = RavenQuery.Metadata(x)["Id"] as string, // Also project the document Id
Errors = x.Errors
})
.ToListAsync();
foreach (var doc in results)
{
if (doc != null && doc.Errors is { Length: > 0 })
{
foreach (var error in doc.Errors)
Console.WriteLine($"{doc.Id} {error}");
}
else
{
Console.WriteLine("No errors or no document.");
}
}
}
}
// A simplified Order class
public class Order
{
public string Customer { get; set; }
public double Total { get; set; }
}
public class IndexResult
{
public string[] Errors { get; set; }
}