Indexing Attachments
-
Indexing attachments allows you to query for documents based on their attachments' details and content.
-
Static indexes:
- Local attachments - Both attachment details and content can be indexed.
- Remote attachments - Only attachment details can be indexed.
Remote attachment content is not indexed by design, to avoid downloading files from cloud storage during indexing, for performance reasons.
-
Auto-indexes:
Auto-indexing attachments via dynamic queries is not available at this time. -
In this page:
Index attachments details
The index:
-
To index attachment details, call
AttachmentsFor()within the index definition. -
AttachmentsFor()provides access to the name, size, hash, and content-type of each attachment the document has. These details can be used to define the index-fields. Once the index is deployed, you can query the index to find Employee documents based on these attachment properties. -
Indexing attachment details is supported for both local and remote attachments.
To index attachment content, see the examples below.
- LINQ_index
- JS_index
public class Employees_ByAttachmentDetails :
AbstractIndexCreationTask<Employee, Employees_ByAttachmentDetails.IndexEntry>
{
public class IndexEntry
{
// The index fields:
public string EmployeeName { get; set; }
public string[] AttachmentNames { get; set; }
public string[] AttachmentContentTypes { get; set; }
public long[] AttachmentSizes { get; set; }
}
public Employees_ByAttachmentDetails()
{
Map = employees => from employee in employees
// Call 'AttachmentsFor' to get attachments details
let attachments = AttachmentsFor(employee)
select new IndexEntry()
{
// Can index info from document properties:
EmployeeName = employee.FirstName + " " + employee.LastName,
// Index DETAILS of attachments:
AttachmentNames = attachments.Select(x => x.Name).ToArray(),
AttachmentContentTypes = attachments.Select(x => x.ContentType).ToArray(),
AttachmentSizes = attachments.Select(x => x.Size).ToArray()
};
}
}
public class Employees_ByAttachmentDetails_JS : AbstractJavaScriptIndexCreationTask
{
public Employees_ByAttachmentDetails_JS()
{
Maps = new HashSet<string>
{
@"map('Employees', function (employee) {
var attachments = attachmentsFor(employee);
return {
EmployeeName: employee.FirstName + ' ' + employee.LastName,
AttachmentNames: attachments.map(function(attachment) {
return attachment.Name;
}),
AttachmentContentTypes: attachments.map(function(attachment) {
return attachment.ContentType;
}),
AttachmentSizes: attachments.map(function(attachment) {
return attachment.Size;
})
};
})"
};
}
}
Query the Index:
You can now query for Employee documents based on their attachments details.
- Query
- Query_async
- DocumentQuery
- RQL
List<Employee> employees = session
// Query the index for matching employees
.Query<Employees_ByAttachmentDetails.IndexEntry, Employees_ByAttachmentDetails>()
// Filter employee results by their attachments details
.Where(x => x.AttachmentNames.Contains("photo.jpg"))
.Where(x => x.AttachmentSizes.Any(size => size > 20_000))
// Return matching Employee docs
.OfType<Employee>()
.ToList();
// Results:
// ========
// Running this query on the Northwind sample data,
// results will include 'employees/4-A' and 'employees/5-A'.
// These 2 documents contain an attachment by name 'photo.jpg' with a matching size.
List<Employee> employees = await asyncSession
.Query<Employees_ByAttachmentDetails.IndexEntry, Employees_ByAttachmentDetails>()
.Where(x => x.AttachmentNames.Contains("photo.jpg"))
.Where(x => x.AttachmentSizes.Any(size => size > 20_000))
.OfType<Employee>()
.ToListAsync();
List<Employee> employees = session.Advanced
.DocumentQuery<Employees_ByAttachmentDetails.IndexEntry, Employees_ByAttachmentDetails>()
.WhereEquals("AttachmentNames", "photo.jpg")
.WhereGreaterThan("AttachmentSizes", 20_000)
.OfType<Employee>()
.ToList();
from index "Employees/ByAttachmentDetails"
where AttachmentNames == "photo.jpg" and AttachmentSizes > 20000
Index details & content - for specific attachment name
Indexing attachment content is supported only for Local attachments.
Indexing attachment content is Not supported for Remote attachments.
Sample data:
- Each Employee document in RavenDB's sample data already includes a local attachment - photo.jpg.
- For the following examples, let's store a local textual attachment (file notes.txt) on 3 documents in the 'Employees' collection.
// Create some sample attachments:
for (var i = 1; i <= 3; i++)
{
var id = $"employees/{i}-A";
// Load an employee document:
var employee = session.Load<Employee>($"employees/{i}-A");
if (employee?.Notes == null || employee.Notes.Count == 0)
continue;
// Store the employee's notes as an attachment on the document:
byte[] bytes = System.Text.Encoding.UTF8.GetBytes(employee.Notes[0]);
using (var stream = new MemoryStream(bytes))
{
session.Advanced.Attachments.Store(
$"employees/{i}-A",
"notes.txt", stream,
"text/plain");
session.SaveChanges();
}
}
The index:
-
Call
LoadAttachment()within the index definition to access both the details and content of a specific attachment. -
Access to the content is available only for LOCAL attachments via methods
GetContentAsString()orGetContentAsStream(), as shown in the example below.
The content can be indexed just like the attachment details. -
Calling these "get content" methods on a REMOTE attachment is not supported. In such cases, the index will enter an error state, and a
RemoteAttachmentIndexingExceptionwill appear in the index errors. To avoid this, always check whether the attachment is local before accessing its content, as demonstrated below.
- LINQ_index
- JS_index
public class Employees_ByAttachment:
AbstractIndexCreationTask<Employee, Employees_ByAttachment.IndexEntry>
{
public class IndexEntry
{
// The index fields:
public string AttachmentName { get; set; }
public string AttachmentContentType { get; set; }
public long AttachmentSize { get; set; }
public string AttachmentContent { get; set; }
}
public Employees_ByAttachment()
{
Map = employees =>
from employee in employees
// Call 'LoadAttachment' to get attachment's details and content
// Pass the attachment name, e.g. "notes.txt"
let attachment = LoadAttachment(employee, "notes.txt")
// Check whether the attachment is stored locally
// 'RemoteAttachmentFlags.None' indicates a LOCAL attachment
let isLocal = attachment.RemoteFlags == RemoteAttachmentFlags.None
select new IndexEntry()
{
// Index attachment DETAILS (available for both LOCAL and REMOTE attachments):
AttachmentName = attachment.Name,
AttachmentContentType = attachment.ContentType,
AttachmentSize = attachment.Size,
// Index attachment CONTENT (available only for LOCAL attachments):
// Call 'GetContentAsString' to extract the content
AttachmentContent = isLocal ? attachment.GetContentAsString() : null
};
// It can be useful to configure Full-Text search on the AttachmentContent index-field
Index(x => x.AttachmentContent, FieldIndexing.Search);
// This index processes Employee documents.
// It allows querying these documents based the "notes.txt" attachment details & content:
// * Attachment details (available for both local and remote attachments)
// * Attachment content (available for local attachments only)
}
}
public class Employees_ByAttachment_JS : AbstractJavaScriptIndexCreationTask
{
public Employees_ByAttachment_JS()
{
Maps = new HashSet<string>
{
@"map('Employees', function (employee) {
var attachment = loadAttachment(employee, 'notes.txt');
var isLocal = attachment.RemoteFlags === 'None';
return {
AttachmentName: attachment.Name,
AttachmentContentType: attachment.ContentType,
AttachmentSize: attachment.Size,
AttachmentContent: isLocal ? attachment.getContentAsString() : null
};
})"
};
Fields = new Dictionary<string, IndexFieldOptions>
{
{
"AttachmentContent", new IndexFieldOptions
{
Indexing = FieldIndexing.Search
}
}
};
}
}
Query the Index:
You can now query for Employee documents based on their attachment details and/or its content.
- Query
- Query_async
- DocumentQuery
- RQL
List<Employee> employees = session
// Query the index for matching employees
.Query<Employees_ByAttachment.IndexEntry, Employees_ByAttachment>()
// Can make a full-text search
// Looking for employees with an attachment content that contains 'Colorado' OR 'Dallas'
.Search(x => x.AttachmentContent, "Colorado Dallas")
.OfType<Employee>()
.ToList();
// Results:
// ========
// Results will include 'employees/1-A' and 'employees/2-A'.
// Only these 2 documents have an attachment by name 'notes.txt'
// that contains either 'Colorado' or 'Dallas'.
List<Employee> employees = await asyncSession
// Query the index for matching employees
.Query<Employees_ByAttachment.IndexEntry, Employees_ByAttachment>()
// Can make a full-text search
// Looking for employees with an attachment content that contains 'Colorado' OR 'Dallas'
.Search(x => x.AttachmentContent, "Colorado Dallas")
.OfType<Employee>()
.ToListAsync();
List<Employee> employees = session.Advanced
.DocumentQuery<Employees_ByAttachment.IndexEntry, Employees_ByAttachment>()
.Search(x => x.AttachmentContent, "Colorado Dallas")
.OfType<Employee>()
.ToList();
from index "Employees/ByAttachment"
where search(AttachmentContent, "Colorado Dallas")
Index details & content - for all attachments
The index:
-
Call
LoadAttachments()within the index definition to be able to index the details & content of ALL attachments. -
Access to the content is available only for LOCAL attachments, and it can be indexed just like the details.
-
Note how the index example below is employing the Fanout index pattern.
- LINQ_index
- JS_index
public class Employees_ByAllAttachments :
AbstractIndexCreationTask<Employee, Employees_ByAllAttachments.IndexEntry>
{
public class IndexEntry
{
// The index fields:
public string AttachmentName { get; set; }
public string AttachmentContentType { get; set; }
public long AttachmentSize { get; set; }
public string AttachmentContent { get; set; }
}
public Employees_ByAllAttachments()
{
Map = employees =>
// Call 'LoadAttachments' to get details and content for ALL attachments
from employee in employees
from attachment in LoadAttachments(employee)
// This will be a FANOUT index -
// the index will generate an index-entry for each attachment per document
// Check whether the attachment is stored locally
// 'RemoteAttachmentFlags.None' indicates a LOCAL attachment
let isLocal = attachment.RemoteFlags == RemoteAttachmentFlags.None
select new IndexEntry
{
// Index attachment DETAILS (available for both LOCAL and REMOTE attachments):
AttachmentName = attachment.Name,
AttachmentContentType = attachment.ContentType,
AttachmentSize = attachment.Size,
// Index attachment CONTENT (available only for LOCAL attachments):
// Call 'GetContentAsString' to extract the content
AttachmentContent = isLocal ? attachment.GetContentAsString() : null
};
// It can be useful configure Full-Text search on the AttachmentContent index-field
Index(x => x.AttachmentContent, FieldIndexing.Search);
}
}
public class Employees_ByAllAttachments_JS : AbstractJavaScriptIndexCreationTask
{
public Employees_ByAllAttachments_JS()
{
Maps = new HashSet<string>
{
@"map('Employees', function (employee) {
const allAttachments = loadAttachments(employee);
return allAttachments.map(function (attachment) {
var isLocal = attachment.RemoteFlags === 'None';
return {
attachmentName: attachment.Name,
attachmentContentType: attachment.ContentType,
attachmentSize: attachment.Size,
AttachmentContent: isLocal ? attachment.getContentAsString() : null
};
});
})"
};
Fields = new Dictionary<string, IndexFieldOptions>
{
{
"attachmentContent", new IndexFieldOptions
{
Indexing = FieldIndexing.Search
}
}
};
}
}
Query the Index:
- Query
- Query_async
- DocumentQuery
- RQL
// Query the index for matching employees
List<Employee> employees = session
.Query<Employees_ByAllAttachments.IndexEntry, Employees_ByAllAttachments>()
// Filter employee results by their attachments details and content:
// Using 'SearchOptions.Or' combines the full-text search on 'AttachmentContent'
// with the following 'Where' condition using OR logic.
.Search(x => x.AttachmentContent, "Colorado Dallas", options: SearchOptions.Or)
.Where(x => x.AttachmentSize > 20_000)
.OfType<Employee>()
.ToList();
// Results:
// ========
// Results will include:
// 'employees/1-A' and 'employees/2-A' that match the content criteria
// 'employees/4-A' and 'employees/5-A' that match the size criteria
List<Employee> employees = await asyncSession
.Query<Employees_ByAttachment.IndexEntry, Employees_ByAttachment>()
.Search(x => x.AttachmentContent, "Colorado Dallas", options: SearchOptions.Or)
.Where(x => x.AttachmentSize > 20_000)
.OfType<Employee>()
.ToListAsync();
List<Employee> employees = session
.Advanced
.DocumentQuery<Employees_ByAllAttachments.IndexEntry, Employees_ByAllAttachments>()
.Search(x => x.AttachmentContent, "Colorado Dallas")
.OrElse()
.WhereGreaterThan(x => x.AttachmentSize, 20_000)
.OfType<Employee>()
.ToList();
from index "Employees/ByAllAttachments"
where search(AttachmentContent, "Colorado Dallas") or AttachmentSize > 20000
Leveraging indexed attachments
-
Access to the indexed attachment content opens the door to many different applications,
including many that can be integrated directly into RavenDB. -
This blog post demonstrates how image recognition can be applied to indexed attachments using the additional sources feature. The resulting index allows filtering and querying based on image content.
Syntax
AttachmentsFor
// Returns a list of attachment details for the specified document (without binary data).
IEnumerable<AttachmentName> AttachmentsFor(object document);
| Parameter | Type | Description |
|---|---|---|
| document | object | The document object whose attachments details you want to load. |
// AttachmentsFor returns a list containing the following attachment details object:
public class AttachmentName
{
public string Name;
public string Hash;
public string ContentType;
public long Size;
public RemoteAttachmentParameters RemoteParameters;
}
public class RemoteAttachmentParameters
{
// The scheduled time to upload the attachment to the remote destination
public DateTime At;
// The identifier of the remote storage destination where the attachment will be uploaded
public string Identifier;
}
LoadAttachment
// LoadAttachment returns attachment details and methods to access its content.
public IAttachmentObject LoadAttachment(object document, string attachmentName);
| Parameter | Type | Description |
|---|---|---|
| document | object | The document whose attachment you want to load. |
| attachmentName | string | The name of the attachment to load. |
// LoadAttachment returns the following object:
public interface IAttachmentObject
{
public string Name { get; }
public string Hash { get; }
public string ContentType { get; }
public long Size { get; }
// The scheduled time when the attachment was uploaded to cloud storage
public DateTime? RemoteAt {get; }
// The identifier of the remote storage destination where the attachment is stored.
public string RemoteIdentifier { get; }
// The flags indicating whether the attachment is stored locally or remotely.
public RemoteAttachmentFlags RemoteFlags { get; }
// Methods to access the content of LOCAL attachments only
public string GetContentAsString();
public string GetContentAsString(Encoding encoding);
public Stream GetContentAsStream();
}
public enum RemoteAttachmentFlags
{
// No flags are set. The attachment is stored locally.
None = 0,
// The attachment is stored remotely in cloud storage rather than in the local database.
Remote = 0x1
}
LoadAttachments
// Returns a list of ALL attachments for the specified document.
public IEnumerable<IAttachmentObject> LoadAttachments(object document);
| Parameter | Type | Description |
|---|---|---|
| document | object | The document whose attachments you want to load. |