Skip to main content

SNMP Support

SNMP support is available for Enterprise licenses only.

Overview

  • Simple Network Management Protocol (SNMP) is an Internet-standard protocol for collecting and organizing information about managed devices on IP networks. It is used primarily for monitoring network services. SNMP exposes management data in the form of variables (metrics) that describe the system status and configuration. These metrics can then be remotely queried (and, in some circumstances, manipulated) by managing applications.

  • In RavenDB we have support for SNMP which allows monitoring tools like Zabbix, PRTG, and Datadog direct access to the internal details of RavenDB. We expose a long list of metrics: CPU and memory usage, server total requests, the loaded databases, and database-specific metrics like the number of indexed items per second, document writes per second, storage space each database takes, and more.

  • You can still monitor what is going on with RavenDB directly from the Studio, or by using one of our monitoring tools. However, using SNMP might be easier in some cases. As users start running large numbers of RavenDB instances, it becomes impractical to deal with each of them individually, and using a monitoring system that can watch many servers becomes advisable.

Enabling SNMP in RavenDB

  • To monitor RavenDB using SNMP you must first set the Monitoring.Snmp.Enabled configuration key to true.

  • To learn how to modify a configuration key, refer to the Configuration Overview article,
    which outlines all available options.

  • For example, add this key to your settings.json file and restart the server.

{
...
"Monitoring.Snmp.Enabled": true
...
}

SNMP configuration options

There are several configurable SNMP properties in RavenDB:

For SNMPv1:
For SNMPv2c:
  • Monitoring.Snmp.Community
    The community string is used as a password.
    It is sent with each SNMP GET request and allows or denies access to the monitored device.
    Default: "ravendb"
For SNMPv3:
  • See article Monitoring Options for the full list of SNMP configuration keys.

  • To learn how to modify a configuration key, refer to the Configuration Overview article,
    which outlines all available options.

The Metrics

Access metrics via monitoring tools

  • Querying the exposed metrics using a monitoring tool is typically straightforward (see this Zabbix example).

  • For a simplified setup, we have provided a few templates which can be found here.
    These templates include the metrics and their associated OIDs.

Access metrics via SNMP agents

  • The metrics can be accessed directly using any SNMP agent such as Net-SNMP.
    Each metric has a unique object identifier (OID) and can be accessed individually.

  • The most basic SNMP commands are snmpget, snmpset and snmpwalk.
    For example, you can execute the following snmpget commands to retrieve the server's up-time metric.

    For SNMPv2c:
// Request:
snmpget -v 2c -c ravendb live-test.ravendb.net 1.3.6.1.4.1.45751.1.1.1.3

// Result:
iso.3.6.1.4.1.45751.1.1.1.3 = Timeticks: (29543973) 3 days, 10:03:59.73
  • ravendb is the community string (set via the Monitoring.Snmp.Community configuration key).
  • "live-test.ravendb.net" is the host.
For SNMPv3:
snmpget -v 3 -l authNoPriv -u ravendb -a SHA \
-A ravendb live-test.ravendb.net 1.3.6.1.4.1.45751.1.1.1.3
  • -l authNoPriv - sets the security level to use authentication but no privacy.
  • -u ravendb - sets the user for authentication purposes to "ravendb".
  • -a SHA - sets the authentication protocol to SHA.
  • -A ravendb - sets the authentication password to "ravendb".

Access metrics via HTTP

Access single OID value:

  • An individual OID value can be retrieved via HTTP GET endpoint:
    <serverUrl>/monitoring/snmp?oid=<oid>

  • For example, a cURL request for the server up-time metric:

// Request:
curl -X GET http://live-test.ravendb.net/monitoring/snmp?oid=1.3.6.1.4.1.45751.1.1.1.3

// Result:
{ "Value" : "4.21:32:56.0700000" }

Access multiple OID values:

  • Multiple OID values can be retrieved by making either a GET or a POST request to the following HTTP endpoint: <serverUrl>/monitoring/snmp/bulk

  • For example, cURL requests for the server managed memory and unmanaged memory metrics:

curl -X GET "http://live-test.ravendb.net/monitoring/snmp/bulk? \
oid=1.3.6.1.4.1.45751.1.1.1.6.7&oid=1.3.6.1.4.1.45751.1.1.1.6.8"
curl -X POST \
-H "Content-Type: application/json" \
-d '{ "OIDs": ["1.3.6.1.4.1.45751.1.1.1.6.7", "1.3.6.1.4.1.45751.1.1.1.6.8"]}' \
http://localhost:8080/monitoring/snmp/bulk
{
"Results": [
{ "OID": "1.3.6.1.4.1.45751.1.1.1.6.7", "Value": "410" },
{ "OID": "1.3.6.1.4.1.45751.1.1.1.6.8", "Value": "4" }
]
}

Get all OIDs:

List of OIDs

  • RavenDB's root OID is: 1.3.6.1.4.1.45751.1.1.

  • Values represented by X, D, or I in the OIDs list below will be:

    • X:
      0 - any kind of collection
      1 - a generation-0 or generation-1 collection
      2 - a blocking generation-2 collection
      3 - a background collection (this is always a generation 2 collection)
    • D - Database number
    • I - Index number
OIDMetric (Server)
1.1.1Server URL
1.1.2Server Public URL
1.1.3Server TCP URL
1.1.4Server Public TCP URL
1.2.1Server version
1.2.2Server full version
1.3Server up-time
1.3.6.1.2.1.1.3.0Server up-time (global)
1.4Server process ID
1.5.1Process CPU usage in %
1.5.2Machine CPU usage in %
1.5.3.1CPU Credits Base
1.5.3.2CPU Credits Max
1.5.3.3CPU Credits Remaining
1.5.3.4CPU Credits Gained Per Second
1.5.3.5CPU Credits Background Tasks Alert Raised
1.5.3.6CPU Credits Failover Alert Raised
1.5.3.7CPU Credits Any Alert Raised
1.5.4IO wait in %
1.6.1Server allocated memory in MB
1.6.2Server low memory flag value
1.6.3Server total swap size in MB
1.6.4Server total swap usage in MB
1.6.5Server working set swap usage in MB
1.6.6Dirty Memory that is used by the scratch buffers in MB
1.6.7Server managed memory size in MB
1.6.8Server unmanaged memory size in MB
1.6.9Server encryption buffers memory being in use in MB
1.6.10Server encryption buffers memory being in pool in MB
1.6.11.X.2GC info for X.
Specifies if this is a concurrent GC or not.
1.6.11.X.3GC info for X.
Gets the number of objects ready for finalization this GC observed.
1.6.11.X.4GC info for X.
Gets the total fragmentation (in MB) when the last garbage collection occurred.
1.6.11.X.5GC info for X.
Gets the generation this GC collected.
1.6.11.X.6GC info for X.
Gets the total heap size (in MB) when the last garbage collection occurred.
1.6.11.X.7GC info for X.
Gets the high memory load threshold (in MB) when the last garbage collection occurred.
1.6.11.X.8GC info for X.
The index of this GC.
1.6.11.X.9GC info for X.
Gets the memory load (in MB) when the last garbage collection occurred.
1.6.11.X.10.1GC info for X.
Gets the pause durations. First item in the array.
1.6.11.X.10.2GC info for X.
Gets the pause durations. Second item in the array.
1.6.11.X.11GC info for X.
Gets the pause time percentage in the GC so far.
1.6.11.X.12GC info for X.
Gets the number of pinned objects this GC observed.
1.6.11.X.13GC info for X.
Gets the promoted MB for this GC.
1.6.11.X.14GC info for X.
Gets the total available memory (in MB) for the garbage collector to use when the last garbage collection occurred.
1.6.11.X.15GC info for X.
Gets the total committed MB of the managed heap.
1.6.11.X.16.3GC info for X.
Gets the large object heap size (in MB) after the last garbage collection of given kind occurred.
1.6.12.0Monitor /proc/meminfo/ metrics (unix/linux).
The description of each metric is available via endpoint <serverUrl>/monitoring/snmp/oids.
See Get all OIDs.
1.6.13Available memory for processing (in MB)
1.7.1Number of concurrent requests
1.7.2Total number of requests since server startup
1.7.3Number of requests per second (one minute rate)
1.7.3.1Number of requests per second (five second rate)
1.7.4Average request time in milliseconds
1.8Server last request time
1.8.1Server last authorized non cluster admin request time
1.9.1Server license type
1.9.2Server license expiration date
1.9.3Server license expiration left
1.9.4Server license utilized CPU cores
1.9.5Server license max CPU cores
1.10.1Server storage used size in MB
1.10.2Server storage total size in MB
1.10.3Remaining server storage disk space in MB
1.10.4Remaining server storage disk space in %
1.10.5IO read operations per second
1.10.6IO write operations per second
1.10.7Read throughput in kilobytes per second
1.10.8Write throughput in kilobytes per second
1.10.9Queue length
1.11.1Server certificate expiration date
1.11.2Server certificate expiration left
1.11.3List of well known admin certificate thumbprints
1.11.4List of well known admin certificate issuers
1.11.5Number of expiring certificates
1.11.6Number of expired certificates
1.12.1Number of processor on the machine
1.12.2Number of assigned processors on the machine
1.13.1Number of backups currently running
1.13.2Max number of backups that can run concurrently
1.14.1Number of available worker threads in the thread pool
1.14.2Number of available completion port threads in the thread pool
1.15.1Number of active TCP connections
1.16.1Indicates if any experimental features are used
1.17.1Value of the '/proc/sys/vm/max_map_count' parameter
1.17.2Number of current map files in '/proc/self/maps'
1.17.3Value of the '/proc/sys/kernel/threads-max' parameter
1.17.4Number of current threads
OIDMetric (Cluster)
3.1.1Current node tag
3.1.2Current node state
3.2.1Cluster term
3.2.2Cluster index
3.2.3Cluster ID
OIDMetric (Database)
5.2.D.1.1Database name
5.2.D.1.2Number of indexes
5.2.D.1.3Number of stale indexes
5.2.D.1.4Number of documents
5.2.D.1.5Number of revision documents
5.2.D.1.6Number of attachments
5.2.D.1.7Number of unique attachments
5.2.D.1.10Number of alerts
5.2.D.1.11Database ID
5.2.D.1.12Database up-time
5.2.D.1.13Indicates if database is loaded
5.2.D.1.14Number of rehabs
5.2.D.1.15Number of performance hints
5.2.D.1.16Number of indexing errors
5.2.D.2.1Documents storage allocated size in MB
5.2.D.2.2Documents storage used size in MB
5.2.D.2.3Index storage allocated size in MB
5.2.D.2.4Index storage used size in MB
5.2.D.2.5Total storage size in MB
5.2.D.2.6Remaining storage disk space in MB
5.2.D.2.7IO read operations per second
5.2.D.2.8IO write operations per second
5.2.D.2.9Read throughput in kilobytes per second
5.2.D.2.10Write throughput in kilobytes per second
5.2.D.2.11Queue length
5.2.D.3.1Number of document puts per second (one minute rate)
5.2.D.3.2Number of indexed documents per second for map indexes (one minute rate)
5.2.D.3.3Number of maps per second for map-reduce indexes (one minute rate)
5.2.D.3.4Number of reduces per second for map-reduce indexes (one minute rate)
5.2.D.3.5Number of requests per second (one minute rate)
5.2.D.3.6Number of requests from database start
5.2.D.3.7Average request time in milliseconds
5.2.D.5.1Number of indexes
5.2.D.5.2Number of static indexes
5.2.D.5.3Number of auto indexes
5.2.D.5.4Number of idle indexes
5.2.D.5.5Number of disabled indexes
5.2.D.5.6Number of error indexes
5.2.D.5.7Number of faulty indexes
5.2.D.6.1Number of writes (documents, attachments, counters, timeseries)
5.2.D.6.2Number of bytes written (documents, attachments, counters, timeseries)
OIDMetric (Index)
5.2.D.4.I.1Indicates if index exists
5.2.D.4.I.2Index name
5.2.D.4.I.4Index priority
5.2.D.4.I.5Index state
5.2.D.4.I.6Number of index errors
5.2.D.4.I.7Last query time
5.2.D.4.I.8Index indexing time
5.2.D.4.I.9Time since last query
5.2.D.4.I.10Time since last indexing
5.2.D.4.I.11Index lock mode
5.2.D.4.I.12Indicates if index is invalid
5.2.D.4.I.13Index status
5.2.D.4.I.14Number of maps per second (one minute rate)
5.2.D.4.I.15Number of reduces per second (one minute rate)
5.2.D.4.I.16Index type
OIDMetric (General)
5.1.1Number of all databases
5.1.2Number of loaded databases
5.1.3Time since oldest backup
5.1.4Number of disabled databases
5.1.5Number of encrypted databases
5.1.6Number of databases for current node
5.1.7.1Number of indexes in all loaded databases
5.1.7.2Number of stale indexes in all loaded databases
5.1.7.3Number of error indexes in all loaded databases
5.1.7.4Number of faulty indexes in all loaded databases
5.1.7.5Number of indexing errors in all loaded databases
5.1.8.1Number of indexed documents per second for map indexes (one minute rate) in all loaded databases
5.1.8.2Number of maps per second for map-reduce indexes (one minute rate) in all loaded databases
5.1.8.3Number of reduces per second for map-reduce indexes (one minute rate) in all loaded databases
5.1.9.1Number of writes (documents, attachments, counters, timeseries) in all loaded databases
5.1.9.2Number of bytes written (documents, attachments, counters, timeseries) in all loaded databases
5.1.10Number of faulted databases
OIDMetric (Ongoing tasks)
5.1.11.1Number of enabled ongoing tasks for all databases
5.1.11.2Number of active ongoing tasks for all databases
5.1.11.3Number of enabled external replication tasks for all databases
5.1.11.4Number of active external replication tasks for all databases
5.1.11.5Number of enabled RavenDB ETL tasks for all databases
5.1.11.6Number of active RavenDB ETL tasks for all databases
5.1.11.7Number of enabled SQL ETL tasks for all databases
5.1.11.8Number of active SQL ETL tasks for all databases
5.1.11.9Number of enabled OLAP ETL tasks for all databases
5.1.11.10Number of active OLAP ETL tasks for all databases
5.1.11.11Number of enabled Elasticsearch ETL tasks for all databases
5.1.11.12Number of active Elasticsearch ETL tasks for all databases
5.1.11.13Number of enabled Queue ETL tasks for all databases
5.1.11.14Number of active Queue ETL tasks for all databases
5.1.11.15Number of enabled Backup tasks for all databases
5.1.11.16Number of active Backup tasks for all databases
5.1.11.17Number of enabled Subscription tasks for all databases
5.1.11.18Number of active Subscription tasks for all databases
5.1.11.19Number of enabled Pull Replication As Sink tasks for all databases
5.1.11.20Number of active Pull Replication As Sink tasks for all databases