Deployment and configuration¶
Deployment¶
While it is possible to run the application with ./run_exporter.sh
, docker
deployment is the recommended approach. The project has an example docker/docker-compose.yaml
that could be deployed with:
docker-compose --file docker/docker-compose.yaml up
Configuration¶
By default, no configuration file is expected by the app. All options are taken either from the command-line interface (CLI) or the execution environment (ENV).
CLI options are the same as the config file directives. For example, config file directive timelag: 120
corresponds to CLI option --timelag=120
. Corresponding ENV options are upper-case names of config file directives prefixed with GRAPH_
(e.g. GRAPH_TIMELAG=120
). Config file takes precedence over the CLI, which in turn takes precedence over ENV.
Below is the default config file docker/app_config.yaml
with all the default values and corresponding ENV variables mentioned in comments:
---
# Azure AD tenant where the Service Principal with access rights
# to call the MS Graph API resides.
# (GRAPH_TENANT)
#
tenant: ""
# ClientId of the Service Principal with access to MS Graph API.
# (GRAPH_CLIENT_ID)
#
client_id: ""
# ClientSecret of the Service Principal with access to MS Graph API.
# (GRAPH_CLIENT_SECRET)
#
client_secret: ""
# Seconds to shift back the query time-frame for each of the periodic
# invocations of the parallelized extraction process.
# (GRAPH_TIMELAG)
#
timelag: 120
# Number of parallel streams to fetch time-domain data.
# (GRAPH_STREAMS)
#
streams: 2
# Time-domain size of the data request for each stream (seconds).
# (GRAPH_STREAM_FRAME)
#
stream_frame: 30
# Number of records to request from the MS Graph API in a single response.
# (GRAPH_PAGE_SIZE)
#
page_size: 50
# Backend type to store exported data. Accepts either `redis` or `log`.
# If set to `redis`, expects `queue_type`, `queue_key` and `redis_url`
# to be defined to store data in Redis. If set to `log`, outputs received
# records to Celery Worker log under severity INFO.
# (GRAPH_QUEUE_BACKEND)
#
queue_backend: redis
# Storage queue implementation type. Accepts either `list` or `channel`.
# If set to `list`, all records are accumulated for further processing.
# If set to `channel`, all records are pushed to a PUB/SUB channel for
# automatic relaying to all subscribers.
# (GRAPH_QUEUE_TYPE)
#
queue_type: list
# Name of the CHANNEL or LIST where extracted data is pushed.
# (GRAPH_QUEUE_KEY)
#
queue_key: ms_graph_exporter
# Connection string for Redis client. Follows `reids-py` URL schema with
# `redis://` for regular Redis instance and `rediss://` for TLS. In case
# authentication is required, use `redis://:<password>@redis.host.org:<port>`
# where `<password>` is URL-encoded.
# (GRAPH_REDIS_URL)
#
redis_url: redis://localhost:6379?db=0
# Maximum number of Greenlets to spawn for each data upload task. Storage
# task splits list of records into the appropriate amount of chunks not
# greater than `greenlets_count + 1` (byproduct of int-based modulo calc)
# and spawns each chunk upload as a Greenlet co-routine.
# (GRAPH_GREENLETS_COUNT)
#
greenlets_count: 10
# Enable/disable threat-safe blocking in Redis connection pool.
# (GRAPH_REDIS_POOL_BLOCK)
#
redis_pool_block: True
# Enable/disable `gevent.queue.LifoQueue` usage in Redis connection pool.
# (GRAPH_REDIS_POOL_GEVENT_QUEUE)
#
redis_pool_gevent_queue: True
# Maximum number of reusable connections maintained by the connection pool
# of the Redis client.
# (GRAPH_REDIS_POOL_MAX_CONNECTIONS)
#
redis_pool_max_connections: 15
# Time that blocking-enabled Redis client waits for connection to become
# available from the exhausted connection pool (seconds). Afterwards, raises
# Redis ConnectionError exception.
# (GRAPH_REDIS_POOL_TIMEOUT)
#
redis_pool_timeout: 1
Here is also the output of the app worker --help
option:
$ celery worker --app=ms_graph_exporter.celery.app --help
[...snip...]
User Options:
--app_config GRAPH_APP_CONFIG
YAML-based configuration file. By default, no config
file is expected by the app. All options are taken
either from the command-line interface (CLI) or the
execution environment (ENV). Config file directives
are the same as the CLI options listed below.
Corresponding ENV options are upper-case names of CLI
options prefixed with 'GRAPH_'. Config file takes
precedence over the CLI, which in turn takes
precedence over ENV.
--client_id GRAPH_CLIENT_ID
ClientId of the Service Principal with access to MS
Graph API.
--client_secret GRAPH_CLIENT_SECRET
ClientSecret of the Service Principal with access to
MS Graph API.
--greenlets_count GRAPH_GREENLETS_COUNT
Maximum number of Greenlets to spawn for each data
upload task.
--page_size GRAPH_PAGE_SIZE
Number of records to request from the MS Graph API in
a single response.
--queue_backend {redis,log}
Backend type to store exported data.
--queue_type {list,channel}
Storage queue implementation type.
--queue_key GRAPH_QUEUE_KEY
Name of the CHANNEL or LIST where extracted data is
pushed.
--redis_url GRAPH_REDIS_URL
Connection string for Redis client.
--redis_pool_block Enable threat-safe blocking in Redis connection pool.
--no-redis_pool_block
Disable threat-safe blocking blocking in Redis
connection pool.
--redis_pool_gevent_queue
Enable `gevent.queue.LifoQueue` usage in Redis
connection pool.
--no-redis_pool_gevent_queue
Disable `gevent.queue.LifoQueue` usage in Redis
connection pool.
--redis_pool_max_connections GRAPH_REDIS_POOL_MAX_CONNECTIONS
Maximum number of reusable connections maintained by
the connection pool of the Redis client.
--redis_pool_timeout GRAPH_REDIS_POOL_TIMEOUT
Time that blocking-enabled Redis client waits for
connection to become available from the exhausted
connection pool (seconds). Afterwards, raises Redis
ConnectionError exception.
--streams GRAPH_STREAMS
Number of parallel streams to fetch time-domain data.
--stream_frame GRAPH_STREAM_FRAME
Time-domain size of the data request for each stream
(seconds).
--tenant GRAPH_TENANT
Azure AD tenant where the Service Principal with
access rights to call the MS Graph API resides.
--timelag GRAPH_TIMELAG
Seconds to shift back the query time-frame for each of
the periodic invocations of the parallelized
extraction process.