This is a cookbook-style document outlining the steps in deploying the esmond codebase, databases, various moving parts and running some initial tests on the deployment to make sure things are running smoothly.
In the case of installation of external tools/etc, what needs to be installed will be noted and the user may need to refer to the docs for said tools. If there are any known gotchas, they will be noted here.
pip install virtualenv
Install Java 7 of choice (openjdk-7 etc)
Install cassandra: http://cassandra.apache.org/download/
Install postgres and related development packages (libpq-dev, etc) if using that as the DB backend.
Install python development packages if your system does not already have them (python-dev, etc).
Get esmond source: (Checkout/install source where you want it.)
git clone https://github.com/esnet/esmond.git
Chdir to where source code is pulled to. This will now be referred to as the ESMOND_ROOT
Copy the example conf file to where you want - cp devel/esmond-devel.conf esmond.conf (for example)
Set the following environment variables (modify paths as appropriate):
export DJANGO_SETTINGS_MODULE=esmond.settings export ESMOND_ROOT=/home/parallels/esmond/esmond export ESMOND_CONF=$ESMOND_ROOT/esmond.conf
Create the virtualenv:
virtualenv --prompt="(esmond)" venv . venv/bin/activate pip install -U pip pip install -U setuptools
pip install -r requirements.txt
Execute the following:
mkdir $ESMOND_ROOT/tsdb-data touch $ESMOND_ROOT/tsdb-data/TSDB
All binary components are installed into $ESMOND_ROOT/venv/bin - you may wish to add this to your $PATH.
Modify the esmond.conf file you have pointed to with the $ESMOND_CONF variable to set up the database connection.
If using sqlite as a backend, these directives will suffice:
sql_db_engine = django.db.backends.sqlite3 sql_db_name = %(ESMOND_ROOT)s/esmond.db
If using postgres (or another database that django is friendly with) directives like this are needed:
sql_db_engine = django.db.backends.postgresql_psycopg2 sql_db_name = esmond sql_db_user = snmp
If using a postgres/etc, the user will need to create the target database and give whatever users necessary access permissions to it.
Populate schema with this command: python esmond/manage.py syncdb
You can load some example oidset fixtures with the following commands:
python esmond/manage.py loaddata oidsets.json python esmond/manage.py loaddata test_devices.json
You can look at some additional DB administration commands by looking under the [api] section of the output generated by python esmond/manage.py help
The following directives in $ESMOND_CONF need to be tailored to your cassandra installation:
cassandra_servers = localhost:9160 cassandra_user = cassandra_pass = cassandra_replicas = 1
Try installing the esmond keyspace/schema in cassandra by executing the following command: python esmond/manage.py cassandra_init
If successful, output like this should be generated:
Initializing cassandra esmond keyspace cassandra_db [INFO] Creating keyspace esmond cassandra_db [INFO] Checking/creating column families cassandra_db [INFO] Created CF: raw_data cassandra_db [INFO] Created CF: base_rates cassandra_db [INFO] Created CF: rate_aggregations cassandra_db [INFO] Created CF: stat_aggregations cassandra_db [INFO] Schema check done cassandra_db [INFO] Waiting for schema to propagate... cassandra_db [INFO] Done cassandra_db [DEBUG] Opening ConnectionPool cassandra_db [INFO] Connected to ['localhost:9160’]
With cassandra running and configured, execute the test suite: python esmond/manage.py test -v2 api
Now with the database loaded and cassandra running, test to see if the persister can consume data.
Run memcached and configure the following lines in $ESMOND_CONF as apropos:
espersistd_uri = 127.0.0.1:11211 espoll_persist_uri = MemcachedPersistHandler:127.0.0.1:11211
Execute $ESMOND_ROOT/util/poller_test_generator.py -W - you should see the approximate following output:
<MemcachedPersistQueue: cassandra_1 last_added: 0, last_read: 0> <MemcachedPersistQueue: cassandra_2 last_added: 0, last_read: 0> <MemcachedPersistQueue: cassandra_3 last_added: 0, last_read: 0> Generating 8 data points.
That program can be used to generate bogus testing poller data - run with the -h | —help flag to see further options.
Now, verify that the persister consumed the data from memcache and entered it into cassandra:
Execute $ESMOND_ROOT/util/dump_keys.py -p fake - you should see the following output (or something similar if you have different oidsets defined):
cassandra_db [INFO] Checking/creating column families cassandra_db [INFO] Schema check done cassandra_db [DEBUG] Opening ConnectionPool cassandra_db [INFO] Connected to ['localhost:9160'] snmp:fake_rtr_a:FastPoll:ifInOctets:fake_iface_0:30000:2013 snmp:fake_rtr_a:FastPollHC:ifHCOutOctets:fake_iface_1:30000:2013 snmp:fake_rtr_a:FastPollHC:ifHCInOctets:fake_iface_0:30000:2013 snmp:fake_rtr_a:FastPollHC:ifHCInOctets:fake_iface_1:30000:2013 snmp:fake_rtr_a:FastPollHC:ifHCOutOctets:fake_iface_0:30000:2013 snmp:fake_rtr_a:FastPoll:ifOutOctets:fake_iface_1:30000:2013 snmp:fake_rtr_a:FastPoll:ifOutOctets:fake_iface_0:30000:2013 snmp:fake_rtr_a:FastPoll:ifInOctets:fake_iface_1:30000:2013
That program can be used to dump the row keys from the various column families in the cassandra esmond keyspace - run with the -h | —help flag to see further options. Meant as a debugging/testing utility.
Alternately you can log into cassandra using cassandra-cli and look at the various column families to see the data was inserted.
Shut the persister down: kill cat $ESMOND_ROOT/var/espersistd.manager.pid
Execute curl http://localhost/v1/oidset/ (or whatever host/port is apropos) and you should get a list of the oidsets you loaded from the fixtures. If you didn’t you will just get an empty list returned.
To make sure auth is properly set up, execute $ESMOND_ROOT/util/timeseries_post_get.py with only the -U arg set to point at the rest api (default: http://localhost). The following output/error should be generated: esmond.api.client.timeseries.PostException: ‘PostData requires username and api_key for rest interface.’
Execute the following command to add a user that is allowed to post data through the api (note, this will give a user write access through the api so assign accordingly):
python esmond/manage.py add_timeseries_post_user <username>
Re-execute the timeseries_post_data.py script now supplying the -u and -k as appropriate (the -k args is the api key string returned by manage.py). The following output should be generated:
api/client/timeseries.py:160: PostRawDataWarning: Payload empty, no data sent. self._issue_warning('Payload empty, no data sent.') <DataPayload: len:5 b:1384804667000 e:1384804758000> + <DataPoint: ts:1384804667000 val:1000> + <DataPoint: ts:1384804697000 val:2000> + <DataPoint: ts:1384804727000 val:3000> + <DataPoint: ts:1384804757000 val:4000> + <DataPoint: ts:1384804758000 val:5000> <DataPayload: len:5 b:1384804667000 e:1384804758000> + <DataPoint: ts:1384804667000 val:33.3333333333> + <DataPoint: ts:1384804697000 val:66.6666666667> + <DataPoint: ts:1384804727000 val:100.0> + <DataPoint: ts:1384804757000 val:133.333333333> + <DataPoint: ts:1384804758000 val:166.666666667>
If so, the authentication is set up properly (the PostRawDataWarning is there on purpose and does not indicate an error state.
Memcached can lose data if it runs out of memory. A few configuration options can help prevent this.
Specify the ‘-M’ option. This tells memcached to return a failure if there is not any storage available rather than evicting some other item from the cache. That particular poll result will still be lost but the failure lets Esmond log the event.
Use the ‘-m 1024’ or similar to give it plenty of RAM.
Use the ‘espersistq’ utility or monitor memcached directly to make sure you have enough persist processes to handle the load. If the backlog is growing add more processes by adjusting the ‘cassandra = CassandraPollPersister:4’ line in esmond.conf.
Finally, check for log entries stating “Memcache ‘set’ failed! Polling data lost!”
The main thing is to tune the Java heap size and newgen memory. The rule of thumb on this is for a system with more than 4G memory, allocate 1/4 the system memory but with a cap of 8G. Then set the newgen memory at 25-30% of that.
Setting the key cache to around 512M should be more than enough cache for the keys. The timeseries rows are not great candidates for row caching.
Changes to the cassandra_replicas factor will not automatically be reflected in the Cassandra database. The setting is only used when the ‘esmond’ keyspace is first created. If the number of replicas needs to be changed later it should be done directly in Cassandra with the ALTER KEYSPACE command. See http://www.datastax.com/documentation/cql/3.0/cql/cql_using/update_ks_rf_t.html for more information.
If you ever delete rows from Cassandra it may be necessary to increase the value of tombstone_failure_threshold in the cassandra.yaml file. One full year of 30 second samples is just over 1 million values so deleting an entire row will leave behind enough tombstones to prevent any queries for that row key from working unless the threshold is increased. Note that these failures will normally show up as timeouts to the client (as of 2.0.7 at least) which can be misleading. The true cause of the failure does show up if the query is run with tracing enabled.
The MX4J plugin can be used to get information about the state/health of a Cassandra server. This gives a pointer to the java source and instructions how to install:
It exposes an http interface that can be used to query JMX variables from cassandra and the OS as outlined here:
The script util/query_jmx.py imports a client library from esmond.api.client that can query one of these MX4J endpoints for a variety of information. There is a nagios wrapper for that client in util/nagios.