Deployment Cookbook

Introduction

This is a cookbook-style document outlining the steps in deploying the esmond codebase, databases, various moving parts and running some initial tests on the deployment to make sure things are running smoothly.

In the case of installation of external tools/etc, what needs to be installed will be noted and the user may need to refer to the docs for said tools. If there are any known gotchas, they will be noted here.

Initial installation

  • Install python-pip

  • pip install virtualenv

  • Install Java 7 of choice (openjdk-7 etc)

  • Install cassandra: http://cassandra.apache.org/download/

  • Install mercurial

  • Install postgres and related development packages (libpq-dev, etc) if using that as the DB backend.

  • Install python development packages if your system does not already have them (python-dev, etc).

  • Install memcached

  • Get esmond source: (Checkout/install source where you want it.)

  • Chdir to where source code is pulled to. This will now be referred to as the ESMOND_ROOT

  • Copy the example conf file to where you want - cp devel/esmond-devel.conf esmond.conf (for example)

  • Set the following environment variables (modify paths as appropriate):

    export DJANGO_SETTINGS_MODULE=esmond.settings
    export ESMOND_ROOT=/home/parallels/esmond/esmond
    export ESMOND_CONF=$ESMOND_ROOT/esmond.conf
    
  • Create the virtualenv:

    virtualenv --prompt="(esmond)" venv
    . venv/bin/activate
    pip install -U pip
    pip install -U setuptools
    
  • pip install -r requirements.txt

  • Execute the following:

    mkdir $ESMOND_ROOT/tsdb-data
    touch $ESMOND_ROOT/tsdb-data/TSDB
    
  • All binary components are installed into $ESMOND_ROOT/venv/bin - you may wish to add this to your $PATH.

Set up database

  • Modify the esmond.conf file you have pointed to with the $ESMOND_CONF variable to set up the database connection.

  • If using sqlite as a backend, these directives will suffice:

    sql_db_engine = django.db.backends.sqlite3
    sql_db_name = %(ESMOND_ROOT)s/esmond.db
    
  • If using postgres (or another database that django is friendly with) directives like this are needed:

    sql_db_engine = django.db.backends.postgresql_psycopg2
    sql_db_name = esmond
    sql_db_user = snmp
    
  • If using a postgres/etc, the user will need to create the target database and give whatever users necessary access permissions to it.

  • Populate schema with this command: python esmond/manage.py syncdb

  • You can load some example oidset fixtures with the following commands:

    python esmond/manage.py loaddata oidsets.json
    python esmond/manage.py loaddata test_devices.json
    
  • You can look at some additional DB administration commands by looking under the [api] section of the output generated by python esmond/manage.py help

Test Cassandra/base install

  • The following directives in $ESMOND_CONF need to be tailored to your cassandra installation:

    cassandra_servers = localhost:9160
    cassandra_user =
    cassandra_pass =
    cassandra_replicas = 1
    
  • Try installing the esmond keyspace/schema in cassandra by executing the following command: python esmond/manage.py cassandra_init

  • If successful, output like this should be generated:

    Initializing cassandra esmond keyspace
    cassandra_db [INFO] Creating keyspace esmond
    cassandra_db [INFO] Checking/creating column families
    cassandra_db [INFO] Created CF: raw_data
    cassandra_db [INFO] Created CF: base_rates
    cassandra_db [INFO] Created CF: rate_aggregations
    cassandra_db [INFO] Created CF: stat_aggregations
    cassandra_db [INFO] Schema check done
    cassandra_db [INFO] Waiting for schema to propagate...
    cassandra_db [INFO] Done
    cassandra_db [DEBUG] Opening ConnectionPool
    cassandra_db [INFO] Connected to ['localhost:9160’]
    
  • With cassandra running and configured, execute the test suite: python esmond/manage.py test -v2 api

Test the Persister

  • Now with the database loaded and cassandra running, test to see if the persister can consume data.

  • Run memcached and configure the following lines in $ESMOND_CONF as apropos:

    espersistd_uri = 127.0.0.1:11211
    espoll_persist_uri = MemcachedPersistHandler:127.0.0.1:11211
    
  • Execute $ESMOND_ROOT/venv/bin/espersistd

  • Execute $ESMOND_ROOT/util/poller_test_generator.py -W - you should see the approximate following output:

    <MemcachedPersistQueue: cassandra_1 last_added: 0, last_read: 0>
    <MemcachedPersistQueue: cassandra_2 last_added: 0, last_read: 0>
    <MemcachedPersistQueue: cassandra_3 last_added: 0, last_read: 0>
    Generating 8 data points.
    
  • That program can be used to generate bogus testing poller data - run with the -h | —help flag to see further options.

  • Now, verify that the persister consumed the data from memcache and entered it into cassandra:

  • Execute $ESMOND_ROOT/util/dump_keys.py -p fake - you should see the following output (or something similar if you have different oidsets defined):

    cassandra_db [INFO] Checking/creating column families
    cassandra_db [INFO] Schema check done
    cassandra_db [DEBUG] Opening ConnectionPool
    cassandra_db [INFO] Connected to ['localhost:9160']
    snmp:fake_rtr_a:FastPoll:ifInOctets:fake_iface_0:30000:2013
    snmp:fake_rtr_a:FastPollHC:ifHCOutOctets:fake_iface_1:30000:2013
    snmp:fake_rtr_a:FastPollHC:ifHCInOctets:fake_iface_0:30000:2013
    snmp:fake_rtr_a:FastPollHC:ifHCInOctets:fake_iface_1:30000:2013
    snmp:fake_rtr_a:FastPollHC:ifHCOutOctets:fake_iface_0:30000:2013
    snmp:fake_rtr_a:FastPoll:ifOutOctets:fake_iface_1:30000:2013
    snmp:fake_rtr_a:FastPoll:ifOutOctets:fake_iface_0:30000:2013
    snmp:fake_rtr_a:FastPoll:ifInOctets:fake_iface_1:30000:2013
    
  • That program can be used to dump the row keys from the various column families in the cassandra esmond keyspace - run with the -h | —help flag to see further options. Meant as a debugging/testing utility.

  • Alternately you can log into cassandra using cassandra-cli and look at the various column families to see the data was inserted.

  • Shut the persister down: kill cat $ESMOND_ROOT/var/espersistd.manager.pid

Set up REST api

  • Install apache2 (config examples are for current threaded)
  • Install mod_wsgi and make sure that it’s the same version as your python. When installing from packages (apt-get, et al), the python version that mod_wsgi was compiled against is commonly not what you want and you will need to compile it from source.
  • See the example doc strings in $ESMOND_ROOT/esmond/wsgi.py, modify the paths as appropriate for your esmond deployment, modify httpd.conf with the modified directives.
  • Along with modifying the paths, set the group that your apache is running under (www, www-data, etc) as appropriate in the WSGIDaemonProcess and WSGIProcessGroup directives.
  • If using postgres/another database engine, it might be necessary to set the ‘sql_db_host’ (even if you are just running on localhost). If you get an apache “peer authentication failed for user” error, setting sql_db_host will rectify this problem.
  • Similarly, appropriate auth directives/configuration will need to be set up with the database engine so the user specified in sql_db_name is able to connect via the processes running in apache since the rules of engagement can be different than connecting locally/from the shell. This might involve setting sql_db_password in esmond.conf, modifying pg_hba.conf, etc.
  • Cassandra will not need any additional configuration as previous configuration steps are sufficient.
  • Re/start apache to pick up the configuration changes - check the apache error log to verify that mod_wsgi loaded and there are no other errors.

Test REST api

  • Execute curl http://localhost/v1/oidset/ (or whatever host/port is apropos) and you should get a list of the oidsets you loaded from the fixtures. If you didn’t you will just get an empty list returned.

  • To make sure auth is properly set up, execute $ESMOND_ROOT/util/timeseries_post_get.py with only the -U arg set to point at the rest api (default: http://localhost). The following output/error should be generated: esmond.api.client.timeseries.PostException: ‘PostData requires username and api_key for rest interface.’

  • Execute the following command to add a user that is allowed to post data through the api (note, this will give a user write access through the api so assign accordingly):

    python esmond/manage.py add_timeseries_post_user <username>
    
  • Re-execute the timeseries_post_data.py script now supplying the -u and -k as appropriate (the -k args is the api key string returned by manage.py). The following output should be generated:

    api/client/timeseries.py:160: PostRawDataWarning: Payload empty, no data sent.
      self._issue_warning('Payload empty, no data sent.')
    <DataPayload: len:5 b:1384804667000 e:1384804758000>
      + <DataPoint: ts:1384804667000 val:1000>
      + <DataPoint: ts:1384804697000 val:2000>
      + <DataPoint: ts:1384804727000 val:3000>
      + <DataPoint: ts:1384804757000 val:4000>
      + <DataPoint: ts:1384804758000 val:5000>
    <DataPayload: len:5 b:1384804667000 e:1384804758000>
      + <DataPoint: ts:1384804667000 val:33.3333333333>
      + <DataPoint: ts:1384804697000 val:66.6666666667>
      + <DataPoint: ts:1384804727000 val:100.0>
      + <DataPoint: ts:1384804757000 val:133.333333333>
      + <DataPoint: ts:1384804758000 val:166.666666667>
    
  • If so, the authentication is set up properly (the PostRawDataWarning is there on purpose and does not indicate an error state.

Memcached Configuration

Memcached can lose data if it runs out of memory. A few configuration options can help prevent this.

Specify the ‘-M’ option. This tells memcached to return a failure if there is not any storage available rather than evicting some other item from the cache. That particular poll result will still be lost but the failure lets Esmond log the event.

Use the ‘-m 1024’ or similar to give it plenty of RAM.

Use the ‘espersistq’ utility or monitor memcached directly to make sure you have enough persist processes to handle the load. If the backlog is growing add more processes by adjusting the ‘cassandra = CassandraPollPersister:4’ line in esmond.conf.

Finally, check for log entries stating “Memcache ‘set’ failed! Polling data lost!”

Initial Cassandra Tuning

The main thing is to tune the Java heap size and newgen memory. The rule of thumb on this is for a system with more than 4G memory, allocate 1/4 the system memory but with a cap of 8G. Then set the newgen memory at 25-30% of that.

Setting the key cache to around 512M should be more than enough cache for the keys. The timeseries rows are not great candidates for row caching.

Changes to the cassandra_replicas factor will not automatically be reflected in the Cassandra database. The setting is only used when the ‘esmond’ keyspace is first created. If the number of replicas needs to be changed later it should be done directly in Cassandra with the ALTER KEYSPACE command. See http://www.datastax.com/documentation/cql/3.0/cql/cql_using/update_ks_rf_t.html for more information.

If you ever delete rows from Cassandra it may be necessary to increase the value of tombstone_failure_threshold in the cassandra.yaml file. One full year of 30 second samples is just over 1 million values so deleting an entire row will leave behind enough tombstones to prevent any queries for that row key from working unless the threshold is increased. Note that these failures will normally show up as timeouts to the client (as of 2.0.7 at least) which can be misleading. The true cause of the failure does show up if the query is run with tracing enabled.

More info: http://www.datastax.com/docs/1.1/operations/tuning

Cassandra monitoring hooks

The MX4J plugin can be used to get information about the state/health of a Cassandra server. This gives a pointer to the java source and instructions how to install:

http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J

It exposes an http interface that can be used to query JMX variables from cassandra and the OS as outlined here:

http://www.tomas.cat/blog/en/monitoring-cassandra-relevant-data-should-be-watched-and-how-send-it-graphite

The script util/query_jmx.py imports a client library from esmond.api.client that can query one of these MX4J endpoints for a variety of information. There is a nagios wrapper for that client in util/nagios.