Skip to: Site menu | Main content

Blog > Building a Status Screen, part 2: Collecting Data

>> Building a Status Screen, part 2: Collecting Data

Fri, Apr 9th 11:47pm 2010: IVT

This is part 2 of a 3-part series on building a visual status screen, or dashboard, for your company. If you missed part 1 you can find it at Building a Status Screen, part 1: Hardware.

Now that you have the monitors neatly mounted in place it's time to start collecting some data to display on them, so the first thing to decide is exactly *what* you want to display. Write a list of items first, then we'll work through the list and figure out how to retrieve that particular piece of data. In my case the initial list looked a bit like this:

  • Recent Twitter updates that mention us or our products.
  • Number of open bugs in our bug tracker.
  • Number of feature requests pending.
  • Number of packages that are in the QA queue awaiting testing.
  • Number of open support requests categorised by customer.
  • Number of customers categorised by product and project type.
  • Monthly revenue categorised by product and project type.
  • Number of running physical servers in each data centre.
  • Number of running virtual machines in each data centre and on EC2.
  • Current weather conditions in Melbourne, Australia.
  • Graphs of office power consumption broken down by desk area and power circuit.

Other things that I'd like to display but don't currently have room to include on the current screens:

  • Graph of phone lines in use.
  • Nagios report on service status and failures.
  • Visualisation of network traffic.
  • Video feed from front door of office.

If you glance down that list you'll notice that the data itself has to come from quite a few different places: some is stored in internal systems, some is external. Additionally, some is quite volatile and can change by the second while some may vary quite slowly.

Collection Architecture
To get around the problem of different data sources and types I used an architecture based on a centralised datastore, and wrote a number of independent data collectors that each deal with their own specific type of data and update the datastore at the appropriate rate.

At any one point in time the datastore can then be relied on to contain a snapshot of the last set of data that has been collected, without concern for the status of any of the individual collectors.

In my case the datastore and all the collectors run on the same machine, the intranet server at Internet Vision Technologies. The datastore itself could be created using something like a MySQL database but as a quick, low-effort starting point I just used a bunch of text files stored in a directory on the server. Ultimately I'll probably switch the database to MySQL but having everything sitting around in text files appeals to my sense of minimalism and makes debugging very easy during development.

Writing Collectors
The IVT intranet server is running Linux (Ubuntu, in this case) so there are plenty of options for scripting environments, but I chose to use PHP since that's what most of our other projects use and it's a language I'm very familiar with. The same thing could be done in Perl, Python, C, or even in a collection of BASH scripts and command line tools if you prefer. It really doesn't matter: just use whatever you're most comfortable with.

Each collector is just a self-contained PHP script that can be invoked directly on the command line or automatically run by CRON at a predetermined interval. Start by writing and running each script manually, then when you're happy that everything is working properly they can be added to a crontab file. We'll get to that in just a moment.

Twitter Collector
The first collector I wrote was a trivial Twitter collector. Twitter offers a simple API, and the TwitterSearch class by Ryan Faerman makes it incredibly simple to access using PHP. I downloaded the TwitterSearch class from http://ryanfaerman.com/twittersearch/ and put it in the directory with the collector.

The code for the collector itself is stored in a file called "collector-twitter.php" located in "/home/statusscreen" and looks like this:

#!/usr/bin/php
<?php
$datapath = "/home/statusscreen/data/";
include_once( "/home/statusscreen/includes/TwitterSearch.php" );
$search = new TwitterSearch( '#brillianz' );
$search-%gt;user_agent = 'phptwittersearch:jonoxer@gmail.com';
$results = $search->results();
$fp = fopen( $datapath . 'data-twitter', 'w' );
foreach( $results as $result )
{
  $text = $result->text;
  $avatar = $result->profile_image_url;
  $user = $result->from_user;
  fwrite( $fp, $user . "|" . $avatar . "|" . $text . "n" );
}
fclose( $fp );
?>

Even if you haven't used PHP before the script should hardly need any commentary. It starts by specifying the location in which to store the data, includes Ryan's TwitterSearch class, then sets up a search for the "#brillianz" hashtag and executes it. Of course you'll probably want to use a different search.

The collector then opens a text file called "data-twitter" in "write" mode and loops over the search results. For each result it extracts the text of the tweet, the URL of the user's avatar, and the username that created the tweet. These are then inserted into the text file with a pipe separator before looping back to the next result and doing it again. Finally the file is closed.

The result is a text file that contains data about a set of tweets, and in part 3 of this series we'll be looking at how to process that data and display it on the status screens.

Oh yes, and something to keep in mind here is potential attacks. If we were taking this data and inserting it into a MySQL database we'd need to be careful to validate the data, otherwise you'd be opening yourself up to SQL injection attacks from any random Twitter user!

Once you've created the script you can make it executible using:

chmod +x collector-twitter.php

Then run it manually using:

./collector-twitter.php

The result should be a set of entries in the "data-twitter" file.

Weather Collector
The weather collector is even simpler, and uses PHP's built in XML parsing support to access the Google Weather API.

#!/usr/bin/php
<?php
$datapath = "/home/statusscreen/data/";
$weather_feed = file_get_contents("http://www.google.com/ig/api?weather=melbourne,Australia");
$weather = simplexml_load_string($weather_feed);
if(!$weather) die('weather failed');

#print_r($weather->weather->current_conditions->temp_c);
$temp_c = $weather->weather->current_conditions->temp_c['data'];
$fp = fopen( $datapath . 'data-weather-temp', 'w' ); fwrite( $fp, $temp_c ); fclose( $fp );
?>

You'll notice a commented-out call to "print_r". If you want to explore the other data available within the API you can just uncomment that line and execute the script manually. This example only collects the temperature in degrees C, but it could trivially be extended to collect other data and store it in other text files alongside the "data-weather-temp" file.

Custom Data Services The two data collectors shown so far access external services with established APIs, but in many cases the data you need to fetch comes from internal systems with no existing API. The solution is to write your own data services to make that information available in a similar way to the external APIs.

The systems I needed to access were all running the Apache web server so once again I just used PHP to create simple scripts that would report specific values. For example, our server provisioning system runs on another Ubuntu Linux server and uses its own MySQL database to manage information about all the servers we run.

You need to think very carefully about security at this point because you're potentially exposing highly confidential information to anyone with a web browser. Make sure the system is behind a firewall, password protected, or otherwise restricted.

In the web tree on the provisioning server I created a script that connects to the MySQL database on the server, runs a couple of queries to get some relevant data, and then reports those values when accessed using a web browser. The result is a simple web page that looks something like this:

br:123|ec:456

That page can then be loaded by a data collector and parsed to extract the values.

If you really wanted to you could implement a full web-services API using XML or something, but I couldn't be bothered. This is lightweight and works perfectly well.

Internal Data Collector The final collector I'll show you is one example of the many collectors I've written to acquire data from our own internal systems such as the one described above. As we've just seen the data service exposes values in a very simple format so all we need to do is call the appropriate URL, parse the response, and stick the values into text files just like before. This should do the trick:

#!/usr/bin/php

Obviously you'll need to substitute your own URLs and variables into these examples to suit your own requirements, and your scripts may have many more variables in them. Most of my collectors acquire about 4 to 8 data points.

Automatic Scheduling On Linux systems it's really easy to schedule scripts for regular execution. In my case I have a file called "/etc/cron.d/statusscreen" that looks a bit like this:

* * * * * root /home/statusscreen/collector-stats.php
* * * * * root /home/statusscreen/collector-bugs.php
*/15 * * * * root /home/statusscreen/collector-weather.php
* * * * * root /home/statusscreen/collector-twitter.php

In this case the weather collector is executed every 15 minutes, while the other collectors are executed every minute. My real file has many more lines (and therefore collectors) than that, but it gives you the idea. If you set up your collectors to be executed automatically you can periodically "cat" each data file to see the current values, and you should see them change within a minute or so of the original data source changing.

So at this point you should have infrastructure in place to periodically collect data from a variety of sources and store it all in a central place. In the final installment I'll show you how I take that data and display it on the status screens up on the office wall.



Bookmark and Share