Alexandre's Notebook: Visualizing Data

Brief

In order to learn a bit about data visualization and also learn about simple data warehousing, i’ve tried out a small experiment, where i set out to built an small visualization application.

In this post I’ll be trying to cut boring details as much as possible, so i’ve simplified several things that are not overly important.

I planned the following:

The goal is to create a visualization to show the amount of vegetarians around the world. Imagine looking at world map, with each country showing in a graphical way the number of vegetarians, you should be able to zoom in to europe for example, to see which country’s lead vegetarian eating or zoom out and see the whole world picture, click on a specific country and see the statistics for it all over a month, choose a particular day of month to visualize, dragable navigation of the world map… this is what i decided to go for…

On the technical side, as i was interested in using Processing framework and because I am a ruby addict, this turned out to be a good excuse to play with jruby.

Part 1 - Aggregating data

Normally this process involves a lot of work, but i had an easy task, i could collect clean data from another database. I’m interested in the table with vegetarian people. But what to collect, what to summarize, what to calculate ?

NOTE: Specify up front what is the goal of the visualization as much as possible, this will influence the way all design will be done.

The kind of data aggregation to do depends on the visualization… So I’ve decided even if i have tons more data available i just want to see overall count of veggies by country by day.

So in an warehouse fashion lets choose the facts and dimensions:

Facts:

number of vegetarians.

Dimensions:

Time.
Localization(country)

facts: are generally numeric data that captures specific values.

dimensions: contain the reference information that gives each transaction its context. When dimensions are created they should be as enriched with most information as possible(and calculated values).

Next Step is to build the “warehouse”, for this is used a plain database where i created 3 tables:
as

Country:

Initially i only had 2 char ISO code identifying country, but i enriched the dimension with all the other values.
I used geoname.org webservice to collect other values. Specially important are the geo coordinates for the country bounding box which where used to calculate central latitude and a central longitude of a country, that is going to be used for the visualization.
Things like continent, population, capital, are can be used later for summarizing data for continent, for showing ratio of number of veggies for total of population, number of veggies for square meter, etc etc… think of the possibilities… :)

Time:

I made a “group by” day for collecting data from the database. I wanted the finest granularity detail as a day.
So from a day, we can calculate, day, month, year, day of week, weekday?, day in year, day in month, quarter, week day name, etc etc…
What is this useful for? Well imagine you want to see number of vegetarians on wednesday’s compared to monday’s, or the same for quarters, or months, maybe getting close to summer months, the number of veggies might go up a bit ?

Aggregating

With the basic schema laid out, its time for a data collection. I used the ActiveRecord part of the rails framework, using jruby. Its not the first time i’ve used ActiveRecord as standalone and i like it a lot… simplifies data access hugely, and because its all inside ruby, a couple more lines of code, and voilá, all the needed extra calculated columns get done also.
This collected and calculated values are then inserted into a local mySql using the schema above: fact_vegetarian, dim_date and dim_country.

I’ve collected values for a whole month.

Ended up with 225 lines of code for the warehouse part code, with some comments… but no repeated code.

Part 2 - Building a Visualizer

What the visualizer does generically is to shoot some queries to database, filtered by the view expected and also by some global vars, like date, country and shows it back as bubbles with sizes proportional to amount of veggies on for each country’s.

Application was divided into different drawing components:

Show World Data, its the opening scenario, showing the whole world 1 month statistics.
Show Country, used showing a specific country stats.
Show Stats, a strip at bottom showing a graph of the number over the month, where x axis are the days of the month, and y axis the number of veggies for a given.
Show Buttons, button used to control zoom, reset, etc…

(Probably a refactoring will reduce the Show World Map and the Show Country into a single Drawing component, has a lot of repeated code.)

I’ve created a different module for each one, which where then mixed in into main class the inherits from Processing.Sketch, to avoid ending up with a big ball of spaghetti code :)

Defined some globals vars, like:

mouse coordinates, for the dragable navigation.
zoom level, to know what is the zoom level.
active month, filter for queries.
active country, filter for queries.
active day, filter for queries.

Made some stuff clickable, like the country codes, displayed on top of the country’s, so the user has the possibility to filter and see stats on bottom of a single country. This is done by checking how far away is the mouse position to the central point of a country.

Also on the bottom, the stats strip has on the x axis the possibility to click on the day of the month, so the user can select a particular day and that will update the world visualization, showing the numbers of the number of veggies for a given day for all the world.
as

And zoomed out, whole world view:

Ended up with 584 lines of code, with a big chunk of repeated code, and some comments…

Overall making the Visualization was a lot more work that the warehouse part, because I had a lot of fighting around with correct coordinates positioning, getting a decent map, maintaining map country coordinates with the zooms.

Using jruby was mostly a nice experience, there are a couple of things to learn at first , for example on how to include java libraries, no biggie, but I had also some type conversion issue when i tried to refactor the code at some point, i guess its because of the java type’s, that jruby guys hide and convert automatically, they can show up in some edge cases? … but then again might be also my inexperience with jruby…

I’ve used version 1.0 of jruby, i think is a great work that jruby guys have done, making accessible to ruby community all the millions of java libraries out there. But of course don’t expect to do 100% ruby code like you do with old ruby, sometimes there’s some java lurking out of the jruby box.

Processing is great, has also huge potential, had a couple of troubles with 1 or 2 plugins i tried, but i end up using base distribution and that works and feels 100%. Is probably not intended to do full applications, but more like Sketches and stand alone small visualizations, which is fine. I look forward to do more stuff with it, its fun!

Visualizing Data

Part 1 - Aggregating data

Aggregating

Part 2 - Building a Visualizer

No comments: