Retrieving And Using Data From Google Analytics

This is second post in a two part series exploring how to send custom variables to Google Analytics and pull information back out using these variables. It culminated with a release of a feature module that can track node types and generate blocks of most popular nodes for each content type.

Google Analytics provides an API for other applications to have access to the wealth of information it has about the sites it tracks. In my previous post about tracking custom variables I described how to send Drupal node type information into Google Analytics. Here we will look at how to get this information back out and produce blocks displaying the most popular content by type. In the end I will share the feature module powering the Most Popular Posts block on this site.

The Google Analytics API Module

While the Google Analytics API is very well documented there are some things you need to do to get up and running with it. The Google Analytics API module takes care of this for you. This module needs to be enabled, have the permissions setup, and have the site authorized to access the API for a particular account.

Making A Request

$request = array(
  '#dimensions' => array('pagePath'),
  '#metrics' => array('pageviews'),
  '#filter' => 'ga:customVarValue1=='. $node_type,
  '#sort_metric' => array('-pageviews'),
  '#start_date' => date('Y-m-d', time() - 2592000),
  '#max_results' => 10,
);
$data = google_analytics_api_report_data($request);

To start, an array describing the query needs to be created. Here I am looking for the pagePath dimension and pageviews metrics. This will be provided back in the results. The filter will be a node type, as it was previously shared as a custom variable. The sorting puts the highest pageviews first and the start date is 30 days ago.

The google_analytics_api_report_data() function returns an array of up to 10 result objects.

There is a Lullabot article with even more examples of accessing the API.

Handling The Results

if (!empty($data)) {
  foreach ($data as $page) {
    $dimensions = $page->getDimensions();
    $metrics = $page->getMetrics();

    $alias = substr($dimensions['pagePath'], strlen(base_path()));
    $path = drupal_lookup_path('source', $alias);

    if (arg(0, $path) == 'node' && is_numeric(arg(1, $path))) {
      $id = arg(1, $path);
      $title = db_result(db_query("SELECT title FROM {node} WHERE nid = %d AND status = 1", $id));
      if ($title) {
        $items[] = l($title, $alias);
      }
    }
  }
}

When we get the results back we want to do some processing on them before displaying them. Here we are getting the title for the node being tracked instead of using the title reported to Google Analytics which is based on the title tag and can contain the site name.

This array of links can be passed into theme('item_list', $items); to generate an html list.

Caching!

When making calls like this caching should be used. In my case I use block caching. The significantly reduces the calls to the database to get the titles and the calls to the Google Analytics API.

The Module

As I promised, the Google Analytics Node Tracking code is available. Read the installation instructions as there are some gotchas when you go to set this up. The code is very heavily documented if you want to learn even more about how it works.