4 Ways to make Couchbase do the hard work: Part I


Let's see those new users!

Views are the Couchbase way of generating indices so that you can query your data on attributes other than the document key. In this article we are going to explore the Map Reduce system of Couchbase to allow you to access your data in a variety of methods.

Views are used for 3 main reasons:

  • Additonal indices on which to query your data
  • Calculating and producing statistics and information
  • Fine filtering of larger data sets

The following is incredibly important to remember!

Views are updated when the document data is persisted to disk. There is a delay between creating or updating the document, and the document being updated within the view.

Couchbase documents are always consistent but views are not and are eventually consistent (although you have the ability to change this). You can change how the view is queried via the following flags.

  1. false : Force a view update before returning data
  2. ok : Allow stale views
  3. update_after : Allow stale view, update view after it has been accessed

Data SetUp

This article assumes that you have Couchbase installed on your local development environment, you can check out this earlier article on how to install on Ubuntu. You also need to create a bucket with the name users.

We are going to populate our node with some sample data so that we can quickly get down to some Map Reduce fun.

If you have Ruby installed then you can run the Ruby script below which will insert 1000 documents with a variety of random values for the fields and dates from the last week. (You will need the couchbase gem to do this Guide to installing the Ruby Couchbase SDK)

Initial View Setup

Let's open up our Couchbase admin console and start to create some views!

Browse to your console via http://localhost:8091 and enter your credentials. Click the view tab and then create development view. We are going to name the design document 'design/devusers' and 'new_users' for the view name.

The data that we inserted has the following format:

Let's pretend we are working for a company that is using a system with millions of documents and high throughput, perhaps we are tracking urls or a mobile social game.

Daily Installs Users Query

Our company is keen to track new users entering the system, it's a metric that almost any business would want to track. We've already created the view 'new_users' so let's dive in and write some JS! Click edit on the users view and paste the following snippet into the mapping side of the editor. Before you hit save, select in the reduce section 'Count' by clicking on the link. Now save the view.

Some things to note about the mapping code, each document in our bucket is passed through this function, and we are passed both the document and the meta information. It is important to check the meta.type == "json" as non json data can be saved too. It is also a good idea to have a doc type field on your documents. As Couchase recommend as few as buckets as possible many buckets will have many different types of documents, a docType field allows us to filter on a specific type. Think in SQL terms as though selecting from a table.

We then check the document does indeed have a field of 'join_date' and if so output using the Couchbase dateToArray() function.

The Couchbase dateToArray(date) function converts a JavaScript Date object or a valid date string such as "2012-07-30T23:58:22.193Z" into an array of individual date components.

This allows us to do some very neat things that we'll explore in the next few examples.

Now execute the script by running show results, the key should be null and the value should be 1000. This has just done a simple count of all our documents that match our criteria. Let's make it a little more useful.

Count of user base

Next, to filter results click the dropdown button and set reduce to false, re run the query. Now as we've turned off the 'count' reduce the query emits the keys (our date array), null for the value (we didn't set one) and below the key you'll notice the id of the document. The id is always included in the output so you never need to output it.

Reduce set to false

You'll notice the date is now an array, we are going to use the group level function to allow better granularity on our data. Set reduce to true again and select the group checkbox and set the level to 3. This means we are now grouping our users on the day they joined, level 1 would group by year, 2 by month. With a large amount of data you could group on seconds (level 6).

Grouped by level 3

Our data only covers a one week range, most likely we'd want to view the latest weeks data, but what if we wanted to only see a subset?

Well we can achieve this using the Couchbase start and end keys.

Our data may differ slightly as the Ruby script generates dates based on your current time, so bare that in mind for the next examples.

We only want data since the 30th of January? Then use the startkey
Start key

We only want to retreive from the 30th to the 2nd of February? Then use the startkey and endkey respectively (will include the 2nd of Feb data too)

Grouped and start and end key

In part 2 we will be looking at other views for querying the same dataset, we'll also include the dataset bundled together so you can load it in the node without having to use Ruby.


comments powered by Disqus