Data Export

The Sandstorm Data Export feature will allow you to export your data as JSON files. Once you set up the Data Export feature for your account, you’ll be able to export data for the period of last 6 months up to yesterday.

How do I get started with Data Export?

Refer to Data Export Setup to configure your account for Data Export. It takes less than 10 minutes to set up.

What is the format of the data?

Your Sandstorm data will be organized into subdirectories like so:

Your S3 Bucket/revisions/
    1.json
    2.json
    ...
    N.json

To help you better sort through your exported data, we include two additional .csv files to guide you:

Status File

Your S3 Bucket/status.csv contains information about how many files your account has, and how many files are left to export. The columns are:

  • Total Revisions: The total number of revision files that contain your site’s data. This file is up to date within 5 minutes of sending data to Sandstorm.

  • Exported Revisions: The number of revisions we have exported. If you subtract this from Total Revisions, you’ll know how many revision files have not yet been exported.

  • Max Exported Revision Timestamp: The UNIX timestamp of the most recent data exported.

  • Max Exported Revision UTC Timestamp: The UTC timestamp of the most recent data exported (included for convenience and readability).

Here is what a sample status.csv looks like.

Index File

Your S3 Bucket/index.csv contains information about which files contain which days’ worth of data. The columns are:

  • Filename: The revision filename

  • Minimum Timestamp: The minimum timestamp of the data found in the revision file in UNIX format.

  • Maximum Timestamp: The maximum timestamp of the data found in the revision file in UNIX format.

  • Minimum UTC Timestamp: The minimum timestamp of the data found in the revision file in UTC format (for easy readability).

  • Maximum UTC Timestamp: The maximum timestamp of the data found in the revision file in UTC format (for easy readability).

Here is what a sample index.csv looks like.

How often can I export?

We will export all of your data for the period of last 6 months. We start an export cycle every 2 hours, going through each product and exporting as much data as possible in a given time limit. For external processing purposes, it's good practice to check the status.csv file to make sure all of the data you're expecting to process has already been exported. You can also check on the progress of your data export at https://game.sandstorm.co/data_export. The data export feature will only export data up to yesterday.

Our system keeps track of what has been exported so far. If you remove data from your bucket, we will not automatically repopulate your bucket with old data.

How To Read KM Data

Inside of each JSON file, you will find lines of JSON. Each line is a single hash separated by a newline. To parse, read each line one at a time.

  • _p is the person’s identity who did the event.

  • _p2 is the person’s second identity if you passed an alias command

  • _t is the UTC UNIX timestamp of when the event happened.

  • _n is the name of the event

  • Property names are surrounded by “ “, followed by a :

If you passed us an event with no additional properties the line would export as:

{"_p":"john", "_t":1234567890, "_n":"Visited Site"}

If you passed in an event with properties it would export as:

{"_p":"john", "_t":1234567890, "_n":"Visited Site",
  "url":"http://mysite.com", "campaign name":"my campaign"}

If you passed us just properties with no event it would export as:

{"_p":"john", "_t":1234567890, "url":"http://mysite.com",
  "campaign name":"my campaign"}

If you set each property as two separate calls they will be exported as two separate lines:

{"_p":"john", "_t":1234567890, "url":"http://mysite.com"}
{"_p":"john", "_t":1234567890, "campaign name":"my campaign"}

If you passed us an alias command it will be exported as:

{"_p":"john", "_t":1234567890, "_p2":"john@sandstorm.co"}

Notes For Comparing to Our Reports

  1. The .json files we export are identical to the ones we use when generating your reports.

  2. The timestamp is not adjusted for your local timezone, listed in your site’s settings.

  3. Event data is dumped in batches, so a .json file for one day may actually contain events from the day before or the day after. If you want to count up events for a particular day, use the timestamps for each entry and look in the other .json files in case the beginning/end of the day has instances in the other .json files.

  4. The data in the .json files are sorted on the time we received the data, not on the timestamp. Because of this, it’s normal to see events listed out of order in the export.

An Example JSON file

Attached is the file 300.json, which is an export of data from one of our demo accounts.

Processing the Data

You can parse the results of our export with a gem that our very own Clay Whitley built:

GitHub: km-export-processor gem

The core features are:

  • JSON compiler: Aggregates all Sandstorm JSON files in a directory, into a single (nonstandard) file.

  • JSON to JSON converter: Converts a non-standard JSON file into standard format.

  • JSON to CSV converter: Converts a non-standard JSON file into a CSV file, where each row is an event/property/alias.

  • Reimporter: Takes a non-standard JSON file, and an API key, and sends the data in the JSON file to the product using the KMTS gem.

  • Alias parser: Takes a non-standard JSON file, and separates out any alias calls into their own JSON file, to be processed separately.

  • Email parser: Takes a non-standard JSON file, and an identity. Parses any actions done by the identity in the JSON file, into a separate file.

📘This is not an official tool that Sandstorm built, but you can refer to this Ruby gem to parse the results of our export:

KM-db KM-instructions.md

Last updated