MySQL and Postgres as an External Data Source

You can connect Sandstorm to your MySQL or Postgres database. This will import all existing data and then continuously monitor your database for any new data and pass that into Sandstorm. This is a powerful and easy way to integrate Sandstorm with your app.

If you have a different SQL database (Oracle, SQLite, Mongo), we have not yet built support to directly read from these database types. Let us know that you are interested in seeing such an integration! In the meantime, please consider converting your data into .csv files and using our method.

It’s recommended you read People, Events, and Properties before getting started

Setting up your database

The Sandstorm SQL integration will only work with a read-only database. We enforce this for all users for our users peace of mind. Additionally we recommend that you create a separate MySQL user for Sandstorm to use.

The following SQL will accomplish this:

CREATE USER 'sandstorm'@'%' IDENTIFIED BY 'some_password';
GRANT select ON your_database.* TO 'sandstorm'@'%';
FLUSH PRIVILEGES;

Particularly for Postgres, we check for read-only status by ensuring the user you created does not have table privileges to INSERT, UPDATE, DELETE, TRUNCATE.

Just make sure you note the username (sandstormin this example) and password (some_password in this example) because you’ll need those when you create the connection.

This SQL provides access to all tables in a database. You can also grant access to the specific tables you will be querying with Sandstorm .

Updating your firewall rules

Sandstorm will connect to your MySQL server using one of the following IP addresses:

IP addresses:

75.101.132.88
75.101.161.87
75.101.165.228
184.73.221.24
174.129.16.231
54.196.40.13
54.174.103.148

You’ll want to make sure that your firewall allows connections to the port you run MySQL on (which by default is 3306) from these IP addresses. If you have other security needs please contact support so that we can help you figure out the best solution for your situation.

If your database is on AWS

If your database is hosted on Amazon Web Services in the US East (Northern Virginia) Region (us-east-1), then we would be locating your databases using the internal Amazon network. In that case, Sandstorm may connect to your MySQL server using one of the following internal Amazon IP addresses:

IP addresses:

10.187.169.236
10.180.220.215
10.154.138.252

Creating the connection

To create the connection:

Click Add a MySQL Data Source
Fill out the connection details you used in the previous step. You’ll need to specify the remote address of the MySQL server. If your server is not accessible over the internet please contact us so we can help you determine your best option.

Creating queries

You will need to create one query for each event.

Name: you can describe what the query is importing.
Record Sandstorm Event as: name the Sandstorm event that is recorded by this query.

When creating your SQL query you’ll need to make sure that the results contain the following fields:

KM_PERSON - this field will be used as the identity of the person. Typically you’ll want to select the email or username field from your database.
KM_TIME - this field will be used as the timestamp of the event. Typically you’ll want to select a date field from your database.
KM_LAST_RAN - this field is used so that Sandstorm knows which rows have already been synced. Typically you’ll want to select the id or timestamp field from your database.

See below for more details about each.

KM_PERSON

The value you select for KM_PERSON will be used as the identity for the person performing the event. As mentioned the best choice, when available, is the e-mail field form your table. To rename email as KM_PERSON you can use the SELECT...AS... syntax to name the field in the results (e.g. SELECT email AS KM_PERSON).

It’s possible that the identity field you want to use is in a different table. In this case you might need to use a SQL JOIN to join the two tables. For example if I have a payments table that I’m creating a query for I might do:

SQL

SELECT
  users.email AS KM_TIME...
FROM
  users, payments...
WHERE
  users.id = payments.user_id;

KM_TIME

The value you select for KM_TIME will be used as the date/time for the event you syncing. Sandstorm will not sync without this field. The column type can be any valid date field, including just a plain INT field with a UNIX epoch-based timestamp. You can use the SELECT...AS... syntax to name the field in the results (e.g. SELECT created_at AS KM_TIME).

Please note that Sandstorm will set the timezone to UTC/GMT (SET time_zone = "+0:00") for the connection before making any queries.

KM_LAST_RAN

When Sandstorm syncs it will sync at most 50,000 rows every hour per query. When it runs the next hour it will need to make sure that it selects the next 50,000 rows - making sure not to duplicate or miss any rows.

This is where the KM_LAST_RAN field comes in. There are a few key things to keep in mind:

You need to select a column that only increments when new data is inserted. An ID column is a perfect candidate, because it only goes up (after Sandstorm has processed the row with ID 500 it knows that no new row with a lower ID than 500 is going to be inserted). A timestamp or date/time column can work also as long as new rows are going to have later timestamps than the existing rows in the database. If users can insert rows with timestamps in the past than a timestamp is not a good choice.
You need to tell Sandstorm the last value that was used. You can do this by returning a field with the name KM_LAST_RAN. If you were using the id column for this purpose you can do SELECT id AS KM_LAST_RAN to accomplish this. Sandstorm will store the last value used here.
You need to add a condition to your SQL query that will ensure that when Sandstorm runs your query it only selects rows that have not been synced. The value that you last returned to Sandstorm will be available in the variable $KM_LAST_RAN_FOR_THIS_QUERY. If you are using the ID column you can use WHERE id > $KM_LAST_RAN_FOR_THIS_QUERY to accomplish this. You must be very careful when using timestamps for the this purpose. To ensure that Sandstorm does not miss any data it is recommended that you use a >= condition (e.g. WHERE created_at >= $KM_LAST_RAN_FOR_THIS_QUERY). This does mean that Sandstorm will get some duplicate rows, but Sandstorm will ignore the duplicates. Not doing this means that Sandstorm might miss some data depending on the granularity of your timestamps and the frequency that you get new data.
Your query will automatically be ordered by KM_LAST_RAN. You should not add ORDER BY or LIMIT clauses to your query.

🚧Note:
You should use the same field for KM_LAST_RAN and $KM_LAST_RAN_FOR_THIS_QUERY.

A full query might look like:

SELECT
  id AS KM_LAST_RAN
  ...
FROM
  users
WHERE
  id > $KM_LAST_RAN_FOR_THIS_QUERY

Additional Properties

Any additional fields you return in your SQL will be recorded as Sandstorm properties for the person/event. So let’s say you have gender and country columns in your table that you want to pass along when a user signs up.

You could do:

SELECT
  id AS KM_LAST_RAN,
  email AS KM_PERSON,
  created_at AS KM_TIME,
  gender,
  country
FROM
  users
WHERE
  id > $KM_LAST_RAN_FOR_THIS_QUERY

Whatever the name of the field is is what the property will be recorded as. If you want to use a different name you can use SELECT ... AS .... So if you had a field cpn that you wanted to record as Coupon Code you could use: SELECT cpn AS "Coupon Code".

Examples

Tracking when a user signs up

It’s quite easy to track all your signups:

SELECT
  users.login AS KM_PERSON,
  users.created_at AS KM_TIME,
  users.id AS KM_LAST_RAN
FROM
  users
WHERE
  users.id > $KM_LAST_RAN_FOR_THIS_QUERY

Imagine you also have a payment_plans table and you want to include the plan name when a user signs up, you can do:

SQL

SELECT
  users.login AS KM_PERSON,
  users.created_at AS KM_TIME,
  payment_plans.name AS 'Plan Name',
  users.id AS KM_LAST_RAN
FROM
  users, payment_plans
WHERE
  payment_plans.id = users.payment_plan_id AND
  users.id > $KM_LAST_RAN_FOR_THIS_QUERY

Tracking when a user pays you

SQL

SELECT
  users.login AS KM_PERSON,
  payment_transactions.created_at AS KM_TIME,
  payment_transactions.id AS KM_LAST_RAN,
  payment_transactions.amount
FROM
  payment_plans, users
WHERE
  users.id = payment_plans.user_id AND
  payment_plans.id > $KM_LAST_RAN_FOR_THIS_QUERY

Tracking when a user cancels their account

SQL

SELECT
  users.login AS KM_PERSON,
  users.canceled_at AS KM_TIME,
  users.canceled_at AS KM_LAST_RAN
FROM
  users
WHERE
  users.canceled_at IS NOT NULL AND
  users.canceled_at > $KM_LAST_RAN_FOR_THIS_QUERY

PreviousMySQL Argument Out of Range NextDetecting Duplicates

Last updated 1 year ago

SELECT users.login AS KM_PERSON, users.created_at AS KM_TIME, payment_plans.name AS 'Plan Name', users.id AS KM_LAST_RAN FROM users, payment_plans WHERE payment_plans.id = users.payment_plan_id AND users.id > $KM_LAST_RAN_FOR_THIS_QUERY

SELECT users.login AS KM_PERSON, payment_transactions.created_at AS KM_TIME, payment_transactions.id AS KM_LAST_RAN, payment_transactions.amount FROM payment_plans, users WHERE users.id = payment_plans.user_id AND payment_plans.id > $KM_LAST_RAN_FOR_THIS_QUERY

SELECT users.login AS KM_PERSON, users.canceled_at AS KM_TIME, users.canceled_at AS KM_LAST_RAN FROM users WHERE users.canceled_at IS NOT NULL AND users.canceled_at > $KM_LAST_RAN_FOR_THIS_QUERY