MySQL and Postgres as an External Data Source
You can connect Sandstorm to your MySQL or Postgres database. This will import all existing data and then continuously monitor your database for any new data and pass that into Sandstorm. This is a powerful and easy way to integrate Sandstorm with your app.
If you have a different SQL database (Oracle, SQLite, Mongo), we have not yet built support to directly read from these database types. Let us know that you are interested in seeing such an integration! In the meantime, please consider converting your data into .csv files and using our recurring CSV import method.
It’s recommended you read People, Events, and Properties before getting started here
Setting up your database
The Sandstorm SQL integration will only work with a read-only database. We enforce this for all users for our users peace of mind. Additionally we recommend that you create a separate MySQL user for Sandstorm to use.
The following SQL will accomplish this:
Particularly for Postgres, we check for read-only status by ensuring the user you created does not have table privileges to INSERT, UPDATE, DELETE, TRUNCATE.
Just make sure you note the username (sandstormin this example) and password (some_password in this example) because you’ll need those when you create the connection.
This SQL provides access to all tables in a database. You can also grant access to the specific tables you will be querying with Sandstorm .
Updating your firewall rules
Sandstorm will connect to your MySQL server using one of the following IP addresses:
IP addresses:
You’ll want to make sure that your firewall allows connections to the port you run MySQL on (which by default is 3306) from these IP addresses. If you have other security needs please contact support so that we can help you figure out the best solution for your situation.
If your database is on AWS
If your database is hosted on Amazon Web Services in the US East (Northern Virginia) Region (us-east-1), then we would be locating your databases using the internal Amazon network. In that case, Sandstorm may connect to your MySQL server using one of the following internal Amazon IP addresses:
IP addresses:
Creating the connection
To create the connection:
Go to Settings
Click Data Integrations
Click Add a MySQL Data Source
Fill out the connection details you used in the previous step. You’ll need to specify the remote address of the MySQL server. If your server is not accessible over the internet please contact us so we can help you determine your best option.
Creating queries
You will need to create one query for each event.
Name: you can describe what the query is importing.
Record Sandstorm Event as: name the Sandstorm event that is recorded by this query.
When creating your SQL query you’ll need to make sure that the results contain the following fields:
KM_PERSON - this field will be used as the identity of the person. Typically you’ll want to select the email or username field from your database.
KM_TIME - this field will be used as the timestamp of the event. Typically you’ll want to select a date field from your database.
KM_LAST_RAN - this field is used so that Sandstorm knows which rows have already been synced. Typically you’ll want to select the id or timestamp field from your database.
See below for more details about each.
KM_PERSON
The value you select for KM_PERSON
will be used as the identity for the person performing the event. As mentioned the best choice, when available, is the e-mail field form your table. To rename email as KM_PERSON
you can use the SELECT...AS...
syntax to name the field in the results (e.g. SELECT email AS KM_PERSON
).
It’s possible that the identity field you want to use is in a different table. In this case you might need to use a SQL JOIN
to join the two tables. For example if I have a payments table that I’m creating a query for I might do:
SQL
KM_TIME
The value you select for KM_TIME
will be used as the date/time for the event you syncing. Sandstorm will not sync without this field. The column type can be any valid date field, including just a plain INT field with a UNIX epoch-based timestamp
. You can use the SELECT...AS...
syntax to name the field in the results (e.g. SELECT created_at AS KM_TIME
).
Please note that Sandstorm will set the timezone to UTC/GMT
(SET time_zone = "+0:00") for the connection before making any queries.
KM_LAST_RAN
When Sandstorm syncs it will sync at most 50,000 rows every hour per query. When it runs the next hour it will need to make sure that it selects the next 50,000 rows - making sure not to duplicate or miss any rows.
This is where the KM_LAST_RAN
field comes in. There are a few key things to keep in mind:
You need to select a column that only increments when new data is inserted. An ID column is a perfect candidate, because it only goes up (after Sandstorm has processed the row with ID 500 it knows that no new row with a lower ID than 500 is going to be inserted). A timestamp or date/time column can work also as long as new rows are going to have later timestamps than the existing rows in the database. If users can insert rows with timestamps in the past than a timestamp is not a good choice.
You need to tell Sandstorm the last value that was used. You can do this by returning a field with the name
KM_LAST_RAN
. If you were using the id column for this purpose you can doSELECT id AS KM_LAST_RAN
to accomplish this. Sandstorm will store the last value used here.You need to add a condition to your SQL query that will ensure that when Sandstorm runs your query it only selects rows that have not been synced. The value that you last returned to Sandstorm will be available in the variable
$KM_LAST_RAN_FOR_THIS_QUERY
. If you are using the ID column you can useWHERE id > $KM_LAST_RAN_FOR_THIS_QUERY
to accomplish this. You must be very careful when using timestamps for the this purpose. To ensure that Sandstorm does not miss any data it is recommended that you use a>=
condition (e.g.WHERE created_at >= $KM_LAST_RAN_FOR_THIS_QUERY
). This does mean that Sandstorm will get some duplicate rows, but Sandstorm will ignore the duplicates. Not doing this means that Sandstorm might miss some data depending on the granularity of your timestamps and the frequency that you get new data.Your query will automatically be ordered by
KM_LAST_RAN
. You should not addORDER BY
orLIMIT
clauses to your query.
🚧Note:You should use the same field for
KM_LAST_RAN
and$KM_LAST_RAN_FOR_THIS_QUERY
.
A full query might look like:
Additional Properties
Any additional fields you return in your SQL will be recorded as Sandstorm properties for the person/event. So let’s say you have gender and country columns in your table that you want to pass along when a user signs up.
You could do:
Whatever the name of the field is is what the property will be recorded as. If you want to use a different name you can use SELECT ... AS ....
So if you had a field cpn that you wanted to record as Coupon Code you could use: SELECT cpn AS "Coupon Code"
.
Examples
Tracking when a user signs up
It’s quite easy to track all your signups:
Imagine you also have a payment_plans table and you want to include the plan name when a user signs up, you can do:
SQL
Tracking when a user pays you
SQL
Tracking when a user cancels their account
SQL
Last updated