Managing your Amazon Redshift performance: How Plaid uses Periscope Data

scapecast · on Sept 10, 2018

Lars here, the guy who gets the honorable mention at the end of the post "for brainstorming Redshift performance" with Austin (the author of the post) :-)

If you care to dig a little deeper into the things we discussed, we've written them up in a longer blog post:

https://www.intermix.io/blog/top-14-performance-tuning-techn...

dapearce · on Sept 10, 2018

Great post. Check out dbt (https://www.getdbt.com/) for materializing your views, lots of great features and a great community.

nickolas_t · on Sept 10, 2018

I just wish Plaid would change the color for TD Canada Trust in Canada to the color that resembles the logo.

TD Logo: https://i.gyazo.com/aa0d5f97954bd497b2f8b3a515752b34.png

Plaids iframe for TD: https://i.gyazo.com/41fd14755c25d83642359a054f9525d6.png

Users have complained about the mismatch, iv tried contacting plaid but it hasn't gone anywhere.

evtan · on Sept 11, 2018

Query structures have a huge impact on performance, this problem can be managed by scheduling SQL based ETL for your data warehouse. We abstracted this into a simple feature on Holistics, you can take a look at how the guys at Rezdy use it https://medium.com/rezdy-engineering/an-introduction-to-data... Data Transforms SQL scheduler: https://www.holistics.io/features/data-transforms/

georgewfraser · on Sept 10, 2018

I would be interested to know what their monthly Redshift bill is. The work they’ve done is really impressive, I’m just wondering if the cost savings justify all the time they’ve invested. Sometimes the right answer in these situations is just to throw more CPUs at the problem.

teej · on Sept 10, 2018

The problems they solved here are vanilla optimization for Redshift. Adding sort/dist keys on tables and pre-aggregating immutable data is stuff you’re going to have to do at some point, throwing more CPU at it can only help so much.

groestl · on Sept 10, 2018

I'd like to know the actual footprint of their data. They mention some of their tables have "infinite rows", yet from their screenshots, the largest query is on"link_web_production.exit_link", scanning 3.9 mio rows.

maslam · on Sept 10, 2018

Or Snowflake

dapearce · on Sept 10, 2018

Yep, we had similar Redshift issues and ended up switching to Snowflake.