
Redshift Pitfalls and How to Avoid Them - drob
https://blog.heapanalytics.com/redshift-pitfalls-avoid/
======
jhk727
Hi, author here. This post covers some of the surprising things we ran into
getting started with Redshift. Happy to answer any questions.

~~~
scapecast
Very interesting.

I have to ask - with 80 clusters, do you set up a cluster per customer? Or do
you put multiple customers on the same cluster?

Can you talk a bit more about how you've set up your workload queues in the
Redshift WLM? I can spot at least three different workloads in your post:

1) loads into Redshift 2) transformations once data is in Redshift 3) customer
ad-hoc queries

And then what node type do you use for your 80 clusters? I'm asking because
the smaller dense-compute clusters offer more I/O, which helps a lot when
you're loading data into the cluster.

~~~
jhk727
We offer our customers the option of managing their own cluster or having us
manage it for them. About 80% of our customers choose the first option. In
this case, cluster sizing and queue configuration is up to them.

When we manage the cluster, we set up a new cluster for each customer. The
node type we choose depends on the customer - for larger customers we'll
typically use the DS type nodes (with hard disk storage). The slower I/O isn't
a huge deal for us, since getting the data from our database to S3 is usually
the bottleneck in the process, rather than copying from S3 into Redshift.

