
Show HN: RudderStack, open-source CDI (a.k.a. open-source Segment) - soumyadeb
GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;rudderlabs&#x2F;rudder-server&#x2F;" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rudderlabs&#x2F;rudder-server&#x2F;</a><p>===<p>Firstly, a big thanks to the HN community for showing us love and support in our previous HN post (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21081756" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21081756</a>). At that point, we had just open-sourced the repo and were not fully prepared for a Show HN. We wanted to share updates since then and also do our official Show HN.<p>Updates since Sept 2019<p>1. Changed the name from Rudder to RudderStack :)<p>2. API compatibility with Segment<p>3. Open-source control plane so no dependency on the hosted control plane for open-source users. (<a href="https:&#x2F;&#x2F;github.com&#x2F;rudderlabs&#x2F;rudder-server&#x2F;blob&#x2F;config-gen&#x2F;utils&#x2F;config-gen&#x2F;README.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rudderlabs&#x2F;rudder-server&#x2F;blob&#x2F;config-gen&#x2F;...</a>)<p>4. Multiple hosting options: Docker, Kubernetes, Terraform, Native.<p>5. ~30 integrations (<a href="https:&#x2F;&#x2F;rudderstack.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;rudderstack.com&#x2F;</a>) including cloud mode and device mode<p>6. Support all the popular data-warehouses &amp; lakes - RedShift, Snowflake, BigQuery, S3, Google Cloud Storage, Azure Blob Storage<p>7. Detailed documentation - <a href="https:&#x2F;&#x2F;docs.rudderstack.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.rudderstack.com&#x2F;</a><p>8. Multiple production deployments including few really large ones (our largest deployment is sending a peak of ~40K events&#x2F;sec, ~300M events&#x2F;day)<p>9. Switched license from SSPL to AGPLv3 (after long discussions internally as well as on HN)<p>10. Built some interesting Analytics &amp; ML use cases<p>11. Launched our “paid plans”  (primarily around managed hosting)<p>Wishing everyone best wishes for staying safe from COVID-19
======
adovenmuehle
I recently joined Mattermost, and am currently in the midst of getting us
switched over from using Segment to RudderStack.

The RudderStack team has been super responsive and it's great to be able to
support another open-source based business.

The biggest selling point for us is being able to maintain total control of
our data with us having barely having to do more than change an endpoint. It
also allows us to use one place to send all our analytics data across our
various platforms and it just works.

~~~
soumyadeb
Thanks team Mattermost. Super excited to be working with you guys!!

------
satyamkrishna
Here at @Grofers we have been using rudder for a while. A great solution for
people who want to create their customer data platform. Plus an awesome team
who is always happy to help out :)

~~~
soumyadeb
Thanks team @Grofers for the kind words and for helping us build this out as
our early patrons.

------
bberenberg
Always a bit frustrating to hit a pricing page and not see pricing. Please
take time to provide at least some info
[https://rudderstack.com/pricing/](https://rudderstack.com/pricing/)

~~~
soumyadeb
Fair point.

We don't post pricing because we're still figuring it out (always a tricky
subject for OS project). We charge a platform fee and a small per node charge
(beyond 1 node) but there are many dimensions there - node size, number of
nodes, etc and we haven't figured out all the combinations. The platform fee
generally starts at $2k/month but we offer discounts for start ups/open source
projects/non profits, etc.

Right now we more focused on growing our open-source adoptions & customers and
we want to talk to our prospects to better understand this.

As a benchmark 1 node can handle ~1K events/sec.

------
tiuPapa
Bit of a novice question, but what is the point of a tool like this or
Segment? I have a vague idea that they help with gathering your analytics but
I am not entirely sure how. Is it like Google Tag Manager that you can use to
control multiple analytics script(edit: this is a wrong comparison, I guess it
is like Map function where you send all your analytics data and then it splits
and send it to all relevant services where you want the data to be displayed)?
Sorry for an off-topic and noob question.

~~~
soumyadeb
Not at all, this is a great question and we get it all the time :)

Peter (segment's CEO) had a great answer to this in the thread
[https://www.quora.com/What-is-the-advantage-of-using-
Segment...](https://www.quora.com/What-is-the-advantage-of-using-Segment-over-
Google-Tag-Manager)

TO summarize and highlight the main reasons

1) With GTM you still have to write code. Specifically, you need to figure out
how to send events to a destination following their API and JS library. With
Segment and Rudder, you just call a couple of generic functions identify,
track, page etc.

2) You often want a copy of the events in your own S3 Or Redshift or
Snowflake. GTM doesn't help you there but Rudder/Segment can.

3) We have some features like Replay etc which lets you send historic events
to a new destination. Say you signed up for a new Analytics tool and want to
send all your historic events - Replay helps there

4) Finally, Segment has this product called personas which lets you create
audiences (e.g give me all people who have done X but not Y and then send that
to MailChimp for emailing). We too are working on it.

~~~
tiuPapa
Oh interesting. So, how do you figure out where to send which data? Like say,
I want to track an event on Google Analytics where I want to see if a new user
is being redirected to my site from another site based on the Referer
header(no idea if this is a valid use case for GA, its just something I have
implemented for one of my projects), how would rudder figure out where to send
that data? Does the User have to give a list of events that they want to be
sent to GA? Or do I have to specify that I want an event to be sent to GA
every time I call Rudder api?

~~~
soumyadeb
That's right. You would have to specify at a per-event level (by specifying a
flat to Rudder JSON) where to forward that event.

We have something called Transformations (user defined functions) by which you
can modify the event structure from the Rudder BE. You write the
transformation function (currently javascript) on our UI and that gets
executed in the backend on your event stream. Using the transformation, you
can also control where the event goes to. This is helpful when say you want to
change the destination without pushing an update to mobile app

~~~
tiuPapa
Got it, thanks. Is there any way of transforming old data? Lets say, I want to
add a new platform to just store data about clicks, but some of these events
stored thru Rudder doesn't have the element id, is there any way I can give
these events a default element id before sending them off to the new platform?

~~~
soumyadeb
Yes, that is the goal of the user transformation. You can add an element_id
field while in the event before it is forwarded to destinations.

Would love to understand your use case a bit more.

------
soumyadeb
Founder here - happy to answer any questions

~~~
woile
It's not really clear to me what it does, seems like a bus for data, but how
is it related to kafka for example? sorry if it's too silly, I'm a bit lost,
and I think it may be useful to me. I've checked segment's website and some of
their stuff like unified view of a customer may be useful to me. Regards

~~~
soumyadeb
That's right. What we have today is more of a bus but for customer event data.

Let's say you have a website or mobile app and you want to collect all the
user interaction events (clicks, searches, impressions) in a data-warehouse
like Snowflake.

RudderStack can do it for you. We have SDKs (Android, iOS, JS, Python etc)
which you can use to send events. We have a corresponding backend (you can run
it or we can host it) which will collect these events and dump into your
warehouse.

You often want to send (subset of these) events to other 3rd party websites
too like Amplitude for analytics, Braze for marketing automation and so on.
RudderStack can forward events to those too so you don't have to embed
multiple 3rd party SDKs and understand their library etc.

The second part of this is to create a unified view of the customer as you
mentioned and take action on that (e.g. find all churning users and send them
email). That's where Segment has a product called Personas and we are working
on something similar but it's not launched yet.

Though even before, you can build that customer view on top of your event data
in your data warehouse by using SQL or a tool like Looker. For example, you
find all churning users by writing a simple SQL query. And then send that
result by something like Looker Action Hub.

Happy to discuss more here or offline (email: soumyadeb@rudderstack.com)

------
samblr
In your pricing > faq - there is a mention of why you chose SSPL and NOT
AGPLv3!

And then you are on AGPLv3 ?

Can you please share your thoughts on why you moved to AGPLv3 ?

~~~
masonhensley
Ya, repo looks like SSPL too... [https://github.com/rudderlabs/rudder-
server/commit/db9d53a05...](https://github.com/rudderlabs/rudder-
server/commit/db9d53a05d82057c4763fd4d8fe519989e3baabd)

~~~
soumyadeb
Yes, very recently did we switch from SSPL to AGPLv3 (as noted in updates
above)

~~~
masonhensley
But the repo says SSPL... in the readme & in the LICENSE file isn't that the
source of truth? Or what matters to an organization evaluating adoption?

~~~
soumyadeb
Thanks for pointing it out. The LICENSE file was old, removed it now.

The README says AGPLv3. Or am I missing something?

------
namanyayg
Is this suitable if I want to stream events (e.g. click event data), around 1k
events/per sec, to a Redshift database, and see it happen realtime? Redshift
should receive the data within 5-20 minutes.

Will rudderstack work for this usecase? Or will it require something in
between.

~~~
soumyadeb
Yes, this is a very standard use case. A single node RudderStack (m4.xlarge)
should be able to process more than 1K/sec. And Redshift should receive the
data in 30 mins (configurable parameter).

Please give it a shot. Email me soumyadeb@rudderlabs.com if you run into any
issues.

------
sa46
How does the AGPL license interact with the enterprise plan? If a company pays
for the cloud version that rudder hosts, is that company obligated by the
provisions of the AGPL?

~~~
soumyadeb
No, the cloud-hosted version is under a TOS which has an enterprise license.

------
awwaiid
How much can be run independently vs what the company provides? Are there
parts that are private? Or is it more of a service and hosting model?

------
zenincognito
I recently looked at segment and I was quoted 40k for personas. I think what
you are doing is great and if you can do personas , I and my org would be
eternally greatful.

~~~
knes
If you are looking into Segment Persona, I would love to pick your brain about
what you are trying to accomplish with it.

At getGensus.com we have a tool that allows you to build persona on top of
your existing data set and then sync these segments, golden records, etc to
the tools of choices. It is a bit of a different approach since the data stays
with our DB/Warehouse. I'd like to hear your thoughts.

If you have the time love to connect. Email is in profile

------
mcguire
What is a "CDI"? What is "Segment"?

Yes, I could look these things up, but if you are making an announcement, it
might be nice if you were clear.

~~~
soumyadeb
Sorry, Customer Data Infrastructure and segment.com. A platform which makes it
simple to collect event data from your apps and send it to 3rd party
destinations (including dumping into your warehouse).

We wanted to give out an update from our last HN post so we didn't get into
the depth of what the product is. Kind of assumed everyone knows segment but
yes a fair point.

------
jjeaff
I will be interested to take a look as soon as the ElasticSearch integration
is launched. We did look into Segment a while back, but $$$.

~~~
soumyadeb
One of our users has setup logstash to load data from S3 to ElasticSearch.
Would that setup work for you?

------
dinrat
what is your k8s strategy?

~~~
indianCoder
Our Kubernetes helm charts are open sourced here. It should be straight
forward to install it on your current k8s cluster.

[https://github.com/rudderlabs/rudderstack-
helm](https://github.com/rudderlabs/rudderstack-helm)

