I think the development of Python and Jupyter and other less known things like Vega are much more interesting. Python is today the only "glue code" that puts all of it together, from data to insights.
Other than the expensive part, is it really such a bad thing? I feel like relational databases are a pretty good fit for a wide set of use cases and have a huge amount of tooling.
In short, is there any solution that "does everything you could possibly want" while ensuring you _never_ need to hire a data engineer? This is a holy grail that I don't think exists.
You have to normalize data taken from various sources of various age and complexity. So you really have to understand the data. You also have to really understand the questions.
I've worked with (and on) lots of these tools and projects; the complexity is never in the frontend, it's dominated by getting the data, getting the data right and into the right format.
If all you want in the end is a good looking dashboard on a website then you might as well build it yourself; because of the cost structure that can even cost less than buying one of the BI frontend tools (there's not a lot of difference in development time, the the BI frontenders are more expensive because they are rarer and the licencing is high).
The people and their spreadsheets was the easy part to control.
Welcome to my reality.
Would I love a data architect and a domain expert in my team? Yeah.
Will I run around booking meetings with everyone that even hints at working with data like a headless hen? Yeah.
Is this the normal procedure for Data Scientists in big and old companies? More so than I would like.
Oh! And I forgot that the security department will constantly deny your access to data you need (until you force their hand).
(Disclaimer: I work in data engineering at Amazon and use those tools in my day to day)
Companies prefer well known products like Alteryx or Tableau because, despite the cost, it makes people easier to replace.
But i cant blame you for writing your own things. Im currently replacing a large SSIS-based etl proces with Python, because i'm sick of SSIS randomly breaking.
PostgreSQL on the other hand - so good, so free!
They make it simple to get started and even without knowing what you are doing you can easily churn out something that works if it is simple, doesn't change often, doesn't need to scale and deals with small amounts of data.
But similar to
"It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy."
RDBMS are the root cause.
There are no major systems out there of even moderate complexity that aren't built on an rdbms.
I don't think this claim is accurate.
Counter-intuitively Datomic is in violent agreement with /u/rqmedes where he said "A better alternative is having the data, data model and business logic tightly bound in one place. Not separated in multiple "tiers"" – Datomic inverts/unbundles the standard database architecture such that the cached database index values are distributed out and co-located with your application code such that database queries are amortized to local cost. Immutability in the database is how this is possible without sacrificing strong consistency, basically if git were a database you end up at Datomic.
When one of these things change it changes the rest.
In theory, it could be used to provide that industrial strength abstraction layer between your Tableau/Looker/etc. and your bajillion weird and not-so-weird (RDBMS) data sources.
That would seem to make sense to me from the point of view of -- I would want my data visualization/analytics-type company to be able to concentrate on data visualization/analytics, not building some insane and never-ending data abstraction layer.
The part that surprised me was that Denodo could allegedly do a lot of smart data caching, thus speeding things up (esp hadoop-oriented data sources) and keeping costs down.
I'm guessing the other data virtualization providers can do similar.
The only barriers to Salesforce + Tableau adoption I noticed were cross-object JOINs and live vs cached data extracts.
Both issues were remedied by denormalizing the data prior to export. For example, a nightly flattened "view" of Opportunities with key related objects moved into columns.
Mulesoft is well suited to perfect the ETL challenges. Bringing them to the table could be a win for everyone.
In that case you may be interested in Dash (dash.plot.ly). It’s a free and open source library that you can use to create dashboards online with Python only.
We write our back ends with FastAPI, which is usually just a wrapper around our ML models. Then serve both Dash and FastAPI with gunicorn. The backend is provided the uvicorn worker class with the gunicorn -k arg to greatly increase the speed as well.
For personal projects you can use this stack in GCP's AppEngine standard environment to basically host your (relatively low traffic) apps for free.
The real issue has always been the organizational problems of larger teams and companies as data gets split into multiple silos and needs ETL and cleanup before it's useful. The new abilities we have gained have increased the complexity and scale which can lead to new challenges, but the tools are definitely getting better every day.
Don't forget Teradata.
I found the same thing with MicroStrategy. I spent a lot of time reverse engineering what I could from MicroStrategy jars to expose additional functions in their plugin interface (which is so incomplete it shouldn't be advertised). But the reality is its 20+ year old system with front end updates, can only put so many band aids on it.
I think the only thing keeping MicroStategy alive is its cube functionality and the businesses who have invested to much into it.
We seem to prefer the lite version, which is simplier.
If you look at the examples, you can click a button a go to a dynamic editor which we rather like.
If JS and web browsers aren't your thing, they have python version called "altair"
It's still really early, but feel free to have a play and create an app. Here is an example app examining using the Prophet forecasting library: https://nstack.com/apps/rdA647Q/
I'd love any feedback, and if you'd like to chat to learn more, reach out to me on email@example.com.
And next week you will have to do it all again, because it’s all manual.
None, because it's too much programming for IT to let business people have access to it, and it's not disguised as an office productivity app the way Excel is.
If they had access to it and had basic training on it that anyone already competent in any vaguely quantitative domain could handle, plenty of them could and would.
At least judging by my experience with SQL shells and similar told that are both less powerful and less friendly than Jupyter + Python, and yet plenty of business people used them productively in enterprise environments (often right up until IT ripped it from their hands.)
But that’s for data science which is (hopefully) the foundation of actionable BI.
Did I mention its a nasty fugly POS?
With all that’s happening we’re definitely looking to pick up the pace, and would love to work with more contributors on the free open source alternative at Meltano (www.meltano.com)
Edit: just wrote a quick post with some open questions I'd like to explore around this deal https://meltano.com/blog/2019/06/10/salesforce-is-acquiring-...
Contrast with https://www.tableau.com/, which has a sample graph and "See it in action" right at beginning.
Would like to see a few simple, visual stories about how one could derive business value from meltano, ideally real-life use-cases but if you dont have those yet just make them up.
There are companies that will pay for it just to have the alternative available to places like Tableau.
In the meantime, one way to see more is to checkout our getting started guide: https://meltano.com/docs/quickstart.html and also our YouTube channel which has weekly "Demo Day" videos sharing our progress: https://www.youtube.com/meltano
Really appreciate you taking a look at what we're up to!
This looks great. Will check it out for sure. Keep up the great work.
I wonder if there are any other open source tools in this space?
Our vision is to glue the steps from ETL to dashboard together in an end-to-end solution. We pick whatever we consider best in class and integrate it. So far, we've got Singer, DBT, Jupyter Notebooks and Apache Airflow and we're using VueJS for both the product UI and our website.
We're also working on a blog post exploration what other acquisition might happen in this space. We're adding suggestions to the spreadsheet as we hear them on Twitter, HN, etc https://meltano.com/blog/2019/06/10/first-looker-and-tableau...
Airflow is too heavy weight for me, I use Scons to do the workflow management.
meltano must do a lot of work to integrate all this together. I wonder what is the general user experience is. To me, seems too heavy weight.
It is still beta but should be ready to go in weeks.
Here's an example of the top 100 stories on Hacker News:
I'm not so sure, I'd probably be too worried about spiders to focus much.
Would it be possible to chat with you about your past experience at Mattermark? I am trying to answer a few questions (not related to the company, but related to VCs and investing). If so, my HN username @ gmail. Thanks!
It’s still very TBD while we get the product to a place where those opportunities start be more emergent. We definitely expect it to be an open core model though, similar to GitLab.
By remaining self-hosted we avoid the big expense (and risk) of storing users data, and they can pick whatever cloud they want. Our team is 5 core members working at GitLab, and we have about a dozen contributors. So it’s kind of a startup within a late stage “startup” (unicorn).
To me, Salesforce looks like a big shared Excel file with a bunch of sheets. Tableau... well I can do the same thing with some scripts or spin up a web server.
To others, this tech is just magical. Pay the money, do the integration and it just works... And clearly people will pay a lot of money for things that "just work".
That's a massive waste of your time and effort. The maintenance costs become massive as well should you choose to create your own system. At the VERY least you should leverage existing free open-source tools such as Metabase or Superset or Dash, or free tools such as Google Data Studio or Mode Analytics, if you're not going to spend cash to get a tool like Periscope Data/Looker/Tableau. I mean this gently, but you likely underestimate the complexity of a reliable reporting/analytics infrastructure. Think about it this way - these tools are either collaborated on by a large open-source talent pool, or are created by teams of dedicated software engineers just as talented as you.
I've worked with quite a few companies in an analytics consulting type role, and your "I can do the same thing with scripts" statement is one I've heard countless times. The long-term maintenance costs and technical debt (and "rigidity cost") of rolling-your-own analytics far outweighs the cost of a true analytics platform.
If you decide to roll-your-own anyway, look at tools like DBT and Airflow to reduce long-term maintenance costs.
Yeah I work at a company in the analytics space and see that all the time. It peaks the curiosity of people who are software developers (yet their core competency at their job is something else). They think its a fun work side-project and go after it. Write some python scripts to do ETL and process the data...make a backend with pg, a web server, then do charts in d3.js.
A year later they have a bunch of nice demos to their bosses but nothing that they can actually use in production because it crashes, there's no UI for interactive queries, no reports for people in the business groups, no user management etc. Then they drop it because they're busy with their actual job. So the cost of that engineers time to do something that didn't work was about $20-30k over a year. While the product they could actually use in production was around the same price.
If you want analytics directly on the stream, then there are plugins available to support reading the query results of something like Kinesis Analytics or Confluent's KSQL.
Grafana/Kibana are the actual charting tools
The auto-chart generation is nice. But what about Tableau makes it more likely to be accurate? Aren't you just as likely to make an error on the SQL in Tableau than if you didn't use Tableau?
The only exception to "using any of these is better than creating your own" is large companies like Google and Facebook, where they have entire teams of engineers who are dedicated to creating an in-house SQL+Visualization tools. It is absolute hubris for one engineer to think they can make a robust analytics platform!
Since people aren't typing code, it can be more accurate to use, and it provides visual results beyond just a table that can be useful in detecting anomalies in your data.
If you are good, you aren't "scripting", you are making a rad MVC system.
Salesforce puts all the MVC together in nasty, nasty ways.
If this is something you guys are interested in, I started a company called Retool (https://tryretool.com) that is essentially Excel for developers. Imagine if every Excel cell — instead of being a cell — were instead a React component. So you drag and drop these components around, and you can connect them to any back-end datasource (postgres, APIs, etc.). So you could drag on a table and have it pull data from `select * from users` from postgres, and then drag on a button and have it `POST` the selected row back to your API, in order to ban a particular user. The goal is to let end users build CRUD apps (like Salesforce) around their existing datasources quickly.
If you guys have any feedback... I'd really appreciate it. We're just starting out, and really curious to get any feedback from developers. Thanks!
That made it not viable for me. Building from scratch was way cheaper with Upwork. Anyway, Product was cool.
Do you mind if I email you? (I can't see your email in your profile, but my email is firstname.lastname@example.org if you're interested in reaching out.) I'm really curious — just to learn — what kind of pricing plan might work for you. Would a per-app model work for you, for example? Thanks!
It was a light hearted jab at the description that made it sound like something completely original and never tried before :)
I hope he does well, those tools are super useful.
If I had a dollar for every time someone showed me a wrongheaded graph they were using to make business decisions, I could retire.
Is that fully accurate, though?
Where does that leave data scientists / data analysts? I know SQL very well, and I know python's data stack (numpy, pandas, matplotlib, plotly, seaborn, various stats toolkits). I have a strong understanding of the "programming ecosystem" i.e. concepts, terms, definitions, and so on. I understand (basic) computer architecture, I've used and am familiar with (basic) shell/terminal, and services like Docker/Heroku on the command line, and can certainly use GUI cloud tools for AWS, GCP, etc. I can read and understand code and how systems fit together. I've worked alongside engineers of all types.
But I'm not a software engineer. I don't tell people I "program" because my strongest skill is SQL and generally people do not refer to that as "programming".
1. It's hard to get considered to be a SWE in the first place when your job titles are more in the realm of data analyst. They'll toss out my resume for a fresh grad, much less someone with experience, without a second thought.
2. If I were to make the switch I'd likely have to start at level 0 on the scale ladder. I've already career changed once into tech, and at this point I do not wish to "reset" my experience another time
Never start a land war in Russia, and never neglect your data infrastructure if data is in any way a key business differentiator/fundamental in your market.
Massive customization, and then you are bound by this broke ass Object model in which to get it all done, between Apex and VisualForce nauseating crap.
I don't want to pay hundreds of thousands of dollars for the right to do write database driven web pages.
I thought that the point of Tableau was to provide a tool that end-users (who can't program) could use to interactively explore their data. That's not something you can replicate with a bunch of scripts.
The issue in practice is almost always getting the data in a workable state so that you can manipulate it easily in Tableau. In my experience in smaller and mid-sized places, Tableau tasks get punted to analytics and data science, because they are needed anyway to get and transform the data in the first place. And these people usually prefer and are capable of using more technical tools than Tableau. I know I would rather use Shiny or Dash.
Maybe that's not a difficult problem in larger corps.
And you should know, that the ratio between those two is something like 1:1,000,000
So, for the most of the world, tech is essentially "magic" to them.
Except it's not just a few weeks of dev time that makes up the overall cost. Consider infrastructure, maintenance, updates, support, training, etc,. Those things start to add up and you don't get the benefit of scale/community if you do it on your own. There's also the opportunity cost of building your own system when you could buy something existing and use that time to work on other things.
Otherwise, it will be hard to justify this high markup for a tool company.
It will be awesome if Salesforce can adjust their model and make Tableau spit out D3. Their desktop tool is nice for designing, but their server components seem frequently unnecessary for running the visualization. The catch is that creating serverless dynamic visualizations isn’t all that money-making and the cool UI/UX design tool is outside of OSS’ wheelhouse.
SalesForce has been pushing Einstein Analytics recently. I haven't used it, but I do see that moving an organization from Tableau to Einstein has a lot of costs involved so this would be a hard sell in many places. Having them both under one roof means they're able to bring a bunch of people across to their cloud and now that license revenue year over year is theirs with the additional data lock-in.
As someone that really dislikes vendor-forced lock-in and generally dislikes the way SalesForce controls your data, rate limits, maxes, licensing, etc, this move is about even more control up your stack that will seem like a "no brainer" to decisionmakers, which is dangerous. That said, I'm sure it will work well for some organizations.
EDIT: I would also love to see them spit out D3 or other open visual but then they'd be losing control of the secret sauce and the requirement for a license. Not sure there is an incentive to go that route.
Precisely what I thought. It looks like their annual revenue is ~ $1 billion, placing this price around 15x annual revenue.
However, Looker has about $131 million in revenue, so their purchase price was an even higher 20x annual revenue.
My conclusion is that these acquisitions are much less about sales revenue and much more about filling strategic holes in product offerings, and I can only assume it's a sellers market in that area.
At $1B/year in revenue there aren't a lot of companies that can realistically acquire you. At $131MM/year, there are.
But still I agree. Both are quite high.
CRM is a pretty darn good ticker for Salesforce. Why would you change it? It perfectly explains your core business
You make it sound like the ticker symbol justifies a $15.7B price tag...
I think we are many years away from $15B vanity ticket symbols.
Perhaps in 5 years we'll look back at this move as a prime example of the bubble we're currently in.
However, 'Tableau Research'  has existed for years and its researchers regularly publish at major academic visualisation conferences like IEEE VIS (InfoVis/VAST) and EuroVis (see ).
Can you clarify?
It would be nice if tableau would just generate static content that could be hosted anywhere.
There’s not a client tool for d3 as nice as tableau. I work with lots of scientists who learned tableau but aren’t really programmers and can’t figure out d3 or other libraries.
This deal makes a lot of sense for Salesforce. They should be (and are) on an acquisition spree.
But if I had stock options (or any kind of locked-up equity) in Salesforce, I'd be worried right now. Someone is going to be left holding the bag.
That said, Salesforce (based on my usage ~4 years ago) has truly awful baked-in BI and analytics, necessitating third party products and data engineering to fill that gap. Tableau will fix that, but I'm staggered at the price.
That said, Google has deep inroads with their apps suite, and it’s really a race to have their tech run business processes. That and their ecosystem like Insightly are making big inroads to CRM’s SBM market.
Their analytics is shit, but I think the important thing here is that they know their audience very well, and business people love Tableau.
They bought $15B worth of sales leads for cross selling their portfolio of services.
First, how do you know what all institutional investors are thinking?
Second, Sales and FCF straddle Earnings on the income statement. A focus on EV/Sales suggests that investors are optimistic about growth and ignoring the spending required to get there. A focus on EV/FCF suggests that investors are optimistic about increased efficiency and cutting costs.
I've worked with a lot of companies who spend months (if not years) integrating their data into a few disparate systems... The finance team has one system (and underlying data lake), the commerce team another, the marketing team another... If Salesforce thinks they can run the entire underlying data infrastructure in addition to the actual customer-facing functions, then this is a smart play.
Data is an asset and liability - when somebody else has all of yours in their proprietary platform and under the control of their cloud, that is a scary proposition to me.
Fair enough if you are running your infrastructure on open standard tech and common cloud platforms, but not locked away in Salesforce.
This is reminding me of the behavior of other large orgs, like Oracle.
This is to prevent cost overruns and solution capture, where every solution to your company’s problems becomes “give it to X vendor” and then X vendor kills a product line and you’re toast.
Salesforce needs to be careful or else they’ll hit that threshold where companies don’t want to use them because you as a client are too small. Google is facing this problem right now.
So while they might lose customers like you, there is clearly ridiculously large piles of money up for grabs if they diversify their products, rather than remain specialized. And, of course, any sufficiently good specialist is at risk of being acquired by one of these behemoth generalists.
There are also cases where a company will pick multiple vendors in an attempt to de-risk and/or for negotiating tactics. If you are fully dependent on a single vendor, the cost of migrating tends to skyrocket and the negotiating power moves towards the vendor.
This is particularly important with SaaS, as you lack the leverage to walk away. If you're fighting with a vendor, you had best resolve it by your contract date, as they will happily shut you off and wait for you grovel (and pay).
Actually does Amazon's marketplace make any profit?
Not sure if that's a relative bargain compared to this deal or if it makes the $15.7B look totally unrealistic. As of a few years ago the total revenue of the two firms wasn't that different.
Birst end up getting acquired too. https://www.infor.com/news/infor-to-acquire-birst-infor
This is true, because Thoma is a PE firm and is literally in the business of acquiring and selling companies.
But now, Salesforce now has a bigger war chest to play with.
Looks like BI is the new hotness.
Personally I'm really interested in who if anyone will buy Snowflake.
With Google snapping up Looker ($2.6B) for Google Cloud, Salesforce's much bigger purchase of Tableau is a clear sign that the big guys see buying BI tools is a good way to expand the reach of their offerings into more of the business. We talk to companies every day that have made massive investments in data warehouses, viz tools, and high-paid data scientists, but they still aren't agile enough because they can't tie it all together so their people can find and use the right data when they need it. I read an article once about the failure of self serve BI and the reality is that you just end up creating more sprawl. People need tools to reduce the clutter and sprawl and stop the endless chain of emails trying to figure out what table to look at or query to use or if your source is still updating.
We built our data catalog and analysis hub for exactly this, and it's extremely validating to see the big guys like Salesforce and Google investing in expanding the user base of big data tools and I really hope we can be part of the solution of sorting it all out!
Lets break it down:
Both are over priced
Lookeer, however has LookerML
Tableau obfuscates all code.
Looker has easier bolt-ons to redshift/postgres
Tableau's BI tool-set is weaker than looker, albeit, wider spread (more mature)
So, I think google got a steal and SF is playing catch-up... at a high cost.
Plus SF's sunk costs in eveything is going to make a 15B buy take at least two decades to pay off....
(I don’t know why you are being downvoted though, you’re entitled to an opinion same as me)
Or Tableau can continue to pretend that this isn't a real issue and stonewall customers and partners alike.
I think the actual visualisation part is neat, and better than many competitors, but many of the server-side parts are various levels of disastrous (as is their support), and their "data preparation" tool needs some serious improvements to be borderline useable.
15+ billion seems like a lot to me given how Tableau interacts with customers and partners alike, especially seeing how they are activelly alienating existing enterprise customers, all in favour of new sales, but perhaps something will change for the better here.
We're a Microsoft-heavy shop, and I've been trying to get them to move to Power Bi simply because it's far more fully featured, easier to use if you're familiar with the "Windows way" of working, and has streamlined administration/installation/licensing/configuration in Windows environments.
That said, it baffles me why I have to restart Tableau Server 3 or 4 times during installation, and why I have to restart it for trivial changes more generally. For a piece of software that specifically ships with a cluster controller and full-blown zookeeper, somehow their engineers (or "engineers", as I sometimes get the impression) manage to make things that should be trivially solvable with reloads, partial restarts or spawning new workers (e.g. SSL certificates for the built-in Apache webserver) require a complete restart of the whole node.
edit: Regarding Power BI -- I feel that Tableau Server is (for better or worse) one of the killer features for many enterprise customers, because it means all of your data can remain within your own infrastructure and does not have to rely on external cloud providers. If that is not a requirement in your organisation, Power BI might make sense depending on your overall IT landscape, as well as your users' specific needs. On the other hand, if your organisation requires hosting things yourself, I guess it doesn't matter how miserable the experience is for you as an administrator. That's basically Tableau Server in a nutshell.