Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Transform Data Without Programming (easydatatransform.com)
114 points by hermitcrab on Oct 6, 2019 | hide | past | favorite | 61 comments

Easy Data Transform is a tool to help you quickly and easily clean, merge, dedupe and analyze table and list data, without any programming.

It is aimed at professionals who have data to transform, but aren't programmers or data science professionals.

Use cases include:

* making a list of all the people in mailing list A that are not in mailing list B

* filtering a log file

* joining two spreadsheets

* renaming, reordering and adding/deleting columns in a table

* reformatting dates

* de-duplicating a postal mailing list

It is desktop software for Windows and Mac, so there is no latency and you don't have to upload sensitive data to a third party server.

At some point we plan to start charging. But the current beta is free until the end of November. And there may be another free beta after that.

We would love to get some feedback. Particularly from people using it to solve real world problems.

well, i know i can just download the app to see what it looks like but i can't right now so would be nice to see some screenshots of the app on the website.

I have improved the screen shot on the home page.

The website is very basic right now. But there is a video at: https://www.screencast.com/t/HEO16Ix7

I love this product and wish I had created it.

Here’s a podcast interview I recently did with OP about his product:


Worth listening to if you are interested in the decisions that go into creating, designing, naming, doing usability testing and promoting a desktop app like Easy Data Transform.

Cool tool. Tried it on an annoying dataset I know well. Three specific requests;

#1. The "Show First 10 Rows" dropdown... nice here would be "Show First 10 MOST FREQUENT Rows" ... helps get a view of the distribution of values

#2. A "Map" transformation - you can give it a list of input values and a list of one more more output values to which the inputs should be mapped. E.g. input values might be "New York", "Peekskill" and "Middletown" which map to "New York State" which can be placed in a new column (like the "If" transform)

#3. Finally because it's Hackernews... a "Function" transform allowing something like a Javascript function to be applied to a column, the output put in another column

>#1. The "Show First 10 Rows" dropdown... nice here would be "Show First 10 MOST FREQUENT Rows" ... helps get a view of the distribution of values

You should be able to do this with a pivot, then a sort. But pivot doesn't work with non-numeric values at present. Next release!

>#2. A "Map" transformation - you can give it a list of input values and a list of one more more output values to which the inputs should be mapped. E.g. input values might be "New York", "Peekskill" and "Middletown" which map to "New York State" which can be placed in a new column (like the "If" transform)

You could do that with 'IF'. But I guess that could be a bit verbose and I should perhaps offer a 'Lookup' transform as well. The table lookup has the advantage that the lookup table can be created/modified by Easy Data Transform.

>#3. Finally because it's Hackernews... a "Function" transform allowing something like a Javascript function to be applied to a column, the output put in another column

Yes, an option to have some sort of scriptable transform would be very useful (even if it is slightly at odds with the "without programming" positioning). I personally loathe Javascipt, but I guess it would be easier to embed than, say, Python or Lua.

Thanks for the feedback.

I imagine you could do much of this with NiFi - https://nifi.apache.org/, though if your needs are simple, something like this would definitely be much easier to deal with.

Looking at the Nifi website, I assume it is aimed at IT professionals and has a steep learning curve.

Easy Data Transform is aimed at people who don't have either the skills, time or inclination to take on something like Nifi (which is most people!).

The aim with Easy Data Transform is that someone can use it to transform their data within a few minutes of first seeing it.

> I assume it is aimed at IT professionals and has a steep learning curve.

That's definitely true :)

I don't know how good this is, but from that page it looks like the "no-code required" approach for data transformation I saw as part of many solutions.

When I write data-transformation code, I always have the feeling that it's often too inter-connected and an approach, like the one this tool follows, would be nicer.

Somehow the only the core idea of using these connected nodes is good, the rest of the UI is too clunky, so I drop down to "real" code again for some nodes and sooner or later the mixing up of nodes and code becomes too cumbersome and I drop down to "real" code for everything.

If you can code a solution, then you probably aren't in the core market for this product.

But, perhaps one day in the future, I might be able to add a script or plugin node, so you can add your own custom transforms.

I love these kinds of tools and this looks very useful indeed, but something about your page triggers my spidey sense. A pricing page or at least a hint at a business model would make that go away. Right now it feels like you’re just trying to get me to run a binary on my computer.

Edit: not saying you’re shady, just that it has a vibe of being shady :)

It is free while we are in beta. Because I hope that is going to result in more feedback. Also I don't feel comfortable charging for something that isn't quite production quality yet. But the plan is to charge an annual sub after it comes out of beta (price undecided).

I can see that might trigger some people to think it is of dubious provenance. Maybe I should put the above on a 'Buy' page?

BTW the software is digitally signed (and notarized on Mac) and we've been selling software online since 2005. http://oryxdigital.com/

I have added a 'buy' page that hopefully makes things a bit clearer. No decision has been made on the price yet. Perhaps $99 per year?

Strongly suggest perpetual licensing unless this is truly a service, i.e., you're offloading processing to the cloud (which frankly would be a deal-breaker for me if I were evaluating this for a client).

If this is a standalone binary, what I'd want to see as a user would be a one-time purchase and an optional annual support plan.

My other two products are perpetual licences with optional paid major upgrades and include support. But the world is changing and a yearly sub is attractive for vendors. Simple and with a more predictable cash flow. It also incentivizes the vendor to keep existing customers happy, rather than always chasing new customers.

"Everything-as-a-service" is definitely attractive to us the vendors, but I'm talking about the user experience.

With a recurring license, I'd expect:

  - Cloud computing
  - Cloud storage
  - Real-time collaboration
  - Hosting
Vendors rarely deliver on the promise of continuous improvement quickly enough that I could justify the recurring cost from that standpoint alone. Or I'm already happy with the features, so I'm just paying for feature bloat at that point. (That's definitely been the case for most of the consumer cloud apps I pay for ... which is why I'm slowly migrating back to the desktop in many areas.)

If there are no service features like real-time collaboration, then a recurring revenue model makes even less sense.

I mean, I get it—I've written software that I sell with annual licenses myself. But it depends on cloud services to work, so there are costs to me too. That's I think where it's maybe the place to step back and look at the architecture and whether it's better suited to a web app if recurring revenue is important. Just my two cents....

I disagree - as a vendor I think you should charge based on the value you provide to users, not the costs to run your service. That is to say I don’t think desktop software needs to have a “cloud” piece to it to make it valuable enough to users to justify an annual charge. Users don’t usually care where your app or service runs, just that it solves their pain points quickly and easily.

The question then becomes how you justify that charge, but I think you can legitimately say support and/or new feature development (especially if you allow customers to have some kind of input into that). Having guaranteed support from a company with the technical chops that Andy has would be worth it alone for me in this instance.

Jetbrains have an interesting model with their IDE’s whereby you can fallback to a perpetual desktop license for their products if you don’t want support or updates. Perhaps that’s a nice compromise option.

> you should charge based on the value you provide to users

Yes, but not infinitely, not for a static product. Otherwise I should also be willing to accept infinite punishment for whatever harm my product does as long as it exists.

If I create a hammer, it could still be generating value in a hundred years, but it could also be used to break windows and kill people. So if I deserve continuous payment, shouldn't I also be liable for the damages? Why should I be infinitely rewarded just because my tool had the potential to add value when it also had the potential to do harm?

> Having guaranteed support [...] would be worth it

I agree with you there. But I think it should be optional. It sounds like that's what JetBrains is doing, albeit in a roundabout way.

I agree. I'm developing something, and while a subscription is attractive to me, I feel it's not correctly aligned to what users deserve.

I'm burned out by all of these subscriptions and trying to go back to my "roots".

I pay a sub for some desktop software I use. I don't have a problem with it.

If you estimate that the average user will use the product for N years, you can charge a one-time fee of X or a yearly sub of X/N and make the same amount either way (assuming you got N right and forgetting inflation). A yearly sub is very simple to explain. And people who use the product for longer pay more, which seems fair. The sub also helps to finance the ongoing costs of development and support.

Pretty cool, reminds of how Advanced Renamer handles batch renaming filenames through a visual stack of methods, like sorting, regex replacing, trimming, renumbering, etc. I think that's a really useful thing. There are lots of other weird online formatting tools I've seen over the years that perform things like this, but the experience is pretty poor. I will probably recommend this to my dad.

Congrats on releasing this. I had a quick look at the site and the manual and couldn't see a list of file formats that can be used. For example can I use it with JSON or XML files?

Not yet. Currently it can read delimited text (e.g. CSV) and XLS(X)* and write delimited text. But I do plan to add other input/output formats, depending on feedback.

I need to think a bit about how to flatten an XML/JSON doc into a table and then turn it back into an XML/JSON doc.

(*XLS(X) output currently only works on Windows, because it uses ActiveX. But I plan to have XLS(X) input/output on Windows and Mac at some point.)

Thanks. An FAQ would be useful with info like this. It's not difficult to flatten JSON but turning into a table is another issue altogether.

This may be just the tool I need. Working with a bunch of separate spreadsheets to compile a dataset of 3,000 vendors. Thanks!

You may also want to check out a rather powerful tool for aggregating spreadsheets - https://easymorph.com (I'm the founder)

Any feedback would be gratefully recieved: https://www.easydatatransform.com/support.html

This is a good idea. Honestly surprised it's not SaaS - although I get the privacy aspect.

I don't see any advantage to making this a SaaS. It would just result in more latency and potential privacy issues.

It is true that a desktop system may not be suitable for transorming million row datasets or processing that is running 24x7 - but that is not the market we are aiming for.

I am familiar with Patrick's article. Here is my take on it: https://successfulsoftware.net/2013/10/28/is-desktop-softwar...

TLDR : It depends.

I really enjoyed this article. It's still as applicable today as when it was written. Not everything needs to be a SaaS.

I think the benefit of a SaaS in this case would be:

1. Users always work with the latest version, so you only have 1 version to support 2. It would make monthly pricing an easier sell

But I think there are some downsides here, with an app that is solely about data:

1. If user's data has to flow through it, there are privacy, GDPR and intellectual property concerns (for both the SaaS vendor and customers) 2. Latency, since you're going to have to upload data 3. Possibly issues with bandwidth fees (I think most clouds only charge for egress bandwidth, but users are still going to want to download the processed data) 4. Monthly pricing is a big turn-off for a large segment

Similar to Google's Dataprep https://cloud.google.com/dataprep/ (not free)

And there's openrefine http://openrefine.org/

Both of these look like they are aimed at IT and data science professionals.

Why are you not offering a Linux Version?

It is written in C++/Qt, so it wouldn't be that hard to add a Linux version. But I'm not sure that the market that this is aimed at uses Linux in any appreciable numbers. You are the first to ask!

Also building binaries for Linux is a pain. Which distributions to support?

I was about to ask the same but then I realised I'm not in the target market. The number of "data noobs" on Linux is probably indeed quite low. Fair enough

>Which distributions to support?

Target flatpak and/or snap.

> Also building binaries for Linux is a pain. Which distributions to support?

just build static binaries for x86-64

or better, distribute the source code ;)

Building static binaries for a Qt application requires a very expensive licence. :0(

I do all my data science on linux machines (Kubuntu, but it shouldn't matter), so, like +1.

But would you pay for something like this? Or would you use a free tool such R?

My gut feeling is that Linux users: a) Don't want to pay for software. b) Probably have the technical chops to roll their own solution in Python/R/SQL.

I haven't tried the thing yet, but if it made my life easier, I would most certainly pay for and use it. Or get my customers to do so.

Data pipelining, cleaning and feature construction is the most time consuming part of data science. Its almost always a struggle, and the process usually produces fragile and ephemeral code that will need to be rewritten to put into production. If you could provide a labview-like GUI thing to remove some of this drudgery, assuming you could hook it up to database and csv back ends, or if there were a target which could do this, and the result were robust and could be deployed, it would make me much more productive than fiddling around with pandas or R data tables.

Maybe it isn't what you built, but I've said many times this is the most useful data science product; the one that is 10x easier than writing your own every damn time. Fancy woo-woo classifiers with alleged superpowers don't even begin to compare to tooling like this.

Interesting. I'd love to know close or far it is from what you are looking for.

Currently it can input XLS/XLSX (on Windows) and delimited text (e.g. CSV) and can output delimited text. I am looking to support other inputs and outputs, e.g. JSON / ODBC / SQLite, depending on feedback.

well, I don't work on osx or windows, like, ever. ;-)

csv is the lingua franca for prototyping of course. For deploying (which could make your code more sticky in an enterprise setup), you need to hook up to real databases.

I'd suggest finding a local senior data scientist or consulting group and use them to drive your product feature development. I'm not sure who you had in mind for your end user, but I do think data science types (the ones who get paid; not smurfs who want a DS job some day) are a viable market for something like this.

>I'm not sure who you had in mind for your end user

People who have data they need to transform, but probably aren't programmers or data science professionals.

I would obviously be delighted if data science professionals want to buy it. But I assume they have lots of powerful tools already at their disposal.

But I am still learning about the market, so I could be completely wrong about where the opportunity is.

>People who have data they need to transform, but probably aren't programmers or data science professionals.

I can't imagine who that would be. Excel power users maybe?

Marketing people for a start. They often have vast reams of data (email lists, postal lists, analytics data) that they need to clean, reformat and analyze.

Looks like Alteryx basically?

I don't know a lot about Alteryx. But I understand it is enterprise oriented and much more expensive than the price point we are aiming for.

Yeah way more expensive - a license is like 5k.

It's free for non-commercial use though

Anyway - the space does need some competition so good luck with your project!

I love this kind of programming tools, but do not understand the terminology. Using programming tools has always been called, eeer, "programming"? Is there something that I'm missing here? What's the point of saying "no programming" when you are, precisely, programming?

I don't think it is programming in any real sense. There are no variables and no loops. So I don't think it is Turing Complete. ;0)

Presumably because the end user of the product doesn't need to be a programmer.

Whatever their job title says, if somebody is writing programs using a (visual or textual) programming language, then they are programming.

So when my father puts a SUM() calculation into Excel via the UI he is programming?

That's a bit of a stretch.

indeed he is!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact