Why I Created YADA

ricardobeat · on Aug 22, 2016

I surely wasted a lot of time reading the post, then readme on github, then "QuickStart", sifting through roughly a trillion lines of configuration and build options, and still failed to find a single code sample on fetching data or writing an adapter...

karma_vaccum123 · on Aug 22, 2016

the author recognized one problem (hairball perl left undocumented by previous coworker) but replaced it with another problem (overly complex and needlessly configurable java)

what the author replaced was a classic "first system"...but he created a classic "second system".

I wish people would just realize that the best route is to solve the problem at hand...and JUST the problem at hand, with the smallest and simplest codebase possible....these tend to be the near-optimal "third systems" that eventually emerge.

varontron · on Aug 22, 2016

There is distinction between the scenario you describe, and the goal of YADA, which is that with YADA we strive to make the first (or existing) system useful, and to keep the bar low, by favoring the indispensable skills like SQL; or, to simplify the first, second, third, or whatever system by reducing the number of components, redundant configurations, and points of failure. Further, YADA facilitates solving a problem that many of the optimal systems create, those related to workflow, collaboration across teams, efficiency, repurposing, etc.

Thanks for taking the time to review and comment!

nathancahill · on Aug 22, 2016

100 times this. What's the rule about frameworks? "Every system evolves towards supporting arbitrary programming|turing completeness". Something like that.

k__ · on Aug 22, 2016

This could be true, yes.

But I have the feeling, that good frameworks still make some stuff easier than others, while supporting arbitrary programming/turing completeness.

For example Cycle.js, which is a structural framework for reactive programming with observables. It's made for front-end developoment, but I can also model REST and WebSocket servers with it, since almost everything can be modeled as a mix of source-streams + (input-data) sink-streams (output-data).

Since it uses observables all over the place, it makes handling these data-streams much easier an clearer, by making them first class citiziens in the app.

trhway · on Aug 22, 2016

>almost everything can be modeled as a mix of source-streams + (input-data) sink-streams (output-data)

true. This is pretty much what i have been doing for 25 years and will be doing for the next like 20.

Reminds me about the time when one guy we had around read a book on finite state machines. It was, dare i say, an eye-opener for him. Since that moment until he parted with us, all what he was doing was FSAs.

k__ · on Aug 23, 2016

lol, sounds a bit like he did Heroin. :D

gregmac · on Aug 22, 2016

Yep. I have found the mantra "Build what you need" works best.

Only once you end up having to solve a separate problem that shares common attributes/patterns with the first should you you extract a common framework and rebuild both on it. You have a proven need and you're not just blindly building "someday this could be used for"-type code.

varontron · on Aug 22, 2016

This is an excellent point–more accessible advanced documentation. YADA has tremendous flexibility, and if you actually read the readme, then you know, a lot of use cases. There's always a trade-off between appealing to both experts and novices, but I'm certain we can balance things better.

There isn't currently an Adaptor authoring guide, so that would obviously be a good thing. There are some code samples for usage in user guide.

In any event, thanks for giving it a look and taking time to comment.

codemac · on Aug 22, 2016

I hadn't heard of YADA - anyone use it? Seems like a noble goal.

Kildea · on Aug 22, 2016

I started using at it a couple of months ago for a couple of small projects. I think it has real potential as a tool for dealing with integration of old projects and new ones, and for charting out transitions in a production environment. I have been looking at some 'technical debt' in the form of badly documented software written over several years (by several different employees) and it can be a nightmare. YADA offers a simple, flexible way to organize these sorts of processes.

exelius · on Aug 22, 2016

So it's an ETL library? Useful, but there are dozens (if not hundreds?) of free and commercial libraries, services, etc that do the same thing.

Hell, some companies (Teradata, Informatica) have built their entire business on this. Of course, their solutions are far more complex and scalable, but this is definitely not a new problem (or solution).

varontron · on Aug 22, 2016

Thanks for taking the time to review and comment.

ETL is a viable use case, one of the first in fact, dating back to 2011, in which we marshalled 100s of millions of RNASeq gene expression values and metadata into a dw.

Regarding other ETL tools, one of the goals of YADA is to make the process easier. The tools you describe are well known and robust, but as you point out, complicated and costly. YADA was designed to be much simpler. Further, it can also be used for a variety of other things, like SPJAs, data analytics, etc, securely, and without additional overhead.

room271 · on Aug 22, 2016

There is an unfortunate naming conflict here:

https://github.com/juxt/yada

although I guess that happens alot nowadays.

karma_vaccum123 · on Aug 22, 2016

Okay I think its safe to say we have all built this at one time or another...its almost a right of passage for an intermediate developer to grasp that moment when they realize the world can be encoded as a nested dictionary, and most databases can be abstracted to a lowest-common-denominator.

Then you realize it is a lossy transformation that doesn't expose any of the real power of the platform you are committed to, and you don't actually have to parameterize the world of possibilities (and there is no value in enabling this)...you tarball it and walk away.

varontron · on Aug 22, 2016

Thanks for taking the time to review and comment.

YADA doesn't abstract any underlying system, it abstracts access to the underlying system. You can think of it as a pluggable web service that you slap on, say, a legacy warehouse, except it can also reach into any other system, enabling access to data using the same standard syntax.

It's like a BI or ETL tool in that respect, except those tools wall off the work you've done and limit repurposing. BI tools generate reports and that's pretty much it. You couldn't, for example, point Spotfire to Cognos, you'd connect them both to the db instead, distributing the credentials, and possibly construct the same query in both places using their interfaces. With YADA, you could instead write the SQL once, and get the data from anywhere.

I do agree with your sentiment however, that as innovators we always think there is a better way, or more commonly, that their _must_ be a better way, because we don't understand the problem fully, or grasp the complexity of the current solution. Often we discover we're not the first to think of a great idea. I'm sure there are similar tools to YADA, but I also think YADA has potential to offer a wide array of users some options and combinations of features that are distinct in the marketplace.

qyv · on Aug 22, 2016

As a tool for data analysis I could see using something like this, but I don't think I could ever write an application on top of this. At the low levels you are running on top of hard-coded native DB queries, potentially across multiple data sources. What happens when you are 100's of queries into a project and someone needs to change the underlying data structure? That is a shaky foundation.

varontron · on Aug 22, 2016

Thanks for taking the time to comment. You're basically rephrasing the hammer cliche, which is accurate. First though, as your first sentence indicates, you agree we still need hammers. Right now, YADA is a great tool for prototyping, adhoc reporting, data analytics, ETL, and smaller SPJAs. One testament in our environment has been to repurpose the work we do for one endpoint in another, e.g. webapp and spotfire, or vendor app and ad hoc reports. We are working with a high-volume machine-learning group, so eventually we expect to be able to scale.

mjevans · on Aug 22, 2016

Which is why JSON for non-indexed attributes is useful.

falcolas · on Aug 22, 2016

There's another point in your development career where you realize that strings, numbers, maps, and lists are not nearly enough data types for more complex work. Of course, first you'll have to go through the phase where you fight to make them fit into those four types simply so you can use JSON.

Even when those four types are sufficient, you'll frequently find there are better ways to handle storage and transmission than serializing the data to JSON.

rch · on Aug 22, 2016

Then eventually you give up on all that and try to focus on building a datetime library that works the way you want it to.

falcolas · on Aug 22, 2016

Yup, done that one too. Ended up falling back on using the Linux zoneinfo database and accepting that I'm unlikely to do better (and keep up with it, no less).

azinman2 · on Aug 22, 2016

I would _highly_ recommend never to do this. Date/time is complex in so many subtle ways, least of which is staying up to date on changes. Because its so fundamental every language is likely to have a mature, maintained, and roughly canonical library to use.

ben_jones · on Aug 22, 2016

But what about JSON web tokens?

khoomeister · on Aug 22, 2016

Is this basically Presto?

zzz157 · on Aug 22, 2016

Is this a play on words with "Yadida"?

varontron · on Aug 22, 2016

It started as a acronym (albeit a tired one) for yet another data abstraction, and it caught on.

on Aug 22, 2016

[deleted]

dang · on Aug 22, 2016

ausjke · on Aug 22, 2016

It's written in java/tomcat after I read for 15+ minutes and then backed out, as I don't use java.

treehau5 · on Aug 22, 2016

This is a silly statement I see time and time again. Does the tool provide you value and solve a problem you have? Then use it. You are an engineer.

To show how silly it is, imagine if you were talking about another program that was incredibly useful like grep for example:

"It's written in c after I read for 15+ minutes and then backed out, as I don't use c."

luchadorvader · on Aug 22, 2016

grep is simple to use from a simple command. Setting up a Java server and maintaining it is quite a different challenge then running a command with a few parameters. I wouldn't use it because if it goes down, fuck. If I need to extend it or decide to change something in there, fuck. I would want to use something that is easier for me to fix and maintain or my coworkers to fix/maintain, then have a product built from multiple languages that require too many different types of people to keep it running.

treehau5 · on Aug 22, 2016

I wasn't suggesting you do something that would place a large burden on your coworkers, but there is a point where the usefulness of existing software is greater than any potential churn of learning it's running environment.

Suppose you were a ruby developer and you want to create a fully scalable, highly available JSON document store that you can easily index, store, and backup/rotate, and a solution like this wasn't available in your language. Would you create your own database to store the documents, implement your own consensus algorithms and searching algorithms like TL-DF?

Or would you just learn how to use elasticsearch and incur the cost of running it?

ben_jones · on Aug 22, 2016

Yup. It goes far beyond the effort taken to run it. Straight out of the preface of Site Reliability Engineering, a book written by Google SREs, "estimates 40-90% of the total costs of a system are incurred after birth".

Edit: This was meant to go under one of your post's child comments, and hopefully isn't too out of context misplaced here.

sesm · on Aug 22, 2016

If a Java web application is packaged properly, then running it as simple as 'java -jar myapp.jar'.

ausjke · on Aug 22, 2016

I already use javascript, c, c++, PHP, python, ruby, css, less, sass, jade, sql, etc for my daily work. However I do avoid java/tomcat and .net ecosystem, they're either too heavy, or not my favorite ecosystem. My comments are simply a reminder for those in the same boat so that they can save some time.

Are you going to pick up _all_ kinds of tools from a store? I do have my preferences there.

Bino · on Aug 22, 2016

I had the same experience. People who still use java should try to be more upfront with it, so others don't waste their time...