
Why I Created YADA - varontron
https://yadadata.com/2016/08/22/why-i-created-yada/
======
ricardobeat
I surely wasted a lot of time reading the post, then readme on github, then
"QuickStart", sifting through roughly a trillion lines of configuration and
build options, and still failed to find a single code sample on fetching data
or writing an adapter...

~~~
karma_vaccum123
the author recognized one problem (hairball perl left undocumented by previous
coworker) but replaced it with another problem (overly complex and needlessly
configurable java)

what the author replaced was a classic "first system"...but he created a
classic "second system".

I wish people would just realize that the best route is to solve the problem
at hand...and JUST the problem at hand, with the smallest and simplest
codebase possible....these tend to be the near-optimal "third systems" that
eventually emerge.

~~~
nathancahill
100 times this. What's the rule about frameworks? "Every system evolves
towards supporting arbitrary programming|turing completeness". Something like
that.

~~~
k__
This could be true, yes.

But I have the feeling, that good frameworks still make some stuff easier than
others, while supporting arbitrary programming/turing completeness.

For example Cycle.js, which is a structural framework for reactive programming
with observables. It's made for front-end developoment, but I can also model
REST and WebSocket servers with it, since almost everything can be modeled as
a mix of source-streams + (input-data) sink-streams (output-data).

Since it uses observables all over the place, it makes handling these data-
streams much easier an clearer, by making them first class citiziens in the
app.

~~~
trhway
>almost everything can be modeled as a mix of source-streams + (input-data)
sink-streams (output-data)

true. This is pretty much what i have been doing for 25 years and will be
doing for the next like 20.

Reminds me about the time when one guy we had around read a book on finite
state machines. It was, dare i say, an eye-opener for him. Since that moment
until he parted with us, all what he was doing was FSAs.

~~~
k__
lol, sounds a bit like he did Heroin. :D

------
codemac
I hadn't heard of YADA - anyone use it? Seems like a noble goal.

~~~
Kildea
I started using at it a couple of months ago for a couple of small projects. I
think it has real potential as a tool for dealing with integration of old
projects and new ones, and for charting out transitions in a production
environment. I have been looking at some 'technical debt' in the form of badly
documented software written over several years (by several different
employees) and it can be a nightmare. YADA offers a simple, flexible way to
organize these sorts of processes.

------
exelius
So it's an ETL library? Useful, but there are dozens (if not hundreds?) of
free and commercial libraries, services, etc that do the same thing.

Hell, some companies (Teradata, Informatica) have built their entire business
on this. Of course, their solutions are far more complex and scalable, but
this is definitely not a new problem (or solution).

~~~
varontron
Thanks for taking the time to review and comment.

ETL is a viable use case, one of the first in fact, dating back to 2011, in
which we marshalled 100s of millions of RNASeq gene expression values and
metadata into a dw.

Regarding other ETL tools, one of the goals of YADA is to make the process
easier. The tools you describe are well known and robust, but as you point
out, complicated and costly. YADA was designed to be much simpler. Further, it
can also be used for a variety of other things, like SPJAs, data analytics,
etc, securely, and without additional overhead.

------
room271
There is an unfortunate naming conflict here:

[https://github.com/juxt/yada](https://github.com/juxt/yada)

although I guess that happens alot nowadays.

------
karma_vaccum123
Okay I think its safe to say we have all built this at one time or
another...its almost a right of passage for an intermediate developer to grasp
that moment when they realize the world can be encoded as a nested dictionary,
and most databases can be abstracted to a lowest-common-denominator.

Then you realize it is a lossy transformation that doesn't expose any of the
real power of the platform you are committed to, and you don't actually have
to parameterize the world of possibilities (and there is no value in enabling
this)...you tarball it and walk away.

~~~
qyv
As a tool for data analysis I could see using something like this, but I don't
think I could ever write an application on top of this. At the low levels you
are running on top of hard-coded native DB queries, potentially across
multiple data sources. What happens when you are 100's of queries into a
project and someone needs to change the underlying data structure? That is a
shaky foundation.

~~~
varontron
Thanks for taking the time to comment. You're basically rephrasing the hammer
cliche, which is accurate. First though, as your first sentence indicates, you
agree we still need hammers. Right now, YADA is a great tool for prototyping,
adhoc reporting, data analytics, ETL, and smaller SPJAs. One testament in our
environment has been to repurpose the work we do for one endpoint in another,
e.g. webapp and spotfire, or vendor app and ad hoc reports. We are working
with a high-volume machine-learning group, so eventually we expect to be able
to scale.

------
khoomeister
Is this basically Presto?

------
zzz157
Is this a play on words with "Yadida"?

~~~
varontron
It started as a acronym (albeit a tired one) for yet another data abstraction,
and it caught on.

------
ausjke
It's written in java/tomcat after I read for 15+ minutes and then backed out,
as I don't use java.

~~~
treehau5
This is a silly statement I see time and time again. Does the tool provide you
value and solve a problem you have? Then use it. You are an engineer.

To show how silly it is, imagine if you were talking about another program
that was incredibly useful like grep for example:

"It's written in c after I read for 15+ minutes and then backed out, as I
don't use c."

~~~
luchadorvader
grep is simple to use from a simple command. Setting up a Java server and
maintaining it is quite a different challenge then running a command with a
few parameters. I wouldn't use it because if it goes down, fuck. If I need to
extend it or decide to change something in there, fuck. I would want to use
something that is easier for me to fix and maintain or my coworkers to
fix/maintain, then have a product built from multiple languages that require
too many different types of people to keep it running.

~~~
treehau5
I wasn't suggesting you do something that would place a large burden on your
coworkers, but there is a point where the usefulness of existing software is
greater than any potential churn of learning it's running environment.

Suppose you were a ruby developer and you want to create a fully scalable,
highly available JSON document store that you can easily index, store, and
backup/rotate, and a solution like this wasn't available in your language.
Would you create your own database to store the documents, implement your own
consensus algorithms and searching algorithms like TL-DF?

Or would you just learn how to use elasticsearch and incur the cost of running
it?

