
Show HN: X to Elasticsearch Sync - sidi
https://medium.appbase.io/abc-import-import-your-mongodb-sql-json-csv-data-into-elasticsearch-a202cafafc0d?ref=producthunt
======
alx_
I did a bit of digging in the repository. And I didn't like what I found.

First off, this project seems to be based on:

[https://github.com/compose/transporter](https://github.com/compose/transporter)

But of course every mention of transporter or compose.io was removed from
repository, including original BSD-3 license. Which is clearly a violation of
BSD-3 license terms.

Guys, this is not how open source works. You shouldn't try to claim all the
credit when your work is clearly a derivative of another work.

And this isn't even the worst thing about the project. While it claims to be
an open-source software, almost all of its functionality has been removed from
the source code.

Thus as an open source software it's completely useless. If you build it from
source it probably does nothing, because syncing code is closed source.

Yeah, I know you can download pre-built executable. But I won't ever be
running some closed-source freeware/commercial software in production.

No, thank you very much. Especially if aforementioned software was written by
someone who completely lacks any notion of ethics.

~~~
sidi
Addressing to your point directly, the project doesn't claim to be open-source
[https://github.com/appbaseio/abc#licensing](https://github.com/appbaseio/abc#licensing)
and the same is also mentioned in the blog post. The part of the project that
is open-source is available at
[https://github.com/appbaseio/abc](https://github.com/appbaseio/abc) (but
isn't relevant to the post here).

The import functionality is based on the transporter project[1], and to set
the record straight - we will be adding the acknowledgement for the same in
the next binary release[2]. However, we are not redistributing the source as
`abc import` isn't open-source. For anyone interested in why we aren't
straight up using the transporter project as is, there are changes we have
introduced in the 1.) sink functionality, 2.) added adaptors for SQL variants
and CSV, and 3.) made a more simpler interface. Going forward, we are more
interested in the easiest way to sync <X> to Elasticsearch, which is different
from transporter's goal of a generic ETL.

I do appreciate you bringing this up. We're very much just getting this out
there and want to do the right thing.

[1]
[https://github.com/compose/transporter](https://github.com/compose/transporter)

[2]
[https://github.com/appbaseio/abc/issues/77](https://github.com/appbaseio/abc/issues/77)

~~~
alx_
Sure, there is some mention of "!oss". It's not 100% clear though. I sure
missed it.

You are redistributing transporter's source though, because it's avalaible
from the github repository. Wtf are you even talking about? There is some code
almost straight from transporter in your abc repository.

Sure, you've made changes. Why should anyone be interested though if it's 100%
closed-sourced? "free while in beta" my ass.

~~~
sidi
We aren't redistributing transporter's source code. Where did you see that?

~~~
alx_
You are still redistributing source code that is derivative of transporter's
code (like goja_builder.go and other stuff). Not to mention that anyone can
just roll back commits in your repository and get to the original code of
transporter, with original license, copyright and even authors list.

I think that's a license violation, because you are not keeping an original
copyright notice in repositor. If you think otherwise - whatever. I don't want
to argue on technicalities.

I think I'll just forward info to transporter's developers, so that they can
handle the situation the way they want. You can argue with them or maybe they
just don't care.

------
BrentOzar
I couldn't figure this out in the documentation - how are you keeping the data
up to date? Is there some kind of scheduled refresh that pulls all the data
from the database periodically, or how are you detecting which rows changed?

In particular, how are you implementing this in, say, the MSSQL importer?

> Adaptors may be able to track changes as they happen in source data. This
> "tail" capability allows a ABC to stay running and keep the sinks in sync.

~~~
TYPE_FASTER
I’m assuming they’re adding a trigger to the source table.

~~~
BrentOzar
> I’m assuming they’re adding a trigger to the source table.

Their video demo for SQL Server simply points the command line at a connection
string, and it then says 2 item(s) indexed. There wasn't a way to pick
specific tables.

That means if the trigger method is true, they're adding a trigger by default
to every single table in the database. That would be a remarkably bad idea.

------
sidi
Hi HN, we created __ABC import __as a convenient way to sync data source to an
Elasticsearch index. It does three things really well imho:

1\. A small footprint process / docker container that can index or sync your
data source with ES that is operationally simple vs relying on application
layer logic,

2\. Supports on-the-fly transformations with Javascript, as well as
configuration of mappings (so if you want to set a specific analyzer on your
Text fields, or set type mappings),

3\. Works with a wide variety of sources - Postgres, MySQL, SQL Server,
MongoDB, JSON, CSV, Elasticsearch and more coming soon.

~~~
styfle
A couple questions with regards to the SQL -> ES:

1\. Does it sync deltas or do you have to import the whole table each time?

2\. Does it listen for changes to a table or does it require a manual
invocation?

3\. Can you explain more about the algorithm/implementation?

~~~
sidi
1\. and 2. It syncs deltas if you are either using Postgres (9.4 or above) or
MongoDB. For other sources like MySQL or MSSQL, it needs to import the whole
table each time. The tailing capability is DB dependent, for instance, in
MongoDB's case - we use the oplog to do this.

The way it works is after the initial import if you are using the `--tail`
option, it listens to further changes via oplog. More on setting oplog access
over here -
[https://github.com/appbaseio/abc/blob/dev/docs/importer/adap...](https://github.com/appbaseio/abc/blob/dev/docs/importer/adaptors/mongodb.md)

For Postgres, similarly a replication role and a replication slot needs to be
created for decoding the write ahead log. More on that here -
[https://github.com/appbaseio/abc/blob/dev/docs/importer/adap...](https://github.com/appbaseio/abc/blob/dev/docs/importer/adaptors/postgres.md).

This seems like a good primer on replication slots -
[https://postgresqlspace.wordpress.com/2015/06/18/replication...](https://postgresqlspace.wordpress.com/2015/06/18/replication-
slots-in-postgres/).

------
rpedela
Can this sync to any ES server or just appbase? All the examples in docs are
for appbase targets.

~~~
sidi
It can sync to any ES server. Linking to the import doc which expands on that
-
[https://github.com/appbaseio/abc/blob/dev/docs/appbase/impor...](https://github.com/appbaseio/abc/blob/dev/docs/appbase/import.md).

