Hacker News new | past | comments | ask | show | jobs | submit login

Hey Maayan and Or, Nice project, at re_data we just got over a lot of your new updates and it seems a quite large part of your project is "inspired" by code from our library https://github.com/re-data/re-data. Even with parts, we are not especially proud of ;)

If you decide to copy not only ideas but a big part of internal implementation, I think you should include that information in your LICENSE.

Cheers




Is the idea here that it's inspired by re_data due to using dbt transformations underneath or because it's reposted looking nearly the same? (or both?)

Looks like much of the lineage code is also largely a wrapper around this library: https://github.com/reata/sqllineage

Would be curious to understand the project's purpose and unique contributions vs. the underlying dependencies powering it as there seems to be some ambiguity. Is this just a wrapper around dbt transformations and a lineage library in one package? Can I just use them directly?


It's "inspired" the dbt transformation part by using the same models and logic/part of code of generating them. We, for example, had a funny thing of computing metrics in 4 threads via multiple dbt models, and this is also done in elementary in a very similar way :)

The lineage part is independent (re_data uses lineage from dbt), so I haven't looked into that much.


While writing our dbt project we looked into more than 60 dbt projects to learn from prior work while developing Elementary, and have been inspired by different things in different places. You're right that we were inspired by a couple of techniques you used, one being that creative way to improve performance (though the 4 thread setting itself is the dbt recommendation in their docs). Another is using z-score for anomaly detection, which we saw in a number of related projects and it's widely used in the industry.

In terms of the lineage, you can see in the code that we mostly rely on query and access history that exist in Snowflake and Bigquery to parse the queries and learn about the connection between nodes in the graph. We use other python libraries like sqlfluff and sqllineage as low level parsers for some specific use cases which we extend and solve many things on top of them. Actually we're heavy open source users, depending on around 20 libraries, all MIT or Apache.


Okay, I'm happy that you admit inspiration in this comment (in opposition to the previously deleted one).

Also, I think it's more than just following up re_data in a couple of places. Elementary's whole data monitoring part started much later than your Lineage part, and it seems to try to follow what re_data did there on the idea & implementation level. I'm sure the other 59 projects you mention were not dbt packages for data reliability (there were no other one in the dbt hub) which is what re_data is and now elementary also tries to copy this. (seeing our traction)

As mentioned, it's open-source. You can use our code. But if you are doing that, state that clearly in the LICENSE.


I think mateuszklimek is pointing out that the MIT license requires you to include the redata copyright in your source.


Right on point, they don't even have a filled out LICENSE on the repo

> Copyright [yyyy] [name of copyright owner]

https://github.com/elementary-data/elementary/blob/master/LI...


Gotcha - I can see what you mean, appreciate the clarification


If you're going to make an accusation like this on HN, you should provide line by line evidence. Saying "you copied us" without any examples makes you incredible.


Sure, please compare: https://re-data.github.io/dbt-re-data/#!/overview?g_v=1 and https://docs.elementary-data.com/ graph png.

Elementary models like data_monitors_thread1, data_monitors_thread2, data_monitors_thread3, data_monitors_thread4, data_monitoring_metrics, latest_metrics, metrics_stats_for_anomalies, z_score, anomaly_detection, schema_schenages, etc. Existed before in re_data, are doing the same things and specifically for *_thread4 are not similar to anything you normally do in dbt.

And these similarities are also visible in code, for example here: the same usage of the undocumented dbt context feature.

# elementary

{% macro get_monitor_macro(monitor) %}

    {%- set macro_name = monitor + '_monitor' -%}
    {%- if context['elementary'].get(macro_name) -%}
        {%- set monitor_macro = context['elementary'][macro_name] -%}
    {%- else -%}
        {%- set monitor_macro = context['elementary']['no_monitor'] -%}
    {%- endif -%}

    {{- return(monitor_macro) -}}
{% endmacro %}

# re_data

{%- macro get_metric_macro(metric_name) %}

    {% set macro_name = 're_data_metric' + '_' + metric_name %}

    {% if context['re_data'].get(macro_name) %}
        {% set metric_macro = context['re_data'][macro_name] %}
    {%- else %}
        {% set metric_macro = context[project_name][macro_name] %}
    {% endif %}

    {{ return (metric_macro) }}
{% endmacro %}


Pretty strong accusation, are you sure re-data isn't "inspired" from Monte Carlo? :)


It is! But it doesn't have Monte Carlo code in it :)

And it's open-source so it's generally okay to do that, but it should be reflected in the LICENSE.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: