Hacker News new | past | comments | ask | show | jobs | submit | ersatz_username's comments login

The word “empty” carries no semantic content under this construction of the term. Terms like “empty” are always used contextually, hence, why only a fool would correct their partner for mentioning the empty refrigerator by remarking on the presence of air.

Unless you think Sagan would have been surprised by the presence of an electric field (mediated by virtual particles…) between electrons and protons in an atom it’s quite likely you’re choosing an obtuse understanding of the term.


This is kind of my point, Sagan was a great science communicator but really missed the mark here.

A solid wood table is not mostly empty by any common contextual definition. Photons do not pass through it freely, your hand won't pass through it freely, and the electron clouds of the carbon atoms in the wood are physical in almost every common sense of the term. They very much push back on any other electron cloud that comes near enough and are generally the only part of the atom that does the interacting.

While it might be true that almost all the mass is concentrated in the center of the atom, that's not what people mean by empty. Houses aren't considered empty just because most of the mass is in the foundation.


What does your data stack look like? I'll put something together specifically for you.


A lot of these tools are very, very different from each other so it's hard to address each individually. Just by way of example, databend is a full on datawarehouse while greatexpectations is a testing framework evaluating data assertions (i.e. "I see there are nulls, but you wrote a test which says there shouldn't be").

Here are some things we think are really important though

1. Data quality testing ideally happens during CI not after merge.

2. Developers come first. Virtually every aspect of the tool can be customized, modified, and extended down to the basic data model without changing any upstream core code. Want to build your own custom application on top of your data lineage? Great! Have at it!

3. Users should be able to own not just their own data but their own metadata. We go to great lengths to maintain feature parity between the cloud and self-hosted application.


My bad, I was actually referring to databand [1]

[1] https://databand.ai/


I wouldn't personally draw such a bright line between monitoring and reliable CI/CD. That division definitely exists but partly as a product of the complexity introduced by fragmented data systems. In some ways an ideal world is one where the need for extraordinarily complex monitoring tools is actually pretty limited because we had tools to validate end to end data pipelines before making code changes if that makes sense.

We actually already do data monitoring as well although we haven't built the specific alerting features of Monte Carlo. There are quite a few tools that do that really well so it's not our focus at the moment.


It depends a bit on your stack. Out of the box it does a lot with the metadata produced by the tools your using. With something like dbt we can do things like extract your test assertions while for postgres we might use database constraints.

More generally we can embed the transformation logic of each stage of your data pipelines into the edge between nodes (like two columns). Like you said, in the case of SQL there are lots of ways to statically analyze that pipeline but it becomes much more complicated with something like pure python.

As an intermediate solution you can manually curate data contracts or assertions about application behavior into Grai but these inevitably fall out of sync with the code.

Airflow has a really great API for exposing task level lineage but we've held off integrating it because we weren't sure how to convert that into robust column or field level lineage as well. How are y'all handling testing / observability at the moment?


For testing:

- we have a dedicated dev environment for analysts to experience a dev/test loop. None of the pipelines can be run locally unfortunately.

- we have CI jobs and unit tests that are run on all pipelines

Observability:

- we have data quality checks for each dataset, organized by tier. This also integrates with our alerting system to send pagers when data quality dips.

- Airflow and our query engines hive/spark/presto each integrate with our in-house lineage service. We have a lineage graph that shows which pipelines produce/consume which assets but it doesn't work at the column level because our internal version of Hive doesn't support that.

- we have a service that essential surfaces observability metrics for pipelines in a nice ui

- our airflow is integrated with pagerduty to send pagers to owning teams when pipelines fail.

We'd like to do more, but nobody has really put in the work to make a good static analysis system for airflow/python. Couple that with the lack of support for column level lineage OOTB and it's easy to get into a mess. For large migrations (airflow/infra/python/dependecy changes) we still end up doing adhoc analysis to make sure things go right, and we often miss important things.

Happy to talk more about this if you're interested.


Really appreciate the kind words. If you don't mind my asking, what sort of issues were y'all experiencing that prompted you to start looking for solutions now?


Thanks so much! Really appreciate the kind words.

We haven't had anyone request Gitlab yet but would love to add support! Any chance you'd be willing to beta test for us? If so, shoot me an email at ian@grai.io :).

EDIT: It looks like the index issue is related to our search provider. Were you able to eventually load the page or is it fully blocking you?


Just tried it again, and I’m still having the same issue with search :(.


Weird, sorry about this! We just removed the search and redeployed the docs. Hopefully that will fix it until we can sort the problem out. Would you mind giving it another shot?


Just to be clear, the only limitation imposed by the license is preventing someone from reselling a cloud hosted copy of the tool. The code is otherwise totally free to use fork / modify / etc...


That's great, it's not open source though so you shouldn't call it open source. Call it something else.


We are pretty open to feedback on licensing and have gone back and forth internally because, frankly, we'd rather use a copy-left license.

We believe a project like this needs financial backing and a dedicated team driving development along but therein lies the tension. The common monetization paths either feature-lock critical self-hosted capabilities like SSO behind a paywall and/or monetize behind a cloud hosted option.

The Elastic license is an attempt to maintain feature parity between the cloud and self-hosted tool while still being protected from something like the big cloud providers ripping the code off altogether.

In all seriousness though, we would love to hear suggestions if you think there's a better path.


I personally don't have anything against the license you've chosen, and I respect your right to protect your efforts against usage you don't desire. I just think it's better to avoid using "open source" if going down the ELv2 path, and using something like "source available" or "fair code" instead to prevent confusion in misrepresenting this as, what is commonly considered, open source.

If you'd like further detail in regards to why I (and others) think this matters, I've previously written my thoughts up here: https://danb.me/blog/posts/why-open-source-term-is-important...


Thanks for the link. Some personal thoughts:

I think the effort to standardize what is meant by a term like "open source" is generally good, but I also think the meaning of language is always up for debate, and the OSI's definitions are only right if they are useful.

Of the two clauses you pulled out of the EL2 license, the first one - "You may not provide the software to third parties as a hosted or managed service ..." - seems fine to me as "open source", while the second - "You may not move, change, disable, or circumvent the license key functionality ..." - seems not-fine.

(So for what it's worth, because of that second clause, I am agreeing with you that this license shouldn't be called "open source" - but it seems unfortunate for OP if they aren't relying on that clause.)

I think the issue I have is with the 6th OSI definition you pulled out - "No Discrimination Against Fields of Endeavor" - it seems to me like that one could use some tweaking. I do think it's important that the ability to run "Derived Works" is not limited by "field of endeavor", but I think selling managed software as a service could be a specific carve-out to that. It seems totally reasonable and not violating the spirit of "open source" to say you can modify and self-host for any purpose, but you can't re-sell.


> It seems totally reasonable and not violating the spirit of "open source" to say you can modify and self-host for any purpose, but you can't re-sell.

Personally I would heavily disagree with that, and that statement is something I see as against the spirit of open source. In my view, open source and free software mainly intend to use licensing to put the freedoms and rights of the code & it's users in front of those of it's authors. Being able to re-sell has always been a significant point, and part of the spirit, in free software and open source.


It's just an honest disagreement. I don't think your opinion is the way, but I don't begrudge it.

I'm just not an ideologue. What matters to me is having as much software tooling that is as useful to me as possible. I consider tools that I can modify and run myself to be more useful than those that are proprietary. But I don't require or demand the ability to re-sell someone else's software; that isn't a capability that is useful to me. That capability is pretty much entirely only useful to Amazon and Google, and that's just not something I care about optimizing for.


This question is for my education alone, but since you seem quite passionate I am curious.

I just read a super long article about licensing to understand your comment as well as the article you wrote. Under these "source available" licenses, I can still sell the software within some kind of package correct? Like if I create my own PR linter I can use Grai and still sell it? I just can't host grai with some observability and sell it? Or am I misunderstanding?


Just to be clear for my responses, I am not a legal expert in any way.

> Under these "source available" licenses, I can still sell the software within some kind of package correct? Like if I create my own PR linter I can use Grai and still sell it?

"Source available" means the source is accessible. Whether you can sell the software depends on the license. In the case of the Elastic License v2 as used here, I believe you could re-sell the works but you cannot re-license and the original limitations will remain which include providing as a hosted/managed service. There are other limitations too, the limitations around license keys functionality could be a significant hindrance depending on specific use and implementation.

> I just can't host grai with some observability and sell it? Or am I misunderstanding?

That is kind of the most significant limitation, but ultimately you are subject to the detail of all limitations:

~~

>> You may not provide the software to third parties as a hosted or managed service, where the service provides users with access to any substantial set of the features or functionality of the software.

>> You may not move, change, disable, or circumvent the license key functionality in the software, and you may not remove or obscure any functionality in the software that is protected by the license key.

>> You may not alter, remove, or obscure any licensing, copyright, or other notices of the licensor in the software. Any use of the licensor’s trademarks is subject to applicable law.

~~

Note that there's nothing about selling at all. Also think about how widely that first limitation could cover different types of use-case. And, as touched on above, that second limitation could be used in quite a protective/combative way to make significant parts of the software unusable in re-use.


Totally fair and appreciate the (well written) thoughts.


> The Elastic license is an attempt to maintain feature parity between the cloud and self-hosted tool while still being protected

I don’t know enough about the elastic license but I very much prefer this approach. I’ve seen a lot of source available projects deliberately refuse to implement features, and just generally let the product managers spend time on dark pattern bait-and-switch to drive sales. It misaligns the incentives, and complicates the product offering. It’s infuriating for developers. This is much clearer for everyone.


> We believe a project like this needs financial backing and a dedicated team driving development

What benefits do you get from being open source other than the OS stamp of approval?

Perhaps the solution is to just go closed source. I'm all for open source, but I'm not the biggest fan of open core or source available. All it does it hurt the business with little benefit to me. I'd rather you make more money and support me or go full altruistic and make it truly open source.


We aren't open source because we want to get anything out of it is the short answer. Of course to each their own but I've personally gotten a ton of value from open core tools in the past.


Don't its customers get the benefit of being able to self-host and modify for their own internal use? Seems like a big benefit to me...


My point is if it's a commercial entity, I'd rather pay them to make the modifications and then maintain it than pay my own engineers to do it.


Yes, but if I want to make larger modifications than would make sense for the core project, I'd like to have the ability to self host my modified version (and ideally have a support contract as well, if they're into that).

So you asked what the point of doing this is for them, from a business perspective. I think the point is marketing / smoothing the sales process. I feel much better about using SaaS products that I know I can self host if necessary, even if I'm unlikely to actually do so.

Frankly, it's just the same reason I prefer any of my tools to be open source. I don't like using proprietary programming languages or frameworks, because I can't fix things that are broken even if I want to. This remains true even though I can count the number of times I've actually done this on one hand.


Pre-built integrations are a big part of what makes onboarding easy but it sort of ends up in a catch-22 situation where whichever integrations gets highlighted is only directly applicable to the people using those tools.

If you have a different toolset onboarding will look exactly the same though, there's nothing truly DBT specific at work here. It's a good idea though! We really should put together a few other combinations so more people can see their own stack represented.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: