More

poulpi · on Oct 9, 2023

Algolia is YC backed, and HN is run by YC.

Algolia ex-CEO is now a partner at YC.

So the connection between the two is pretty strong, this will be resolved before the end of the day.

ashu1461 · on Oct 9, 2023

Yes right, and who doesn't like free publicity amongst their TG

poulpi · on July 28, 2023

If the issue happen a lot, there is also: https://github.com/datafold/data-diff

That is a nice tool to do it cross database as well.

I think it's based on checksum method.

Pxtl · on July 28, 2023

Honestly if the resultsets are small-enough, I just dump them to JSON and diff the files. But it has to be fully deterministically sorted for that (in a sane world "order by *" would be valid ANSI SQL).

hichkaker · on July 28, 2023

Thank you for mentioning Data Diff! Founder of Datafold here. We built Data Diff to solve a variety of problems that we encountered as data engineers: (A) Testing SQL code changes by diffing the output of production/dev versions of SQL query. (B) Validating that data is consistent when replicating data between databases.

Data Diff has two algorithms implemented for diffing in the same database and across databases. The former is based on JOIN, and the latter utilizes checksumming with binary search, which has minimal network IO and database workload overhead.

poulpi · on May 23, 2023

Nice!

Our docker builds are getting slow despite using kaniko, does depot has a better caching than kaniko?

How so?

jacobwg · on May 23, 2023

It should yeah, our builders are based on BuildKit rather than Kaniko, which optimizes for building container images in parallel and caching as much as possible. BuildKit also supports some more advanced types of caches, such as cache mounts: https://github.com/moby/buildkit/blob/master/frontend/docker...

Both Kaniko and BuildKit can be run in rootless mode - we are not doing this, instead we give every builder access to an isolated VM, so builds are a bit quicker as well by avoiding some of the security tricks that rootless needs to work.

debarshri · on May 23, 2023

Where does this isolated VM run?

jacobwg · on May 23, 2023

In AWS - we launch either Intel or Arm EC2 instances depending on the requested build platform (or both for multi-platform builds). When a project's builds are running, they have sole control of that instance, which is terminated when the builds are done.

To make this performant we keep a certain number of spare "warm" machines ready for build requests so that you don't have to pay the instance launch time penalty yourself.

debarshri · on May 23, 2023

Just to clarify, when you run depot build, does build run locally or it runs remotely in an ec2 instances? Also, it sounds like the instances is on your side, not on customers infrastructure. Compounding build time is a problem, but I think we solved it with buildkit cache. But the setup you are describing, if I understand correctly might be a no-go for enterprise customers. May be you are going after the mid-market companies, in that case it might work. Just an opinion from my side.

jacobwg · on May 23, 2023

I think Kyle answered this below, about the option for enterprises to run the data plane of Depot in their own cloud account. In that model, the Depot CLI connects directly to that data plane without passing through any infrastructure on our side.

> I think we solved it with buildkit cache

One big thing we're doing here, if you're familiar with BuildKit cache, is providing builds a stable cache SSD that's reused between builds. This means we support all of BuildKit's caching features, including things like cache mounts that aren't directly supported in ephemeral CI environments. Plus Depot doesn't need to save or load the cache to a remote store like S3 or GitHub Actions cache, instead the previous cache is immediately available on build start.

This may not be any better or different than what you're doing, I just wanted to mention the detail for anyone familiar with trying to make BuiltKit more performant.

alexeldeib · on May 23, 2023

It’s a client/server model with remote build kit, I believe.

> the setup you are describing, if I understand correctly might be a no-go for enterprise customers

Fwiw, I know at least buildbuddy operates in a similar space (hosted bazel cache/builds, client/server style).

aidos · on May 23, 2023

I haven’t tried but I believe you can do either. You can use their api and tooling and it will connect to agents in your infra.

kylegalbraith · on May 23, 2023

Hi there! Kyle here, the other half of Depot. This is correct, we have a self-hosted data plane model that larger enterprises can use if they want full control over the builders + build cache.

In that deployment model, the Depot control plane passes changes to make in the customers environment via a small agent they run in their account. Here is some docs we put together for anyone that wants to go into a bit more of the details: https://depot.dev/docs/self-hosted/architecture

poulpi · on Dec 15, 2022

hey,

I'm one the builder of this catalog!

What is exciting is that the database behind it is open source: https://github.com/whalyapp/connector-catalog-data

This way, we can ensure that the catalog stay fresh as the work to maintain it is on the vendors side that have an incentive to push updates of their catalog.

It also ensure complete transparency of how the catalog is being built.

The tech behind it is a Next.js app that is statically generating the website from the catalog data to ensure a good SEO capacity.

The end goal is to offer a central place where the community (vendors + users) can exchange the current capabilities of the connector market.

Next steps are: - Add the capability for customers to provide comments / feedbacks on connectors that they tried / used - Add more info on the Vendors pages (which destinations they support to write data, pricing info)

Tell us what you think!

mritchie712 · on Dec 15, 2022

Have you seen https://www.moderndatastack.xyz/?

On this page: https://whaly.io/product/mission

It's says pairs well with Airbyte, but right above that you say Whaly handles ETL. Those two seem to conflict, do you use Airbyte for the ETL?

poulpi · on Nov 14, 2022

Actually, if you want something to stay, you should go for stone rather than metal.

Stone have low market value while metal can always be melted to do something else (like weapons).

It's one of my main take away of my art history lessons -> most antic art done on metal has been lost, but the stone remains!

itronitron · on Nov 14, 2022

A friend of mine would use fiberglass resin, bondo, and plywood because they wanted their sculptures, which they then boxed into custom-sized crates, to last for at least one hundred years.

inasio · on Nov 15, 2022

That was also what they went with in the 3 body problem

poulpi · on Nov 8, 2022

Congrats team!

poulpi · on Nov 8, 2022

The data behind the website is open source and can be contributed / maintained by the connectors vendors!

https://github.com/whalyapp/connector-catalog-data

poulpi · on Aug 9, 2022

Very cool!

I've seen too many companies with spaghetti integration built on Zapier / Make to orchestrate the user onboarding (CRM, database, Finance tool)

At the beginning it's a good idea and at one point everything start to crash and you wonder how to put everything in a got repository!

Windmill seems really cool

rubenfiszel · on Aug 9, 2022

Thank you. I agree, everything starts very nice and it's all downhill from there with rigid tools.

poulpi · on July 16, 2022

Did you have a bad previous experience from their engineering?

poulpi · on Jan 3, 2022

You can find answers to the first question here: https://whaly.io/posts/hacker-news-2021-retrospective

And here: https://app.whaly.io/hacker-news/public/report/3596f39c-5a56...