More

arjunnarayan · 2025-10-13T14:28:18 1760365698

1/4. The max number of recipients of any prize is 3, but it can be split into a half, and then one of those halves into two further halves, for a 1/2 + 1/4 + 1/4 split. (It can also be split equally into 1/3 + 1/3 + 1/3).

arjunnarayan · on April 7, 2023

[disclosure, former cockroachdb engineer]

you can get expected "single shard" performance in CockroachDB by manually splitting the shards (called "ranges" in CockroachDB) along the lines of the expected single shard queries (what you call a "properly shared database"). This is easy to do with a single SQL command. (This is what we do today; we use CockroachDB for strongly consistent metadata).

The difference between CockroachDB and a manually sharded database is that when you _do_ have to perform some cross-shard transactions (which you inevitably have to do at some point), in CockroachDB you can execute them (with a reasonable performance penalty) with strong consistency and 2PC between the shards, whereas in your manually sharded database... good luck! Hope you implement 2PC correctly.

dravita · on April 7, 2023

The point about strong consistency and 2PC cross-shard is a good one. Even among other "auto-sharding" distributed relational databases, not all of them provide that.

arjunnarayan · on Feb 17, 2023

This disconnect is called the "Tullock Paradox" in Economics, after Gordon Tullock who first asked it in "The Purchase of Politicians" (1972) [couldn't find an online link].

You'll find a more recent discussion in "Why is There so Little Money in U.S. Politics?" (2003). [1]

[1] https://www.aeaweb.org/articles?id=10.1257/08953300332116497...

tomp · on Feb 17, 2023

Much more recent:

TOO MUCH DARK MONEY IN ALMONDS (2019)

https://slatestarcodex.com/2019/09/18/too-much-dark-money-in...

Natsu · on Feb 17, 2023

Sometimes I think it's like dark matter, it's just out of the public eye in, I dunno, art sales or cushy jobs for family members that funnel money to one another.

arjunnarayan · on May 3, 2021

Materialize | Engineering, Product, Marketing | NYC HQ + North America Remote + Europe Remote | https://materialize.com/careers

Materialize is a streaming database for real-time applications. Materialize lets you ask questions about your data, and then get low-latency, correct answers, which are kept incrementally updated as the underlying data changes. Materialize is built on Timely Dataflow, a low-latency cyclic dataflow computational model, first introduced in the paper "Naiad: a timely dataflow system".

Materialize is co-founded by Frank McSherry, the primary author of Timely Dataflow (http://timelydataflow.com) and Differential Dataflow (http://differentialdataflow.com), the two open source projects that power Materialize. Materialize itself is source-available and entirely written in Rust: https://github.com/MaterializeInc/materialize

Materialize is a team of over thirty, primarily based in New York City but also open to remote positions in the EU and NA. We are hiring in all engineering positions (eng. manager, engineers from new grad to principal) as well as several non-engineering positions - for the full list, see https://materialize.com/careers

We are a team of significantly experienced individuals in databases and distributed systems, and looking to add more folks with that interest and/or experience to our team. Materialize recently raised a $32m Series B led by Kleiner Perkins, which was lovingly hacker newsed: https://news.ycombinator.com/item?id=25277511

arjunnarayan · on April 14, 2021

A high speed collision in low orbit can change a circular low orbit into an elliptical eccentric orbit that intersects a higher circular orbit, but unless there is an additional accelerating event at that higher altitude, it cannot recircularize its orbit at that higher altitude.

There are thus two takeaways:

1. By definition, this means that part of the orbit will always be at low altitude, regardless of the collision dynamics. So this means that it will continue to decay over time, albeit perhaps at a slower rate (decay being proportional to the time spent at lower altitude).

2. While that eccentric orbit will intersect with a higher circular orbital plane, it does so in a predictable fashion that can be routed around. The higher orbits are also much sparser, so the chance of this intersecting with a satellite that is already present is very, very small.

IlliOnato · on April 14, 2021

Thank you, others pointed this out too. Proves that "celestial mechanics" is trickier than I thought.

I guess if a collision is messy enough, there would be secondary collisions between debris pieces, and it sounds like these in principle can push some junk into higher orbits. But I think the probability is really low; this should not be a concern.

arjunnarayan · on April 1, 2021

Materialize | Engineering, Product, Marketing | NYC HQ + North America Remote + Europe Remote | https://materialize.com/careers

Materialize is a streaming database for real-time applications. Materialize lets you ask questions about your data, and then get low-latency, correct answers, which are kept incrementally updated as the underlying data changes. Materialize is built on Timely Dataflow, a low-latency cyclic dataflow computational model, first introduced in the paper "Naiad: a timely dataflow system".

Materialize is co-founded by Frank McSherry, the primary author of Timely Dataflow (http://timelydataflow.com) and Differential Dataflow (http://differentialdataflow.com), the two open source projects that power Materialize. Materialize itself is source-available and entirely written in Rust: https://github.com/MaterializeInc/materialize

Materialize is a team of over thirty, primarily based in New York City but also open to remote positions in the EU and NA. We are hiring in all engineering positions (eng. manager, engineers from new grad to principal) as well as several non-engineering positions - for the full list, see https://materialize.com/careers

We are a team of significantly experienced individuals in databases and distributed systems, and looking to add more folks with that interest and/or experience to our team. Materialize recently raised a $32m Series B led by Kleiner Perkins, which was lovingly hacker newsed: https://news.ycombinator.com/item?id=25277511

arjunnarayan · on March 2, 2021

Thank you for your kind words! We indeed have plenty of work to be done (and are thus hiring)! I'm curious however why you think this requires you to be all-in on Materialize. As you said better than I could have, dbt is amazing at keeping your business logic organized. Our intention is very much for dbt to standardize the modeling/business logic layer which allows you to use multiple backends as you see fit in a way that shares the catalog layer cleanly.

Our hope is that you have some BigQuery/Snowflake job that you're tired of running up the bill hitting redeploy 5 times a day, and you can cleanly port that over to Materialize with little work because the adapter is taking care of any small semantic differences in date handling, or null handling, etc. So Materialize sits cleanly side-by-side with Snowflake/BigQuery, and you're choosing whether you want things incrementally maintained with a few seconds of latency by Materialize, or once a day by the batch systems.

My view is you're likely going to want to do data science with a batch system (when you're in "learning mode" you try and keep as many things fixed, including not updating the dataset), and then if the model becomes a critical automated pipeline, rather than rerunning the model every hour and uploading results to a Redis cache or something, you switch it over to Materialize, and don't have to every worry about cache invalidation.

abrazensunset · on March 2, 2021

In that situation (dual usage modes) I think I'd rather have the primary data store be Materialize, and just snapshot Materialize views back to your warehouse (or even just to an object store).

Then you could use that static store for exploration/fixed analysis or even initial development of dbt models for the Materialize layer, using the Snowflake or Spark connectors at first. When something's ready for production use, migrate it to your Materialize dbt project.

The way dbt currently works with backend switching (and the divergence of SQL dialects with respect to things like date functions and unstructured data), maintaining the batch and streaming layers side by side in dbt would be less wasteful than the current paradigm of completely separate tooling, but still a big source of overhead and synchronization errors.

If the community comes up with a good narrative for CI/CD and data testing in flight with the above, I don't think I'd even hesitate to pull the trigger on a migration. The best part is half of your potential customers already have their business logic in dbt.

jdotjdot · on March 2, 2021

I should clarify; I don't think that for the general case you have to go all-in on Materialize, but for the case in my comment--where you are effectively using business logic within Materialize views as the "source of truth" of logic across all of both your analytics and your operation--that requires buy-in. Additionally, if I'm _already_ sending all of my data to a database or to my data warehouse, ETLing all of that data to Materialize also is rather burdensome. Just because I technically could run Materialize side by side with something doesn't mean I necessarily want to, especially given the streaming use case requires a lot more maintenance to get right and keep running in production.

I fully agree with you that for many data science cases, you're likely to stick with batching. Where I see Materialize to be the most useful, and where I'd be inspired to use it and transform how we do things, would be the overlap between when Analytics team are writing definitions (e.g., what constitutes an "active user"?) and are typically doing so on the warehouse, but then I want those definitions to be used, up to date, and available everywhere in my stack, including analytics, my operational database, and third-party tools like marketing tools.

Personally, I'm less interested in one-off migrations like you're suggesting. What I really want is to have something like Materialize embedded in my Postgres. (Such a thing should be doable at minimum by running Materialize + Debezium side-by-side with Postgres and then having Postgres interact with Materialize via foreign data wrappers. It would need some fancy tooling to make it simple, but it would work.) In such a scenario, a Postgres + Materialize combo could serve as the "center of the universe" for all the data AND business definitions for the company, and everything else stems from there. Even if we used a big data warehouse in parallel for large ad hoc queries (which I imagine Materialize wouldn't handle well, not being OLAP), I would ETL my data to the warehouse from Materialize--and I'd even be able to include ETLing the data from the materialized views, pre-calculated. If I wanted to send data to third-party tools, I'd use Materialize in conjunction with Hightouch.io to forward the data, including hooking into subscriptions when rows in the materialized views change.

For what I propose, there are some open questions about data persistence, high availability, the speed to materialize an initial view for the first time, and backfilling data, among other things. But I think this is where Materialize has a good chance of fundamentally changing how analytical and operational data are managed, and I think there's a world where data warehouses would go away and you'd just run everything on Postgres + Materialize + S3 (+ Presto or similar for true OLAP queries). I could see myself using Materialize for log management, or log alerting. I'm just as excited to see pieces of it embedded in other pieces of infrastructure as I am to use it as a standalone product.

arjunnarayan · on March 2, 2021

Thank you very much for the elaboration, I really appreciate the thinking!

> Personally, I'm less interested in one-off migrations like you're suggesting. What I really want is to have something like Materialize embedded in my Postgres.

We're about to launch "Materialize as a Postgres read-replica" where it connects to a Postgres leader just as a Postgres read-replica would - using the built-in streaming replication in newer versions of postgres. It's currently in final testing before being released in the next month or two.

https://github.com/MaterializeInc/materialize/issues/5370

Also on our roadmap for Materialize Cloud is plug-and-play connections with Fivetran, Hightouch, and Census (and more) to bring in the business data, and allow you to, as you put it, make Materialize the central collection point for keeping updated all your views.

gbrits · on March 2, 2021

Exciting! Does this already have the concept of backfilling as mentioned by GP. Interested to know as well

frafra · on March 2, 2021

I agree with you. If you are focused on PostgreSQL, you could find interesting the work being done on IVM: https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc... - https://github.com/sraoss/pgsql-ivm

arjunnarayan · on March 1, 2021

Materialize | Engineering, Product, Marketing | NYC HQ + North America Remote + Europe Remote | https://materialize.com/careers

Materialize is a streaming database for real-time applications. Materialize lets you ask questions about your data, and then get low-latency, correct answers, which are kept incrementally updated as the underlying data changes. Materialize is built on Timely Dataflow, a low-latency cyclic dataflow computational model, first introduced in the paper "Naiad: a timely dataflow system".

Materialize is co-founded by Frank McSherry, the primary author of Timely Dataflow (http://timelydataflow.com) and Differential Dataflow (http://differentialdataflow.com), the two open source projects that power Materialize. Materialize itself is source-available and entirely written in Rust: https://github.com/MaterializeInc/materialize

Materialize is a team of around thirty, primarily based in New York City but also open to remote positions in the EU and NA. We are hiring in all engineering positions (eng. manager, engineers from new grad to principal) as well as several non-engineering positions - for the full list, see https://materialize.com/careers

We are a team of significantly experienced individuals in databases and distributed systems, and looking to add more folks with that interest and/or experience to our team. Materialize recently raised a $32m Series B led by Kleiner Perkins, which was lovingly hacker newsed: https://news.ycombinator.com/item?id=25277511

arjunnarayan · on Feb 3, 2021

protobufs still get encoded and decoded by each client when loaded into memory. arrow is a little bit more like "flatbuffers, but designed for common data-intensive columnar access patterns"

hackcasual · on Feb 3, 2021

Arrow does actually use flatbuffers for metadata storage.

poorman · on Feb 3, 2021

https://github.com/apache/arrow/tree/master/format

hawk_ · on Feb 3, 2021

is that at the cost of the ability to do schema evolution?

arjunnarayan · on Feb 1, 2021

Materialize | Engineering, Product, Marketing | NYC HQ + North America Remote + EU Remote | http://materialize.io/careers

Materialize is a streaming database for real-time applications. Materialize lets you ask questions about your data, and then get low-latency, correct answers, which are kept incrementally updated as the underlying data changes. Materialize is built on Timely Dataflow, a low-latency cyclic dataflow computational model, first introduced in the paper "Naiad: a timely dataflow system".

Materialize is co-founded by Frank McSherry, the primary author of Timely Dataflow (http://timelydataflow.com) and Differential Dataflow (http://differentialdataflow.com), the two open source projects that power Materialize. Materialize itself is source-available and entirely written in Rust: https://github.com/MaterializeInc/materialize

Materialize is a team of around thirty, primarily based in New York City but also open to remote positions in the EU and NA. We are hiring in all engineering positions (eng. manager, engineers from new grad to principal) as well as several non-engineering positions - for the full list, see http://materialize.io/careers

We are a team of significantly experienced individuals in databases and distributed systems, and looking to add more folks with that interest and/or experience to our team. Materialize recently raised a $32m Series B led by Kleiner Perkins, which was lovingly hacker newsed: https://news.ycombinator.com/item?id=25277511