Thank you! The thing I find tricky here is choosing the weight. I think maybe on...

tmoertel · 2024-10-21T15:52:36.000000Z

All of your `created_at` epoch times are going to be of similar magnitude, since they’re all about the same relative distance from the start of the epoch (usually 1970-01-01T00:00:00). So if you use them as weights directly, you’re in effect taking a uniform sample – every row has about the same chance of being chosen.

Based on what I glean about your scenario, I suspect you’d be better served by some kind of exponentially decaying weight based on age (i.e., age = current_timestamp ‒ created_at). For example, if you wanted to make your rows half as likely to be selected every time they grew another hour older, you could use `POW(2.0, -age / 3600)` as your weight, where age is given in seconds.

    duckdb> WITH Ages AS (
              SELECT age FROM UNNEST([0, 3600, 7200]) AS t(age)
            )
            SELECT age, POW(2.0, -age / 3600) AS w FROM Ages;
    ┌──────┬──────┐
    │ age  ┆ w    │
    ╞══════╪══════╡
    │    0 ┆  1.0 │
    │ 3600 ┆  0.5 │
    │ 7200 ┆ 0.25 │
    └──────┴──────┘

swasheck · 2024-10-21T14:52:39.000000Z

depending on what you’re analyzing, EMA could give you a good weighting method