Hacker News new | past | comments | ask | show | jobs | submit | singron's comments login

> Server actions are public, unauthenticated routes

Why can't they be authenticated? That seems like the obvious fix. Otherwise how you are handing out the correct customer_id unless you authenticate somehow?

This scheme also complicates API key rotation, although you can work around it by trying to decrypt with both the old and new key if you use e.g. authenticated encryption.

This also has no mechanism for expiration (besides API key rotation). If you add an expiration time and sign it, then you essentially created an authentication token that you use as the customer_id.


Maybe we didn't phrase it as well as we should've. We meant to say API routes in general are public, and so the server actions could be called by anyone.

Authentication is definitely possible, but we were trying to brainstorm a way where users could have protected routes with as little set up as possible, the ideal being they just pass in customerId into a Provider component

We also did think about things like registering an auth function but felt that being able to just pass in customerId would be a magical experience!

Definitely acknowledge that the current mechanism has flaws though -- it's really more of an experiment at the moment, and if it does indeed become very popular with users we would implement auth mechanisms like JWT and what not -- though that would kinda be reinventing the wheel


The current mechanism has security flaws that are hiding because of SSR. Try to implement the same flow in pure-vanilla-js and you'll realize that you're hitting replay attacks instantly. This is the same vulnerability that companies which try to "hash the password on the client side to protect it" face - they've merely transformed the password to a different one (the hashed one) which has the exact same semantics as the original password for an attacker.

Your encrypted customer ID has the exact same semantics as the original customer ID for an attacker, and is insecure.


Yesterday, Today, Blockers. I.e. the typical standup update.

In an open office, room-less meetings are quite disruptive. I still remember what the completely unrelated team two rows away was working on 8 years ago since I listened to them talk about it for 10 minutes every day. (I also apologize to everyone else since our team did the same thing)

If "30" minute meetings start 5 minutes late, then you can only go 5 past reliably.


As a bonus: the vanilla JFA can only calculate unsigned distances, but you can extend this to signed distance computation using a simple trick: by inverting your JFA result and setting it as the seed for running a second DFA. (See https://blog.demofox.org/2016/03/02/actually-making-signed-d... for a better explanation)

Thats a cool algorithm!! I couldnt find resources on how it might be used to compute distance functions (though it seems like it can). It seems to be for approximating voronoi diagrams.

The two problems are highly related, which is why it can do both. At the end of the algorithm, you have a per-pixel assignment of the (approximately) closest seed pixel. If you want a voronoi diagram, then you color each pixel according to the identify of its seed pixel. If you want a distance function, then you color it with the distance to that pixel.

> You'll have a monolith, it might break out into frontend, backend and a separate service for async background jobs

And when you break these out, you don't actually have to split your code at all. You can deploy your normal monolith with a flag telling it what role to play. The background worker can still run a webserver since it's useful for healthchecks and metrics and the loadbalancer will decide what "roles" get real traffic.


If you are building the same binary for all microservices you lose the dependency-reduction benefit microservices provide, since your build will still break because of some completely unrelated team's code.

If it is possible for that other team to merge a broken build, you are doing it wrong.

If you are concerned about someone else breaking your thing, good! You were going to eventually break it yourself. Write whatever testing gives you confidence that someone else's changes won't break your code, and, bonus, now you can make changes without breaking your code.


> If it is possible for that other team to merge a broken build, you are doing it wrong.

This assertion is unrealistic and fails to address the problem. The fact that builds can and do break is a very mundane fact of life. There are whole job classes dedicated to mitigate the problems caused by broken builds, and here you are accusing others of doing things wrong. You cannot hide this away by trying to shame software developers for doing things that software developers do.

> Write whatever testing gives you confidence that someone else's changes won't break your code, and, bonus, now you can make changes without breaking your code.

That observation is so naive that casts doubt on whether you have any professional experience developing software. There are a myriad of ways any commit can break something that goes well beyond whether it compiles or not. Why do you think that companies, including FANGs, still hire small armies of QAs to manually verify if things still work once deployed? Is everyone around you doing things wrong, and you're the only beacon of hope? Unreal.


I haven't seen a broken build in at least nine years, not since I left the company with a merge process built out of bash scripts that took three hours and required manual hand-holding.

I am genuinely curious what situations you are seeing where builds are making it through CI and then don't compile.

It isn't always worth investing in quality, but when it is it is entirely possible to write essentially bug-free software. I've gone seven months without a bug in production and the one we saw we had a signed letter from product saying "I am okay if this feature breaks, because I think writing the tests that can verify this integration was going to take too long."

FAANG companies aren't prioritizing writing software well: they are prioritizing managing 50,000 engineers. Which is a much harder problem, but the management solutions that work for that preclude the techniques that let us write bug-free software.

One of the great things about startups is that it is trivial to manage five engineers, so there is no reason we have to write software badly.


> (...) it is entirely possible to write essentially bug-free software (...)

You lost what little credibility you had left.


You are absolutely right.

Of course, if people wrote bug-free code, then there would be no bug !

Bug-free code in the actual code, or bug-free code in the test code, this is the same story.

If you write stuff and never have any bug, then either:

  - you are lying
  - you do not write much
  - you only write really simple things
  - you are Jesus, came back from heaven to shine his light on us, poor souls
The more complicated, intricate stuff you have, the more bugs you'll get (and only time will allow you to fix that).

Tests are great do define how you think it should work, and to ensure it keeps doing that way. Take the time to think about the third point on the bullet list above.


This seems a bit much.

In the DVCS era, we have inexpensive branching. Do as thou wilt on your topic or epic branches. Rebase them against main/master before merging upwards. Fix what must be fixed first.

Main/master branch should never fail CI. If it does, there is something seriously wrong with your branch lifecycle and/or deployment process.


Sure, but "never fails CI" is a different assertion from "has no bugs", or even "deploys correctly in production".

If you test defensively[0], CI will catch the vast majority of functional bugs. Anything that is missed suggests at least two bugs: one in tests, and one in business logic.[0]

A reliable deployment pipeline is outside of CI, but can be kept straightforward and minimal to constrain the scope of failures.

Bugs happen, and systems complexify. It is possible to manage both of those risks down to near-zero by the time code reaches production release candidate stage.

In some industries, this is more important than others, though -- obviously the goal is to match quality to business needs.

But I agree with our thread predecessor: I haven't seen a broken build make it anywhere near production in many years, and it's not because of the snarky dismissal that provoked my original response.

[0] In some situations, proper tests are not possible, and in others they are not practical. And I acknowledge that I'm omitting things like visual design/layout bugs, which is probably not reasonable.


I disagree. I write complex code, and it is essentially bug-free.

And no, I'm not Jesus, I just care a lot about quality and have spent the last 20 years finding ways and strategies to improve it.

Reducing the number of bugs does not mean being a god that writes bug-free code on the first draft. It means being able to detect and fix issues as early as possible. In my case I aim to always do that before letting myself push any code to git.

IMO, it only comes down to how much someone really cares about the quality, but here are some examples of what can be done and is very effective:

- Plan ahead your functional and technical design

- Carefully research existing code to confirm the feasibility of the design

- Use a statically typed language

- Use advanced static-analysis tools

- Avoid magic, write explicit code. Especially avoid runtime checks such as reflection. Ideally, everything should be checked statically one way or another.

- Never let a code path/branch/corner case be unhandled, however unlikely it is (and go back to step one to refine the design if a code path has been forgotten in the current design)

- Always have automated testing. The bare minimum is to unit-test all business logic, including all possible code paths. Ideally e2e tests are nice, but not always a good investment. Tests must be 100% independent and never depend on an external environment, otherwise it's going to be flaky at some point.

- Always manually test every feature and path related to my changes (especially don't skip testing the ones that I think are going to be ok) before pushing anything to git.

- Warnings and "optional" notices are unacceptable and must always be fixed (or disabled), otherwise the list will just keep growing, which reduces the visibility of any issue and normalizes having problems.

- Have a CI integration that applies all the automated checks mentioned in this list and make everything mandatory.

Each one of those actions does on it's own significantly reduce the number of bugs. If you combine them all, you can effectively reduce the number of bugs to pretty-much zero. And since the earlier you find a bug, the cheaper is it overall to fix, I've also found-out that in terms of productivity it's always worth the investment (despite many people pretending the opposite).


Even if it builds successfully, I've never worked anywhere where automated tests prevented 100% of problems and I doubt I ever will. For most systems of sufficient complexity you are testing in prod, even if you did a lot of testing before prod as well.

That's even more true for microservices, though, since I have yet to see a microservice architecture that automatically runs end to end tests before deploying.

The post I was replying to said "your build will still break": that's what I was taking issue with. In this day and age there is no reason our trunk build should ever be broken.


> I have yet to see a microservice architecture that automatically runs end to end tests before deploying.

One of the big tenets of independent services is that your APIs are contracts that don't change behaviour. As long as each individual service doesn't introduce breaking changes, the system as a whole should work as expected. If it doesn't this is indicative of either 1) a specific service lacking test coverage, or 2) doing something wrong i.e. directly reading from a microservices' database without going through an API.


> One of the big tenets of independent services is that your APIs are contracts that don't change behaviour

How is that any different from an API in a monolith not changing behavior?


Yes, I suspect some of the back and forth is the fuzziness of the term "broken build", whether that means the code literally doesn't compile or it does but the code does the wrong thing.

I agree that you can prevent merges that cause compilation errors in nearly all cases!


What about when it’s you breaking your own thing?

A very large code base full of loosely related functionality makes it more and more likely a change in one part will break another part in unexpected ways.


You have a point, but I wouldn't say this is a big deal unless there is a mammoth dependency somewhere that slow down things to a crawl. Then maybe that one part of the codebase can be broken into its own separate service.

But even then there are ways around this kind of problem with dynamic linking pre-built binaries and caching, but it is extra complexity that could be worse than managing multiple services. Docker cache can usually handle this pretty well though.


You’ll still get some isolation since not all pathways share the same code. It’s not all or nothing.

Anyone know how this compares to Terrain3D?

https://github.com/TokisanGames/Terrain3D


Terrain3D is very limited in what it can do.

You will quickly find a ceiling.


NixOS works really well for me. I used to write these kinds of idempotent scripts too but they are usually irrelevant in NixOS where that's the default behavior.

And regarding this part of the article

> Particularly with GitOps and Flux, making changes was a breeze.

i'm writing comin [1] which is GitOps for NixOS machines: you Git push your changes and your machines fetch and deploy them automatically.

[1] https://github.com/nlewo/comin


The original complaint alleges that the training process requires copying the material into the model and thus requires consent of the copyright holder. (Copyright protects copying but notably not use, so the complaint has to say they copied it in order to have standing). Then it says they didn't have consent.

They also mention Books3, but they don't appear to actually allege anything against Meta in regards to it and are just providing context.

I don't think it actually changes anything material about this complaint if Meta bought all the books at a bookstore since that also doesn't give you the right to copy the works.

The original complaint is 2 years old though, so I don't really know the current state of argumentation.

https://www.courtlistener.com/docket/67569326/1/kadrey-v-met...

Note that incidental copying (i.e. temporary copies made by computers in order to perform otherwise legal actions) is generally legal, so "copying" in the complaint can't refer merely to this and must refer more broadly to the model itself being a copy in order to have standing.


Check out the parallel consumer: https://github.com/confluentinc/parallel-consumer

It processes unrelated keys in parallel within a partition. It has to track what offsets have been processed between the last committed offset of the partition and the tip (i.e. only what's currently processed out of order). When it commits, it saves this state in the commit metadata highly compressed.

Most of the time, it was only processing a small number of records out of order so this bookkeeping was insignificant, but if one key gets stuck, it would scale to at least 100,000 offsets ahead, at which point enough alarms would go off that we would do something. That's definitely a huge improvement to head of line blocking.


Disclosure (given this is from Confluent): I'm ex MSK (Managed Streaming for Kafka at AWS) and my current company was competing with Confluent before we pivoted.

Yup, this is one more example, just like Pulsar. There are definitely great optimizations to be made on the average case. In the case of parallel consumer, if you'd like to keep ordering guarantees, you retain O(n^2) processing time in the worst case.

The issues arise when you try to traverse arbitrary dependency topologies in your messages. So you're left with two options:

1. Make damn sure that causal dependencies don't exhibit O(n^2) behavior, which requires formal models to be 100% sure. 2. Give up ordering or make some other nasty tradeoff.

At a high level the problem boils down to traversing a DAG in topological order. From computer science theory, we know that this requires a sorted index. And if you're implementing an index on top of Kafka, you might as well embed your data into and consume directly from the index. Of course, this is easier said than done, and that's why no one has cracked this problem yet. We were going to try, but alas we pivoted :)

Edit: Topological sort does not required a sorted index (or similar) if you don't care about concurrency. But then you've lost the advantages of your queue.


> traverse arbitrary dependency topologies

Is there another way to state this? It’s very difficult for me to grok.

> DAG

Directed acyclic graph right?


Apologies, we've been so deep into this problem that we take our slang for granted :)

A graphical representation might be worth a thousand words, keeping in mind it's just one example. Imagine you're traversing the following.

A1 -> A2 -> A3...

|

v

B1 -> B2 -> B3...

|

v

C1 -> C2 -> C3...

|

v

D1 -> D2 -> D3...

|

v

E1 -> E2 -> E3...

|

v

F1 -> F2 -> F3...

|

v

...

Efficient concurrent consumption of these messages (while respecting causal dependency) would take O(w + h), where w = the _width_ (left to right) of the longest sequence, and h = the _height_ (top to bottom of the first column)

But Pulsar, Kafka + parallel consumer, Et al. would take O(n^2) either in processing time or in space complexity. This is because at a fundamental level, the underlying data storages store looks like this

A1 -> A2 -> A3...

B1 -> B2 -> B3...

C1 -> C2 -> C3...

D1 -> D2 -> D3...

E1 -> E2 -> E3...

F1 -> F2 -> F3...

Notice that the underlying data storage loses information about nodes with multiple children (e.g., A1 previously parented both A2 and B1)

If we want to respect order, the consumer will be responsible for declining to process messages that don't respect causal order. E.g., attempting to process F1 before E1. Thus we could get into a situation where we try to process F1, then E1, then D1, then C1, then B1, then A1. Now that A1 is processed, kafka tries again, but it tries F1, then E1, then D1, then C1, then B1... And so on and so forth. This is O(n^2) behavior.

Without changing the underlying data storage architecture, you will either:

1. Incur O(n^2) space or time complexity

2. Reimplement the queuing mechanism at the consumer level, but then you might as well not even use Kafka (or others) at all. In practice this is not practical (my evidence being that no one has pulled it off).

3. Face other nasty issues (e.g., in Kafka parallel consumer you can run out of memory or your processing time can become O(n^2)).


Wanted to say thanks so much for writing this all out - I've always thought of ordering as being sort of inherently against the point of parallel streams, so its interesting to hear about the state of the art and the benefits that are trying to be gleaned! I'm not thinking in stream processors terribly often so I wasn't aware of how dependencies are mapped.

If you don't mind another followup (and your patience with my ignorance hasn't run out :P), wouldn't the efficient concurrent consumption imply knowing the dependency graph before the events are processed? IE, is it possible in any instance to get to O(w+h) in a stream?


No problem. :)

Yes, order needs to be known.

So no, it’s not possible to do O(w+h) with streams partitioned by key. Unless, of course you use a supplementary index, but then you might as well not use the streams storage at all and store the records in the same storage as the index.

It’s worth noting that Pulsar does something like this (supplementary way to keep track of acknowledged messages), but their implementation has O(n^2) edge cases.


Do you have an example use case for this? This does seem like something unsuited to kafka, but I'm having a hard time imagining why you would structure something like this.


Great follow up question, thank you. I could talk about this "topic" for days, so I appreciate the opportunity to expand. :)

Let's imagine ourselves as a couple of engineers at Acme Foreign Exchange House. We'd like to track Acme's net cash position across multiple currencies, and execute trades accordingly (e.g., heding). And we'd like to retrospectively analyze our hedges, to assess their effectiveness.

Let's say I have this set of transactions (for accounts A, B, C, D, E, F, etc.)

A1 -> A2 -> A3 -> A4

B1 -> B2 -> B3 -> B4

C1-> C2

D1 -> D2 -> D3 -> D4

E1 -> E2

F1

Let's say that that:

- E1 was a deposit made into account E for $2M USD.

- E2 was an outgoing transfer of $2M USD sent to account F (incoming £1.7M GBP at F1).

If we consume our transactions and partiton our consumption by account id, we could get into a state where E1 and F1 are reflected in our net position, but E2 isn't. That is, our calculation has both $2M USD and £1.7M GBP, when in reality we only ever held either $2M USD or £1.7M GBP.

So what could we do?

1. Make sure that we respect causality order. I.e., there's no F1 reflected in our net position if we haven't processed E2.

2. Make sure that pairs of transactions (e.g., E2 and F1) update our net position atomically.

This is otherwise known as a "consistent cut" (see slide 25 here https://www.cs.cornell.edu/courses/cs6410/2011fa/lectures/19...).

Opinion: the world is causally ordered in arbitrary ways as above. But the tools, frameworks, and infrastructure more readily available to us struggle at modeling arbitrary partially ordered causality graphs. So we shrug our shoulders, and we learn to live with the edge cases. But it doesn't have to be so.


I suppose it depends on your message volume. To me, processing 100k messages and then getting a page however long later as the broker (or whatever) falls apart sounds much worse than head of line blocking and seeing the problem directly in my consumer. If I need to not do head of line blocking, I can build whatever failsafe mechanisms I need for the problematic data and defer to some other queueing system (typically, just add an attempt counter and replay the message to the same kafka topic and then if attempts > X, send it off to wherever)

I'd rather debug a worker problem than an infra scaling problem every day of the week and twice on Sundays.


It's interesting you say that, since this turned an infra scaling problem into a worker problem for us. Previously, we would get terrible head-of-line throughput issues, so we would use an egregious number of partitions to try to alleviate that. Lots of partitions is hard to manage since resizing topics is operationally tedious and it puts a lot of strain on brokers. But no matter how many partitions you have, the head-of-line still blocks. Even cases where certain keys had slightly slower throughput would clog up the whole partition with normal consumers.

The parallel consumer nearly entirely solved this problem. Only the most egregious cases where keys were ~3000 times slower than other keys would cause an issue, and then you could solve it by disabling that key for a while.


Yeah I'd say kafka is not a great technology if your median and 99ths (or 999ths if volume is large enough) are wildly different which sounds like your situation. I use kafka in contexts where 99ths going awry usually aren't key dependent so I don't have the issues you see.

I tend to prefer other queueing mechanisms in those cases, although I still work hard to make 99ths and medians align as it can still cause issues (especially for monitoring)


Follow on: If you're using kafka to publish messages to multiple consumers, this is even worse as now you're infecting every consumer with data processing issues from every other consumer. Bad juju


There's also a similar project from Line https://github.com/line/decaton.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: