Hacker Newsnew | past | comments | ask | show | jobs | submit | ijcd's commentslogin


Yup, and AWS Code*


Typescript/React and Go. Ruby and Ember once upon a time.


100s of services and databases to work out and sort through. Good luck building a global real-time video CDN too. You could build your own faster. Microservice architectures mirror the org that built them. You wouldn’t do it the same way for yourself.


In general the broad access was to code repos early on. Some were gated. There’s lots of collaboration and the need to study other code bases for learning and collaboration, read only. It’s micro services galore there so one didn’t tend to have access to production databases for services or systems you didn’t work on. You were opted in there. Teams did their own devops for the most part.

The payout data likely wasn’t ripped from a DB but rather dashboards which customer service or partnerships likely had access to. Tier1 or Tier2 support kinda stuff.

This smells like a stolen backup or maybe network access and http scanning, finding the internal GitHub and maybe a support admin cred that allowed dashboard view.


The extractors directory had 765 extractors in it, only one of which is YouTube. The facts seem cherry-picked to support their case (well duh). Hopefully that will matter. Given that youtube is a tenth of a percent of the extractors maybe there's hope.

FWIW, it's easy to find the code on Google. The release tarballs have the format youtube-dl-YYYY.MM.DD.tar.gz and one can look in the Homebrew Cellar, for example and see there was a release on 2020.09.20. ;)


Even if YouTube was the only extractor, it shouldn't make a lick of difference - not everything on YouTube is owned by the RIAA as they'd like to believe!


My favorite approach is encrypting all of a user’s data (everywhere) and just deleting the key from a central store instead of actually erasing.


Can you be certain that nobody has retained a copy of this key?


Can’t be certain that no one has copied the unencrypted user’s data either.


I don't know if this addresses the underlying issue.


I was about to suggest the exact same. It is literally what we do at my company. Of course you now have to them deal with searching and indexing. To mitigate this issue, we use UUIDs to reference users throughout our systems. Any fields that would link that UUID back to a real person or contain personal info is then kept encrypted. That way, we can still gather all the information from our various components but if we needed to, we can essentially make the user data unreabable.


Doesn't this still leak information? "A person known by these 5 people was in contact with the suicide hotline."


Doesn't this leave you with a bunch of junk data taking up space?


What's the problem with actually just deleting this data when you're done with it though?


Because FKs on a relational model.. as an example, deleting a user/account might end up being a task on going through every reference to it, and the references to its references, etc.

This is actually the reason some companies do not delete users/accounts [0].

[0] https://news.ycombinator.com/item?id=23005060


One of our popular accounts has about 70M worth of rows. I can't imagine how we would go about deleting their data. We rotate out old data each month when it doesn't become needed, but maybe 40M records (4M new each month gets added).


Not all customers are cool with this approach.


For regulatory reasons or because of "you didn't really delete it"? This is similar to deleting a file but not overwriting the space it occupied on the disk, isn't it? The data is still "there", but it's not accessible by normal means.


Is that okay from a GDPR perspective? What if there's an exploit discovered in the implementation of the encryption? Or what if quantum computers can crack it easily in the future?


It is ok for GDPR, it is axon uses it in their commercial GDPR modul (better then our but same principale).

Broken encryption, 20 years better machines in future and quantum are solved with same trick. We have event sourcing.

Implement best current encryption, delete everything except event store, decrypt events with old encryption publish events and everything in now encrypted with best current encryption events in new store. Delete deprecated old event store. Skip aggregates with deleted old key.


We had internal discussions around the "hacky" nature of the solution. Both sides had proponents. The proof was in the numbers and some teams utilized it, others did not. In the end it was a few-line solution that solved the problem neatly and didn't rely on a (terrifying) dynamic solution such as is proposed in this blog post. It was expected the "hack" would be temporary as we expected the Go GC to quickly improve to the point it was not necessary.


Since you seemed to have analyzed this carefully, why couldn't object pools be used to reduce collectable garbage in the first place?


That was done too, of course. There’s so much to get done in these big systems that it’s often most efficient to take the quick win and move on, especially when, as I mentioned, the world is expected to fix the problem for you for free.


Ikr? “ballast is hacky! let me just build my own gc real quick”. Fwiw i think go 1.14 will have the required knob in the runtime package.


The title is misleading. I did a science project on this in the early 1990s. The paper mentions that the retention of memories after decapitation has been known since the 1950s. The news here seems to be that they have developed a computerized system to allow for easier, more consistent study of this phenomenon. This reminds me of Feynman's Cargo Cult of Science essay where he talks about the research that went into how to research rats, and why that was good science that many others ignored.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: