Hacker News new | past | comments | ask | show | jobs | submit | generalizations's comments login

Also interesting: deep sleep therapy where you keep the patient asleep for days or weeks. Mixed results back then; I imagine they didn't have good ways to differentiate the patients who would be helped vs harmed by the therapy.

https://en.wikipedia.org/wiki/Deep_sleep_therapy


I don't have access to that paper - and when I looked for TinyKVM all I found was the rpi-based project that uses the other definition of KVM. Is your project online somewhere? Or is it proprietary?

I can't publish/open-source it, sadly. But the paper I can share: https://www.dropbox.com/scl/fi/38e0la5m6zkc04tlm03w8/Introdu...

Also appreciate the reference. I just realized you're the libriscv author (and as pointed out includeOS contributor). Love all your work!

That's cool. Thanks dude.

Ultimately you have to give a human direct responsibility for it; or in your case, a series of humans.

I'd suggest you do buy a domain, but set up a legal/financial framework so that a long-standing law firm will keep up the payments for N decades (or for as long as the firm & its successors exist).


This is exactly the wrong advice, instead put your data on the blockchain, preferably Bitcoin, Bitcoin Cash or Ethereum.

"Free" data storage for as long as the internet exists.

Make the barcode to your transaction with the data, anyone with a brain and time can figure it out from there.


What generalizations suggests is what people have been doing successfully for hundreds of years to maintain things much more complicated than a website (a multi-generational family estate, for example). What you're suggesting has never been tried for any period of time. Tell me again how his advice is wrong and yours is right?

> "Free" data storage for as long as the internet exists.

Oh ye of so much faith.


? "Free" data storage for as long as the internet exists.

The big blockchains are robust, but I'm wondering if it's not impossible for them to not need the full blockchain at some point.


Whether its right or wrong advice depends on GPs audience.

No one else is going to mention that "separation" was misspelled four times?

If we can all hear the tiny violin, who cares?


Someone created something. Its value greatly exceeds the perceived "degradation of the environment" of a spelling mistake. Not acknowledging that says more about the pedant than the creator.

> mostly from Librivox books

That probably explains a lot. I've tried listening to some of those audiobooks - very hit and miss, mostly miss. Definitely amateur hour and mostly bad quality.


> These micro VMs operate without a kernel or operating system, keeping overhead low. Instead, guests are built specifically for Hyperlight using the Hyperlight Guest library, which provides a controlled set of APIs that facilitate interaction between host and guest

Sounds like this is closer to a chroot/unikernel than a "micro VM" - a slightly more firewalled chroot without most of the os libs, or a unikernel without the kernel. Pretty sure it's not a "virtual machine" though.

Only pointing this out because these sorts of containers/unikernels/vms exist on a spectrum, and each type carries its own strengths and limitations; calling this by the wrong name associates it with the wrong set of tradeoffs.


I guess if it uses CR3 it's a "process" and if it uses VMLAUNCH it's a "VM".

Heh. Going by that delineation we end up with very VM-ish containers and (now) very container-ish VMs. Though this seems like it's even more stripped down than a unikernel - which would also be a "VM" here.

I thought a chroot was not considered a real security boundary?

Chroot is a real security boundary as long as you use it properly. That said, namespaces on Linux are much superior at this point, so I can only recommend using `chroot` for POSIX compliance.

chroot is great for all sorts of things, but they're not security-related.

A lot of tools expect to do things to "your system" at absolute paths — chroot lets those tools operate against an explicitly wired-up semi-virtualized simulacra of your system, designed to pass through just the parts of those operations you want to your real host, while routing the rest of the effects into a "rootfs in a can", that you're either building up, or will immediately throw away.

Think: debootstrap; or pivot-root; or mounting your rootfs to fix your GRUB config and re-run update-grub from your initramfs rescue shell.


Yes. Anything that shares a kernel is a very weak security boundary as the kernel is complex and vulnerabilities are regularly discovered.

> you're making bad decisions and you should do better.

This part was unnecessary btw.


I wonder if this is a real solution. "Memory safety" has sure been pushed hard the last few years, but this feels more like a "we need to do something, this is something, we should do this" kind of response than anything that will really address the issue.

If security-through-virtualization had been the fad lately, would that have been proposed instead?


It is not a real solution. The people delivering memory-safe code today do not think their systems are secure against individual, lone attackers, let alone fully-funded state actors. The overwhelming majority of them, all software developers, and software security professionals probably think it is literally impossible to design and develop usable systems secure against such threats, i.e. can achieve the desired requirements.

Let us do this thing that literally every practitioner thinks can not achieve the requirements and maybe we will accidentally meet the requirements in spite of it is a bona-fide insane strategy. It only makes sense if those are not "requirements", just nice-to-haves; which, to be fair, is the state of software security incentives today.

If you actually want to be secure against state actors, you need to start from things that work, or at least things that people believe could, in principle, work and then work down. Historically, there were systems certified according to the TCSEC Orange Book that, ostensibly, the DoD at the time, 80s to 90s, believed were secure against state actors. A slightly more modern example would be the Common Criteria SKPP which required NSA evaluation that any certified system reached such requirements.

But if you think they overestimated the security of such systems, so there are no actual examples of working solutions, then it still makes no sense to go with things that people know certainly do not work. You still need to at least start from things that people believe could be secure against state actors otherwise you have already failed before you even started.


> f you actually want to be secure against state actors, you need to start from things that work, or at least things that people believe could, in principle, work and then work down. Historically, there were systems certified according to the TCSEC Orange Book that, ostensibly, the DoD at the time, 80s to 90s, believed were secure against state actors. A slightly more modern example would be the Common Criteria SKPP which required NSA evaluation that any certified system reached such requirements.

Right. I was around for that era and worked on some of those systems.

NSA's first approach to operating system certification used the same approach they used for validating locks and filing cabinets. They had teams try to break in. If they succeeded, the vendor was told of the vulnerabilities and got a second try. If a NSA team could break in on the second try, the product was rejected.

Vendors screamed. There were a few early successes. A few very limited operating systems for specific military needs. Something for Prime minicomputers. Nothing mainstream.

The Common Criteria approach allows third-party labs to do the testing, and vendors can try over and over until success is achieved. That is extremely expensive.

There are some current successes. [1][2] These are both real-time embedded operating systems.

[1] https://www.acsac.org/2009/program/case/ccsc-Kleidermacher.p...

[2] https://provenrun.com/provencore/


> That is extremely expensive.

And it proves nothing. And it's as expensive for every update, so forget updates.


Provenrun, used in military aircraft, has 100% formal proof coverage on the microkernel.

We know how to approach this. You use a brutally simple microkernel such as SEL4 and do full proofs of correctness on it. There's a performance penalty for microkernels, maybe 20%, because there's more copying. There's a huge cost to making modifications, so modifications are rare.

The trouble with SEL4 is that it's not much more than a hypervisor. People tend to run Linux on top of it, which loses most of the security benefits.


> The trouble with SEL4 is that it's not much more than a hypervisor. People tend to run Linux on top of it, which loses most of the security benefits.

Well, yeah, that's a problem.

But the bigger problem is that this works for jets, as long as you don't need updates. It doesn't work for general purpose computers, for office productivity software, for databases (is there an RDBMS with a correctness proof?), etc. It's not that one couldn't build such things, it's that the cost would be absolutely prohibitive.


It's not for everything. But the serious verification techniques should be mandatory in critical infrastructure. Routers, BGP nodes, and firewalls would be a good place to start. Embedded systems that control important things - train control, power distribution, pipelines, water and sewer. Get those nailed down hard.

Diagrams like this scare me.[1]

[1] https://new.abb.com/docs/librariesprovider78/eventos/jjtts-2...


> Routers, BGP, ...

Well, but those need new features from time to time, and certification would make that nigh impossible. I'd settle for memory-safe languages as the happy middle of the road.


seL4 is a bit more than a hypervisor, but it's definitely very low-level. In terms of a useful seL4-based system, you may want to look at https://trustworthy.systems/projects/LionsOS/ – not yet verified, but will be.

Ooooo thanks for the rare gem of comment to give me fun things to look up.

I think certification overestimates security, absolutely. Certification proves nothing.

You can use theorem provers to prove correctness, but you can't prove that the business logic is correct. There's a degree to which you just cannot prevent security vulnerabilities.

But switching to memory-safe languages will reduce vulnerabilities by 90%. That's not nothing.


While these types of failures are the 50-70% problem, 30% left seems like a big problem too and the black-hats will just concentrate more on those if the low hanging fruit are removed with rust, C#, python, whatever

50-70% defect reduction is significant, though.

It is true that black hats are going to focus on the remainder pretty much by definition because there’s no other choice. The rest is a problem and it needs a solution. But the fact that it exists is not a sound argument against choosing a memory-safe language.

Current solutions are not ideal but they still eliminate a huge portion of defects. Today we can avoid all the memory issues and we also can focus on the next biggest category of defects. If we’ll keep halving possible defects it won’t take long before software is near defect-free.


I guess my point was it won’t be a “tidal wave” of solved security issues. No all the effort that went into find buffer overflow and use after free errors just gets shifted to combing through code for logic errors and missed opportunities to tighten up checks. It’s not going to be 50-70% reduction. Maybe half that? I mean it would help, but it’s not going to fix the problem in a huge way at all.

It's a 90% solution, not a 100% solution. We don't have a 100% solution, not at any affordable cost.

> Why would anyone ever publish stuff on the web for free unless it was just a hobby

That's exactly what the old deal was, and it's what made the old web so good. If every paid or ad-funded site died tomorrow, the web would be pretty much healed.


That's a bit too simple. There is way fewer people producing quality content "for fun" than people that aim or at least eventually hope to make money from it.

Yes a few sites take this too far and ruin search results for everyone. But taking the possibility away would also cut the produced content by a lot.

Youtube for example had some good content before monetization, but there is a lot of great documentary like channels now that simply wouldn't be possible without ads. There is also clickbait trash yes, but I rather have both than neither.


Demonetizing the web sounds mostly awesome. Good riddance to the adtech ecosystem.

The textual web is going the way of cable TV - pay to enter. And now streaming. "Alms for the poor..."

But, like on OTA TV, you can get all the shopping channels you want.


Not to be the downer, but who pays for all the video bandwidth, who pays for all the content hosting? The old web worked because it was mostly a public good, paid for by govt and universities. At current webscale that's not coming back.

So who pays for all of this?

The web needs to be monetized, just not via advertising. Maybe it's microtransactions, maybe subscriptions, maybe something else, but this idea of "we get everything we want for free and nobody tries to use it for their own agenda" will never return. That only exists for hobby technologies. Once they are mainstream they get incorporated into the mainstream economic model. Our mainstream model is capitalism, so it will be ever present in any form of the internet.

The main question is how people/resources can be paid for while maintaining healthy incentives.


No one paid you to write that?

Except I also pay my network provider to run the infrastructure

I think you forgot that


It costs the Internet Archive $2/GB to store a blob of data in perpetuity, their budget for the entire org is ~$37M/year. I don't disagree that people and systems need to be paid, but the costs are not untenable. We have Patreon, we have subscriptions to your run of the mill media outlets (NY Times, Economist, WSJ, Vox, etc), the primitives exist.

The web needs patrons, contributions, and cost allocation, not necessarily monetization and shareholder capitalism where there is a never ending shuffle of IP and org ownership to maximize returns (unnecessarily imho). How many times was Reddit flipped until its current CEO juiced it for IPO and profitability? Now it is a curated forum for ML training.

I (as well as many other consumers of this content) donate to APM Marketplace [1] because we can afford it and want it to continue. This is, in fits and starts, the way imho. We piece together the means to deliver disenshittification (aggregating small donations, large donations, grants, etc).

(Tangentially, APM Marketplace has recently covered food stores [2] and childcare centers [3] that have incorporated as non profits because a for profit model simply will not succeed; food for thought at a meta level as we discuss economic sustainability and how to deliver outcomes in non conventional ways)

[1] https://www.marketplace.org/

[2] https://www.marketplace.org/2024/10/24/colorados-oldest-busi...

[3] https://www.marketplace.org/2024/08/22/daycare-rural-areas-c...


> There is way fewer people producing quality content "for fun" than people that aim or at least eventually hope to make money from it...But taking the possibility away would also cut the produced content by a lot.

....is that a problem? most of what we actually like is the stuff that's made 'for fun', and even if not, killing off some good stuff while killing off nearly all the bad stuff is a pretty good deal imo.


Agreed. The entire reason why search is so hard is because there's so much junk produced purely to manipulate people into buying stuff. If all of that goes away because people don't see ads there anymore, search becomes much easier to pull off for those of us who don't want to stick to the AI sandbox.

There's a slight chance we could see the un-Septembering of the internet as it bifurcates.


Unless the reason for the death of the paid content deal is because of AI vacuuming up all the content and spitting out an anonymous slurry of it.

Why would anyone, especially a passionate hobbyist, make a website knowing it will never be seen, and only be used as a source for some company's profit?


> and only be used as a source for some company's profit?

Are we forgetting the main beneficiaries? The users of LLM search. The provider makes a loss or pennies on million tokens, they solve actual problems. Could be education, could be health, could be automating stuff.


The problem is not the ad sites dying. The problem is that even the good sites will not have any readers, as the content will be appropriated by the AI du jour. This makes it impossible to heal the web, because people create personal sites with the expectation of at least receiving visitors. If nobody finds your site, it is as if it didn't exist.

I'm not so sure.

I think the best bloggers write because they need to express themselves, not because they need an audience. They always seem surprised to discover that they have an audience.

There is absolutely a set of people who write in order to be read by a large audience, but I'm not sure they're the critical people. If we lost all of them because they couldn't attract an audience, I don't think we'd lose too much.


Exactly. Even if people don't publish information for money, a lot of them do it for "glory" for lack of a better term. Many people like being the "go to expert" in some particular field.

LLMs do away with that. 95% of folks aren't going to feel great if all of the time spent producing content is then just "put into the blender to be churned out" by an LLM with no traffic back to the original site.


chatGPT puts trillions of tokens into human heads per month, and collects extensive logs of problem solving and outcomes of ideas tested there. This is becoming a new way to circulate experience in society. And experience flywheel. We don't need blogs, we get more truthful and aligned outcomes from humna-AI logs.

You, for one, welcome our new AI overlords?

Blogs have the enormous advantage of being decentralized and harder to manipulate and censor. We get "more truthful and aligned outcomes" from centralized control only so long as your definition of "truth" and "alignment" match the definitions used by the centralized party.

I don't have enough faith in Sam Altman or in all current and future US governments to wish that future into existence.


But it would disincentive those who create knowledge? AFAIK, most of the highly specific knowledge comes from a small communities where shared goal and socialization with like-minded individuals are incentive to keep acquiring and describing knowledge for community-members. Would it really be helpful to put an AI between them?

First issue, silos of information.

Second issue: who decides the weights of sources. this is the reason why every nation must have culturally aligned AIs defending their ways of living in the information sphere.


Yet 300M users are creating interactive sessions on chatGPT, which can be food for self improvement. AI has a native way to elicit experience from users.

Only middle-class and rich people could participate in "the old deal" Internet made by and for hobbyists. I think people forget this. It was not so democratized and open for everyone – you first had to afford a computer.

If you're a member of a yacht club, you can probably expect other members to help you out with repairs while you help them. But when a club has half the world population as members, those arrangements don't work anymore.


As if OpenAI won't end up offering paid access to influence these results, or advertise inside them. Of course they will, just like how Google started without ads.

It will be even more opaque and unblockable.


To quote Prince: ahh, now people can finally go back to making music for the sake of making music.

Remember in that time, less web content meant major media outlets dominated news and entertainment on TV and newspapers.

Paging Sergey

> How are you breaking up the documents? Are the documents consistently formatted to make breaking them up uniform? Do you need to do some preprocessing to make them uniform?

> When you retrieve documents how many do you stuff into your prompt as context?

> Do you stuff the same top N chunks from a single prompt or do you have a tailored prompt chain retrieving different resourced based on the prompt and desired output?

Wouldn't these questions be answered by the RAG solution the OP is asking for?


Not really you need to try different strategies for different use cases in my experience.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: