More

poikroequ · 2024-06-30T15:36:58

I would love to know what you're doing right and I'm doing wrong, because my experience with AI has been mostly crap. I've tried using AI to summarize articles, but then the AI will say something interesting, so I actually look at the article and quickly realize the AI summary was wrong.

You might want to double check that what the AI is telling you is actually accurate and not just blindly trust its output.

chaos_emergent · 2024-06-30T15:45:25

I’d have to give away some of my secret system prompt sauce to tell you, but I’m sure that the summarizations I get are solid, we’ve done extensive evaluation of truthiness. I have tens of thousands of hours of calls summarized monthly from paying customers who rely on them to be accurate. We’ve only ever gotten hallucinations when we accidentally omitted the transcript, and even that was fixed by allowing the LLM to alert us when it didn’t see a transcript.

FireBeyond · 2024-06-30T17:28:42

I use an AI noise canceling/transcription service to help with my extensive meeting agenda.

I've lived in the US nearly 20 years, coming from Australia. Whether it's my accent or something else, most transcripts for me need probably in the order of 20% meeting duration for me to tidy and edit to be free of, in some cases, quite asinine, transcription errors.

Unless your customers are doing that, I have a hard time, using the principle of garbage in garbage out, of believing "zero hallucinations" (and even then), but even then, the prompts would be something along the lines of "transcribe this, but just ... do better".

poikroequ · 2024-06-30T10:46:04

These models make heavy use of RNG (random number generator), so it would be difficult to fingerprint based on the output tokens. It may be possible to use specially crafted prompts that yield predictable results. Otherwise, just timing how long it takes to generate tokens locally.

There's already so many ways to fingerprint users which are far more reliable though.

poikroequ · 2024-06-29T10:20:09

Someone should do this, scrape everything Microsoft (don't forget MSN news), then create an online chat bot trained on all their data. Tout it all over the web. Then sit back and watch how quickly Microsoft moves to get it taken down.

Sheer hypocrisy.

EnigmaFlare · 2024-06-29T10:56:56

I don't think you'd be breaking the law doing that. As long as you don't reproduce any of the MS-owned material in your output. Data isn't protected by copyright (in the US at least), so your AI could extract the knowledge from some text and present it in its own different way.

latexr · 2024-06-29T11:22:45

> As long as you don't reproduce any of the MS-owned material in your output.

That is exactly what you should be doing to call them out on their bullshit. From the article:

> "I think that with respect to content that is already on the open web, the social contract of that content since the 1990s has been it is fair use," he opined. "Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That's been the understanding."

Which means that from their logic you can just copy their content and reproduce it.

exe34 · 2024-06-29T11:05:19

that won't stop them suing you into oblivion. they can afford the lawyers. you waste a decade of your life.

poikroequ · 2024-06-28T15:23:07

An LLM can be trained to pick a URL from a database of known URLs, or from search results. It may choose the wrong URL, of course, but at least it wouldn't hallucinate fake URLs.

SirMaster · 2024-06-28T16:07:13

Why can't it just check the URL to see if it's valid and actually contains the data it thinks it contains before recommending it to the user.

Don't modern LLMs like GPT4 have access to load web pages?

The downside is it will slow down the output I guess.

darby_nine · 2024-06-28T15:48:04

LLMs aren't "trained to pick a url from a database of known urls". Or at least, we have not collectively seen this yet, perhaps I am wrong. We explicitly wire things so we can use LLMs to generate data useful to index with existing search databases....

acting like LLMs are responsible for the reliability of identifying the source document is disingenuous when a large part of the value of an LLM as an asset is its role laundering other peoples' intellectual property without attribution.

poikroequ · 2024-06-28T18:48:12

Try out phind.com for an example. The LLM generates a query for the search engine to fetch web results. It can include clickable links to the sources in its response.

They accomplish this by training/fine-tuning the LLM to output special tokens, or special syntax, which is interpreted in code to perform some action, such as calling out to an API. This is how chatgpt and Gemini are able to automatically search the web and generate images and such. Yes, LLMs just output tokens, but those tokens can be interpreted to perform actions in code.

poikroequ · 2024-06-27T02:10:47

For what it's worth, that's just a tweet embedded on the page, not written by the author of the article. Inflammatory remarks like that are par the course for Twitter.

poikroequ · 2024-06-27T01:59:20

The title is very clickbaity. These are not users downloading torrents in the normal sense. It's users that are using a specific piece of software that happens to utilize the BitTorrent protocol.

BillTthree · 2024-06-27T16:03:15

There is an enormous issue here. A service provider committed crimes against customers and their justification is the customers were using a protocol to exchange something. The service provider has no idea what the something exchanged was.

It's similar to arresting someone because they are speaking French. I don't speak French and I don't like people who speak French because sometimes French people say stuff I don't agree with. I don't know what they're saying but I hate it.

poikroequ · 2024-06-25T21:46:03

I use it a lot to ask dumb simple questions. I've been delving into a new tech stack at work, and I'm already familiar with the concepts, but I just don't know how to do those things in this specific tech stack. AI saves me a lot of time digging through documentation and SEO spam. It often gets me to the answer faster.

However, I usually only use it to ask dumb simple questions. When it comes to anything more complex or obscure, it often falls flat on its face and either hallucinates or misunderstands the question. Then I'll do an old fashioned web search and find a clear cut answer in stack overflow.

My experience has been AI is very unreliable right now and you simply can't trust what it tells you. So I only use it in very limited ways.

poikroequ · 2024-06-25T11:42:26

Totally agree. If Firefox Translations can be an extension, so can this.

poikroequ · 2024-06-22T02:22:03

> "I understand that publishers and authors have to make a profit, but most of the material I am trying to access is written by people who are dead and whose publishers have stopped printing the material," wrote one IA fan from Boston.

This really is the crux of the problem. Copyright should be "use it or lose it." If you don't make your books readily available, then you should have no right to demand copies of your book be removed from places like IA. It's not like these publishers are losing any money from books that literally nobody can purchase.

CSMastermind · 2024-06-22T03:20:17

>If you don't make your books readily available, then you should have no right to demand copies of your book be removed from places like IA

What if an author explicitly doesn't want to distribute their works or to distribute an alternative version of their works? There was the recent case of the company that owns the rights to Dr. Suess choosing not to publish old versions of books they felt had racist depictions.

And who sets the standard for readily available? If I offer my book for sale for $100 is it readily available? At what price is something no longer readily available? Does it depend on the type of book? What if it's for sale broadly but not in the state where you live? What if it's free but must be read in person and cannot be taken home with you?

Dylan16807 · 2024-06-22T04:24:08

> What if an author explicitly doesn't want to distribute their works or to distribute an alternative version of their works?

Too bad. Once you publish it the first time, the cat is out of the bag. Eventually it's going to go into the public domain whether you like it or not.

> And who sets the standard for readily available? If I offer my book for sale for $100 is it readily available? At what price is something no longer readily available? Does it depend on the type of book? What if it's for sale broadly but not in the state where you live? What if it's free but must be read in person and cannot be taken home with you?

Good question but can definitely be decided. $100 is probably fine. Regulators can decide. Yes. Not good enough. Not good enough.

We have frameworks for mandatory music licensing, we can do more things like that.

mardifoufs · 2024-06-22T04:53:12

Should we apply that logic to gplv3 code too? Just basically disregard the license since knowledge should be completely free? Or maybe impose some burden on the maintainers to keep the code active and constantly changed so that the codebase doesn't lose its license after an arbitrary period of time?

I'm genuinely wondering , because to me there's a clear parallel yet in tech circles we almost always see defense of copyleft code (which I totally agree with, I'm extremely pro GPL) and a very heavy bias towards maintainers. I know GPL code is already free but we are talking automatically putting copyrighted/licensed material in the public domain which isn't GPL compatible.

mlyle · 2024-06-22T05:51:47

> doesn't lose its license after an arbitrary period of time?

The whole thing that allows copyright in the US is in the Constitution:

> To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

The original term was 14 years, with one 14 year renewal allowed.

This was probably a little short, in my opinion. 25+20 seems reasonable; most works have no commercial value after 25 years, and 45 years is already a long time to keep things that have become cultural touchstones locked up. The present legal regime of life of the author plus 70 years is clearly excessive.

> but we are talking automatically putting copyrighted/licensed material in the public domain

This already happens: just after an unreasonably long period of time.

> which isn't GPL compatible.

Code which is in the public domain is freely compatible with code under the GPL.

The whole point is "to promote the progress of science and useful arts". Stuff kept locked away beyond its useful commercial life is no longer promoting progress (the authors have already gotten paid anything they're going to get). Indeed, most of present works borrow deeply from the public domain but the authors seek to only return the same favor to future authors in a few generations.

How long should Nintendo be able to rent-seek and convince/force the same people to buy the original Super Mario Bros. over and over again? Should that be done in 2030, or 2080? What about operating systems of the 1980s-- should they be locked up to 2090, even though no one will sell them to you?

At some point, there are substantial impediments to legitimate archival and research purposes; to keeping existing important systems working; and to allowing the free exploration and creativity that comes from remixing and building upon past works.

account42 · 2024-06-25T08:21:16

> The original term was 14 years, with one 14 year renewal allowed.

> This was probably a little short

Strongly disagree. And for anything that's distributed digitally it's an eternity.

The goal should be that as an adult you can build on the things you grew up with as a child. Anything longer than that is absurd. Remember that copyright is an infringement on your right to free speech so should need extraordinary evidence for that any additional second of the copyright term actually fulfils its constitutional purpose. You don't need to allow maximum commercial exploitation to encourage more works. In fact I would question if commercial exploitation is something that needs to be made possible at all considering that humans are naturally driven to be creative and we have a giant corpus of creative works to fall back on which can now be copied and distributed easier than ever.

The current terms that won't even let your grandchildren benefit from the work your generation funded are an outright affront to the spirit of the constitution. At this point we would be better off scrapping the whole concept of copyright.

mlyle · 2024-06-25T15:41:33

> And for anything that's distributed digitally it's an eternity.

On the flip side, there's a fair bit of fiction, etc, where the same author has been acting as steward of the series for 35 years. Having them still get proceeds from book 1 is an important part of the calculus to continue. There's a balance to be struck here.

I do think there should be a significant fee at renewal.

> The current terms that won't even let your grandchildren benefit from the work your generation funded are an outright affront to the spirit of the constitution.

The current terms are absurd, agreed.

> At this point we would be better off scrapping the whole concept of copyright.

Nah; I like there being a market to make expensive works of intellectual property, which depends upon copyright.

Dylan16807 · 2024-06-22T08:48:44

> Should we apply that logic to gplv3 code too?

Well the GPL code was never being sold in the first place, so these rules might not apply at all. And it's still available the same way it always has been, so that suggests no need for intervention. Alternatively it would make sense to treat source code differently from books and photos and music and movies.

> lose its license after an arbitrary period of time

Of course GPL code would become public domain after an arbitrary period of time, that's how public domain works. In the year 2024, why shouldn't anyone be able to reuse 1997 linux code or pieces of windows 95 in their own programs?

gaganyaan · 2024-06-22T19:08:03

The GPL is a clever hack of a broken system. Given copyright existing, I like the GPL's protections. I would rather take no copyright, as that would address many (but not all) of the reasons the GPL exists.

lolinder · 2024-06-22T12:58:11

> There was the recent case of the company that owns the rights to Dr. Suess choosing not to publish old versions of books they felt had racist depictions.

This is actually exactly why I agree with OP. See also the changes made to Roald Dahl books. Future generations deserve to be able to read the content that their forebears produced as they produced it.

I'm supportive of an author's right to not initially publish something that they at the time are uncomfortable with being made public. There should be protections for that. But once something has entered into the public consciousness in a particular form, I'm not okay with a cultural censorship wave being able to memory hole the original copy and replace it with a sanitized version (or wipe it out entirely). They shouldn't be obliged to print content that they find objectionable, but that content needs to be accessible or we lose our history.

Messy and uncomfortable as it is, future generations have a right to see us as we were and are, not as the second-generation holder of our too-long copyright wishes we had been.

account42 · 2024-06-25T08:32:52

You have really hit the crux of the issue. We shouldn't allow corporations or individuals to control something that has become part of our shared culture. Beyond shortening copyright lengths to the absolute minimum required for the statet purpose of encouraging more creation we probably also need limitations on author rights for works that have gained widespread public adoption similar to how trademarks can become genericised. At some point you shouldn't get to decide if and how your creation is distributed even if you can still demand royalties for a while.

cwillu · 2024-06-22T03:54:44

Authors should have that inalienable right, and it should not transferable via contract or any other means. Publishers, on the other hand, should have no such rights: they own the presses, their inalienable right should be to refrain from using them.

CSMastermind · 2024-06-22T04:06:36

Should I be allowed to commission a work under the understanding that I own the rights after it's created?

If yes, then how do you regulate who is or is not allowed to transfer ownership of rights to or from whom?

hmry · 2024-06-22T05:53:53

That part is already solved. Under current international law, if you commission a work then you own the economic rights, but the original author retains the moral rights. In fact, selling your moral rights is not possible.

It sounds like the suggestion is that retracting / completely discontinuing a book should only be part of the moral rights, not the economic rights.

I'm not sure how feasible that is, but it's not totally unprecedented. For example, one of the moral rights recognized by many countries is the right not to have your works destroyed. E.g. even if someone else owns the physical object of your painting, they are not allowed to set it on fire, and you could sue them if they did.

account42 · 2024-06-25T08:36:17

> It sounds like the suggestion is that retracting / completely discontinuing a book should only be part of the moral rights, not the economic rights.

It shouldn't be part of any rights. At best the author should be able to demand to not have his name associated with the work.

account42 · 2024-06-25T08:34:36

> Authors should have that inalienable right

Why? What do we as a society gain by allowing individuals continued control over parts of our culture include the ability to erase them.

account42 · 2024-06-25T08:14:39

> What if an author explicitly doesn't want to distribute their works or to distribute an alternative version of their works?

Then they should lose the rights of the originals. The point of copyright is to enrich society not to satisfy any want of the author.

vineyardmike · 2024-06-22T03:02:54

> Copyright should be "use it or lose it."

Which is basically how trademarks are. So we even already have a system in place to manage something like this.

KennyBlanken · 2024-06-22T03:33:33

No, it really isn't. That person, and you, don't understand how publishing - or mass production of any kind, it seems - works.

A publisher "stopping printing" of a book is completely normal - books are like any other mass-produced good, in that there are fixed and variable costs to production and a factory can't economically crank out more than a certain number of different things at once.

Sp, there are "printings" - ie a production run - and then that inventory is sold to distributors. When the inventory is sold out, it is "out of print." That does not mean it's not available - there's still stock at distributors. And likely on shelves.

When it sells out at distributors, then it is backordered.

It is completely normal for a publisher to wait until they feel there is enough pent-up demand for another printing - increasing the size of the printing to improve per-copy profit (or make it economically viable at all), and then sell it to distributors because the distributors think they can sell the inventory at a high enough rate.

Distributors don't want to keep around books that don't sell very fast, because that means they don't have warehouse space for books that do sell quickly. And if they have books that don't sell and need the warehouse space, the books might get remaindered (sold to a low-budget distributor for sale at well below original price) or destroyed (cover stripped as proof of destruction and the rest destroyed/recycled.)

Things have changed with digital press technology improvements, opening the door to more print-on-demand books - but printing one copy will never be anywhere close to as cheap as printing, say, 1000 copies.

There are also other reasons it might not be for sale, despite the author trying / wanting to sell it.

If you know nothing about how book printing, publishing, distribution, buying, and retail works - you probably shouldn't be forming opinions on how it should be subject to radically different regulation, much less offering them up.

https://pinestatepublicity.substack.com/p/book-distribution-...

nanomonkey · 2024-06-22T03:53:49

Internet Archive isn't printing books, they are lending out digital copies. A publisher would have no reason not to sell a digital copy of their own. There are no production runs necessary on digital copies.

poikroequ · 2024-06-22T12:42:50

I never said the books have to be physically printed. Digitize the book and sell it online. Make it available through a kindle unlimited subscription. It doesn't matter, as long as it's readily available. Until then, they should have no right to sue to remove the books from IA.

Also keep in mind that, for many of the books, the authors are dead.

account42 · 2024-06-25T08:43:44

> Make it available through a kindle unlimited subscription. It doesn't matter, as long as it's readily available.

I disagree that requring the customer to have a continued business relationship to retain the work should count as it being readily available.

account42 · 2024-06-25T08:41:24

> Authors want

> Publishers want

> Distributors want

So? I too want a golden goose protected with force by the government at no cost to me. What does the rest of society gain from this deal?

poikroequ · 2024-06-19T22:20:49

> No ISO! ISO for optical media is a legacy format.

This comes off as fairly ignorant. Virtual machines? Ventoy? There are lots of tools which can flash an ISO to a thumb drive or similar. ISO files are far more useful than just burning them to optical media.

ogurechny · 2024-06-20T00:43:35

Sole by-the-book CD/DVD image with a corresponding specific boot loader (e. g. isolinux) with installation environment which expects to access its files on a real optical drive through regular bundled ODD drivers would be almost completely useless today. To make such disks — installation media for systems released before widespread support of USB boot — work without real CD/DVD-ROM, you need to emulate BIOS functions and/or compatible device, and also keep the whole image at hand, most likely by copying it to memory in full, to provide data for reading, then boot the installer into that partially virtualized environment. Even older systems (before standardization of CD booting) may additionally need boot floppy emulation.

The main reason you can easily copy installation images to USB sticks and other devices is that they are actually “hybrid” images with a bag of tricks helping them to also work like a disk drive, and boot that way. There are two boot loaders which handle different modes (and a third one in UEFI boot partition, though accessing it is firmware's problem), and the kernel knows that it can be started in various ways, and has to look for the root filesystem in multiple places. Ventoy and company make use of the fact that everything is ready, and the only thing left for them to do is adding the menu item with proper path and kernel boot options.

So you can run Linux from USB stick, you can run it from there, but use intermediate boot loader (say, Plop chain-loaded by main system to bypass slow USB access in old BIOS through its drivers, and massively decrease boot times), you can run it from extra partition added to your disk, you can run it from image file, you can run it from memory, and so on. ISO is just a convenient legacy distribution format that does not help with any of that. And, as mentioned, there is no “boot process” with UEFI, you simply copy a bunch of files to the USB flash drive, firmware sees that one of them looks like a boot loader, then it is started, and can do anything it wants.

PlutoIsAPlanet · 2024-06-19T22:29:12

Ventoy and flash tools should in theory support img files just fine, if anything for virtual machines img files should be easier to boot than ISOs (don't need to emulate a CD drive)

Modern Linux ISOs are a sort of hacked hybrid ISO/IMG, where keeping support for burning to CDs (the ISO part) has some trade offs (such as workarounds needed for persistence storage, multiple partitions).

gertop · 2024-06-20T01:53:38

"In theory" being the key word here. For whatever reason, it's a pita to use raw images with both virtualbox and VMware. You have to resort to third party command line tools to convert the image (qemu-img).

josephcsible · 2024-06-19T22:29:22

Exactly. And it's not like they'd need to ship two versions of the installer; a single hybrid ISO that works both ways is what basically every other distro already does.

khrbtxyz · 2024-06-19T23:41:06

Most Linux distributions ship ready-made VM images which are easier to turn into VMs. See https://news.ycombinator.com/item?id=40610332

poikroequ · 2024-06-20T03:05:53

I could never remember all that and I would always have to refer to documentation to know how to write out the full commands.

On the other hand, I can download an ISO for almost any popular Linux distro and easily install it without reading a single word of documentation, even if I've never used that distro before.

khrbtxyz · 2024-06-20T05:14:54

Yes, these commands have way too many options for anyone to remember. The commands are best wrapped up in a script. An ISO install is usually a manual process that can take time, but it is definitely easier.

prmoustache · 2024-06-20T06:45:44

waiy what? you don't need to learn complicated commands.

Just interactively setup a virtual machine and at the screen that provide you the choice of your disk size, you just select the option to use an existing disk and select it.

fuzzfactor · 2024-06-20T17:05:59

Not complicated at all, but it's still easier to reboot to bare metal if you're doing it right and all you want to do is run one OS at a time.

prmoustache · 2024-06-20T21:36:41

It is e.x.a.c.t.l.y the same. Instead of selecting an image for the virtual dvd drive, you select an image for the hard drive, or an image for an usb drive. It is as fast and as simple either way with the advantage of not having to go through the install process if you just start fe the hard drive image. The downside possibly being download size compared to say, downloading a netboot iso that will install the latest packages from the very start.

fuzzfactor · 2024-06-22T20:14:33

Good to get your message, upvoted.

Sounds to me like you're one who is doing the VM right, as effectively as anybody.

I wonder what people think about this,

When I was first doing VMs, I would have the image in a file and it had to be stored somewhere.

All I would have in the file, would be an OS and maybe a couple apps installed.

Really only takes up a few GB of drive space as long as I don't try to store any massive or valuable data within the image.

I wouldn't want that kind of data in my images anyway.

And I had plenty of drive space so I didn't need any compression or space-saving measures to be applied for storing images.

In this case the working image takes up the exact same space on some drive either way.

Might as well let the image stay installed on its own partition so I could boot to it the bare metal way when I wanted to, or alternatively use it as a VM when re-booted to a powerful enough host OS which has been previously installed to a different partition.

Now in the terabyte world I've got more spare drive space than ever, and with GPT drive layout I can have as many partitions as I want.

Definitely learning as I go.

rascul · 2024-06-19T22:44:28

ISOs make little sense over a regular disk or filesystem image for just about every use case except burning to optical media, a use case I understand to be quite rare (but not completely gone) nowadays.

I know nothing about Ventoy, though.

alganet · 2024-06-20T00:56:03

Optical media was designed for distribution, ISOs and their use cases evolved around the same goal.

A good example is VirtualBox Guest Additions that packs several drivers for multiple OSs in a single ISO and leverages the autorun mechanism to simplify automation for end users.

TacticalCoder · 2024-06-20T01:28:39

> ISOs make little sense over a regular disk or filesystem image for just about every use case except burning to optical media ...

Uh what!? About every single Linux distro has .iso files available.

And it typically makes way more sense to dump the .iso to a USB stick than to an optical media because it speeds up the installation big times. That's the case for Ubuntu for example.

And an example as to why optical media are kinda falling into irrelevancy: some distro like Ubuntu ship an ISO that doesn't even fit on a DVD anymore. It is an ISO meant to be dumped to a USB stick, not an optical media.

Now of course the downsize is that there is something very nice when you burn an .iso to a write-once optical media: once you've verified that the disk's cryptographic checksum is correct, you know the disk's content won't change so there's no need to recheck it (at least not for security purposes).

With USB stick that's not the case.

> I know nothing about Ventoy, though.

...

wadim · 2024-06-20T06:14:40

> Uh what!? About every single Linux distro has .iso files available.

That doesn't really mean much. The point is that there's practically only disadvantages to ISOs and using image files instead would make more sense nowadays.

smaudet · 2024-06-20T15:44:17

> The point is that there's practically only disadvantages to ISOs

Except burning to "legacy" optical media.

That can't be altered once burnt. If I could write it on a stone tablet I would (more durable).

Also the author comes off as arrogant/rude, calling people who don't like ISO as "older members". Maybe I'm "old" no longer being a twenty something, but I'm also not (yet) "old".

fuzzfactor · 2024-06-20T17:00:23

If you wanted to operate using IMG files as reliably as ISO's, what standardization body would you trust the IMG files to conform with?

I guess it depends on how sensible different levels of reliability are for your particular application.

SubiculumCode · 2024-06-20T06:02:45

https://bkhome.org/news/202112/why-iso-was-retired.html

I mean, the author had gone into depth about this.

fuzzfactor · 2024-06-20T17:08:06

Very enlightening.

Absorbed it at the time and followed up with experimentation ever since.

Edit: experimentation was underway long before this was published.

there's also a Part 2:

https://bkhome.org/news/202112/why-iso-was-retired-part-2.ht...

>I have whittled away at the use-cases in favour of using the iso file,

Very well done, and I would rather not have to use ISO's myself since the only remaining thing they are perfect for is distribution, where it doesn't look like they will be beat for a very long time.

I would rather not touch ISOs any other time, but I'm going to have to maintain the skill anyway.

There's definitely nothing better for Windows than ISOs and Windows is huge with very sophisticated imaging built-in.

Kauler just happens to make an even more sophisticated IMG than most would do, for this distribution I can handle it with no drawbacks compared to a standard ISO file, and way better than a funky "hybrid" ISO.