17 USC 106 gives copyright holders exclusive rights to reproduce and distribute copies; no exemption exists for downloading digital copies because you own the physical book, and fair use (17 USC 107) is unlikely to apply when commercial alternatives exist and you’re copying entire works from unauthorized distributors.
> you’re copying entire works from unauthorized distributors
Yep, this sounds like an issue. So the idea from MP3 early days of "let me download these files as a backup before I lend my CD collection to my cousin" is not a real option.
Your observations for using a vector DB for retrieval-augmented generation are consistent with my own.
For my applications, I use pgvector since I can also use fulltext indexes and JOINs with the rest of my business logic which is stored in a postgres database. This also makes it easier to implement hybrid search, where the fulltext results and semantic search results are combined and reranked.
I think the main selling-point for standalone vector databases is scale, i.e., when you have a single "corpus" of over 10^7 chunks and embedding vectors that needs to serve hundreds of req/s. In my opinion, the overhead of maintaining a separate database that requires syncing with your primary database did not make sense for my application.
It's an incentives problem. At research universities, promotion is contingent on research output, and teaching is often seen as a distraction. At so-called teaching universities, promotion (or even survival) is mainly contingent on throughput, and not measures of pedagogical outcome.
If you are a teaching faculty at a university, it is against your own interests to invest time to develop novel teaching materials. The exception might be writing textbooks, which can be monetized, but typically are a net-negative endeavor.
Unfortunately, this is a problem throughout our various economies, at this point.
Professionalization of management with its easy over-reliance on the simplification of the quantitative - of "metrics" - along with the scales (size) this allows and manner in which fundamental issues get obscured tends to produce these types of results. This is, of course, well known in business schools and efforts are generally made to ensure graduates are aware of some of the downsides of "the quantitative." Unsurprisingly, over time, there is a kind of "forcing" that tends to drive these systems towards the results like you describe.
It's usually the case that imposition of metrics, optimization, etc. - "mathematical methods" - is quite beneficial at first, but once systems are improved in sensible ways based on insights gained through this, less desirable behavior begins to occur. Multiple factors including basic human psychology factor into this ... which I think is getting beyond the scope of what's reasonable to include in this comment.
There's a tool I use called Petal https://www.petal.org/reference-manager. The free tier allows up to 1GB of PDFs, which I believe are processed by GROBID and chunked for LLM QA.
The feature I find most useful is the table automation which I use for literature review, since it lets me run the same QA prompts on a collection of documents all at once.
> > VUDA only takes kernels in SPIR-V format. VUDA does not provide any support for compiling CUDA C kernels directly to SPIR-V (yet). However, it does not know or care how the SPIR-V source was created - may it be GLSL, HLSL, OpenCL.
So the answer is no, it can't be used with kernels that use cublas or cudnn, which excludes almost all ML use-cases.