Hacker News new | past | comments | ask | show | jobs | submit | kernelsanderz's comments login

For another library that has great performance and features like full text indexing and the ability to version changes I’d recommend lancedb https://lancedb.github.io/lancedb/

Yes, it’s a vector database and has more complexity. But you can use it without creating indexes and it has excellent polars and pandas zero copy arrow support also.


Since a lot of ML data is stored as parquet, I found this to be a useful tidbit from lancedb's documentation:

> Data storage is columnar and is interoperable with other columnar formats (such as Parquet) via Arrow

https://lancedb.github.io/lancedb/concepts/data_management/

Edit: That said, I am personally a fan of parquet, arrow, and ibis. So many data wrangling options out there it's easy to get analysis paralysis.


Lance is made for this stuff; parquet is not.

How well does it scale?

Also worth checking out https://github.com/jasonwhite/rudolfs

Been using it to store datasets via lfs. Written in rust and has been very reliable.


I’ve been using https://github.com/jasonwhite/rudolfs - which is written in rust. It’s high performance but doesn’t have all the features (auth) that you might need.




Some heroes don’t wear capes. They wield scripts, API calls, and a bit of luck.


100%


The actual podcast and interview is here https://youtu.be/dzQlRt3y5mU?si=qfS0DlPVcjBn-0zd


I use jupyterlab via an SSH tunnel


Is anyone else not seeing this in their console. I still only see the old sonnet.


Non paywall link https://archive.is/lCdK3


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: