Hacker News new | past | comments | ask | show | jobs | submit | tullie's comments login

LanceDB! We love it.


What made you go with lance?


The performance for us was best when we evaluated a couple of options, both in terms of scale and latency. I also like the arrow/dataframe interface. We use arrow everywhere else at Shaped so it was a natural integration.


Yes when integrating Shaped you connect up the data sources needed to ingest: interactions, items and users. The Shaped interface then allows you to select which exact fields should be used for creating a Shaped model. We provide a full SQL interface to do this, which gives a lot of flexibility.

Our dashboard provides monitoring to help understand what data is ingested and view data quality over time. We expect customers to monitor this but also have alerts on our side and jump in to help customers if we see anything unexpected.

The dashboard also shows training metrics over time (how well does the model predict the test set after each retrain?) and online attribution metrics (how well does the model optimize the chosen objective?).

Customers can disable retraining if they want (which is essentially pinning the model version to current), we can do model version rollbacks on our side if we see an issue or if requested but it's not a self-serve feature yet. Because we've made it easy to create or fork a Shaped model, we've seen customers often create several models as fall-backs that rely on more static data sources or are checkpoints of a good state.


Australia represent! Although we're based in NYC we still are a mostly Aus/international team over here, it's great!

The biggest change is some of the less sexy stuff, like scale and security. E.g. we're now able to scale to 100M+ MAU companies with 100M+ items, and we have a completely tenant isolated architecture, with security as a top priority.

We've also made the platform more configurable and lower levels and we've found that people like choosing their own models and experimenting rather than just relying on our system.

Finally, we launched search only a couple of months ago and are currently heavily focused on building a best-in-class experience there.


Thank you! Would love to catch up sometime assuming you're in NYC with the rest of the Pinecone team!

Yes by 100M+ users we definitely mean end-users, wasn't intentional to mislead so thanks for flagging -- we'll update.


Compared to Vespa, we're much easier to get setup on. A big part of this is that we have real-time and batch connectors to all leading CDPs and data warehouses. E.g. if you're on Amplitude it takes < 10mins to stream data directly to Shaped and start seeing initial results.

Being quicker to setup, also means it's quicker to build and experiment with new use-cases. So you can start with a feed ranking use-case the first week and then move to an email recommendation use-case the next week.

In terms of actual performance and results, we've never gone head-to-head in an A/B test so i'm not sure the specifics there honestly!


Thanks, so it's connectors, nice differentiators. Seamless integrations are harder than it seems.


The short answer is: we're better at recommendations and personalization and lean towards more technical teams (e.g. even with data/ML experience). They're better at traditional search and, these days, lean towards less technical teams.

Longer answer is in our blog post about it: https://www.shaped.ai/blog/shaped-vs-algolia-recommend :)


Cool! Any live demos we can try?


Yes play.shaped.ai! We just opened that up in a gateless way for this post. Let me know what you think. I should also mention that these demo models are on our cold-tier so that it doesn't break things, in production there's a big speed up.


We have a library with about 100 algorithms which you can choose from or by default we automatically choose based on your objective.

Majority of them are open source models we've forked and improved. Just as an example, we integrated in gSASRec last week: https://github.com/asash/gSASRec-pytorch, and added a couple of improvements on scale and the ability use language and image features. We use LLMs for the encoding of unstructured data, and we host these our self, although OpenAI and Gemini are used for error message parsing and intelligent type inference, things not on the real-time path.

More info here: https://docs.shaped.ai/docs/overview/model-library


Thank you!

Would love to chat, we've had several customers come over from Algolia and they've seen significant uplift. I can share more if you want to message me at tullie@shaped.ai.

Our pricing is competitive with Algolia's to give you an idea there. We really wanted to get pricing calculator done get before this post but ran out of time. Keep an eye out over the next month for it to come up!


> pricing is competitive with Algolia's

Not sure if good or bad based on HN complaints about Algolia's pricing


The build vs buy decision does come up, but like you mentioned, the product direction of Shaped is to be primitives for search and recommendation, allowing users that want to build use Shaped to empower them to build quicker (e.g. integrated behind their psudo-recommendation engine). In truth we have multiple abstractions to Shaped allowing more technical teams to integrate like this, or less technical ones to have more of an end-to-end integration experience.

The other related market trend we think about here: recommendation is going through a similar journey to what search did 10 years ago. Search at some point was more build leaning, but over time the technology became democratized and then companies like Elastic and Algolia had offerings that pushed search to lean towards buy. We're seeing recommendations going through the same revolution now that the technologies and system design (e.g. 4 stage recommenders) are more solidified. It's the data that makes these systems unique between companies not the infrastructure or algorithms.


Thanks for the first question!

We run online A/B tests to objectively measure quality against our ranking algorithms and other baselines. As you mentioned it's crucial that the measure of quality for these tests chosen is fair and correlates with the topline business objective. E.g. if you just evaluate clicks then the system will show click-baity content and overall perform worse.

To handle this, we make it really easy to define different objectives and experiment with how it changes results. So although we don't claim to solve the issue directly, we believe that if users can quickly experiment with different proxy objectives, that'll be able to find the one that correlates with their topline objective quicker.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: