Hacker Newsnew | past | comments | ask | show | jobs | submit | aarnphm's commentslogin

Hi srameshc, the core of BentoML is still considered MLOps. A lot of our customers are pretty much MLOps users. However, LLMOps seem like a natural progression of the product, given that a lot of our users now want to experiment/build with LLM-based services.


Thanks for the recommendation, I'm actually working on something similar for this part of the docs (I'm also working at BentoML).


You can pretty much make the same argument about Docker. Docker abstracts away runc, runc abstract away cgroup.

I don't think calling it abstraction is correct. OneDiffusion is designed to be opinionated and help users to run diffusion models easily. It includes best practice, default options, optimisation baked in such that it helps developers to move faster.

You can also make that argument about debugging any AI application in production. If you were just writing a simple web server to run this model behind a APIs, there are a plethora to think about, scaling, resource utilisation, load balancing, batching, etc. Hence the argument for creating an standarized interface for these type of serving problems.

We haven't even touched k8s, and throughput, latency, serving optimisation, but you get the idea of building this from scratch versus using library such as OneDiffusion.

Of course we are also working to improve the library and include more features, so looking forward to hearing more constructive feedback!


I also think Docker is an excellent comparison, but the way I see it, it actually reinforces my point.

Docker does not change the API. I can take any simple program and run it with Docker, no modifications needed. That also implies that Docker is fully optional. If I ever need to, I can just run the program outside of Docker. Or I could run gdb inside the Docker container, so it adds no complexity to debugging.

OneDiffusion is exactly the opposite. To use it, I need to rewrite my app for it's API. Once done, I'm 100% locked in and it won't work without the framework anymore. And if there are any issues, I always have to check all OneDiffusion source code, too, because it is impossible to know a priory if the issue was caused by my code or by the framework.

Just imagine if Docker would require you to recompile your OS from scratch for every update: only hardcore Gentoo fans would use it. But that's the level of commitment that OneDiffusion asks of me.


Currently on main, 8bit and 4bit quant is supported

One can simply do

```openllm start falcon --model-id tiiuae/falcon-40b-instruct --quantize int4```

Beware that there is no free lunch, meaning the quality of inference will degrade by alot when using int 4 quantization


Hi all, I'm the main maintainer from the OpenLLM team here. I'm actively developing the fine-tuning feature and will release a PR soon enough. Stay tuned. In the meanwhile, the best way to track the development workflow is at our discord, so feel free to join!!


Thanks for the great project! Any chance, your team might consider more open platform than Discord for posting updates? I personally find Discord hard to use, and there’s no way to have sensible subscription (like RSS). Discord is usually muted.


Discord is a black hole where information goes to die. Its search and scrollback is awful. It's awful at being an archive, as finding anything that was asked more than a day or two ago is impractical.

To use Discord in good faith and with open eyes, you have to prioritize communication in the present, and give up hope of archiving anything that was said for people who might need the information in the future.


Discord is just a rich IRC replacement. You can log and search in IRC too but nobody seriously tries to archive information for research later. And big difference is it's all closed and operated by one entity that can change conditions at will. Don't even try to use it for anything else than real time chat.


"Discord is just a rich IRC replacement"

That's only half true. Yes, Discord does allow a "rich" chat experience, with channels and servers, but there the similarities end.

IRC is based on an open protocol, with many open source clients available for it, and a decentralized server infrastructure.

Discord is closed and centralized, with only a single client available for it.

You can easily log IRC channels, but there is no easy way to do that on Discord, if it can be done at all.

I've logged every channel I've ever visited on IRC, and I can use powerful text tools to regex search through all of my conversations on IRC and have the results appear instantly. Nothing remotely like that is possible with Discord.

Paging through IRC logs is virtually instant on a modern terminal, while Discord makes you wait a long time between every other page load, so if you need to look through more than a handful of pages it's incredibly slow and painful.

Some IRC channels have their logs published on the web, making them fully searchable through web search engines, but to my knowledge no Discord channels do that.

What happens in Discord stays in Discord.


Greping through IRC logs has a 10x better UX


Furthermore, you risk getting banned for deleting messages you wrote in the past


For gaming communities (where you'd use voice chat), Discord was great. Easy to set up, free as in beer, runs in cloud. The alternatives back in the days did not have these features. They were either expensive (Ventrilo) or bad quality (Ventrilo and Skype latency/quality) or proprietary (only Mumble wasn't, TeamSpeak, Ventrilo etc were) or lacked community features (Ventrilo) or these were very archaic (TeamSpeak, Mumble) or you'd have to self-host (all but Ventrilo). It was also before GDPR existed. So Discord happily used and abused that unique position.

Its a shame its being used for general communities who don't use or need the voice chat feature. Especially when its an official community for a place, given their stance on third party clients and privacy issues.

If you don't need voice chat, Zulip, Mattermost, Revolt, Discourse, and many other would suffice (Linen recently got featered on HN). If you do, I think even Signal would be suffice these days.

For Discord search, recently Answer Overflow was recently featured on HN [1].

[1] https://news.ycombinator.com/item?id=36383773


agreed.


I find their search amazing. What's your issue with it?


Here's just one issue:

They stem words aggressively, so searching for "repeater", which is a less common, specific term, gives you results including "repeat", a commonly used word. And there's no way to do an exact word search.


The issue is it's not indexed by Google


There was a recent post about an open source tool for indexing Discord content and making it available for Google search:

https://news.ycombinator.com/item?id=36383773


have you used google lately? might as well not be indexed with all the seo spam you get as top results


> have you used google lately? might as well not be indexed with all the seo spam you get as top results

I just googled "how to use openllm" as an example to test your thesis, and the results look very relevant to me.

https://www.google.com/search?client=safari&rls=en&q=how+to+...


You might want to glance again because all of those results are for a different product.


Top of the results page says:

"Showing results for how to use openlm

Search instead for how to use openllm"


FYI, specifying the nfpr=1 query string parameter will disable Google's idiot attempt to try be helpful by searching for something other than that which you want to link to.


when I click this google gives me results for “how to use openlm” a commercial product, they literally change your search term if there’s a product that fits


Related: As an operator/mod/admin it's fairly straight-forward to bridge a Discord channel to Matrix (and, if one so desires, from there to IRC), allowing users not on Discord to participate. Conservative mods concerned about spam can start with an allowlist for which servers can join.

https://github.com/matrix-org/matrix-appservice-discord


I know this isn't a great time for reddit, but I just made this on your behalf:

https://www.reddit.com/r/OpenLLM/

I much prefer the HN/Reddit discussion format to Discord and even Stack Overflow.


plugging the open source and self hostable https://revolt.chat which i've found to have great UX and be very performant compared to discord.


I'm liking revolt. Thanks for the suggestion.


good alternative: https://www.linen.dev/


s/rd/urse/g


HAHA this was one of my panel interview questions at Goooog'

Q: "How do you do a search and replace for a string in VI"

Me: I cant recall right now, i'd just google it"


What an insulting interview question, I hope it was just in jest or at the end looking to pad the time

However, it did make me realize hidden therein is an actual interesting interview question, similar to the "describe what happens when you type an address into the browser's URL bar and hit enter": describe what happens after you type `:s/foo/bar` and hit enter. Followup version: what about `:%s/foo/bar`? The kind of thing that can be interesting to watch them reason through even if they don't know the answer, or even know what those syntaxes do.


Alt proposed answer "I'd install emacs".


Side question : why are people working on open source project communicating through discord a lot noawadays ?

are discord conversations persisted and indexed on search engines ?


I find Discord quite versatile and a bit overwhelming at the same time. As to SEO, see https://news.ycombinator.com/item?id=36383773

AFAIK most of the gamers choose it for voice chat (Anyone remember TeamSpeak?)


In Europe, TeamSpeak is still very popular.


I used to play EVE Online a fair bit, and always thought it interesting how some of the groups used Discord but only for text communications. Voice was still done over Teamspeak or Mumble.


When I played EVE, Mumble was the de facto voice comms since it supported 100s of pilots which happened many times during joint ops and xmpp for text chat and pings.


My understanding from asking several people, since I hate discord and want to know why people insist on using it, is that it’s a free alternative to Slack. Simple as that.

But it’s crazy, people are aggressive about Discord for some reason. I maintain an OpenAI SDK package for .Net, and I had some random person decide they wanted it to be a Discord community, so they created a Discord claiming it was the official community discord for my library, and submitted a PR updating my readme to say that it’s my project’s official Discord. They also replied to several issues and pull requests telling people to discuss it on that discord. If Discord isn’t paying this person in some guerrilla marketing tactic, they should be...


Because it's easy, free and it just works.

Very few people actually care about indexing the conversations.


So all knowledge is lost and questions have to be asked and answered again and again?


That didn't stop IRC being popular in the 1990s.

There has long been a place in the ecosystem for ephemeral chat. Often alongside non-ephemeral things like written documentation.


People didn't put documentation in IRC channels because they didn't want to answer the same questions over and over. Info went into a wiki, and you would get flamed for asking a question on IRC that was answered on the wiki. Discord is not a good place to stash documentation.


It's ok you get scolded for asking an FAQ in many Discord "servers" as well.


> That didn't stop IRC being popular in the 1990s.

IRC chats, especially in opensource projects channels, could and would be archived, published over the web and indexed by search engines.


In my experience, I don’t think I’ve ever seen an IRC log in a search result.

#haskell on Libra is publicly logged, but I couldn’t get Google to return a quoted phrase from a message a few weeks ago.

Many people on IRC don’t enjoy being in logged channels. I’ve also heard that there are GDPR implications to publicly logging people’s messages without their consent.

Discussion of the difficulty and downsides of IRC logging, from a coulple years ago:

=> https://news.ycombinator.com/item?id=22892015

=> https://web.archive.org/web/20200417001532/https://echelog.c...

The HN blowback to developers choosing to use Discord is just wildly out of proportion.


No, it's not. If you work on an opensource / open development project, it totally makes sense to avoid walled gardens for the community chat/forum (a few years ago it was public Slack instance, nowadays it's Discord servers).


So just like Discord then..


I wasn't aware that was being done.

Can you show me how to access the archives of the ask-for-help channel on the openllm Discord server? Right now they're discussing "loading models on CPU vs GPU". No matter how explicit I got, google did not find the discussion.


It's up to the server owners/admins to configure archiving, same as IRC.


No


Also monks being the only ones who can read and write didn't stop religion to be popular in middle ages.

/s

C'mon


Isn't there something really nice about it though? It seems to me that most every community gradually evolves into one where every new message from a new-ish member is answered by something like "Duplicate, please search first!". And this in turn makes those newcomers either go away, become passive lurkers, or become part of the "hive-mind" (as only likeminded questions get answered).

On the other hand, if people have to actually converse to get an answer to their questions (like back in the real world), newcomers can more rapidly become part of the community, and help make it more diverse.


I just recently saw a post where someone said something similar about Reddit versus traditional forums.

There's a balance between engaging with new members and not turning it into a time sink for older members. This is probably a good use case for LLMs.


LLMs could indeed address the first part, but not the second, of bringing the newcomers in via actual conversation with the older members. The only good solution I encountered to this is of having some (preferably not too experienced) member(s) actively take upon themselves the role of welcoming newcomers and answering their questions, whether that's in an official or unofficial capacity.

This to me is the real way through this "Eternal September", where in every "cohort" of newcomers, one or more choose to stay close to the doorway to welcome and guide the next cohort.


Newcomers are also different. Some are actually experienced vs some are real newbies.

I’m wondering how could learn from games, making the content also adaptive to user levels/experiences.

It’s prob also the key agenda in education.


The best of both worlds - a friendly community that welcomes newbies, with a searchable archive - is possible. Limiting to only chat-based support means that support is bottle-necked by the folks who are available and engaged at the time of the question, and that knowledge will "drop out" of the community as people forget it.


Apologies for my skepticism, but is it just "possible", or do you actually have an example of a long-lived community that remained fully welcoming to newbies while utilizing a searchable archive?

In any case, I'm not arguing that it's impossible, but rather that the more comprehensive the archive, the less welcoming the community would tend to be, all other things being equal. To take it to the extreme, I'll posit the following law: "A well-curated archive is the grave of a community"


Hard disagree. If anything you'll find that the most knowledgeable members get burned out answering the same questions over and over again, so they begin to simplify their answers until they just become copy pasta.

You can still have channels open to welcoming new people while at the same time having a large archive of answered questions so that over time a reservoir gets built.

Saying that the same questions getting asked over and over again by new people is somehow a more welcoming community, is like saying that there's any meaningful interaction happening when two people say "What's up?" followed by the response "not much". It's a handshake protocol equivalent without actual depth.


    I see friends shaking hands
    Saying, "How do you do?"
    They're really saying
    I love you.


A very reasonable question, and I'll admit that I'm not deeply-entrenched in enough technical communities to give you an actual example. But yeah, intuitively I do agree with the sibling commenter - a well-curated archive is a tool of technology which allows skilled respondents to preserve their time and energy for new and interesting questions. A pointer to search is not necessarily dismissive - there is a world of difference between the following _technically_ equivalent responses:

* FFS, read the fuckin' archive noob and stop wasting our time

* Hey there, thanks for asking! This is actually a pretty common question, and we have guides written up for just this case. Try entering some of your search terms here [link], and come back with a follow-up question if that doesn't help you!

But yes, in fairness, I'll certainly agree that a community which _chooses_ to respond as the former will stagnate and die.


No, questions don't have to be asked and answered again and again, because all the knowledge is lost, full stop. No one would know anything.


Not lost enough to use as a transient space for sharing secret intelligence reports.


Maybe it doesn't matter because this type of knowledge is relevant for current week only?


indexing conversations is secondary for gaming but primary for FOSS projects and Discord sucks at that. its like wiping your ass with a fork.


short answer: because it's one of the options with least friction to get running

a lot of people who are into tech stuff already have a discord account making joining the community a one click process, the instant nature of it seems to appeal to younger users more than async forums, it's a fairly mature platform so it has a bunch of moderation/customization/integration features you might want, etc.

> are discord conversations persisted and indexed on search engines ?

nope (and that is a drawback many point out)


Isn't it a generation thing? If I had the choice everyone would be on IRC still.


I've used IRC for a long time and still do, but I do think Discord has a nicer UX for most use cases. In particular, building communities around clusters of channels ("servers") and support for rich media (yes, some old people might call that a downside) increase the appeal for most people. It's also a lot more work to have a persistent connection on IRC (bouncers).

My main problem with Discord is that it's someone else's centralized, for-profit company and has no apparent barriers to enshittification[0]. As Reddit recently demonstrated, it's probably a mistake to build communities on top of something like that.

Matrix is a good candidate for a modern successor to IRC. It's not quite as slick a UX as Discord, but it addresses the main advantages Discord has over IRC.

[0] https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys


Matrix would be my choice as well but good luck getting people to use the uglier alternative. Discord is great.


Fue problem with IRC is that it's crucial to have really robust read state synchronization across desktop and mobile these days.

Slack was the first to really get that right, and Discord effectively emulated them and made it available for free.

IRC users could get there with bouncers, but those were always a lot harder to get going with.


Practically all of my friends grew up with IRC, we are in our late 30s, 40s, early 50s.

We might reminisce about irc but we all prefer discord.

Even the searchability of indexed irc has been surpassed by other knowledge sites. It would have to be something extremely niche these days where the only source of info is in an irc chat log


IRC doesn't even have history, one of the most basic requirements for a modern rudimentary chat app. It's ridiculous to suggest using it in 2023 when it doesn't have features a freshman homework assignment chat app has.


1.People like talking to each other on discord. 2. No. :/



What's the rationale for telemetry tracking?

https://github.com/bentoml/OpenLLM/blob/main/src/openllm/uti...


They have a section about it in the README:

https://github.com/bentoml/OpenLLM#-telemetry


Very cool, btw it's not mentioned in the readme so I assume it's only for running full precision models or do quantized GGML/GPTQ/etc. also work with it?


Hi there, 8bit and 4bit is currently supported on main. GPTQ is working in progress, as well as GGML


GPTQ support would be amazing (AutoGPTQ is an easy way to integrate GPTQ support - it's basically just importing autogptq and switching out 1 line in the model loading code).


How can we stay tuned if we can't do tuning? :P


Fine-tuning is coming up in the next release!

You can actually try it out on the main branch :P


Again?? I definitely have to try out BentoML then to see what the hype is about!!


BentoML is an amazing tool that allows you to quickly and easily deploy your machine learning models as APIs. It is extremely user-friendly and easy to use, and the results are amazing. I highly recommend BentoML for anyone who wants to deploy their machine learning models quickly and easily.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: