Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Answer Overflow – Indexing Discord content into the web (answeroverflow.com)
333 points by rhyssullivan1 on June 18, 2023 | hide | past | favorite | 98 comments
Hi!

I'm Rhys, I develop Answer Overflow a search engine for Discord channels. Answer Overflow indexes content from channels into Google making them discoverable on the web.

I'm sharing this again after seeing a lot of discussion during the Reddit blackout about the inaccessibility of information sent in Discord servers.

Answer Overflow is a verified bot in over 100 communities, fully complies with the Discord ToS, and is open source! https://github.com/AnswerOverflow/AnswerOverflow

Check out some of the communities here!

T3 Community - https://www.answeroverflow.com/c/966627436387266600

C# - https://www.answeroverflow.com/c/143867839282020352

Reactiflux - https://www.answeroverflow.com/c/143867839282020352

All - https://www.answeroverflow.com/browse

Please let me know what feedback you have, thanks for checking it out!




Genuine question: I love Discord, but how on earth is it possible that such functionality was not built-in to begin with?

I really don't understand how the need for indexing and search was overlooked.


I think it's due to how Discord evolved as a platform

Discord start as "your private place for your friends to talk" during a time where there were a lot of privacy issues with other communication methods.

Then as it grew beyond this scope of being a private place for friends, it would have been good for indexing to be added but indexing a normal text channel is really hard since you don't know where the conversation starts / stops to submit to a sitemap.

Now we've got large public communities and forum channels so it's possible they roll out their own version soon, but it does still slightly go against how their product was originally created so there may be some hesitation with adding it due to not knowing what the community reaction will be like.


>Discord start as "your private place for your friends to talk" during a time where there were a lot of privacy issues with other communication methods.

Discord started as a way for gamers to chat with one another. Initially the developers even wanted to sell games directly from the platform [1].

I think it would be incorrect to position Discord as a privacy-oriented platform when the desktop client needs to be run in a sandbox because there's no real way to disable data collection.

1. https://www.pcgamer.com/the-discord-game-store-is-now-open/


This.

Discord came about because all the instant messaging services (eg: AIM, MSN) had recently died, Skype was hot flaming gasoline garbage, Teamspeak and Ventrilo were tedious and expensive (for gamers), and otherwise there were no other means of reliable, easy, free, convenient means of voice communications.


Mumble was the best free voice communications app and briefly took over after Teamspeak fell out of popularity.

Discord was really the best place to hang out with your friends outside of games. Social media was too distant and Steam friends didnt do enough. The old IRC chatroom / Discord format with voice comms and screen sharing is THE way to relate to friends online now.


It's beyond ironic that Discord called itself that, being the successor to OpenFeint and its privacy lawsuit scandal, and being proprietary.

Now it is one of the most privacy-hostile AND preservation-hostile platforms around.


Unfortunately, it's a natural result of Discord moving from being a useful little service to a "platform" with investors and needing to constantly be updated with useless nonsense to keep the "value" of the product alive.

Realistically, once everything was up and running, and they had moved their DB over to their current platform [1], someone should have taken the keys away from them and just said "Discord is done, it's complete". We likely wouldn't be having this much of a problem with useful information being hidden away behind Discord server invite URLs.

[1] https://discord.com/blog/how-discord-stores-trillions-of-mes...


The solution to the growth problem is incubating complementary products.


For regular growth, maybe, but not for investment-fueled hyper-growth.


Every big, lasting company has a product portfolio: Google, Amazon, Meta, Microsoft... A single product will leave you vulnerable to disruption, and the aforementioned problem.


Correct. However, I feel the specific companies you mention didn't aim for hyper-growth, and had space to diversify their product portfolio. Discord, much like Reddit, is a one-trick pony. The weird thing to me is, Reddit feels like it should've been in the Google/Amazon/Meta/Microsoft bucket - yet somehow, through all these years, it failed to diversify.


I'm not sure Discord knew what it wanted to be. Private for your friends, but not end-to-end encrypted. Chat for streamers with rooms, but not a streaming platform. (They tried. Twitch also tried to make a Discord-like desktop app.)

Now they seem to be leaning into being Slack (notice that you can switch accounts, so your coworkers don't know you're xXxedgygamer69xXx or whatever.)

My takeaway is to always be a little scared about accepting investments. Your investors will make you hire people, who will want to work on something. The end result is a Frankenstein's Monster of a product.


It wasn't meant to be "private" in the e2e sense. It was just meant to be a modern reimagining of Ventrilo/TeamSpeak, which were group chat programs for gamers. The video call stuff came after (since it was strange to not have video if you were already voice chatting in the year 201X). They want to branch outside of servicing online gaming communities but that was their origin, and you can still see how that culture affects their product design today. A good way of understanding the product is to just ask "would online gaming communities use this?".


Not everything should be preserved forever. It's actually really nice to be able to talk online and not have it form a permanent record that can be instantly referenced by anyone.


Correct. However, some things should be preserved forever and accessible. It's really nice to be able to put a piece of scientific jargon, or a text of an error message, into a search box, and get back links and references to material you can peruse at your own pace and discretion - as opposed to having to join a closed community and keep asking people, hoping someone who knows the answer and is willing to help spots the message before it disappears in the flood of ongoing conversations, and monitoring said flood so that you catch the answer in time to ask follow-up questions, etc.

Point being: different needs require different tools. Current trend is doing everything in closed, ephemeral groups.


Also, there's the perpetual issue of people who just want to be assholes to other people on line and not actually help. This has existed in internet based chat platforms and is why they will never be better than documentation.


> Also, there's the perpetual issue of people who just want to be assholes to other people on line and not actually help. This has [always] existed in internet based chat platforms and is why they will never be better than documentation.


docs/man pages should still be made, and people who are assholes should be banned.


It's not made for knowledge discovery; it's for gamers. Just look at that busy UI! The content is assumed to have no historical value.


But because it's free, and forums suck, people are using it for knowledge


Forums don't suck it's accessibility the biggest strength of discord is probably that they let you discover the communities and even take part in the conversation without signing up. It's just like IRC back in the day choose a username and of you go and if you like it converting the temp account to a normal one is just 2 clicks.


Forums are terrible compared to Usenet, multiple clicks to see new posts, at least Discord has alt-shift-up/down


It makes no sense to index the vast majority of content. You would need to cherry pick really hard among all the noise to find the stuff worth putting online.


I would argue it makes no sense to index the vast majority of content without good search. If your search is good enough, you can index everything and then surface only the good stuff at query time.


Indexing trillions of messages is outrageous.


That's why "archive to the web and let Google handle it" is a good solution.


Interesting comment. I would think Reddit is similar in terms of content, yet “site:reddit.com <query>” is common as a general search pattern (pre-blackout)


Discord is made for realtime chat and instant messaging. People don't put a lot of thought process in the vast majority of what they write. The format of ressit encourages a bit of a slower conversation and therefore more thought through comments. You barely even have time to edit a message before it's already irrelevant in discord.


That depends a lot on the channel. Some channels are exclusively monthly public announcements, of new version of software, or bugs, or announcements of conferences for example. Or job opportunities with instructions on how to apply.

Some channels contain a high level of knowledge in the same way as a forum, knowledge you won't find posted again, perhaps because the person who posted it no longer works at the company. Some channels are used for work purposes, similar to Slack and MS Teams, and contain async information across timezones about product development, or bug-investigations updates, or design discussions, which are more real-time but valuable to be able to find again. Some channels contain the information needed to enter and participate in multi-month long competitions. The information is not posted anywhere else, even though that would be useful and well suited to a blog or issue tracker.

Some places are using Discord not because it's their favourite thing, but because many other companies and projects in the same field are now using Discord, so it's become expected by users (like Twitter). After all nobody wants to install yet another application, but if everyone in a field already uses Discord for work, then that's what you have to use as well.

My first year of Discord was entirely because it was required by my job, as it was the way everyone communicated at that company.

The point is it was the only way everyone communicated at that company aside from a negligible number of emails (maybe 10 emails in a year), and some Telegram channels with outside parties. So nearly every formal announcement and work-related message, as well as real-time chat went through the company's internal Discord server. When I occasionally searched the company dev channels I found a treasure trove of relevant technical knowledge I couldn't find anywhere else. Knowledge I couldn't get in a reasonable time by asking current people. But being Discord search rather than a web page, it was really hard to keep track of things found, and that knowledge was effectively lost.

At my current job we use Teams the same way the last company used Discord. Teams is worse.


Discord has 'indexing' and search, just like how Slack does. It's just not on the public & open web - only searchable inside of Discord.


Pretty sure that wasn't the spirit of what GP was asking


> I really don't understand how the need for indexing and search was overlooked.

It wasn't overlooked. The point is to make it difficult for outside users to access information unless they sign up.


What I wonder is why would anyone that cares about archiving/search would choose to use Discord?


Apparently because it is very easy to setup and offer a place where people can join.

More and more open source projects are using it and I don't really like it, but what easy alternatives can you recommend to them?

Genuine question, as it is an open issue for me. I want to focus on my project, not setting up and maintain a forum, mailing lists, etc. on top of that.


> what easy alternatives can you recommend to them

Github discussions.

A project I follow moved all their Q&A from discord and it's a joy to both search and ask now.

Discord is actively hostile to anyone simply wanting to browse.


I would much rather engage with a community on Discord than on Github discussions/Reddit/a forum. The inability to easily browse historical posts is a feature to me that makes it much less risky to ask questions and join a community. I don't have to spend a lot of time searching and effort to phrase my questions just right to avoid getting yelled at by someone that I should have used the search instead of posting.


But then instead of putting X amount of effort up-front to do search, you have to put in 10X amount of effort to join a community and ask someone about the thing. It's not a good trade, IMHO.


In my experience, most people on GitHub discussions seem to be more than happy to help out, and over time it also builds up a reservoir of information that anyone else can tap into. GitHub discussions search is relatively good.

Whereas on discord, you might get a great answer which helps you and anyone who is immediately following that conversation but that's it.


Gitter is an ignored solution.


Those are the biggest aspects, but the private aspect is actually an important part of why many open source projects are using it. They don't really want to be having those sort of conversations in github issues or the discussions page or anywhere indexable.

Like getting on a call with a coworker or a collaborator on the project. You can technically record it or leave it open to the public, but most people do not because they just see it as not "informing the public" worthy.


I don't know either, but I'm not the Discord target audience. I've tried but it gives me a headache like a Vegas casino.

I'd rather see something simple and plain (and functional) like phpbb.


Matrix, but that doesnt really fix the searchability problem


Discord is a chatroom first. What non-enterprise chat comes with archives?

A forum is totally different.

And even then, forums weren’t designed to be archived from the start. People just wrote web crawlers and search engines.

(I know Discord has some forum-like functionality now but the point stands.)


Discord does have search, but I really hope they do not improve it.

The lack of good search really prevents the hostility towards new users that you often see on Reddit/forums where every question is instantly answered by a one liner "use the search" reply.

Discord communities are some of the most friendly and welcoming communities I have ever encountered on the internet. I think a large part of it is the chat nature and inability to easily pull up old comments.


I always use discord search to find answers, but a lot of people don't bother. I'm not sure answering the same noob questions over and over is fun for anyone


Maybe not, but it makes a community feel very welcoming


I'd rather have the tools to find things without having to interact with humans.


No one are automatically entitled to get the output of a community while sitting on the sideline, although a community may chose to make it available.

Perhaps you should consider getting your answers from ChatGPT in the cases where a community has decided to be for themselves instead of the greater internet.


> No one are automatically entitled to get the output of a community while sitting on the sideline, although a community may chose to make it available.

That's the difference between the "cozy web" and the web 10-15 years ago. The communities used to grant everyone the right "to get the output of a community while sitting on the sideline" by default; in fact it was weird to gatekeep things that could be useful to others - the only thing that was gatekept was "write access", and it was done based on attitude, not on someone's ability to do networking and invest large amount of time on an ongoing basis.

Of all groups, it's both ironic and sad to see this happening to open source projects in particular. Those projects owe their very existence - and people participating in them owe their skills - to that open, indexed, no-strings-attached knowledge-sharing culture.


The alternative here is that people are more of a burden on the community, not less. It's more disruptive to have noobs asking the same questions and having to be answered by community members every time. At least from where I sit.

I don't think asking noob questions makes you part of the community in the way it matters here - actually contributing.


Rhys - are you sure the consent functionality is working? I'm seeing indexed posts by users who are in a time zone that makes it very unlikely they have consented in the last hour or so.

The one user whom I contacted said they had never clicked the green consent button.

EDIT - turns out those posts were only visible to me when I was logged in to both sites (which makes sense).

It wasn't obvious this was the case and checking incognito shows things correctly.


Glad we got this resolved and it was all working properly, the site does need to do more to make it clearer when viewing a private message while signed in added it to the backlog sorry about that!


While I see the value here, I don't really think most Discord communities are appropriate to be indexed. It breaks the whole cozy web aspect of it. [1]

[1] https://maggieappleton.com/cozy-web


The "cozy web" is out of control these days. A lot of social utility is lost by default because everyone uses Whatsapp and Discord and other such information black holes, places where knowledge goes to die. It's OK if you're using these to chat with your family or friends, but it's kind of... less OK, when every open source project these days, including major programming languages, tells you to join their Slack or Discord for support and learning.

What's happening is that these "communities" demand you to commit first, and deny providing value to passive participants. If that sounds reasonable to some, let me point out that the entire value of the Internet is built on doing the opposite. Wikipedia, Reddit, StackOverflow, everything that you can find through a search engine - those are all resources made available by people and groups that, for various reasons, decided to share knowledge instead of hoarding it, invite passive participation instead of demanding active commitment. The good days of the Internet, the ones people mourn, back before it got fully commercialized? They were built on the sentiment of openly sharing information, giving them "pay it forward" style - not gate-keeping them in webs of trust, and/or demanding people to pay with effort.

Maybe I'm too old, but I hate the "cozy web" with passion.


I was an active participant of the 90s web, and in fact a lead editor and forum moderator for a popular turn of the millennium news site, so I understand the frustration you're sharing.

That said, I'd argue it's not the "cozy web" that's out of control, but instead the "dark forest" that has forced the creation of the cozy web. The cozy web is the only bastion of the internet left where there's still some semblance of the pay it forward community aspect of the early web.

Yes, it is at the cost of not being indexed, but it's the only way of having the genuine sorts of conversations and creation with people of shared interests that typified the early web now.


> Maybe I'm too old, but I hate the "cozy web" with passion.

I don't know if you remember the net/web split but that's exactly what it felt like. Net people would crap on port 80, demand you install a news client and add some byzantine undocumented header or join an IRC channel and send custom DCC commands. There was also a lot of gatekeeping and making fun of the normies ("I may be a nerd but look at Bill Gates, one day I'll be your boss.")

It was a culture I really didn't enjoy and I mostly stayed out of because everyone seemed so interested in exclusivity. Not too many people seem to remember those communities either which says a lot.


I came on-line at the tail end of it, when the denizens of the old were trying to find their footing in the new. IRC, mailing groups, early phpBB boards. This means I could've missed some stronger forms of gatekeeping, but the ones I do remember were all what I'd consider pretty good and desirable: it was gatekeeping based on knowledge, or interest in getting one. That is, all natural gates where the act of crossing them ensured you also could enjoy and contribute to the commons. And, importantly, they were mostly gating only write access, not read access.

The overall feel was, the gatekeeping served to bounce off trolls (before that name was common), and to redirect clueless newbies onto a path where they could either go away, or stay, learn a little bit, and then arrive at the gates again, only to find them wide open. Contrast that with the "cozy web", where the gatekeeping just tries to protect the community from the entire outside world. That's a huge change in overall feel - friendly and inviting vs. apprehensive and afraid. Viewing people as potential friends by default, vs. viewing them as potential enemies.

> and making fun of the normies ("I may be a nerd but look at Bill Gates, one day I'll be your boss.")

RE that, I may be biased, but I find it fully justified. It's not like nerds won this in any way - STEM interests, mastery of skills and concept outside of normie culture approved list (i.e. arts and performances - sports of every kind, playing or singing music, painting, writing, etc.), intellectually deep fiction, and clear thinking in general are still frowned upon and actively discouraged by the society.

While the "revenge of the nerds" memes, "Jocks being bosses in high school, nerds being bosses at work" was a good joke / dream to discharge some frustrations over, it didn't materialize either. On the contrary - if you look carefully, most of the successful bosses are high-school jocks too, and I'm talking in fields like finance and tech too. That's because entrepreneurship and playing on the market is a jock's game, not nerd's game. You win it by looking good, talking smooth, and not caring much about the accuracy of what you say - not by knowing a lot, having strong mental models, and treating truth as valuable for its own sake.

> It was a culture I really didn't enjoy and I mostly stayed out of because everyone seemed so interested in exclusivity.

Unless you're talking about those much earlier communities, way before Eternal September, I have a different view. Exclusivity can be good, and back in the IRC/early phpBB era, most exclusivity was of this kind - that is, anyone was welcome, they just had to show minimum effort up front. Contrast that with today's "cozy web", where everything is exclusive by default, and the exclusivity is of the bad kind: secret clubs to which you get invited by existing members and/or both write and read access are behind gates that require great and ongoing investment of time and effort (i.e. keeping up with the flow of the live chat).

Maybe it's the nerd in me showing, but the cozy web is way too personal in this sense.


> I came on-line at the tail end of it, when the denizens of the old were trying to find their footing in the new. IRC, mailing groups, early phpBB boards.

Ah you're actually talking about a time slightly before I'm talking about. I agree with what you mean when it comes to good-natured acculturation. A lot of those people left to the Web (or used both.) Later on a lot of the people that still used non-web services became more defensive about their parts of the net and doubled down on it.

> While the "revenge of the nerds" memes, "Jocks being bosses in high school, nerds being bosses at work" was a good joke / dream to discharge some frustrations over, it didn't materialize either.

The reason I didn't like it was because it classified the world into two types of people. Were band kids nerds or jocks? I never identified strongly with either end of the spectrum due to growing up in a low income area, so I found the entire thing to be problematic and exclusive in the wrong way.


I am an advocate for knowledge sharing and have previously contributed (a tiny amount) to the community mentioned above, Reactiflux. There, I was able to share my knowledge freely without fear of being penalized or judged through a voting system, or being heavily moderated as is the case with Wikipedia or StackOverflow. I also didn't have to worry about my contributions being eternally indexed on the internet. As a contributor, this is a feature (much less so for the lurker).

On that note, I recently had to request a deletion from Internet Archive because I shared content on my personal website that violates a ToS (it's a Slack archive that I have already anonymized). Unsurprisingly, my request went unanswered.


We seem to have interesting differences in perspective.

> There, I was able to share my knowledge freely without fear of being penalized or judged through a voting system, or being heavily moderated as is the case with Wikipedia or StackOverflow.

Private communities, especially chats, come with - IMO much stronger and impactful - built-in judging by peer pressure. That is, if someone doesn't like your contribution, it (or you) might get ridiculed in front of the entire community. At the very best, you'll have to defend the merit of what you wrote, which is kind of like replying to criticism on Reddit/HN, except you have to do it real-time. I personally vastly prefer the voting system on discussion boards. Less noise, takes more time to settle, lets you get positive feedback too (this is now partly solved in group chats via reactions), and of course:

> I also didn't have to worry about my contributions being eternally indexed on the internet. As a contributor, this is a feature (much less so for the lurker).

As a contributor, I never thought about it as a feature - on the contrary, I'm less willing to contribute something to a community (as opposed to small group of real life friends and family members) when said community is staying unindexed and unlogged - denying access to information to lurkers, and also to future community members, and even to current community members, as on such platforms search, if it exists, is so bad that it may as well not be there (also group chats make this structurally hard, too). I just don't like, and never liked, contributing anything to knowledge black holes.


I agree, but the old web did have some information black holes, like IRC, unarchived mailing lists, junky forums, and more I probably can't remember


Most Discord communities aren't meant to be indexed I agree! Thanks for linking that article it was interesting to read

There's lots that have support channels though for programming libraries, for games, etc and having all of that content locked away can be really damaging.

One of the interesting things I've noticed is when a community for a more niche game / programming library joins Answer Overflow, they often shoot up to being top performers on the site which is great to see.

Along with that, not all channels are indexed, mainly just help channels. What's nice with this is it keeps that cozy feeling of a private place to talk, while helping more people find a community they will enjoy and keeping information accessible.

Long term, I'd like to implement forms of anti-abuse tools for communities to use so they can understand what the types of people who join their server from Answer Overflow are like. For example, if it turns out that 90% of the people who join are abusive, then it'd make sense for them to turn off indexing.

You could possibly make the argument that for the long term health of some communities, having indexed content helps to keep the community active


Thanks for the thoughtful response. Glad to see this is something you care about preserving.

Good to see you're careful to only share particular channels.

I have more thoughts on marketing this and also on guidelines for server administrators implementing search indexing. For marketing, most importantly, it could be good to make it clear you're focused on selective sharing only of channels which it would be a public good to make indexable. For administrator guidelines, most importantly, I think there should be several measures to ensure that users are aware of and agree to having their communications in particular channels publicly indexed.

I ran this by GPT-4 for some more context and detail. [1]

I think with measures like this we may be able to realize the good of indexing without going too far to driving away the safety of the walled garden aspect of Discord.

As an aside, for users of existing Discords, I encourage you to learn to use the search features built into Discord. Discord itself indexes servers and the search has good filtering functionality. I suspect if you already know which Discord server has the information you're looking for, you'll have a better experience with the internal search than trying to lean on Google.

If you want to do better than the internal search, perhaps creating a vector store of the channel and setting up an AI chat application in front of it would be a solution.

[1] https://chat.openai.com/share/254632c2-c25b-4299-88c9-2ce49e...


Most discord communities that are big enough to get indexed were supposed to be forums anyway, or part of one.


Indexing Discord is going to be tough. The reason is that context is all over the place:

Question in one message. Then two unrelated messages. Then a partial answer by somebody. And so on.

It’s even worse than indexing a PDF. Just breaking stuff into paragraphs and generating embeddings isn’t going to cut it.


I imagine this will only work (and only index) threads. So the context can be gathered from the thread title/body and underlying messages reflect the discussion.

Some communities I'm in have #support channels which only support threads. So you create a thread, add a title and a body message and people can reply to your thread by clicking on it. There's no way to post individual messages; only comments in threads.

Thread overview: https://i.imgur.com/jfvrRtG.png

Opening a thread: https://i.imgur.com/pqGrARI.png

This solves your context problem. Still not sure if this is the right direction we want to go in. This just proves to me that Discord is not right tool for the problem at hand.


Welcome back. How does this compare to Linen (https://github.com/linen-dev/linen.dev#readme), which claims to support Slack and Discord? I do see the license difference, but didn't know if that was the major differentiator


Couple key differences:

- Answer Overflow works on a consent basis for displaying messages (https://docs.answeroverflow.com/user-settings/displaying-mes...), while Linen does all the messages in a community. The consent system Answer Overflow has helps a lot with respecting user privacy while also getting content indexed.

- Linen appears to be building out a competitor to Slack & Discord while Answer Overflow is focused on building on top of those platforms, so we've got very different roadmaps. From what I can gather from the Linen roadmap, they're implementing things like voice chat, private channels, etc. Whereas with Answer Overflow some of the things I'm focused on is answer automation, tracking outdated answers, analytics for where to improve your docs etc

- Answer Overflow is pretty much only focused on Discord servers, it wouldn't be too hard to support both Slack and Discord but what's nice about focusing on Discord for now is it helps with our goal of being the best indexing tool specifically for Discord

- Global search (https://www.answeroverflow.com/search), you can search all Answer Overflow communities at the same time

The team at Linen have built out a great product though and it's cool watching them succeed with it!


People who give their consent to Discord to host their writing don't necessarily do so for third parties. Isn't there a copyright issue here?


Not really for a few reasons:

- The API grants you essentially a sublicense to the data, since Answer Overflow is a bot going through the official API and following the ToS properly, that should cover it for any potential issues - Answer Overflow gets consent from users to use their messages https://docs.answeroverflow.com/user-settings/displaying-mes...


Oh you're getting user consent, so that makes sense. But I don't really understand how the API would grant you a sublicense to host written works.


It's from this section of the terms of service:

https://discord.com/developers/docs/policies-and-agreements/...

> Subject to your compliance with the Terms, we grant you a limited, non-exclusive, non-sublicensable, non-transferable, non-assignable, revocable license to access and use the APIs and Documentation we make available to you solely as necessary to integrate with, develop, and operate your Application

When you post on Discord, you grant them a transferable license to your content and that's one of the ways they use it

Disclaimer that it's probably more complicated than that and I'm a software engineer not a lawyer


This is awesome and timely!

I’ve been wanting to set something like this up for the nullbits server for a while. When I picked discord instead of a forum, I wasn’t counting on the growth we saw. There’s a lot of friction for new folks who aren’t yet on discord, and there’s a lot of knowledge in the server that’s locked behind discord.

Just set everything up! My only feedback is that enabling indexing for all of our text channels took a while doing them all individually, but that’s kind of on me for not enabling forums for help requests until now.


Welcome to the Answer Overflow community! I agree it'd be good to have a quicker way to setup multiple channels - to be honest it's kind of far in the backlog as it's pretty rare a server has many, but the UX could be improved there

If you have any other feedback, please send it to me on Discord so I make sure I see it - thanks!


There are several issues with surfacing search results from Discord as mentioned before in the thread, and even if all of them are resolved the biggest one remains relevance.

Unless a general purpose web search engine introduces a special Discord 'tab', like Images/News/Videos already exist, there is no way for a search engine to assign relevance to anything said on Discord because there is no authority or link graph based credibility for any message. In other words a mention of 'blue widgets' on Discord is competing with milions of web pages mentioning 'blue widgets' which all have some kind of built in relevance. If the idea is that this will be achieved through people linking to an aggregrator like this website, then perhaps, but the approach does suffer from the chickien and the egg problem.


I'm mostly interested in surfacing content on pretty specific topics with clear keywords.

But also either answeroverflow.com will gain some domain authority over time, or the communities will be hosted on domains that already have some.


I can imagine obvious use cases for data surveillance, osint and so on But happy to see implementation of a semantic search engine powered by LLM


I was talking about needing a solution like this just a second ago. Down from the heavens, descends this. I'll be sure to give it a try!


Me too! I am trying to build a Discord-based remote course and am excited to read through the code here and see if it matches my needs, or can be tweaked to so.

Once I do that I'd like to DM you with some questions mid-kid.

Nice job on getting so much implemented and open for users!


Send me a message if you have any questions! Happy to help with getting it setup


This might sound a little bit picky, but from a cursory look around the project, it feels a bit too corporate and platform-ey for my tastes. I'm only interested in two things: generating (ideally static, and seo-friendly) web pages out of a discord forum channel and selfhosting it so we can archive the data ourselves (and won't be bound to content policies of answeroverflow.com). All of the extra bells and whistles with the bot auto-managing channels, analytics, AI and whatever else superfluous and make me sweat a little, as I'll have to comb through the documentation to make sure everything is set up correctly. It's also really a shame to read that selfhosting will be a "Pro" feature. I'll give props for considering users wanting to opt-out, however, and it does at least seem rather simple to set up.


Where did you see self hosting is a pro feature? My bad if the website gives that impression. It will be free, the whole codebase is MIT licensed.

For all the extra bells and whistles, it’s mainly for people who are doing community support at scale who need it which would be paid customers - I do sort of need a way to support myself so I can buy groceries. The core of the product that matter is free and working well for indexing content so now the focus is “what else can we do to improve community support as a whole?”

As for self hosting, if you submit a PR for supporting it I’d be happy to get that merged but it’s not really a priority at the moment. The codebase is setup to be pretty easy to make a self hosted version though.


Haha, that's fair. I'll consider trying to set it up myself and see how it goes.

I got the idea that it was a pro feature out of the roadmap list on the website, where it's listed as "coming soon", and "pro" is only mentioned when you click on the waitlist join link. If it means custom domains, it might be better off being listed as "custom domains" or something similar. That's how it's called on google apps and such. It also doesn't help that the roadmap on the website doesn't match the one on the github page, I thought the roadmap features on the github page might be pro features as well.


Ah I see how that’s confusing, sorry about that! I’ll update it in both places to make that clearer


Cool idea, There have been cases where I had to create a burner account just to access a Discord community and its walled content.


Soon discord will pull a reddit and shutdown your app.

Good luck!


If this takes off you may very well get a letter from Stack Overflow lawyers over the name. It's your choice if you want to take that risk, but just FYI.

(And to be honest, I think they would be justified too; I initially assumed it was related to Stack Overflow based on the title. but turns out it's not – this is the sort of confusion trademarks are intended to protect).


Under their own guidelines it's fine https://stackoverflow.com/legal/trademark-guidance

> Do name your application with something unique. Including one of the terms, "Stack" or "Exchange" or "Overflow" in your product name is generally okay.

It's a different enough product that I feel comfortable with it - Stack Overflow is only for programming while Answer Overflow is for all topics. Along with that Overflow is a pretty generic word and if you wanted to get super technical with it, the context I'm using the word in is "I have so many answers they're overflowing" while theirs is a reference to a programming term.

We'll see and I'm not a lawyer but given that their trademark guidelines allow it, I feel comfortable


That part specifically refers to things built on the Stack Overflow API. And "generally okay" is of course hardly a guarantee. "Overflow" as a word is fine, obviously, but it does sit in the same "get answers" space – I can name my restaurant "Best Apple", but I'll have more problems if I named some piece of electronics "Best Apple".

It's your site, you can do what you want with it and you're free to ignore my comment – that's fine! But personally, I wouldn't have named it Answer Overflow.


Ehhh according to their guidance the use of the word “overflow” seems supported.

https://stackoverflow.com/legal/trademark-guidance


Good on you (and everyone) for releasing it as Free Software. Just the target is unfortunate.


It would be useful if clicking an image opened it in an imagebox or expanded it inline.


Um, why Google? So your indexes can be polluted with their shitty advertising? Why not expose your index as a service? I mean really, WTF not?


As much as I like this project. Discord is as absolute disaster. The only reason communities move there is because it's free.


Slack just lets everyone do it.


Telegram need that too


nice, there is a lot of good stuff on discord!


I'm sure Discord and their communities are absolutely ecstatic about opening up the doors to openAI and others to scrape their collective work for the latest LLM.

Walled gardens are going to get a whole lot stricter.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: