Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: IngestAI – NoCode ChatGPT-bot creator from your knowledge base in Slack (ingestai.io)
166 points by Vasyl_R on Feb 23, 2023 | hide | past | favorite | 112 comments



Hey Everybody!

We are super stoked to announce present you IngestAI today. It is the fastest way to build contextually intelligent ChatGPT-like bots within your own WhatsApp, Slack, or Discord to answer queries from your knowledge base, documentation, or educational materials.

IngestAI is a very useful tool for a diverse range of businesses that have company knowledge base or customer support. IngestAI can save their money, providing precise and relevant answers about your product 24/7.

You can upload your technical documentation as well as information from previously resolved support tickets, educational program content, ecommerce products description or any other information relevant to your business case.

Key Features:

1. Flexibility: IngestAI have different file formats supported for uploading, like txt, MS Word, PDF, Excel, and many come to come very soon.

2. Other types of uploading that IngestAI currently support is URL links and integration with Notion and Confluence comes in March’23.

3. Built for global community: Integration with Sack, Discord, Telegram; 2/24 release: WhatsApp; 2/25 release: API (means integration with Shopify, Etsy, Magento, etc or even integrate with your custom CRM/ERP); March’23 release: MS Teams and Facebook Messenger.

4. AI first: IngestAI is harnessing the power of OpenAI to provide precise AI-generated answers relevant to the uploaded context

5. Customizable: go beyond simple queries using IngestAI prompt templates – use ones we have pre-made, edit any of them to your needs, or create a new from scratch.

For enterprise clients willing to use IngestAI with sensitive information we offer possibility to store all the information locally on-site or own AWS S3 Cloud Storage.

Please join our community: Discord : https://discord.gg/kMpbueJMtQ Twitter : https://twitter.com/ingestaiio

Docs: https://ingestai.io/docs

We would be happy to hear your feedback. Hope you love it! Team IngestAI


Would be cool to know some technical details, like are you fine-tuning GPT-3 on Open AI or built something yourself on top of an open source pretrained model?


Hey there, you're welcome to our Discord server: https://discord.gg/kMpbueJMtQ - we do share some technical details there and we also share our daily progress in our #updates channel. Have you heard of LangChain ?


First time I hear about it, but just checked it out - seems pretty cool, but you still need to feed some model into it right? Many examples seem to use OpenAI.

I see a bunch of apps using LLMs popping out like mushrooms in a forest, but how do you fine-tune it for your dataset? The biggest (GPT-3 davinci) model on OpenAI is not available for fine-tuning.


thanks! We tried to do is as simple as possible from the users perspective. You just upload your knowledge base, create a Slack / Discord or other bot and start using it. And sure, we use OpenAI too. Yes, your're right - we use Davince model and then use use LangChain and other approaches that we're exploring to compare performance of different approaches. Have you heard about LangChain?


I see a Notion example in Langchain, I assume you guys did something similar. Do you know if there is maximum size of documents that it supports?


yes, we can say it's something similar) Do you mean the maximum size our app supports? Currently we support files in different formats but with 10Mb size limit.


Anyone care to recommend some open source options?

I've read about GPT-J, GPT-NEO, Bloom. Understanding they aren't as effective/intuitive as Open AI's stuff but unlike OpenAI, they are actually open.


That is a good point that you have brought up.

We plan to be agnostic of underlying LLMs as ecosystem matures.


Really nice, congrats on the launch. I wonder how safe are my confidential information with your thing?


We use AWS for both servers and storage.. How secure would you call AWS ?


Depends on the application being hosted.

Also: https://firewalltimes.com/amazon-web-services-data-breach-ti...


Is my slack stored in plain text on your AWS server?


we don't store conversations at all for the moment. I mentioned that in previous comments too. That's something some people ask, but we don't have 'conversation memory' for the moment))


Are you using Private channels and 1:1 chats as training data?


thant's somehting I mentionned in some of my previous answers- this can have both positive and negative impact on the contextual intelligence of your bot, so we're now playing around with that and trying to see if that doesn't have any negative impact on the context-awareness. Do you believe chat-memory incorporation into the context it's something people would need?


prompt: "what do my employees really think about me?"


that would be a good one :)) That's why we don't add chat memory for the moment :)))


hey, we are not learning from slack history. The Slack bot takes your knowledge base as the input (markdown, docs etc) and answers the queries asked by the user.


Privacy policy:

> We will not use or share your information with anyone except as described in this Privacy Policy.

...

> We want to inform our Service users that these third parties have access to your Personal Information. The reason is to perform the tasks assigned to them on our behalf. However, they are obligated not to disclose or use the information for any other purpose.

Arreeee they?

So let me get this straight, you want to take my Super Important Private Data, like, you know, my entire corporate slack history.

You'll feed it some arbitrary third party(s) (eg. OpenAI, who's privacy policy is flat out 'we'll use that as training data'), and they are...

> obligated not to disclose or use the information for any other purpose.

Other than what exactly? Provide some nebulous service to you? Like... training a model on it, or storing it and using it for training other models later, or..?

haha... there is: No. Way. That is happening.


So if my competitor is using IngestAI and OpenAI use their data to train ChatGPT, could I literally just ask ChatGPT to tell me some secrets from my competitor's internal communication?


2023 is going to be very exciting times for security engineers.

It won't have the data, but it might have enough of an understanding of the data to leak important information.


While the model clearly can't retain all data, ChatGPT can regurgitate a lot of stuff verbatim.

Prompt:

> Recite the first two paragraphs of Neuromancer.

Response:

> Certainly! Here are the first two paragraphs of "Neuromancer" by William Gibson:

> "The sky above the port was the color of television, tuned to a dead channel.

> 'It's not like I'm using,' Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. 'It's like my body's developed this massive drug deficiency.' It was a Sprawl voice and a Sprawl joke. The Chatsubo was a bar for professional expatriates; you could drink there for a week and never hear two words in Japanese."

(I have not checked how far you can get it to continue)

So perhaps it'll be a question of whether enough of your employees are feeding it copies of your data for it to retain it...


I bet that getting the right prompts won't be easy so it will probably fly under the radar and not immediately be detected. You can't search these weights with command-f. Fun times ahead...


Prompt engineering is only getting better. Also,

> You can't search these weights with command-f

Sometimes you can, https://clementneo.com/posts/2023/02/11/we-found-an-neuron


And good luck trying to add data to it without corrupting some other data it has encoded.


Does this problem disappear when using the Azure version of the service? If not, this is a pretty obvious market need: LLM + privacy.


Most AI companies won't want to offer that. They want to know if someone is using their service to instigate the next mass shooting or ethic genocide.


a good point. And what about companies that have on-premise storage?


yes, with OpenAI and also our type of apps security engineers have to move also move next level. And companies have to understand that it's context-aware only based on the knowledge-base you upload. It can not go and grab some data on your PC just because some one would ask it in chat))

BTW, Thanks for your comments! Appreciate it a lot.


This is a well known problem with this technology (although I haven't seen an official term for it, so we have been calling them "recovery attacks'). It's apparently the reason companies like Amazon have banned internal use of services like ChatGPT. I should add while it has been proven to occur the likelihood of something like this is very low. It's going to be a rare occurrence.


thanks for sharing. Do you think if all would be on the client's local server or cloud there would be still some, even rare, occurrence of that?


The problem could still occur but you would have to be capturing all the queries to your internal LLM systems and then using that data for training. You have complete control of the model so you could just choose not to do that and I would think data leaks of this nature would be less of a concern for an internal environment anyway. You would know that only authorized individuals would have access to the data. I suppose there could still be a very small chance of leaking data to unauthorized employees, but if a rogue employee wants to access data they should not have access to fishing an LLM would probably be the least productive way to do that. Your access logs for the LLM system would clearly display the attempts.

Some commercial services are starting to offer "Enterprise" licenses that prohibit the collection and use for training of your data and that would address the concern as well.


If a server was misconfigured OpenAI could have been trained on non public information. You can also poison OpenAI's dataset if you know it has been pulled by the service.


on a higher level of understanding, yes. But it would answer queries / it would be contextually intelligent only based on the previously uploaded knowledge-base. Did I get right your point?


Probably gonna get downvoted for this but people on HN take the Privacy Policy of brand new websites way too seriously.

This is probably some template they downloaded from the web, not some sly document they had their nefarious lawyers put together.


On the other hand, our company has recently mandated that all the software we use needs to have it's license checked over by a lawyer because we've found some fairly nefarious things in some of the software licenses we're using. Don't want to reveal where I work but let's say there's plenty of people working there who understand software licensing already, and some of the things were still missed until under a microscope.

So maybe it pays to be skeptical / cautious?


As a SaaS vendor I’ve had an uptick in one off requests from legal teams to make changes to our terms.

So far, nothing deal breaking (largely because there isn’t anything nefarious in our terms), but I wonder if this will be a trend moving forward.


thanks for your comment! Are the data privacy concerns the main SaaS challenge from your POV ?


No, so far it has been about limiting their liability in the event of something bad happening on the platform.


Every app using GPT will go through OpenAIs API and all that data is stored and used for training future models. No amount of good will or non nefarious intentions from IngestAI side makes any difference.


thanks for your comment and support. That's what I'm trying basically to say in every third comment here. Generally, data privacy is a buzz topic since some time and with OpenAI it became even more viral.. It's not about IngestAI or any other SaaS, it's much wider topic and totally agree (thanks!).


To be clear I'm saying that you claiming that data won't leak into 3rd parties by storing it in aws is deceptive. There's no way you can protect the user of your product from having their data being soaked up by openai as long as you use their service instead of an open source LLM.


> This is probably some template they downloaded from the web

But that's worse. Don't you see how that is worse?

They want to have access to all the knowledge of the company and they can't even articulate what will and won't they do with it?

> people on HN take the Privacy Policy of brand new websites way too seriously.

People should take Privacy Policies of every website they work with more seriously.


We will work to fix this


thank you so much for your support! Hope there're more people that think like you and are a bit 'empathic' about new services.. We improve with every single client, with every single request, and issue we face. So yes- Data Privacy is something very important but it's not only about IngestAI, I believe. Most of the SaaS solutions use API and are on AWS.. So, again, it's a wide topic, a buzz topic now.. Thanks again)


This is probably the output of asking an LLM to generate a privacy policy.


It seems like the companies who throw caution to the wind and use this will end up being more productive in the short term and win out over more cautious entities. Thus we will only end up with more companies who use these tools or holdouts caving to pressure from competition. This will cause big problems by the time post-nut clarity arrives. We are being dazzled by the tech and forgetting our principles again. We are being pulled down this path whether we want to or not.

It also doesn't help that OpenAI is partnered with Microsoft. I would start with the mindset that all data given to OpenAI through any of these tools goes to Microsoft. Why would you give anything to a competitior?


thanks for your comment. If believe if Data Privacy is almost the only topic discussed here, that's good. Means, people like the idea but at the same time people are concerned about personal / sensitive data. And to be honest- I hope that would be our biggest challenge, as I believe it's a bit easier to get a good, solid Data Privacy policy compliant with all legal and moral requirements than to build a product. At least that's what I hope))


This is hard to take seriously with such a policy. I think this is just somebody playing around with the OpenAI API. I expect serious competitors to have a better policy that is actually usable for corporations, it might take a year but I see no way this gets adopted in a broader way. So far OpenAI has the monopoly, but there will be competitors if you will give them some time.


sure, BERT is to come very soon and I think it will bring even wider adoption of AI. And yes, Data Privacy will become even more buzzed topic. Next few years will be of security engineers and of data privacy lawyers, don't you think?


Don't fully disagree but the reason I would care less in this case is I assume most/all of what you would be feeding it is non-sensitive documentation.


I don't think that matters? Sensitive could be just discussing designs, outages, hiring, interview feedback?

There's a lot of stuff in the average Slack account people don't want on the internet, let alone in a LLM which will potential expose it to the entire world?

Maybe companies like Slack will release integrations natively so it won't matter so much.


Who is feeding interview feedback to a third party regardless if its IngestAI or not?


If you send all of your slack communications to IngestAI, it would include possibly channels where you discuss interview feedback. That's what the parent poster is saying.


And I am saying that was never the intended purpose of this product from what I read. That was my whole point in my OP. I agree that there are unique issues with products like these but its not alone, at the end of the day you should not be feeding sensitive data to third party applications like this.

Edit: This whole thread is goofy. It is the equivalent as saying what if you published your entire internal emails online.


please see my comment in this threads. We 're ONLY learning from your knowledge base and not from your slack chat history.


Yup, thats what I was trying to say from the beginning and why I was so confused with all the concerns.


hey, we are not learning anything from your slack history or channels.

The way IngestAI works is it takes your knowledge base as the input (markdown, docs etc) and answers the queries asked by the user on these knowledge base. ur primary usecase has been to simply learn from public documentation of companies and help answer the queries within their Slack/Discord community.


Data privacy is a buzz topic! ;)) We can discuss days about that :))))


thanks for your comment, Infecto. Totally agree with you. And again, for those who want it to be used with sensitive data, there's an option to use their own AWS S3 Cloud storage or even if that's an enterprise client to put our app locally. Agree?


By using your data as training data, OpenAI might disclose your stuff to the world at large


Most startups are clueless and ignorant about data governance.


would you be able to advise a good startup or service that provides good governance of Data Privacy management for startups? We would like to learn more and get this point as good as possible.. But you're right in some degree- we're builders, and we need help of professionals with data governance.


Ok so, you replied everyone on this sub-thread except the top-level comment. Why is that?

Your first task to improve your privacy policy is to review whether you really, absolutely, for reals, can require OpenAI to follow this: "they are obligated not to disclose or use the information for any other purpose."

Because, it looks like you can't, and OpenAI will absolutely use your customers' data for their own purposes, so you probably should remove this line from your privacy policy at minimum.


First of all - this sounds like a super high potential concept. The big hurdle I'd see that would keep companies from tampering around with the current version is that there's no info on data protection & security.

You won't get companies to share any internal info with a tool like that until that's out of the way - and even then it might require a lot of trust-building. Getting the certifications will be quite a chore though.


Thank you so much for your comment and support! This is amazing how we're received here! Yes, we understand that to get enterprise companies we have to improve our Privacy Policy. Do you think adding possibility for our clients to store your data in your AWS S3 Cloud would be solution?


For what I'd call "stupid" compliance (GDPR,...), allowing for regional storage of data at rest is definitely important.

For any internal data where protecting it is a core interest of the company you'll also either need to prove your whole handling is trustworthy or you'll need to design a process that can do all the steps (preprocessing/vectorization/...) on customer controlled hardware.

If your solution can run in a customer owned environment without any external dependencies, that could save you a lot of auditing/certification.


Hey, uh, real quick Vasyl, could you click on all the images below that contain a stoplight?


Hey there, really sorry, but what do you mean by that? (sorry)


I mean that I suspect I am replying to an underwhelming demo.


This seems vaguely xenophobic because the OP clearly doesn’t have perfect English but it’s also funny.


I dunno, I feel pretty comfortable

https://news.ycombinator.com/item?id=34912932


well. depends of how biased one is, I believe. No?


ahahah! Sorry! That's a good one!!!!! Amazing))) Love that! :)))


Bot confirmed, you have failed the captcha.


I thought I found some captcha on our web-site.. omg)) It was a good one)) Thanks))


I love the idea, but no thanks. I'd prefer to keep a version of this running in my own cluster, without data privacy woes.


Start for Free.... then what? give you all my data and then you hold it hostage for a fee which is undisclosed up front.


thanks for your comment Werds, but why so negative? :)) I mention in almost every second comment answer that we also offer possibility that clients store their data on AWS S3 Cloud storage or if you're an enterprise client, we can even put our app on your servers locally. Would that solve your doubts? Have you had any bad experience with data privacy?


You're still sending the data to OpenAI in the prompts. That's sending arbitrary text from my corporate slack to a 3rd party. Where you host my data before you send it to wherever isn't relevant.

Also, "We host with AWS" isn't really a response to "Will my data be secure?"


hey, to clarify we are not learning from your Slack history or slack channels at all.

IngestAI is about learning from your knowledge base (markdown, docs, notion, confluence) and using that to answer queries for users within slack channel. Our primary usecase has been to simply learn from public documentation of companies and help answer the queries of community.


This looks cool and it is fulfilling a need for me (I created a duct tape prototype of a similar solution using LangChain).

I want to get up and running the 2mins advertised but am struggling to find the docs.

Could anyone point out where these are? Is there maybe a video I can follow to test this?


Thanks for your feedback!

We have our docs right on our web-page: https://ingestai.io/docs but your're very welcome to join our Discord server and we'll guide you thoughout the process in case you have any issue: https://discord.gg/kMpbueJMtQ

And yess, we have video in our docs too? Did you find it yet?


Suppose If I want my bot to reply with a custom message related to let's a girlfriend query , then how should I do that ? I've tried to upload documents in Library but what after that ?


If I want my bot to reply with custom response to a particular set of query let's say girlfriend then how can I do that ? I've tried to upload document in library but what after that ?


This is awesome! Congrats IngestAI team!!

Do you have any perf numbers, in terms of size and response times? Is there a list of file formats you support? Possible to choose the LLM model as my preference? How does pricing looks like?

Again, great execution and useful tool. Thank you for the launch and good luck!


Thanks a lot for your comment and support! Yes, now we support URL links and txt, MS Docs, Excel, PowerPoint formats and expending file formats is one of our top priorities. So please join our Discord server and stay tuned) Thanks for your idea to add possibility to chose among LLM models - we can add it, if users would ask us. Is it possible to ask you to copy/past this request to our #feature-request channel on our Discord ?


That's very cool. How does it work? Do you train a model for each "library"?


Lately LangChain[1] and GPT-Index[2] are very popular libraries for this sort of thing.

This is the vid I started out with https://www.youtube.com/watch?v=rBEHPxVHb5c

Also that channel seems to be great. They focus only on LangChain and GPT-Index lately and post videos every few days https://www.youtube.com/@echohive

Both have Discord servers as well.

Additionally there's also this website for GPT-Index ( https://llamahub.ai/ ) where people add different "connectors", like loading up your .md files, .docx files, your notion, your slack, etc.

  [1] https://langchain.readthedocs.io/en/latest/
  [2] https://gpt-index.readthedocs.io/en/latest/


They say they use LangChain, so it's going to be structurally similar to this MVP side-project: https://thundergolfer.com/infinite-ama.

That's a chat-bot backed by LangChain and OpenAI, which can answer some questions from my knowledge base.

It's WIP, I want to add my HN comments into the mix, but the code is here: https://github.com/thundergolfer/modal-fun/tree/main/infinit...


we're using LangChain underneath and we also doing RnD on fine-tuning to see and compare performance of different approaches. Can you tell me what file format is your knowledge-base? is it PDF or MS Docs or any other?


Very cool! I was recently looking at dumping our slack data and fine tuning some openai model to do something very similar at a place I work with currently.

Were there any pain points in fine tuning you wish you knew before you built everything?


Thanks for your feedback! With IngestAI you can make a Discord or Slack bot within minutes, with no-code. And if you would face any technical issue - please let us know on our Discord or by e-mail and will be happy to assist you.

We go two paths - finetuning and working with embeddings, so still to see what would perform better.

In what kind of kind of file types you store your knowledge-base?


Really cool! I actually tried to build something of this sort using OpenAI, but building trained models was hard. Did hear about langChain, yet to experiment with.


Thanks for your feedback and support! Amazing))) Or you can experiment with IngestAI now)) Have you try it? Create a Telegram bot takes less than 2 minutes- I promise..


Interesting product, but it has no pricing information anywhere.


thanks for your comment! Yep, we'd like to define it during our customer discovery, after we gather a lot of feedbacks, crystalize our understanding of the features users like and ask us to add. Maybe you'd be so kind to say how much this kind of products should cost from your POV ?


How is this using ChatGPT without a public API?


We're at the moment using OpenAI and aim is to provide a ChatGPT like chat experience. And having ChatGPT api access would make it ideal though.


OP is either 12 or a bot.

Please don’t give any of your private and confidential data to these people without due diligence.


Sorry, I could not get what is the meaning of 12 ? From your knowledge base we LangChain it to get the right context and ask answer. How do you envisage a solution of this sort work ? Happy to learn and make it better.


this is awesome, my son is using this with a bot in discord to learn python


wow. that's amazing! Do you means he's using our tool or something similar?


your tool, I created a library using a python tutorial and connected it to a discord bot :)


Integrate with MS teams and it will be game changing..


Microsoft themselves will be coming up with that sorta integration? I feel the opportunity of this probably is going along the lines of Intercom (support desk) which can be present in chat tool of the client/customer choice itself ?


No sure if they would come up with that in the nearest future. Maybe just adding OpenAI / ChatGPT - it's possible, but I don't think they'e add contextually intelligent bots that would answer queries from your knowledge base.


Right, you should try tapping to old conversation history in these support desk and it would make a good difference on the replies. Didn't see any fine-tuning in the website or docs - do you have plan for these ? I am afraod just langchaining would not make it completely context aware.


thanks for your feedback, again! We're considering to add memory, but in some cases, like in group chat's it can make kind a font noise for the context of your chatbot. So we have to be careful with that. Sure, we have docs- here's the link: https://ingestai.io/docs and please join our Discor server: https://discord.gg/kMpbueJMtQ where we do our best to assist our users live or even guide throughot the process if needed.


thanks for your comment! Sure, it's even stated on our web-site - our closest releases are: WhatsApp on 2/24, API on 2/25. But we're also woring on adding MS Teams that would be in March 2023.

Do you use mainly MS Teams in your organisation?


ingest.ai PASS

ingestai.io Now I feel complete




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: