Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...


You seem to have found a very polite way of highlighting this.

I assume that in a not so distant future, a malware scanner will detect this and disallow one to run this locally.


Yeah, we were playing around with doing some semantic chunking. Works okay for some use cases. We have some ideas to go further on that.

Generally we have found that recursive chunking and character chunking tend to be short sighted.


Don't you find it dangerous to just run the code w/o any sanitizing?

Why not capture a few strategies that the LLM returns as code that can be properly audited (and ran locally improving the overall performance)?


It is dangerous, part of the reason that we haven't productized that further. One of the ideas we had to productize the capabilities further was to leverage edge / lambda functions to compartmentalize the code generated. (Plus it becomes a general extensibility for folks that are not using semantic code generation and simply want to write their own code.)

The idea of auditing the strategy is interesting. The flow that we have used for the semantic chunkers up to date has been along these lines where we : 1) Use the utility to generate the code snippets (and do some manual inspection) 2) Test the code snippets against some sample text 3) Validate the results


Why not use Stanford Stanza?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: