Hacker News new | past | comments | ask | show | jobs | submit login
Open Sourcing Active Question Reformulation with Reinforcement Learning (googleblog.com)
36 points by jamesjue 5 months ago | hide | past | web | favorite | 2 comments

I haven’t read the paper yet — but from the summary the agent rewrites questions and learns to find “more rewarded” reformulations — but how does the environment derive a reward signal useful for observing the quality of a reformulation ...? That seems as hard as the original problem to me ...

I looked at the paper (too Fast I could say bullshit), the techniques consist of retrieving from a question a list of differently phrased same question thus, if the existing QA answer system fail with one, it can retry with other phrasing of the same request. Also they may combine answers from subquestions to the global visible answer, but this must have many issues.

Nothing directly reward reformulations. But the global answer can be rewarded by user feedback. Yes this indirection still seems like an issue.

What I would do instead of this strategy would be to cluster extremely similar/other formulations of the same question by different users and then store for each frequent common question a list of reformulations (user generated). Of course the list would be based on the profile of the user.

I do not answer How similarity/identicality of user formulations would be determined by I have a couple of heuristics in mind.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact