Hacker News new | past | comments | ask | show | jobs | submit login

I take a different stance: the scientific method is all you can trust. Word of mouth is also inherently biased (e.g. someone that works somewhere may paint a rosier picture at a moderate enough company due to in-group dynamics or even mild forms of stockholm syndrome, just as a ex-employee might be overly negative, and it's up to us to identify potential sources of biases in these anecdotes and take them with the appropriate amount of salt)

Same goes for online reviews. Even if a scummy company deletes 1 star reviews, there might still be 2 star ones, or 3 star ones that are (in my experience, anyways) more grounded than the more impulsive 1 and 5 star counterparts. A keen applicant of the scientific method would question a unnatural distribution of high reviews with low information density, not because of prior accusations of foul play, but simply because one wants to come up with a reasoned theory of how the dynamics of review systems play out in general. You may not necessarily have hard evidence that intentional shenanigans are occurring in any given review system, but you can still make up your mind in terms of what are likely factors for why reviews are the way they are and how much weight you're comfortable putting in them.




Not clear how you will apply a scientific method to a site that can hide, convince, bribe, censor, etc users so the information you gather is incomplete to have a good analysis. More when people have a short focus span and a scientific method is complex.


This is just my personal interpretation, but the way I approach is I create a trustworthiness score for the set of reviews. For example, too many 5 stars without comments and without low stars might earn a low trustworthiness score because the distribution looks unnatural compared to the distribution in similar competitors. Or a set of 6 reviews, no matter their distribution will also receive a low trustworthiness score simply because there's not much to go on. Reviews of a restaurant complaining about a particularly egregious experience earn a low trustworthiness score since it looks like an outlier and there's emotions running wild. Review sets with a decent number of 3 star reviews containing several well-articulated paragraphs tend to earn a high trustworthiness score. Etc.

So, rather than the score being a dimensionless good-or-bad scale, it's a meta analysis of the reviews, judged on multiple dimensions. This means that some companies/restaurants/products simply don't provide enough information for me to form a conclusion despite having a number of reviews that lean towards either "good" or "bad". And that's ok, because the very fact that I've considered so many different dimensions also tells me that there isn't necessarily a single absolute best option.

This, in my mind, seems like a more accurate depiction of reality than blindly ranking by number of 5 star reviews.


The problem I see with that approach is that modern transformer based NLP like GPT3 make generating this sort of "review" text almost free. The value you place in those reviews is the perceived effort put into them. When that effort goes to zero... it's just more automated spam reviews with a "believable" statistical spectrum.

Honestly, I don't see how to trust reviews if the reviewers have no skin in the game (either their reputation or money). If identities and reputations can be faked/generated at low cost, then we're back to just money. Honestly, one theory of advertising (like McD and bigC) is that they throw money at pretty ads to convince you that lots of people have given them money so they must be good... it's pay to play, but it's not completely wrong. Limiting reviews to those who have actually bought the item also helps, if you time limit/delay it and weight by the cost of shipping/stocking.


GPT3 is certainly a valid thing to consider, but I think it has some weaknesses (e.g. there's an uncanny valley to them in terms of balance between relevance, novelty and coherence) and even the most sophisticated GPT3 cannot, by itself, mess with every marker (e.g. volume, distribution of content quality, conflict of interest between having generated low reviews and the ability to doctor review numbers, nuances associated with weasel words in real text vs in training set, variance in emotions/tone/interjections, reviewer consistency, aggregator reputation, etc)


Makes me want to build an ML system to rate ratings. Of course that could escalate into an arms race.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: