
An empirical study of obsolete answers on Stack Overflow [pdf] - fogus
https://arxiv.org/abs/1903.12282
======
awinter-py
UX for fact checking, arbitration, correctness + staleness will be critical to
the survival of information societies

SO not having a good way to retire these is the same issue as twitter's 'get
the facts' banner

IMO wikipedia is the only organization that's solved it, and my sense from the
outside (I've only written 1 article) is that it's with editor-gatekeepers,
not with tools (though there are some bots)

~~~
swalsh
Wikipedia demonstrates that an authoritarian governance model is more
functional than essentially anarchy. But that does not mean we should settle
for it. Wikipedia is known for its restrictive culture. A lot of good
information does not make it past the guards. We can probably do better.

~~~
DaiPlusPlus
I remember when Wikipedia first started cracking-down on “Trivia” sections.
And yet sites like Everything2 still died.

What if Wikipedia allowed articles to have two versions: the normal, policed,
articles we have today - and adding a “post anything, as long as you provide
citations” version for sticking all the trivia and extra details that would
normally not be allowed on a main Wikipedia page.

~~~
bawolff
What would be the benefit of that over just having a different site dedicated
to that.

I think there is plenty of room for all types of sites on the internet, but
they don't have to all be hosted by the same group.

------
payne92
What are the issues with having a "mark obsolete" flag that users can check?
(with an optional comment)

At a minimum, that would be an input to the presentation ranking -- old,
flagged items would drift to the bottom.

Long-tail "Floatsam and jetsam" content is a huge problem, generally, not just
for software development information.

~~~
lucb1e
General agree with the "mark obsolete" button, but I don't think the comment
should be optional. If it's optional, you could mark anything as obsolete and
you shift the burden of proof to the author (who may be long gone) or some
community member to jump in, which sounds ripe for abuse to me. Rather, you
should add why this is no longer current and let people with enough rep points
verify it, similar to how editing someone else's post goes into an edit queue
if you don't have enough reputation.

Might be relevant to mention that I'm quite active on the security
stackexchange and regularly review the suggested edits queue (we don't have a
constant backlog like stackoverflow does). Feel free to point out if you think
this is not a nail for my hammer.

~~~
jrumbut
I think for obsolete, you should give a link to the updated version rather
than a comment (like duplicate is today) or it could be a vote that is
balanced among other signals rather than a cause for deletion or deep
archiving (since obsolete systems maintainers need help too).

I tend to think the real problem is the overly strict conception of duplicate.
Over time, the way people will ask a question and the way people will answer
it changes.

5-10 years ago almost every JS question was a jQuery question too, now not so
much. As someone who lived through that I can very easily translate to the
less jQuery-centric present, but someone who started learning JS/React last
week can't. A new rendition of such a question/answer would be a duplicate for
me, but the old one would be obsolete to the new developer.

I think the best way forward is that both duplicate and obsolete should be
soft signals rather than reasons for closing.

~~~
crispyambulance
> I tend to think the real problem is the overly strict conception of
> duplicate.

It's true, people are so damn trigger-happy marking questions as duplicates.

I've seen new, well-posed questions on up-to-date frameworks get marked as
dupe because 10 years ago someone asked a related question on an obsolete
tech. The reason it was marked as dupe was simply because someone took it upon
themselves to write up a sprawling smug "canonical" answer to a shitty old
question that happened to cover the new subject matter.

It's much better to keep the old questions and answers, to just answer each
question (and no more), and to create new questions as needed. Why not? it's
not like they're running out of disk space.

I think the solution here is to encourage specific answers to specific
questions, let folks sort out the historical minutiae based on timestamps and
subject. Anything more elaborate is asking for mix-ups and confusion.

~~~
misnome
> The reason it was marked as dupe was simply because someone took it upon
> themselves to write up a sprawling smug "canonical" answer to a shitty old
> question that happened to cover the new subject matter.

This seems an optimistic view. What feels an awful lot of the time it just has
some of the same outside appearances, and is a different question completely,
but the people marking it as dupe don’t read it carefully enough, or don’t
know enough about the subject to realise the differences.

------
kevin_thibedeau
There doesn't seem to be any remedy to the problem of poor, obsolete, or
outright wrong answers being selected as the checked answer when the asker
disappears. I frequently have to scroll past the approved one because there's
often gold hiding below it.

~~~
MrZander
I ran into this yesterday. I got an upvote on an answer I posted 7 years ago
that was marked accepted. There was a much better answer with 3x the upvotes
at the bottom of the list and the OP account was no longer active.

Wondering if there was a protocol for changing the accepted answer, I searched
meta. I found the consensus is: 'Accepted' is at the sole discretion of the OP
and shouldn't be mean it's correct, just that it answered the OP's question.
Which I think is BS as it elevates the answer to the top of the list and gives
it credibility.

Why is the OP the czar that chooses the correct solution just because they
asked the question first? Honestly, the OP is often the _least_ qualified
person to validate an answers correctness.

~~~
sgillen
Sort of think there should be two systems, and OP selected answer, and a
community selected answer. This is already what happens to some extent, it’s
just a matter of UX I think to clearly mark the answers the community thinks
are good with something akin to the green check you get (and I guess more
importantly move these to the top)

I’m not sure this matters too much for experienced devs, it’s really just a
noobie trap to only look at a low upvote but chosen answer right?

~~~
MrZander
I like the two system approach, that is a good idea.

Perhaps it is just a matter of properly ordering the answers, maybe giving
community votes precedence over the accepted answer?

For example, here is the question I was talking about:
[https://stackoverflow.com/questions/11970586/apl-removing-
el...](https://stackoverflow.com/questions/11970586/apl-removing-elements-
from-array/12048902)

My answer should not be accepted, I was brand new to APL when I answered it.
The "correct" answer not accepted and buried below two 0 vote questions.

The thing is, my answer _works_, just not well. I can easily see myself
overlooking a better solution in a case like this. Maybe I'm just lazy though
haha.

------
pelasaco
that's something was long discussed on Meta, specially to answer the question:
"Is it fair to down-vote answers which were right in 2009, but aren't right
anymore?" and the answer was "Yes, if you want to keep your points, you should
maintain your answer as long as it exists" which is a kind of no-go for
somebody like me with more than 14k points and more than 500 answers...

~~~
thomascgalvin
> "Yes, if you want to keep your points, you should maintain your answer as
> long as it exists"

This is just one more example of the absolutely toxic culture within Stack
Overflow itself. _Everyone_ uses it, but we all get to it from Google.

Nobody I know bothers answering questions, and I don't think more than a half-
dozen people I know have even submitted questions. If you do, the gatekeepers
are going to jump down your throat.

~~~
NateEag
I've answered a few questions casually on StackOverflow, when I ran into
something that was unanswered and I knew what the answer was, but I've never
invested seriously in it, as you can see from my profile:

[https://stackoverflow.com/users/1128957/nateeag](https://stackoverflow.com/users/1128957/nateeag)

I've also never had a gatekeeper jump down my throat.

I think you are overstating how bad SO's culture is.

I have definitely seen problems there, but they aren't the whole of the story.

(Though SO itself has entirely lost my goodwill due to the Monica Cellio
incident: [https://meta.stackexchange.com/questions/342039/firing-
commu...](https://meta.stackexchange.com/questions/342039/firing-community-
managers-stack-exchange-is-not-interested-in-cooperating-with/342950#342950) )

------
staycoolboy
The call to action, section 5, is exceptional for an academic paper. I'm
surprised the first suggestion doesn't already exist on S.O.

Honestly this is my biggest complaint with the web in general: immortal anti-
information. Proposing analysis and strategies for combatting it on curated
platforms is a great first step.

I think also implicit in this discussion is the role of the readers to vote
with their mouses, so to speak. Without feedback from users, the mechanisms
can't work effectively. Which is why I try to upvote as much as reasonable on
HN and SO.

~~~
boomboomsubban
>Without feedback from users, the mechanisms can't work effectively.

How positive are you that your knowledge is current before you vote? Users
will confirm the information that they know, which is just as likely to be
outdated.

~~~
staycoolboy
Good question. All I got is ... law of averages? ;-)

Damn.

------
j4ah4n
I kind of feel the same issue is being expressed in search engines as well. As
time progresses, more relevant answers are moving down the list. Using Google
for example, I'm finding that I have to employ the "Tools -> Time Range"
filter to get better, more relevant results.

------
ape4
It needs to be versioned. Maybe you are working in an environment with an old
C++ compiler that can do the latest C++ tricks. You want best practices from
state of the art 10 years ago.

~~~
the_jeremy
I have answered >100 questions on SO because I like answering questions. I
would answer significantly fewer if I was required to give the range of
versions that my answer worked on, because I only know that it works in my
environment. If I could give just one version, then others would have to keep
asking if my answer was best practice on _their_ version. I suppose listing at
least one version it works on is better than the current setup, though.

~~~
asdff
The current setup is entirely obtuse. I suck at python. Sometimes I find and
answer that solves my problem, and low and behold, it's python 2 and I have to
read up on the changes to that function between the two versions and see if I
can even use the answer as written. If there was just some little tag in the
op question (Python=2.7), that would save me hours. Multiply that anecdote by
the mountains of novice traffic the site receives, and you can see how a lack
of versioning makes a lot of answers worthless for a lot of people.

For your concern, if op said they needed help with version 2.7, presumably
you'd write your answer with that in mind. Or you would say how you got it
working in 3.8, and exactly how you set up your environment. If someone asked
you about another version, you can throw your hands in the air and say "I
don't know, but it works in 3.8 with these packages installed," and that would
be a perfect response that shows others how to reproduce your work.
Reproducibility should be standard practice, and you shouldn't be reliant on
dubious context and dates and guesswork to reproduce an example in a website
devoted to technical help.

------
ringshall
It might be helpful to have the age of comments listed as part of their
metadata, ie alongside the date the comment was posted. Some formatting could
be added (eg red highlight for comments > x years).

I know this sort of feature is useful on newspaper websites - The Guardian
will flag stories older than some limit as being potentially out-of-date.

~~~
lucb1e
> It might be helpful to have the age of comments listed

That's... currently the case?

> red highlight for comments > x years

I don't find age has a 1:1 correlation with it being outdated. If some advice
doesn't make sense to me, I'd look at the dates of this and other answers,
because most often there will be newer answers (lower voted because they
haven't existed as long / aren't seen as much) and/or comments added to the
answer indicating how to do it in python3 or whatever the new thing is.

Sometimes posts from 2009 help me, sometimes posts from 2018 are outdated.
_Maybe_ this could work if a time limit is configured per tag, but even then,
I expect it wouldn't be very helpful.

~~~
ringshall
>> It might be helpful to have the age of comments listed

> That's... currently the case

Is it currently the case? I'm not seeing it, though it I may be missing it.

I do see, at the bottom of comments, something like

:: edited Apr 23 '15 at 8:40 / answered Mar 11 '09 at 21:11

The date the question was /asked/ does have an age, though, which may be what
you're referring to. For the problem at hand, it's the age of the answers that
matters more than the age of the questions.

> I don't find age has a 1:1 correlation with it being outdated.

No, but there is a correlation.

------
elliekelly
I’m definitely in the minority here but when I’m learning a new language I
kind of prefer to use a slightly outdated resource. When you get stuck and
look for help you get the gist of the answer you’re looking for but not the
solution and then you have to figure out the rest. It’s like a hint that gets
you 50-90% of the way there.

When you’re in the middle of work and not actively trying to learn new
material I’m sure the obsolete answers are frustrating but I don’t think the
outdated information is entirely useless.

------
oblib
I agree with the conclusion of the study (and payne92's suggestion here) but I
haven't found obsolescence to be a huge issue because I generally take the
time to review all the answers provided and the comments on them. Often times
obsolete answers are noted in the comments.

But there is room for improvement.

------
andersco
This seems to imply that answers are binary, either obsolete or not, while in
my experience they are often are only partially outdated. How is that treated
in this study? Additionally, I find that for high traffic answers someone will
often have posted an update to the obsolete response.

------
minimaxir
Per the paper, they are using an SO archive dump from _2017_ , which is ages
ago in internet time, although admittingly the problems with SO comments
extend even before that.

It looks like the latest archive dump (March 2020) is available in BigQuery,
e.g.: [https://console.cloud.google.com/bigquery?p=fh-
bigquery&d=st...](https://console.cloud.google.com/bigquery?p=fh-
bigquery&d=stackoverflow_archive&t=202003_comments&page=table)

~~~
lucb1e
While I like the irony of this paper using an outdated dataset, I don't think
much changed on stackexchange since 2017 in this regard.

------
AndrewKemendo
How could SO incentivize the community to clean that up?

Given that SO's content is basically 100% volunteer/user generated and free to
access, it seems like the first step would be to allow users to flag obsolete
answers with a very visible and obvious UI element.

Maybe second would be SO fielding a team of experts as "pruners" that would
delete/update the flagged answers.

~~~
eitland
Or maybe we could just let content that doesn't receive upvotes sink towards
the bottom:

\- a fresh upvote is 1 point

\- one year old upvote is 0.5 point

\- a two year old upvote is 0.25 points

If nobody votes, answers relative positions stay the same.

Old answers aren't worthless but it also isn't impossible to lift updated
content to the tip later.

A person might vote for the same answer again after half a year, but it really
just refreshes the vote to 1 instead of adding more value to it.

I'm not a fan of deletionism and in stack overflows case I'm fairly certain it
has destroyed value for millions both in knowledge and reputation.

(Why? For years around 2009 - 2015 or something whenever you found a good
answer that solved your problem it would most probably be flagged for removal
or something.)

~~~
Avamander
This would not be implemented because you suggested something that makes high-
point users lose points.

~~~
eitland
_They_ will not do it.

They will also not allow duplicates, allow low effort questions - even if
others are lining up to answers those low quality questions.

Whoever wants to improve the world as much as stackoverflow once did - and
make a bunch of cash in the process could try some of thise ideas.

Someone will probably counter with the assumed fact that if there had been
money in it someone would have done to which I counter with the two economists
who went down the street, saw some money on the ground and walked straight
past it since "if the money was real someone would have taken it already".

~~~
wool_gather
There was an upheaval on the Stack network over the last 10-12 months; part of
the fallout is a lot of users pulling together on a non-profit take on QA:
[https://codidact.org](https://codidact.org)

Still in the early stages, but it's promising.

~~~
eitland
Thanks, looks promising.

The actual instances were somewhat hidden as far as I could see.

------
QuasiGiani
This of course(!) sounds as if like it'll like be quite good.

But. I am more interested in the inevitable corollarial follow-up:

 _An_ Thorough & Ignominious _Probing Investigation Into The Problematic
Presentation Of Pseudo-Intellectual Wankery On Hacker News_

------
darepublic
It would be nice to have answers marked as obsoleted and then a little
historical breakdown of the greater changes that have come about to make that
answer obsolete

