
Why there isn’t an Apache Arrow article in Wikipedia - riboflavin
https://www.dremio.com/why-apache-arrow-wikipedia/
======
TallGuyShort
Having worked on commercially-resold Apache projects, can't say I argue with
Wikipedia a whole lot on this. It seems to me it should be let in, but it's a
bit silly to go and call them out like this on a corporate blog, IMO.

Dremio does benefit from Apache Arrow publicity and notoriety, even if they
don't profit directly. Having a de-facto standard data format and open-source
engines is a selling point for some. That's why Dremio explicitly calls it out
on their own website. It also never hurts in the recruiting department. (edit:
there's a reason the article was submitted by someone working in marketing &
strategy)

>> I’m wondering if Wikipedia can continue to be considered a reliable source
of information for technical folks who want to learn more about the vast
system of Apache open source software projects.

Sign up for the Olympics, because that's a hell of a leap. You didn't get your
page in, it's really not much of a reflection on the rest of Wikipedia. It's
an open-source project. It should have it's own freely available documentation
that fills much the same purpose anyway. If I want to learn about Apache X, I
go straight to x.apache.org. They concede that it's not an end-user product
anyway, so I'd think their key audience knows how to find an open-source
project website. Lower the bar too far the other way, and there are plenty of
semi-open-source project's marketing departments would be all over using
Wikipedia to their own ends - I've seen my own former employer do this for
their Apache projects.

~~~
duskwuff
> I’m wondering if Wikipedia can continue to be considered a reliable source
> of information for technical folks who want to learn more about the vast
> system of Apache open source software projects.

I'm confused why the writer thinks it _should_ be!

The Apache Foundation is a big tent. There's some clearly notable projects in
there (like Apache httpd), but there's also a lot of really obscure crap that
basically nobody outside ASF cares about (like Apache Creadur or Apache Pony
Mail). Expecting Wikipedia to document every Apache project is ridiculous.

Is this particular project notable enough for a Wikipedia article? I don't
know a lot about it, so I can't say for sure. But the article drafts that I've
seen don't convince me that it is.

~~~
lallysingh
> Expecting Wikipedia to document every Apache project is ridiculous.

Wikipedia has a lot of rather obscure entries. Long, long lists that I think
easily under-rank the Apache project.

I'm not saying you're wrong, but the bar for notability is kinda vague. Lists
of every episode of a series, every kind of kim-chee, etc.

------
mxfh
It was a pain to get _gitlab_ in 5 years ago after a "controversial" deletion,
so it wasn't available for simple undeletion. Domain specific knowledge has it
notouriously hard with wikilawyers who, at large, seemingly stopped adding new
things to their world view 15 years ago.

Then it becomes a game of jumping through hoops and hoping you end up with a
kind wiki-landlord or knowing a friendly wikipedia admin.

Doing the latter by anouncing your concern on social media and hoping a
sympathetic admin picks it up, might be the easiest on human time and
resources, just let them copy your reasonably well sourced article draft from
your personal space and see what happens.

~~~
sien
At one point about 10 years ago the entry for Atlasssian was deleted for not
being notable.

~~~
Geimfari
It was deleted in 2005 for being a blatant advert, and again in 2010 for the
same reason. I doubt it was actually a notable company in 2005.

[1]
[https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...](https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Atlassian_Software_Systems)

[2]
[https://en.wikipedia.org/wiki/Special:Log?type=delete&page=A...](https://en.wikipedia.org/wiki/Special:Log?type=delete&page=Atlassian&wpdate=2010-01-19)

~~~
mxfh
Even if it was a clumsy self-promotion or over-ambitious fans with no clue on
wikipedia inner mechanics shouldn't set back a viable interest on information
about a given company or other entitity by a multitude of years. After a
deletion it's just magnitudes harder for anyone to get an article restored,
compared to an entity which didn't have the "luck" to get added to wikipedia
too early. Deletion history shouldn't have that much of a say on actively
developing entities as it has now.

~~~
tptacek
The exact opposite thing is true. If the article is bad, it needs to not be on
the site. What's important are reliable articles, not how many articles there
are. It's perfectly fine for a topic we know will be more obviously notable in
the coming years to stall for an article until a decent one can be written.

This has been the ethos of the project practically since its inception. It's
always startling to see people questioning Wikipedia's premises, since it
seems pretty clearly to be one of the most successful volunteer projects in
the entire history of the Internet.

~~~
bjourne
That is not what Wikipedia's policies say. They say that if a topic full-fills
the notability criteria there should be an article for it. It does not say
that if an article is bad it should be deleted - rather the contrary - if an
article is bad, improve it!

This was the ethos of the project in the beginning but is not the ethos
anymore. People have realized how valuable it is for companies and other
actors to have their own article on Wikipedia. Therefore Wikipedians have
created a very bureaucratic system for deciding which articles should be
created. And people like to wield power. For example, by rejecting perfectly
good articles.

~~~
tptacek
This article was struck for not meeting the notability criteria, which
involves citing reliable sources that make a straightforward claim of
notability. It's not a perfectly good article.

------
tptacek
Open source projects are particularly tricky for Wikipedia. There are tens of
thousands of them. Their owners are often passionate. They compete with each
other, so there's incentive to write hard-to-adjudicate competing claims. Many
have commercial backing, which further warps incentives. The projects
themselves are highly technical; many, like Arrow, are software development
tools and components. There are few authoritative sources that reliably track
open source projects. Keeping up involves directly following bug trackers and
message boards and then synthesizing a narrative, which is the definition of
"original research", forbidden in the encyclopedia.

It's likely that Arrow does deserve a WP article. But Arrow's sponsors
misunderstand more about Wikipedia than Wikipedia does about Arrow. Writing a
defensible article about their project will require work; in particular,
they're going to need to spend the time tracking down authoritative sources
for why Arrow is notable, and those claims will probably need to be something
more persuasive than "hundreds of companies use it"; hundreds of companies use
all sorts of things that don't, and shouldn't, be featured in their own
encyclopedia articles.

I understand the impulse behind "this project is important; it should have a
Wikipedia article". But when you take a step back and accept what Wikipedia
actually is, rather than what you think it should be, you're left with the
question: do we really need to feature this particular piece of software in
its own encyclopedia article? 20 years from now, will people still be getting
value from it? Whatever value that might be, will it outweigh the 20 years of
other people's volunteer efforts to maintain the article, keeping it free of
vandalism and ensuring that it doesn't surreptitiously turn into a promotion
piece for some company or another?

The answers might be "yes". But I don't see much evidence in this piece
considered the questions.

Lots of things that don't seem deserving have in-depth Wikipedia coverage.
Many of those things probably really don't belong in an encyclopedia! But
there are two sides to this problem: the merit of the topic, and the cost, in
volunteer time, of including them. A marginal topic can be defensible if it's
easy to reliably cover it. A seemingly important technical topic might not be
if the only way to say anything interesting about it is to write original
research directly into its article.

 _Late edit_

A useful tip for getting your open source project covered in its own Wikipedia
article: don't have the Chief Marketing Officer of the company that owns the
project write the article.

~~~
ghaff
We're basically into the deletionist vs. inclusionist debate that is at least
somewhat orthogonal to what laypeople think of as notability. Is a Pokemon
character notable. Not really?? But because of the enthusiastic fan base tons
have been written about them.

On the other hand, whether you're talking open source projects beyond the big
names, corporate executives, or just people who are reasonably well known
within fairly large communities, there just isn't a lot of independently
sourced published material about them, especially in mainstream pubs--which
(somewhat both understandably and ironically) Wikipedia tends to prefer. You
even have people with tons of hits on Google but there isn't a ton of info
_about_ them online.

~~~
tptacek
What "debate"? This isn't a live debate. There is a faction of people, some of
whom are involved with Wikipedia, that want it to be something other than a
tertiary-source encyclopedia, just like there are people who want to be able
to write blog posts as Stack Overflow comments. It's true that they will never
stop advocating for these changes, but there's no evidence that the projects
themselves are going to cave.

~~~
ghaff
Maybe it's not a debate so much as a tension--and it's a real one. Personally,
I haven't contributed anything to Wikipedia in years. It's useful, I see its
flaws, but I certainly don't care enough to push on it for the most part.

~~~
tptacek
I'm exactly the same way. For instance: I did some writing about macOS
security in the macOS articles, way back when, and most of it got struck
because I couldn't cite it properly. It was frustrating to write a
straightforward statement, like "the macOS Seatbelt sandboxing mechanism uses
s-expressions", and have it get struck.

But I came quickly to realize the project was right. Without a reliable
_secondary_ source, I was effectively conducting research in the pages of the
encyclopedia. What I learned from that was: I shouldn't be writing
encyclopedia articles; the technical writing I do tends not to be tertiary.

It's fine – good, in fact – if most people don't write much in Wikipedia. It's
its own special thing. You can't argue with its success: it might be the most
successful project in the history of the Internet, and a long-term contender
for one of the most successful volunteer knowledge projects ever.

------
jccalhoun
>Arrow is designed to serve as a shared foundation for SQL execution engines,
data analysis systems, storage systems, and more – think Pandas, Spark,
Parquet, etc. Engineers across the community are working together to establish
Arrow as a standard for columnar in-memory processing.

I like to think I'm fairly techy for a non-programmer but I have no idea what
that means. That might be part of their problem if that is the description in
their wikipedia entry.

~~~
ggggtez
I believe it's what the kids call "buzzword bingo".

------
tetromino_
See
[https://en.wikipedia.org/wiki/Wikipedia:Notability](https://en.wikipedia.org/wiki/Wikipedia:Notability)
\- all you need to show is that Apache Arrow has received significant coverage
in reliable sources that are independent of the subject.

So: find conference papers/talks by people not affiliated with Apache or the
Apache Arrow project and that discuss Apache Arrow. Figure out how to
incorporate the tidbits about Arrow from those papers into the article text.
Add sources in footnotes. Done.

~~~
sb057
A version which was rejected included the following (non-Apache-affiliated
[afaik]) references:

[https://www.xenonstack.com/insights/what-is-apache-
arrow/](https://www.xenonstack.com/insights/what-is-apache-arrow/)

[https://link.springer.com/chapter/10.1007%2F978-1-4842-1311-...](https://link.springer.com/chapter/10.1007%2F978-1-4842-1311-7_5)

[https://www.biorxiv.org/content/biorxiv/early/2016/08/23/071...](https://www.biorxiv.org/content/biorxiv/early/2016/08/23/071092.full.pdf)

[http://delivery.acm.org/10.1145/3110000/3103003/p138-Maas.pd...](http://delivery.acm.org/10.1145/3110000/3103003/p138-Maas.pdf)

[https://www.theregister.co.uk/2016/02/17/apache_arrow_toplev...](https://www.theregister.co.uk/2016/02/17/apache_arrow_toplevel_project/)

[https://www.cio.com/article/3034279/big-data-gets-a-new-
open...](https://www.cio.com/article/3034279/big-data-gets-a-new-open-source-
project-apache-arrow.html)

[https://www.infoworld.com/article/3033446/hadoop/apache-
arro...](https://www.infoworld.com/article/3033446/hadoop/apache-arrow-aims-
to-speed-access-to-big-data.html)

[https://sdtimes.com/apache/guest-view-first-release-
apache-a...](https://sdtimes.com/apache/guest-view-first-release-apache-
arrow/)

[https://www.infoq.com/news/2016/12/le-dem-apache-
arrow/](https://www.infoq.com/news/2016/12/le-dem-apache-arrow/)

[http://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-
parq...](http://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-parquet-and-
orc-do-we.html)

[https://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-
par...](https://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-parquet-and-
orc-do-we.html)

~~~
tptacek
The 1st source is a blog post on a consulting company website.

The 2nd mentions Arrow only in passing, after several pages of coverage of
Spark; Arrow is covered only in relation to Spark. It's a reliable source but
doesn't clearly establish notability.

The 3rd mentions Arrow hardly at all; it's an implementation detail, mentioned
just once, in a paper about something else.

I can't fetch the 4th.

The 5th, a story in The Register, is reliable and probably does go towards
notability, though it seems to sort of argue against it (the gist of the
article is that it's surprising that Arrow has been made a top-level project
at all).

The 6th, in CIO, is a recap of a press release. Trade press PR recaps
shouldn't be WP:RS, but WP will often accept them, or would when I was
patrolling AfD; it's luck-of-the-draw. The admins who shot down Arrow's page
were smart enough not to accept it.

The 7th, in InfoWorld, is promotional as well, but it's at least written in
some depth. It's a straightforward notability claim. The Arrow article should
draw more clearly from it, in the opening paragraph.

The 8th, in SDTimes, is written by someone affiliated with the project itself;
it's citable, but WP probably won't accept it independently as grounds for
notability.

Same, in effect, for the 9th, which is just a recap of an interview with the
project author.

The 10th and 11th are just blog posts. They're citable if they're not
contentious, but they usually won't be acceptable as WP:RS for notability.

~~~
bjourne
Blog posts are prima-facie evidence of notability. Same thing with mentions in
published articles. From the book (second link):

"Recognizing that Value Vectors meet the needs of other data processing
engines, in February 2016, the Apache Software Foundation announced Apache
Arrow as a top-level project, bypassing the standard Incubator process.
Committers to the project include developers from other Apache projects such
as Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas,
Parquet, Phoenix, Spark and Storm.

Apache Arrow enables execution engines like Spark to take advantage of the
latest operations included in modern processors, for fast analytical data
processing. Columnar layout of data allows for better use of CPU caches by
placing all data relevant to a column operation in as compact of a format as
possible. ...

Apache Arrow software is available under the Apache License v2.0.

Dremio, a startup led by Jacques Nadeau, chair of the Apache Drill and Apache
Arrow Project Management Committees, leads the development."

In the past, this and the other sources would have been more than enough to
establish notability. I know that because I have created Wikipedia articles on
subjects much less notable than that. The problem for Apache Arrow isn't that
it isn't notable enough, it is that people have already tried four times to
get it included in Wikipedia so the Wikipedians voting on new page inclusions
are getting suspicious about it.

~~~
tptacek
If you want to sum up something like 10 years of debate and consideration of
the role of blogs as sources (it’s much more complicated than that they’re not
allowed) by saying, in effect, “you’re all wrong”, well you do you.

~~~
bjourne
I'm merely saying that you are wrong. Blogs are not always reliable sources in
the Wikipedia world, but they can absolutely be used as evidence for
notability.

~~~
tptacek
Not routinely, and not most blogs. As you can clearly see from the admin
comments _on this Arrow post_.

~~~
bjourne
Yes, routinely. You can find plenty of articles which had much less support in
sources when they were created here
[https://en.wikipedia.org/wiki/Category:AfC_submissions_by_da...](https://en.wikipedia.org/wiki/Category:AfC_submissions_by_date)
That Wikipedians rejected the article is a moot point because the argument is
that the rules are not applied consistently.

~~~
tptacek
Blogs are _not_ a consistently reliable source, particularly for notability
claims. It depends on the subject and on the blog. I'm not making this up; I
spent a year doing AfD patrol, and this was probably the most frequently
debated point in AfD arguments.

Obviously, they can't always be WP:RS, because then literally everything would
be "notable", since anyone can stand up a blog about anything. You can't even
logically assemble the argument you're trying to make.

~~~
bjourne
I didn't claim that blogs were consistently reliable sources. I claimed that
they were _routinely_ used as evidence of notability. Evidence of notability
!= Reputable source.

I'm not making anything up either; I have penned several articles on Wikipedia
and gotten them through the AfC process with much less notability evidence
than the Apache Arrow draft had. The difference was that I used to be an
established contributor so the rules were not as harsh against we as they are
against newbies and unknown contributors.

Also, you can look at the link I gave you and see that the notability rules
are not uniformly applied.

------
xibalba
As a strategy for getting Dremio on the front page of HN and thus on the radar
of a large group of tech people (i.e. Dremio's prospects), this is article is
very clever.

As a critique of Wikipedia, not so much.

~~~
jessaustin
This seems like a good reason to flag it.

------
Ninjaneered
Here's the link to the draft:

[https://en.wikipedia.org/wiki/Draft:Apache_Arrow](https://en.wikipedia.org/wiki/Draft:Apache_Arrow)

And some possible additional sources:

* [https://www.forbes.com/sites/forbestechcouncil/2019/09/24/dr...](https://www.forbes.com/sites/forbestechcouncil/2019/09/24/dremio-helps-reduce-proximity-to-your-data/amp/)

* [https://www.businesswire.com/news/home/20180906005114/en](https://www.businesswire.com/news/home/20180906005114/en)

* [https://thesiliconreview.com/2016/02/apache-arrow-is-the-new...](https://thesiliconreview.com/2016/02/apache-arrow-is-the-new-open-source-project-for-big-data)

~~~
tptacek
The first article is a paid promotion piece, which WP won't accept as an RS.

The second is a press release by Arrow's sponsoring company, which, obviously,
WP won't accept as an RS.

I have no idea what "The Silicon Review" is; this is the first time I've ever
seen it. To the extent it's not a pay-to-play trade publication, it might
qualify as a notability-establishing source. The fact that the "Review" does
not itself have a WP page might make it harder to claim it's reliable, since
it suggests nobody else knows what it is, either.

~~~
Ninjaneered
Looks like my lateral reading was sub-par (actually I didn't even try, just a
quick Google/post).

The "Silicon Review" one looks like a pay-to-play as well after further
review, it's used in citation on a few other Wikipedia articles, but as far as
I can tell, and due to some anecdotal stories, it doesn't look good.

* [https://www.reddit.com/r/PublicRelations/comments/bha6hs/sil...](https://www.reddit.com/r/PublicRelations/comments/bha6hs/silicon_review_emailed_me_about_an_offer_to/)

* [https://arpr.com/blog/4-pay-for-play-scams/](https://arpr.com/blog/4-pay-for-play-scams/)

Good catch, thanks for spending the time to review my links. Reading your
comments above, I largely agree. It's a high bar (mostly) to get an article on
Wikipedia, and that's a good thing. It allows us to read the majority of
content on Wikipedia without too much suspicion.

------
qwerty456127
Once I witnessed awesome articles [others added and I used with delight] on
open source frameworks as well as some minor facts [I added] on other subjects
deleted for being "insignificant" I decided I'm not donating to Wikipedia
until this bullshit ends.

Wiki articles are not videos, they take humble disk space to host so I can't
recognize any reason in dismissing "insignificant" information other than a
stupid rule.

IMHO whatever can be considered a piece of knowledge should be there.

BTW nearly the same applies to StackOverflow - thanks to high reputation
points I earnt during the early days I can see deleted questions and answers
and I often see really interesting (having three-figure upvvote scores and
dozens of stars) questions and very informative (also heavily upvoted) answers
deleted.

------
oefrha
[https://en.wikipedia.org/wiki/Draft:Apache_Arrow](https://en.wikipedia.org/wiki/Draft:Apache_Arrow)

> REVIEWERS: Please note that the submitting editor is the chief marketing
> officer and vice president of strategy at this company.

Yeah, sorry, big no no there.

Disclosure: consider myself a Wikipedian to some extent, got a couple hundred
edits on Wikipedia.

~~~
xeeeeeeeeeeenu
As long as you abide by WP:N, WP:NOR and WP:NPOV, writing articles about
yourself is perfectly acceptable on Wikipedia and doesn't break the rules.

~~~
oefrha
You may want to review
[https://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest](https://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest).
Acceptable — sure, sometimes. Perfectly — no. It’s a strongly discouraged
practice.

------
mistrial9
sadly, I can jump in on the "Wikipedia fails" train here, also. In about five
attempts to really change an article (different ones) in about five years,
every single change was rejected, as far as I know. The changes were
different, one was writing style and order of facts on a public historical
event in this century; one was adding a lot of detail to the description of a
popular fantasy fiction series; one was removing a controversial and
provocative one-liner at the top of a page about people at the edge of
(western) society; and another .. hmm I forget now, because I just gave up !

My aging colleague tells me, just keep doing the changes, they cant stop
everything. However, my direct (and limited) experience is.. they do stop
everything (that I try). I was logged in twice and used anonymous three times,
and added citation a bit, too.

To the point of the article, FOSS projects in wikipedia ? hmm maybe there
could be a clear category for that ? software projects _are_ proliferating
rapidly.. dunno

~~~
wffurr
Did you read the reasons for rejection and try to modify subsequent
submissions to better comply with Wikipedia's published guidelines?

~~~
mistrial9
yes, I did, and I feel that this revert behavior was more hazing/article
control than substantive in all cases but one, and that one I dont personally
agree.

~~~
ErrantX
Have you got an example?

------
pradn
It looks like there aren't enough independent, non-commercial articles to use
as references. This is somewhat common for many newish technical projects. Add
some academic papers, some usage numbers, some summary blog posts that aren't
related to the project. Wiki editors are very suspicious of people from
companies editing articles related to their work.

~~~
PeterCorless
I'm not so sure it's falling afoul of 'not-notable' so much as WP:COI.

------
mikl
Why do you care about having a Wikipedia page for Arrow? Why is it important
enough to whinge about on HN?

Wikipedia is much like Stack Overflow these days, the community has become
hostile to newcomers who fail to meet their somewhat arbitrary but very
exacting standards for what is allowed on their site.

Fortunately, you can just publish your own web site. No need to be bothered
about not being on WP.

------
dredmorbius
For those who think that edit wars, content disagreements, and innacuracies
are any special realm of Wikipedia, they're not.

One of the best examples I've encountered demonstrating this is a 19th century
edit revision war between the British and American publishers of Chamber's
Encyclopaedia, on the topics of Free Trade, Protection Duties, Slavery, and
certain salacious particulars concerning His Royal Highness, the Prince of
Wales.

[https://old.reddit.com/r/dredmorbius/comments/4xe2k1/chamber...](https://old.reddit.com/r/dredmorbius/comments/4xe2k1/chambers_encyclopaedia_editorial_statement/)

What's novel concerning Wikipedia is that these disputes (as with those of
free software vs. proprietary software) tend to occur, or at least leave
significant evidence, in the open public record.

------
thrower123
The hard-line Wikipedia deletionists should be deleted themselves. The
argument is always brought up, like StackOverflow, that they have to be
ruthless or it turns into an Eternal September dumping ground of garbage, but
the quality is already very uneven and gatekeeping like Cerberus doesn't help
further that goal. There's already a toxic Dead Sea effect where the pedantry
and politicking has chased out a lot of people that would contribute; who the
hell wants to bother putting in some hours writing something up if it is just
going to be summarily deleted?

Bandwidth and hard drives are cheap.

Just spitballing, but it'd be nice if Wikipedia worked a little more like
Linux distro repositories. Keep the tightly curated articles in a "core", but
leave room for "community" or "nonfree" collections if you want to turn them
on.

~~~
CharlesColeman
> Just spitballing, but it'd be nice if Wikipedia worked a little more like
> Linux distro repositories. Keep the tightly curated articles in a "core",
> but leave room for "community" or "nonfree" collections if you want to turn
> them on.

I think that's a fantastic idea, especially if it would lead to a drastic
reduction in the number of articles served from the main Wikipedia domain (to
a number that can meet some reasonable quality and maintenance standard, maybe
10 times the size of the most comprehensive print encyclopedia, or a 1/6 of
Wikipedia's current size) [1].

[1] [https://newrepublic.com/article/101795/encyclopedia-
britanni...](https://newrepublic.com/article/101795/encyclopedia-britannica-
publish-information): "The 2002 Britannica contained 65,000 articles and 44
million words. Wikipedia currently contains close to four million articles and
over two billion words..."

------
julianlam
The whole concept of "notability" in Wikipedia-land is subjective as hell.
Whether your article makes it in is simply a matter of rolling the dice the
first time you submit the article.

I created an article for NodeBB, a piece of forum software used worldwide by
companies small and large (including several triple A gaming companies). We
got AfD'd, and now every time someone creates an article for NodeBB, the AfD
is brought up and the entire discussion ends as soon as it has begun.

We even created an article the _suggested_ way, by submitting a draft for
review. It got reviewed alright... instant rejection because they felt it
looked like an ad. We made changes, but nobody ever took a second look at the
article.

Of course, a number of defunct open-source (and some proprietary) forum
softwares with zero sources are still allowed on Wikipedia, simply due to the
fact that they made it through when nobody was looking :)

One could argue that we shouldn't be writing our own articles (and they'd be
right), so we just quietly accepted our judgement and market NodeBB based on
the merits of the software, instead of whether it appears in some arbitrary
ranking of forum software.

That said, it'd still be nice if we were listed in the Wikipedia list of forum
softwares.... _sigh_, a guy can dream.

~~~
zamadatix
From your own description it doesn't sound subjective as much as understaffed.

------
jabvigWe
Add it to the Free Software Directory!

[https://directory.fsf.org/wiki/Main_Page](https://directory.fsf.org/wiki/Main_Page)

------
aaron695
View the original declined drafts here -

[https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arro...](https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arrow&action=history)

Geez if you want to use Wikipedia as an ad, put a bit of effort in, when did
marketing become so lazy and blame the platform.

Although this meta ad is possibly a far better payoff.

------
michelpp
We're having the same issue getting the GraphBLAS API article to be accepted:
[https://en.wikipedia.org/wiki/Draft:GraphBLAS](https://en.wikipedia.org/wiki/Draft:GraphBLAS).
At first it was summarily deleted overnight, now we're stuck in Draft for who
know how long.

------
nanoscopic
This reminds me of my "war" to get an article for my parser XML::Bare.

There was a time when there was a comparison page for XML parsers, and many
parsers had articles.

Still existing parsers on Wikipedia that should be removed; if they are to
stay true to their war on having useful software info in Wikipedia:

[https://en.wikipedia.org/wiki/Category:XML_parsers](https://en.wikipedia.org/wiki/Category:XML_parsers)

The original argument was that if you can find a citation in print you can
have whatever it is on Wikipedia, but that ceased to be true years ago and it
has become a popularity contest and power struggle with obnoxious Wikipedia
editors.

------
scarejunba
[https://en.wikipedia.org/wiki/Draft:Apache_Arrow](https://en.wikipedia.org/wiki/Draft:Apache_Arrow)

This reads like it was written by the guy who wrote it. It can do this. It
efficiently does that. It’s all promotional content. Not useful.

------
ggggtez
I've never heard of it. Add my vote to removing the article.

Cry more, company I never heard of either.

------
lmeyerov
For context, some other companies contributing to it are in the GPU space, so
orthogonal to CPU-centric Dremio: Nvidia, Blazing SQL, and Graphistry (us).
Likewise, the pydata big guns intersect a bit here: conda, pandas, ... . This
effort got a BOSSIE award for GPU dataframes this year and is taking off now
that it is becoming usable for more than just framework devs. The reason we
all really on it is because a standardized columnar IO streaming format is an
awesome idea for compositional HPC.

It does sounds like maybe Dremio's CMO wrote the original articles and it came
off centered on them? (Did not have a chance to read.)

~~~
tptacek
Yes: Dremio's CMO wrote the article, and the article was overtly promotional.
Of course it got killed.

~~~
TallGuyShort
And the same guy submitted it here 3 times. Also overtly promotional.

riboflavin, genuine suggestion: ask one of the PMC members or committers to
rewrite the article from scratch from an engineer's perspective, source
everything, demonstrate notability, and resubmit. If they still don't take it,
move on with your life. ... but you might have generated a lot of ill-will
with the Wikipedia elites here already.

------
est31
Hmmm this reminds me of the battle to get a Wikipedia page approved for
Minetest, the biggest FLOSS voxel engine out there:

[https://en.wikipedia.org/wiki/Draft:Minetest](https://en.wikipedia.org/wiki/Draft:Minetest)

[https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...](https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Minetest)

------
ForHackernews
I've literally never heard of this piece of software, and it's fair to say I'm
much more interested in FLOSS than the average person on the internet. Why
should this thing have its own article and not just appear in a list of Apache
foundation projects?

~~~
m_ke
Because it's actually a pretty big deal for the (python) data science
ecosystem.

------
aabbcc1241
You're free to post to anywhere; And each site admin/helper/whatever-title are
free to do their own censorship or moderation.

It's the nature of the web.

------
ksec
It was the same with 802.11ax aka WiFi 6.

Someone decided all the technical information on the subject are irrelevant
and deleted all Data Rate and Technical Improvement section. Another reason
was because those details were not finalised.

While it was a little frustrating that those useful information were gone as
one could always found those in other source and media, but they also deleted
the whole section on DensiFi [1], where all the major companies ( Apple,
Broadcom, Cisco, Intel, Qualcomm, Huawei, Samsung and others ) behind the
802.11ax decided to do the work behind close door. TL;DR They were trying to
push 802.11ax to the market earlier despite of all the un-resolved issues.

So I decided to add only the DensiFi section, and it was constantly being
deleted within 24 hours. After a few weeks of fun the page simply got back to
the original, where Data Rate and Improvement are back but DensiFi section is
totally gone. So it turns out it wasn't the technical section they were trying
to get rid of.

P.S We should be glad someone in the working group discovered this and called
out on the action. The current WiFi 6 / 802.11ax situation and UX is much
better than what we had when 802.11ac were shipped. Although this is at the
expense of somewhat 2 years delay of the standard.

[1] [https://mlexmarketinsight.com/insights-center/editors-
picks/...](https://mlexmarketinsight.com/insights-center/editors-
picks/antitrust/north-america/doj-probes-role-of-special-interest-group-in-
new-wifi-standard)

------
foota
Can HN create a draft that would be accepted? :)

~~~
thebooktocome
Nope, this violates Wikipedia's policy against "meat puppetry".

~~~
ErrantX
Not at all! The main thrust of that policy is for discussions. Improving
Wikipedia by conspiring to write a compliant article is explicitly allowed!

~~~
thebooktocome
Tell your deletionist editors that.

------
cdeil
I think the reason this is discussed now is because yesterday I tried to re-
submit the Apache Arrow article. Here's what I wrote:
[https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arro...](https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arrow&oldid=931353298)
It was rejected / reverted 10 minutes later by a Wikipedia editor. The blog
post from Justin was in July 2019 ([https://www.dremio.com/why-apache-arrow-
wikipedia/](https://www.dremio.com/why-apache-arrow-wikipedia/))

There's many interesting and good points in the discussion here, thank you!

To add my 2 cents:

\- Apache Arrow is notable, deserves a Wikipedia page. It might not have been
when someone first tried to create a Wikipedia page for it in 2017 (see
[https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arro...](https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arrow&action=history)),
but in the three years since it has become a major project, see e.g.
[https://blogs.apache.org/foundation/entry/the-apache-
softwar...](https://blogs.apache.org/foundation/entry/the-apache-software-
foundation-announces46) Notability is clearly subjective, depends on what the
author and reviewer find interesting. In the variant I submitted yesterday I
tried to make it clear why it's notable - Apache arrow is a standard format
that connects different languages, runtimes, data systems, communities, e.g.
the Python and Java data communities. See e.g.
[https://wesmckinney.com/blog/apache-arrow-pandas-
internals/](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) \-
Apache Arrow is to my knowledge partly the brainchild of Wes McKinney, creator
of pandas, it's his attempt (looking strongly like success) to resolve a major
issue in data science. \- I think it's a good point Justin made at
[https://www.dremio.com/why-apache-arrow-
wikipedia/](https://www.dremio.com/why-apache-arrow-wikipedia/) that it's bad
that Wikipedia editors reject articles on stuff they know nothing about - if
you look at their profiles, they don't seem to have any knowledge or interest
about technology or software. That's not a good system. \- I haven't
contributed to Wikipedia really before, and I don't understand the rules, I
admit that. Probably what I did yesterday was just not following their
process, and that's the reason my edit was reverted. I guess it's also true
that Justin at first didn't do a great job at submitting an impartial, non-PR
article. However, my understanding from looking at some drafts and the talk
page is that he then took the editor comments into account, and the last
variant of the page he tried to submit in July 2019 was OK. \- So overall I
think the answer to the question "Why isn't there a Wikipedia page on Apache
arrow?" is that it's an unfortunate case of authors and editors not doing a
great job. At least I'm pretty sure I didn't do a good job yesterday, I wanted
to help, but only had an hour, not a day to learn how Wikipedia ticks and to
do more research to find better references. I hope someone with more
experience in Wikipedia and Arrow will try to re-write and re-submit the
Wikipedia article in the future. \- The rule to discourage (or forbid?) people
involved with Apache Arrow from contributing to its Wikipedia page is
unfortunate. I recently started to use it and learn about it, but I don't know
much about it at this point. E.g. Wes McKinney has written at this point 8
high-quality blog posts about it
([https://wesmckinney.com/archives.html](https://wesmckinney.com/archives.html))
- those don't count as references? Even if he or the Apache Arrow team wrote a
paper about it, it wouldn't count because it's a primary source, and Wikipedia
only wants secondary sources to establish notability? There are ~ 100 videos
on YouTube, and many blog posts and a few podcasts (e.g.
[https://softwareengineeringdaily.com/2016/07/17/apache-
arrow...](https://softwareengineeringdaily.com/2016/07/17/apache-arrow-with-
uwe-korn/)) that mention Apache Arrow. Naturally almost all of them are from
Apache Arrow contributors, or from companies using Apache Arrow. \- Apache
Arrow has an interesting story, and it has evolved over the past years and
will keep evolving, so I think exactly for that reason a Wikipedia page would
be good to have, since the current project page and old blog posts don't
capture that well.

------
zeveb
One could perhaps be forgiven for wishing that the deletionists would … delete
themselves.

Seriously, though, bytes are cheap, and an article sitting somewhere in
Wikipedia doing nothing and bothering no-one is pretty damned cheap too.

~~~
SpicyLemonZest
I dunno, this kind of thing seems like exactly the canonical argument for
deletionism. Maybe there's no cost to a page sitting on Wikipedia describing,
like, some guy's special attack from Naruto. There are reasonable arguments
that allowing things like that would set a bad precedent and encourage
behavior that doesn't help the project, but I admit it's pretty tenuous.

There are obvious and important costs if Wikipedia articles start being
perceived as promotional material rather than encyclopedia entries.

~~~
CharlesColeman
> Maybe there's no cost to a page sitting on Wikipedia describing, like, some
> guy's special attack from Naruto.

There is a cost, but it's measured in hours of maintenance labor not bytes of
storage.

If Wikipedia wants to maintain a semblance of accuracy [1] in the face of
declining participation, it needs to concentrate its labor resources rather
than spread them out.

[1] which IMHO is vital given its unwise prestige as arbiter or truth

~~~
sgift
Since Wikipedias concentration of labor itself is a source of declining
participation[1] it's doubtful that continuing this behavior will result in
something else than a death spiral with even fewer people ready to do the
work, more concentration, even fewer .. and so on.

[1] Among other reasons, more here:
[https://en.wikipedia.org/wiki/Wikipedia:Why_is_Wikipedia_los...](https://en.wikipedia.org/wiki/Wikipedia:Why_is_Wikipedia_losing_contributors_-
_Thinking_about_remedies)

~~~
CharlesColeman
> Since Wikipedias concentration of labor itself is a source of declining
> participation[1] it's doubtful that continuing this behavior will result in
> something else than a death spiral

I'm not as interested in the viability of Wikipedia's culture than the
reliability of Wikipedia as a resource given its prominence. I'd take a dead
Wikipedia over one that's lively and fun but full of crap and poorly-checked
influence attempts.

It's never going to recapture its halcyon days, so it's going to have to
evolve with the times in more ways than one.

