Wikipedia: a wild success and an utter failure

Wikipedia's technology is outdated and egotists have infested the contributor ranks. But it works. Here's a look at the two faces of Wikipedia.

Reading an old post by the wonderful Jason Scott about The Great Failure of Wikipedia and then a more recent one by Megan Garber titled Why did Wikipedia succeed while other encyclopedias failed? about research by Benjamin Hill and these pieces reminded me of something I’ve wanted to write for a long time, about how Wikipedia is a wild success despite its utter failure.

Hill’s big finding is that Wikipedia has become so popular because its founders didn’t buy into the if you build it, they will come meme, so they knew they had to write and edit and evangelize to get people on board instead of fetishizing technology.

And the evangelizing worked. I couldn’t live without Wikipedia anymore, it’s that good. Frankly, I’m so in love with Wikipedia that I’d gladly pay to keep it running, and so I did: last year, I donated. You should too.

Procedural whackjobs

But Wikipedia works its magic through what is actually a massively inefficient publishing process. Important stuff gets deleted by idiots, discussions about unimportant issues drag on, pages get deleted because they’re somehow deemed not notable enough, and so on. People get pissed off and nine times out of ten they’re right to be mad. Wikipedia has accumulated its fair share of self-righteous, power-hungry people with too much time on their hands.

Wikipedia works despite its guardians and community standards, not always because of them.

Luckily Wikipedia is such a big place that each subsection has grown its own contributor culture. As a senior during my undergraduate studies, I contributed pretty actively to Wikipedia, doing research about philosophical pragmatism and related articles. Surprise: I enjoyed it, it was great. The procedural whackjobs tend to leave literature, philosophy and science alone.

Share!

As a contributor, it’s more pleasant if you actually understand that Wikipedia has a very specific purpose, and that is to make all kinds of subjects and concepts understandable to a general audience, with a (good!) bias towards perpetuating the common wisdom and mainstream ways of thinking. This is Jason Scott’s old beef with Wikipedia and it doesn’t make sense to me.

If you have uncovered fantastic new information about a certain topic, you should write an essay or a book or a blog post and then maybe that will get incorporated into relevant Wikipedia articles.

If you have a particularly zesty way of writing and are worried that *-for-brain editors will eat it all up and regurgitate it as a bland soup, you’re probably right, and you should find another place to share your knowledge.

If you’re bringing a fresh new angle to a subject, a new way of thinking about things, for Pete’s sake, don’t waste it on Wikipedia.

You’ll find that you’ll feel much better about Wikipedia if you look at it as just one repository of knowledge, rather than as a grand unifying thing.

Create your own knowledge base, answer questions on StackOverflow and Quora instead of writing about it on Wikipedia, blog about your area of expertise, release your rights-free images on Flickr (I do) instead of Wikipedia.

The important part is that you share knowledge. Wherever you want, really, as long as we can find it.

Feeding the machine

Wikipedia is about people, not technology. But here’s a less charitable interpretation: Wikipedia has to be about people because it never cared about technology.

Editors manually create endless list pages, like all people born in 1603 or people from Rhode Island, because Wikipedia’s data model, viz. no data model at all, doesn’t allow these pages to be autogenerated from simple database queries. Same thing for disambiguation pages, figuring out which pages map to which translations, and linking broader topics together with more specific articles.

We’re all complaining about our crappy CMS but our misfortune pales in comparison to MediaWiki and the way it devours Wikipedia contributors’ cognitive surplus and cajoles them into repetitive manual labor that you figure, this being the 21st century and all, computers would do for them.

Wikitext

Then there’s wikitext, which once upon a time had a Markdown-like elegance but has now spiraled so out of control that most local or topical wikis fail before they’ve started: potential contributors take one look at the syntax, decide rocket surgery might be more within their cognitive capacities and run away before contributing even a single word.

Frankly, considering how hard it is for non-techies to write in MediaWiki, I’m surprised that a local city wiki like the Davis Wiki has ever gotten off the ground. It has certainly survived against all odds. And I’m not surprised they’re looking to get rid of MediaWiki.

The WikiMedia foundation has been looking for a Rich Text Editing software dev for a long time. They try, but I don’t know if they can truly solve much considering MediaWiki is such a decrepit codebase. All improvement is bound to be the electronic equivalent of dodging landmines.

Blobs and bots

Part of the problem is that Wikipedia and its engineers are introducing ever more (confusing) wiki syntax to cope with semi-structured data. Semi-structured data are things like a person’s birth date and current residence, anything that’s not a blob of prose. Structure can give content a second life in maps and timelines, and makes it easier to find what you need, like famous people from Rhode Island born after 1972.

Getting any of that good stuff out is really hard, which is why DBpedia — DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. — deserves so much kudos. Of course, if Wikipedia’s data model were anywhere near reasonable, creating an api.wikipedia.org wouldn’t take a separate project like DBpedia, instead it would be a good day’s work for a software engineer and that’d be that.

Wikipedia bots alleviate some of the drudge work by gardening and cleaning Wikipedia automatically while crawling through its pages. For example, many American city pages were created and are updated with new census information and maps without human intervention. Thank you, rambot. But these bots themselves are convoluted pieces of technology. Wikipedia’s data model means they have fudge raw text without stepping on anything real humans have written, which is not easy.

Bots help, just not enough.

Wikipedia is tragic that way: there has been no money, no strategy and no guts to take the software to the next level for ages now, so we’re stuck with a patchwork of fixes and tweaks on top of software that was already out of date when it was first released in 2002.

Inside the sausage factory

As a reader, you don’t really notice that developers have a hard time getting meaningful data out of this huge bank of knowledge, you don’t notice that professors and experts get frustrated fighting with nimwits about stuff those experts know inside-out, you don’t notice that many early contributors never return, you don’t notice the vandalism, you don’t notice how many people whose contributions we’d cherish are put off by that horrible, horrible wikitext syntax.

(What English readers also don’t notice is that local versions of Wikipedia, like the one in Dutch, are even more inconsistent in their quality than the English-language flagship. Wikipedia wins by sheer numbers, and when those numbers aren’t present, quality suffers.)

But here’s the thing: the common wisdom that garbage in means garbage out doesn’t actually apply to Wikipedia. In the Wikipedia model, you put in lots of raw material that’s decidedly less than perfect, but the stuff that comes out is actually damn tasty. In other words: Wikipedia is a sausage factory.

Wikipedia needs to knock out the bullies and improve their tech, because both are making Wikipedia less great than it could be. But while they do so, let’s also just take a minute to appreciate the enormous value of this thing that we’ve created, we, together, people from all over the world.

share on twitter

Wikipedia: a wild success and an utter failure debrouwere.org/4h by @stdbrouw

Stijn Debrouwere writes about statistics, computer code and the future of journalism. Used to work at the Guardian, Fusion and the Tow Center for Digital Journalism, now a data scientist for hire. Stijn is @stdbrouw on Twitter.