Git in the newsroom

written by 
A quick overview on why journalists shouldn't take version control principles and metaphors too literally.

I have no idea what it is, but every writer or budding techie in the newspaper industry who stumbles on Git and GitHub, or any version control system really — enjoying some of that computational thinking, are we? — suddenly goes “Oh. My. God. We should write stories like you guys write code in Git, with forks and branches and commits and issue tracking and history.”

It’s one of those wonderful moments when different fields of study come together and cross-pollinate.

Applying some of the best practices of the IT world to journalism is actually a good (if half-baked) idea. But I sometimes fear people might be taking the version control metaphor a bit too literally, unaware of why you cannot simply use version control as-is in a journalistic context.

One

Version control systems are line-based. For narrative texts, that actually means paragraph-based. Not only does this make it difficult to find the actual changes in a piece of text, it also makes it vastly impractical to, say, merge a spelling fix you’ve done in a special branch with the master branch, because it’ll just overwrite the entire paragraph and all updates that may have happened since.

My blog is actually version-controlled. Feel free to check on GitHub how absolutely useless the diffs are when I update a post.

Two

To get the most out of version control, you’re supposed to make atomic commits, which means that every batch of changes you make should have one specific purpose and one specific purpose only. If you fix a typo, reorder a couple of paragraphs and change the title, those changes merit three different commits.

For code, atomic commits work really well and are hardly any trouble; you’ll have to commit your changes maybe every half an hour and force yourself not to switch to a different task too often. Reasonable enough. For editing prose, be prepared to do a commit including a descriptive message of your changes about every five seconds. Let’s see how long that stays fun for.

Three

Merging code can sometimes be a challenge, although it’s relatively painless in Git. You can easily isolate a block of code and transplant it onto other code. A sentence, however, is a fragile thing that can imply all sorts of things and needs to fit with the next and previous sentences and the paragraph in general.

You might want to haul over sentences, like a crucial fact you corrected in one branch like, say, your web edition of a story, to another branch , like, say, your longer in-the-works print edition. Merging at its best. But think about how time-intensive that would actually be: you have to cherry-pick exactly those commits you want to merge in. Not only that, you will never be able to avoid doing a bit of double work, because the two branches will likely be different enough that the same sentence or paragraph may make absolute sense in the one version but look weird in the other.

Code can take some manhandling, writing can’t. Copy-pasting and a light rewrite suddenly doesn’t seem so bad.

Four

Collaboration on stories is really really hard. Much harder than collaborating on code, even though that’s not always easy either. Look at how hard it is to work together on a piece even when you have the real-time feedback you get in a Google doc. It’s something you generally only want to do if you absolutely can’t avoid it, say, to get a bit of a turbo boost for breaking news.

Now imagine having to do that same collaborative writing exercise in total isolation of each other, like you’d do in a Git-based workflow. Then merge the result together and see what happens. Writing code alongside each other is more like writing different stories about the same thing rather than actually collaborating on a story, which is why it works, and why it won’t work for journalists or wiki authors.

It’s a metaphor, people!

Journalism may need version control, but it needs its own special kind of version control, and that’s something we haven’t invented yet.

share on twitter

Git in the newsroom debrouwere.org/48 by @stdbrouw 


 writes about statistics, computer code and the future of journalism. Used to work at the Guardian, Fusion and the Tow Center for Digital Journalism, now a data scientist for hire. Stijn is @stdbrouw on Twitter.