Hacker News new | past | comments | ask | show | jobs | submit login
Inside the Guardian’s CMS: meet Scribe, an extensible rich text editor (theguardian.com)
225 points by lebek on Mar 20, 2014 | hide | past | web | favorite | 86 comments

Let me diverge for a moment:

Everytime I see this, I wonder why browsers (and the W3C) don't add more functionality to textarea.

ContentEditable always felt like an ugly hack for me. A rich set of API on textarea could move the web forward. Imagine building an IDE in a textarea. That should be possible.


Great work nonetheless, very clean code and very nice documentation!

Related: Why isn't there a wrapping version of <input type=text>? For a single line of text which you wish to wrap, using a <textarea> with JS hooks to prevent pressing the return key is a terrible hack. Why can't you do <input type=text wordwrap>? Or <input type=text style="word-wrap: break-word;">?

Also, in this day and age of sans-serif, variable-width fonts, why do we have to specify rows and cols on <textarea>?!

Serifs are irrelevant, and browsers have had variable-width fonts since Mosaic, WorldWideWeb and ViolaWWW, so making textareas use a monospaced font and specifying their size like that seems to have been just laziness. If there was a good reason, I'd be interested to know it.

You should join the ietf and w3c mailing lists.

Then you will understand why things don't move faster.

Could you explain a little for the 99% who won't join those mailing lists?

Politics and ego over an asynchronous channel.

What stops people from getting things done?


I don't understand why companies and CMS projects think that everything needs to happen in the browser.

Especially in situations like publishing, when the content creation is being done 100% by people under employment or contract. Give them a native app or browser plugin for authoring, and you can solve all these problems with much greater sophistication.

In the distant past I used to use Microsoft Content Management Server to power websites. It lacked most of the modern features of a web CMS, but it provided a native app and IE plugin for authoring. In terms of code control and consistency it kicked the ass of even the best JS libraries available today.

A native authoring app, side loaded onto employee machines, would be a much more powerful authoring solution. It would produce clean HTML code and push it into the CMS via an API call.

remember XHTML2 ? XFORMs , yeah,it was a pain to write but FULLY EFFIN extendable ... so much that browser vendors did not want to implement these stuff and came back with hacks like web components and shadow dom making everything even more painfull to write and extend...So there was an opportunity,it was missed.

Maybe because it would break backwards compatibility (and possibly the spec) for the tab key?

It wouldn't be hard to add that functionality. An attribute could specify whether tab blurs and focuses the next input or actually inserts a tab.

Accessibility seems like an issue then?

first, good luck to the guardian on this endeavor! :+)

second, i can't help observing that efforts to use contenteditable seem to start with a rush of success -- as the early results are always very impressive -- but then seem to quickly bog down in the particulars.

bug-reports come in which are difficult to reproduce, typically originating from idiosyncrasies in an o.s., or a browser, or (in one thorny case) a _combination_.

and although people have much enthusiasm for solving these glitches at first, the slog is generally endless, and, eventually, it wears down even the most determined.

at least, that's what i have observed, enough so that -- once i started experiencing the bramblebushes too -- i pushed the contenteditable strategy off to the side.

which was easy, since i've been a long-time supporter for light-markup. no, not markdown, since that thing is too primitive, and forked, and fragmented nowadays, and will give you a big dose of misery down the line.

i built my own light-markup -- "zen markup language", extension .zml, based on the project gutenberg corpus.

i have also built a phalanx of e-book authoring-tools over the last 20 years, and it's all come together now.

the trick is making an editor that will be acceptable to both the light-markup adherents _and_ the wysiwyg folks, and i think i cracked that nut. i'd like your feedback on pre-release versions of some new stuff i'll have soon. e-mail me at bowerbird@aol.com if you'd like to play...

and again, best of luck to the guardian people on this!


That said, a plain contentEditable is just fine already if you can prevent the combinatoric explosion of browsers+OSes. Like, say, if you're shipping a node-webkit application.

In my web shop, our first CMS (circa 2003) utilized a WYSIWYG editor which was a disaster. For all of the supposed benefits it brought to the editing / writing process for our clients, the reality is that upon hitting 'save', the style/formatting issues on the front-end were awful, having picked up junk from MS Word, or copied web pages, or massive embedded images, you name it. It threw our shop into perpetual customer-support mode as our clients struggled to get anything to look right, or consistent.

For our 2nd client CMS, we chucked the WYSIWYG, spent a little more time helping our clients understand Markdown (actually Textile - this took a few minutes of our and their time). There was a little squirming involved on the part of our clients, but after a day or so, they understood it. Best part, it just works. It ensures that our site renders perfect, semantically correct HTML!

WYSIWYG in RTEs has grown up a bit, as we can see in Guardian's latest attempt. However, its largest flaw, from our perspective, is that it outputs HTML! HTML is just not a proper end-format. It is too difficult to reverse engineer. Markdown, on the other hand, can be rendered into many format. It can be output as HTML, raw text, truncated text, etc. So, while I applaud Guardian for releasing this, it saddens me that we're still attempting to improve on browser-based, user-facing tools which output HTML.

I'd love to read more about the Guardian CMS that Scribe was written for and how the writers find using it. Do they like it?

When I've worked on news sites in the past, many of the journalists preferred writing in their desktop word processor of choice and cutting and pasting into the CMS as the final step before publishing.

The CMS was probably viewed as a necessary evil and I don't remember there being much love for it from the people who had to spend hours in it everyday.

Partly this was due to the CMS not working offline and it just wasn't as pleasant to use as the software the writers have used for years, which is understandable.

They're quite happy with it (except when it breaks, understandably). Mostly, they just want it to work as they expect (i.e. like Word or Google Docs), so the goal is for them to not notice it. Doing the Right Thing on paste from GMail or Google Docs (a very frequent use case) is therefore crucial. At the same time, we want to rely on Scribe to enforce correct, standard typography rules, valid markup (unlike if it were free-form HTML), etc.

We're only at the beginning, but the curly quotes plugin mentioned in the blog post is a good example of that. Other ideas in the pipeline include automatically enforcing and converting to UL/LI lists instead of paragraphs with bullet point characters, warning on punctuation issues, etc.

Scribe has also allowed us to integrate contextual options, such as buttons to add images when the caret is on an empty line, or a button to embed any URL pasted into the body.

So the biggest challenge is therefore to provide a reliable UI that responds as one would expect, while allowing extra features to be built on top of it without too much effort.

Wow it sounds great and thanks for taking the time for the detailed answer. If there exists now or in the future a screencast or gif of a power user writing an article on the CMS please share it on HN. Although I have a feeling it'd make some devs stuck with an older CMS weep a little.

I used to work on a CMS in the telecoms sector. We found the authors absolutely hated using online editors and much prefered word. We decided to use WebDAV to let them directly edit articles in word. They were able to hit save in word, refresh the browser and see their changes instantly. Because we were parsing the word docs ourselves we were able to choose which formatting to support. This meant we could allow different size headers for example but not different fonts.

Perhaps CMSes would serve writers better if there was a focus on providing amazing import tools that work with existing file formats, like .doc, that writers prefer and have used their whole careers. Maybe there is too much effort expended on reinventing the word processor in a CMS.

This does look promising.

Markdown gets a lot of love from developers, but in my experience with clients across many industries (including journalism), it's a non-starter. If you're used to looking at markup in an editor, Markdown is an abstraction you can tolerate. But the average non-technical person tasked with the job of updating a blog or website sees Markdown as something like pig latin - a thing that makes communication more complicated.

I hear you. But as somebody used to looking at code in an editor, I actively attempt to avoid wysiwig tools in favour RestructuredText/Markdown, rather than it being "an abstraction I can tolerate", after fighting with formatting in wysiwig editors once too often.

WYSIWIG formatting woes are all real, they're just not something users understand. (My head explodes when classes are stripped off of a div every time the form is submitted...)

If I were the King of the Internet and had to choose one tool for text entry it would be a system sort of like Ghost/Prose.io, where you use Markdown, but can see the results. So even laymen can get the swing of the tool developers prefer. As King, I would also levy a tax.

This pretty much nails it and is very tricky to get right. Nearly every other contenteditable editor out there messes up the undo stack, copy-pasting and new-lines, some more than others. The extensibility feature is also a great relief.

However, I am curious as to their stance on Markdown and if it was ever considered.

Markdown was indeed considered, but we ended up rejecting the idea for a variety of reasons. Given the interest around this decision in several comments here, maybe we should write up about it as well.

If it's output is semantic and doesn't suffer from inline styles madness (like Chrome likes to mess up contentEditable), you can just convert HTML to MD without lots of issues. Sure it's lossy but it's up to you to decide what functions to expose to user.

This is awesome. Copying and pasting text from Word files seems to work fairly well, instantly generating HTML with all the cruft removed.

To download it to your desktop and try it out locally get [these files](https://github.com/guardian/scribe/tree/gh-pages)

The distribution files are kept in the `dist` branch. The tree you linked to is just for the example. We encourage consumers to download these files using Bower (`bower install scribe`)


Thank you!

I'm so so glad.

Just two weeks ago I thought we'd have to implement the same thing for the same reasons: old editors were good in handling browser inconsistencies but bad in being too tightly coupled to their dated UIs, new editors sucked at generating semantic markup.

Thank you for making this public.

I'm hugely impressed by some of the open source code that comes out of the Guardian offices.

Hmmm...this may just be my favorite post on HN in a while... I have been working on a UI builder tool and have to some extent built the kind of functionality in this editor.

It looks, at least from first impression that you guys have done a much better job...hopefully it turns out to be the perfect solution my need..Thank you!

I'm a tad late, but I should mention a really important library I wrote that changes HOW one writes editors - http://github.com/amark/monotype . Save your caret, do whatever transformation you want directly with HTML manipulation, then you restore your caret. Done, that simple - don't even bother trying to use the browser's API, because yes they are wildly unreliable in their behavior. The kicker with my library is it saves your caret (your selection) based on actual content, rather than the DOM tree, so it still is able to accurately restore it even if you completely delete and replace the DOM tree.

This could be helpful, although we haven’t needed this sort of facility yet luckily. Thanks for posting.

How does it work?

Just do:


Then perform your manipulation, when done call:


Tim Down! Thank you so much for your rangy library, it is a noble library, and I respect all your hard work trying to normalize the Range API. Monotype is similar to your new Text module but behaves a bit differently, please email me so we can talk more! mark at accelsor.com

Shamelessly promoting my text selection and range API polyfill, https://github.com/luwes/selection-polyfill

Just to make clear, this would enable Scribe to work in older IE browsers.

Nice! What persuaded you to write this polyfill?

The Guardian has a great approach to tech; they've done some really cool stuff with Scala / Play and released some great code.

I'm surprised not to see Aloha editor [0] in their list of "Existing Solutions".

[0] - http://www.aloha-editor.org/

I should’ve added that! There are many…

And how about Codemirror? It's mainly used for monospace code editing and it's missing a UI and editing widgets, but the configurability and maturity should blow everything else out of the water. A demo showing it's not only usable for monospace fonts in a single font-size: http://codemirror.net/demo/variableheight.html

I really like Codemirror, it was simple to get set up in https://storytel.la/ and easy to extend. I'm a few versions behind and didn't realise it handles mixed width fonts, so thanks for mentioning it.

I suspect the best way to make rich text editing work across browsers is to do what the current google docs editor does: Drop contentEditable and just implement your own editor in javascript. Read key and mouse events and manipulate the DOM.

It's a bit more work up front, but possibly less work than trying to get four different implementations of contentEditable to behave the way you want. And this way you own the editing logic.

Do you know any good libraries that do this?

I tried building one a long time ago. The hard parts are mouse events (ie moving the caret based on a pixel coordinate) and pasting.

Google docs solves the mouse event problem by rendering every glyph offscreen and measuring it's bounding box. That is intensive and difficult if your font is bidi or changes by context (eg Arabic, accents). I tried wrapping every glyph in a span but it wiped out the available RAM. That may not be a problem anymore in 2014.

Honestly, I haven't looked into it. The bounding box thing sounds messy.

But on chrome, firefox, and probably safari, it appears that you can call document.getSelection() in a click handler, and the returned object will be an appropriate Caret selection with offset and node.

I recall that the selection object in older versions of IE do not offer the character offset, so you have to copy the selection and move it one character at a time until you get to the beginning of the text node. IE9 and above appear to have modern selection objects (according the MS docs). I don't know if they're updated onclick.

Does anyone else notice how ridiculously bloated theguardian.com is? It seriously takes 5 seconds to completely load on my modern machine.

There is a new responsive site which is going to be phased in soonish (no definite time frame though). It's much better performance-wise.

The beta site is a very bad desktop experience compared to the current site. It's a bit like when gmail was redesigned - needs an ultracompact view with way more information density.

I thought it was interesting how they appear to have inlined the CSS and JS into the HTML for their new responsive site. Is this becoming a common practice?

I guess they must have worked out that their CSS and JS was lean and minified enough that despite the extra overhead of it being in the HTML (which is even less when gzipped, I guess) and not being served from cache, it was faster than making another 2 requests. I guess there is some small overhead to fetching a file from the browser cache but surely it is tiny?

Would be interested to get more insight into this aspect of the site if anyone knows anything!

There's a new trend to try and put the really important CSS rules inline so your content can render quickly then fill in the rest later without making the initial render look like it was designed by design blind programmers, i.e. the unmodified Times New Roman HTML page.

The Gruand is also trying progressive image loading.

Unfortunately it actually just looks shit, it's like peeking behind the curtains in Oz as your content kinda half renders then renders again properly, twitching into life like some sort of half botched Frankenstein experiment.

Notice the `loadFontsAsynchronously` and the `loadCssFromStorage` in the js.

In some ways I like that the Gruand's team is experimenting, in others it's frustrating to see how the creaky and leaky HTML/web browser combo is still justifying people doing all sorts of crazy things to try and get a simple page to render quickly 25 years after the web was born. And "responsive" designs that are really just mobile designs stretched to a one-size-fits-all simply because browsers are too fucking stupid to tell you what they actually need.

My Grump is in full swing today.


The irony.

One of their senior engineers gave a good talk on this recently: https://speakerdeck.com/patrickhamann/css-and-the-critical-p...

Inline styles have always been common practice for some people...

My offshore teammates littered our repo with inline styles for no reason. It completely changed the UI I built for the projects. I spent serval days this week trying to fix it and I am still not done.

beautiful, they must send down at least 75% less data down the pipe compared to their desktop version.

It's almost instant on my 3 years old machine (Win7 + Chrome)

You probably visited the site before, and had some information cached. Clear your cache and get back to me on the results. I'm curious.

Actually, I from some reason get automatically new (mobile) version. Same as linked by OliverJAsh.

I'm excited about using this in my next project. Love the fact they kept it simple with this one.

Anyone using angularjs should seriously consider textAngular. I'm currently using it on an enewsletter editor app and it's pretty darn sweet.


It’s easy enough to write a directive to plug Scribe in to Angular with bidirectional binding. I’ll post an example soon, but it’s simply a case of listening to content changes in Sribe and updating the Angular model using ng.NgModel.NgModelController.

I hope the name is in homage to Scribe, the markup language and word processor. There was a time when you might write up your thesis in Scribe (even as, in parallel, some people chose LaTeX).

I am shocked how misinformed and technically misleaded a pro team like the guardian is. Saving your stories in HTML in a CMS? This is like using Joomla for hobbiest sites or something like that, it will kill your archive and it will be a pain reusing content. A good Content Management Strategy is: use libreoffice for wirting, create a plugin that publish to archive and put articles in the pipeline for redaction, save content to docbook and be prepared for any kind of reuse of that content.

We only save inline text in HTML. Text “elements” along with any other media “elements” is stored as JSON and we attach metadata to each element.

nice to see someone trying to fix contenteditable as an api, thanks for the hard work and sharing it!

This looks really nice. I can't wait to play with it on an upcoming project. Thanks to The Guardian for open sourcing it.

I'm curious what support on mobile is like. I tried the demo on my Nexus 7 and it worked well for the two minutes I played. However I skimmed the browserinconsistencies.md file and did not see any specific mention of mobile browsers. Mobile is a place where most of the existing solutions fall on their face. Having good mobile could really drive adoption.

I haven’t done any testing on mobile, but we could add a few platforms to our integration tests to see what the support is like, perhaps! It would be good to test.

First of all, props of course to the Guardian's team, not only for devoting resources to improvement of the CMS field, but open-sourcing it...a concept that is still mostly alien to the modern newsroom.

As far as I know, the solution they have here is impressive and as good as the state-of-the-art, in terms of usability and modularity...but it still can't overcome the major quirks that come up with rich-text editors.

For example, I typed in the following in the demo (http://guardian.github.io/scribe/):

Hello, world

Why italics?

So the error here is that after typing "world", I switched off the italics and hit line break/Enter. However, the italics-mode persisted into the next line. This was the generated HTML:

      <p>Hello, <i>world</i></p><p><i>Why italics?</i></p>

As a programmer, I can appreciate why this might happen, and I know how to fix it...but this is the kind of unexpected behavior that is the bane of the layperson, so much so that with each new rich-text environment -- whether it be Word, Google Docs, TinyMCE, etc -- they have to come up with a whole new list of hacks to get around these quirks.

edit: the rest of this is address to a general "you", not to "you, the Guardian developers", as in, "why didn't you just do Markdown"...though if the Guardian took the lead in that, I'd most definitely upvote that too ;)

I think rich-text editors are fine for the very layperson. But I think for professional reporters, there needs to be a move toward the expectation that they all learn Markdown. Note, I'm not saying that everyone needs to learn HTML...but Markdown is basically the exact subset of text formatting a professional online writer needs to communicate 99.9% of their reportage material, with the rest being made up through plugins/shortcode/embed, as is currently the case for most online CMSes.

Markdown can be written in any editor and is portable to a huge variety of systems and services. More importantly, even without a specialized editor, Markdown still has human-friendly structure. What else does a writer need?

Before you say: oh but we can't expect our writers to learn code-like things...this is not true at all. As an intern at the Denver Post, I spent at least a day learning the in-house editor, which was Windows-only, designed for the print-publishing workflow, and had all manner of arcane key combinations to add editing marks (again, for print, and not the web). Everyone was expected to learn it, and everyone did fine.

But unlike Markdown, this in-house closed source coding system was...well, shitty as most industry-specific codes are...and once in awhile someone would accidentally "un-hide" notes meant for an editor's eyes only, which would then show up in print. What I love about Markdown is that you don't even have to really know it to write what you need...hitting Enter creates a paragraph break, both in your text editor and the platform to which you publish. How much easier can you get?

The behaviour you describe around italics is completely native to the browser, so it will at least be consistent across uses of `contentEditable`. I agree that it is annoying, but there were much bigger fish to fry (https://github.com/guardian/scribe/blob/87d3ed1a7f28d9fcdcc0...).

You make some very good points about Markdown, and we spent awhile thinking about whether to go down that road or not. Although there is a technical barrier to access for Markdown, that’s not the reason we decided against it. Maybe we should do another blog post to follow up with more details regarding our decisions. (Or, I’ll come here with more details in just a bit.)

Would love to hear the reasons why you decided against markdown, it was something I was just discussing at the beginning of the week with my team (who work on news sites).

So, assuming unlimited developer time, you would want Scribe to correct this?

I'm just trying to get a feel of Scribe's scope... Does it want to fix all inconsistencies, even pure UI irritants, or is it mostly about ensuring correct content?

I'm tempted to write a plugin that "types" space-delete after every CR. That's the quickest and nastiest fix I can think of...

Can you add an issue here (https://github.com/guardian/scribe/issues) and we can talk solutions? I have a few in mind!

Generally speaking, it should patch behaviour to the point where it doesn’t get in the way of the user and produces clean and semantic markup. We have found that patching some of the smaller behaviour obscurities would not be trivial. If it matters to the community though, we (the open source community) can definitely get them patched!

I have to reiterate...the product's relatively few shortcomings aren't yours, or even real shortcomings depending on use-case...I'm just bemoaning the lack of digital literacy overall of digital journalists, something I'm sure you have plenty of insight on in terms of your non-dev colleagues. And switching to Markdown not a change that the dev team can enforce on the newsroom...but it'd sure make people's lives easier in the long run, in my opinion...

I couldn't agree less.

Markdown is an abstraction that professional writers do not need. MacWrite et al solved the text-editing problem 30 years ago. Press the "paragraph" button (carriage-return); you get a new para in the content and on the screen. Press the universal shortcut for italic (cmd-I) and you get italics, in the content and on the screen. And so on.

Markdown is a hack that works around two inconvenient truths. First, in-browser editing has traditionally been shitty. Second, for those working in a coding milieu, you can't just type 'less someasciifile.txt' and have it display with the right italics, bold and breaks.

If you're a professional writer for print, none of this has bothered you since WordStar. You write in your tool of choice (most places I've worked it's been Word, though I use TextEdit). You save. The designer brings it into Quark or, latterly, InDesign, which preserves the formatting. End of. You don't need to think of "markup language" at any stage in the process, and nor should you.

Expecting the rest of the world to downgrade to a hack devised for the web is entirely backasswards. Scribe looks like a good attempt to bring in-browser editing up to the standard everyone else has enjoyed for 30 years, rather than dragging everyone down to the hack level of Markdown. Good luck to them.

(My background: former full-time consumer magazine editor, now freelance; semi-pro coder.)

Sorry, but how did MacWrite solve the text-editing problem? I'm guessing you don't mean the actual dead program, unless it exists under another name. So what module could you be referring to? And is it part of Windows platforms?

And how does your process handle the checking of URLs? I've done newspaper pasteup with PageMaker since high school...as far as I remember, there was not a simple way to double check URLs. Actually I don't even remember there being such a concept as embedding hyperlink URLs into what would be print newspapers, but that's besides the point...

And correct me if I'm wrong, but where does the MacWrite magic turn into Web content? You ended your description at "designer brings it into Quark". That's not the Web. that's not even...anything.

And of course, I'm talking about a lot more than the Web here, I'm talking about a portable format that can be read without any special text editor at all. How does MacWrite fit into that?

I'm talking UI. MacWrite established a fast, efficient, clear UI for entering text, one that has been used by professional writers for the last 30 years.

Getting that into your medium of choice, whether that be print, the web, or whatever, is a SMOP. The Guardian have chosen to tackle this SMOP. Markdown chose to hack around it instead, but at the cost of usability.

>First of all, props of course to the Guardian's team, not only for devoting resources to improvement of the CMS field, but open-sourcing it...a concept that is still mostly alien to the modern newsroom.

Django was originally developed by a Kansas newspaper.

And the NY Times has done a great job:


That said, I think those are exceptions, not the rule.

I don't know how newspapers do it these days, but back when I was doing desktop publishing for some small magazines, I used to reduce everything to plaintext before shoving it into Pagemaker.

Genuine question: why do people prefer Markdown to - say - a HTML subset consisting of h1,h2,h3,p,img,i,b and maybe br?

Tables are a problem, but tables are just as horrible to write in Markdown as they are in HTML.

This is quite simply stunning.

I've used ckeditor 4 recently and found that pretty impressive (it was a system for a company of charity workers, not a hope in hell that markdown was going to happen) and it was good enough.

This looks like something I'd use for my own stuff.

Does anyone know why Chrome uses &nbsp; (non-breaking-space) before and after inline tags such as anchor and italics? This behavior causes lines to sometime break much sooner than needed.

Any links to see a demo?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact