
Inside the Guardian’s CMS: meet Scribe, an extensible rich text editor - lebek
http://www.theguardian.com/info/developer-blog/2014/mar/20/inside-the-guardians-cms-meet-scribe-an-extensible-rich-text-editor
======
pothibo
Let me diverge for a moment:

Everytime I see this, I wonder why browsers (and the W3C) don't add more
functionality to textarea.

ContentEditable always felt like an ugly hack for me. A rich set of API on
textarea could move the web forward. Imagine building an IDE in a textarea.
That should be possible.

</rant>

Great work nonetheless, very clean code and very nice documentation!

~~~
TazeTSchnitzel
Related: Why isn't there a wrapping version of <input type=text>? For a single
line of text which you wish to wrap, using a <textarea> with JS hooks to
prevent pressing the return key is a terrible hack. Why can't you do <input
type=text wordwrap>? Or <input type=text style="word-wrap: break-word;">?

Also, in this day and age of _sans-serif, variable-width_ fonts, why do we
have to specify rows and cols on <textarea>?!

~~~
danellis
Serifs are irrelevant, and browsers have had variable-width fonts since
Mosaic, WorldWideWeb and ViolaWWW, so making textareas use a monospaced font
and specifying their size like that seems to have been just laziness. If there
was a good reason, I'd be interested to know it.

------
phirschybar
In my web shop, our first CMS (circa 2003) utilized a WYSIWYG editor which was
a disaster. For all of the supposed benefits it brought to the editing /
writing process for our clients, the reality is that upon hitting 'save', the
style/formatting issues on the front-end were awful, having picked up junk
from MS Word, or copied web pages, or massive embedded images, you name it. It
threw our shop into perpetual customer-support mode as our clients struggled
to get anything to look right, or consistent.

For our 2nd client CMS, we chucked the WYSIWYG, spent a little more time
helping our clients understand Markdown (actually Textile - this took a few
minutes of our and their time). There was a little squirming involved on the
part of our clients, but after a day or so, they understood it. Best part, it
just works. It ensures that our site renders perfect, semantically correct
HTML!

WYSIWYG in RTEs has grown up a bit, as we can see in Guardian's latest
attempt. However, its largest flaw, from our perspective, is that it outputs
HTML! HTML is just not a proper end-format. It is too difficult to reverse
engineer. Markdown, on the other hand, can be rendered into many format. It
can be output as HTML, raw text, truncated text, etc. So, while I applaud
Guardian for releasing this, it saddens me that we're still attempting to
improve on browser-based, user-facing tools which output HTML.

------
eliot_sykes
I'd love to read more about the Guardian CMS that Scribe was written for and
how the writers find using it. Do they like it?

When I've worked on news sites in the past, many of the journalists preferred
writing in their desktop word processor of choice and cutting and pasting into
the CMS as the final step before publishing.

The CMS was probably viewed as a necessary evil and I don't remember there
being much love for it from the people who had to spend hours in it everyday.

Partly this was due to the CMS not working offline and it just wasn't as
pleasant to use as the software the writers have used for years, which is
understandable.

~~~
theefer
They're quite happy with it (except when it breaks, understandably). Mostly,
they just want it to work as they expect (i.e. like Word or Google Docs), so
the goal is for them to not notice it. Doing the Right Thing on paste from
GMail or Google Docs (a very frequent use case) is therefore crucial. At the
same time, we want to rely on Scribe to enforce correct, standard typography
rules, valid markup (unlike if it were free-form HTML), etc.

We're only at the beginning, but the curly quotes plugin mentioned in the blog
post is a good example of that. Other ideas in the pipeline include
automatically enforcing and converting to UL/LI lists instead of paragraphs
with bullet point characters, warning on punctuation issues, etc.

Scribe has also allowed us to integrate contextual options, such as buttons to
add images when the caret is on an empty line, or a button to embed any URL
pasted into the body.

So the biggest challenge is therefore to provide a reliable UI that responds
as one would expect, while allowing extra features to be built on top of it
without too much effort.

~~~
eliot_sykes
Wow it sounds great and thanks for taking the time for the detailed answer. If
there exists now or in the future a screencast or gif of a power user writing
an article on the CMS please share it on HN. Although I have a feeling it'd
make some devs stuck with an older CMS weep a little.

------
subpixel
This does look promising.

Markdown gets a lot of love from developers, but in my experience with clients
across many industries (including journalism), it's a non-starter. If you're
used to looking at markup in an editor, Markdown is an abstraction you can
tolerate. But the average non-technical person tasked with the job of updating
a blog or website sees Markdown as something like pig latin - a thing that
makes communication more complicated.

~~~
mercurial
I hear you. But as somebody used to looking at code in an editor, I actively
attempt to avoid wysiwig tools in favour RestructuredText/Markdown, rather
than it being "an abstraction I can tolerate", after fighting with formatting
in wysiwig editors once too often.

~~~
subpixel
WYSIWIG formatting woes are all real, they're just not something users
understand. (My head explodes when classes are stripped off of a div every
time the form is submitted...)

If I were the King of the Internet and had to choose one tool for text entry
it would be a system sort of like Ghost/Prose.io, where you use Markdown, but
can see the results. So even laymen can get the swing of the tool developers
prefer. As King, I would also levy a tax.

------
aleem
This pretty much nails it and is very tricky to get right. Nearly every other
contenteditable editor out there messes up the undo stack, copy-pasting and
new-lines, some more than others. The extensibility feature is also a great
relief.

However, I am curious as to their stance on Markdown and if it was ever
considered.

~~~
theefer
Markdown was indeed considered, but we ended up rejecting the idea for a
variety of reasons. Given the interest around this decision in several
comments here, maybe we should write up about it as well.

------
hypertexthero
This is awesome. Copying and pasting text from Word files seems to work fairly
well, instantly generating HTML with all the cruft removed.

To download it to your desktop and try it out locally get [these
files]([https://github.com/guardian/scribe/tree/gh-
pages](https://github.com/guardian/scribe/tree/gh-pages))

~~~
OliverJAsh
The distribution files are kept in the `dist` branch. The tree you linked to
is just for the example. We encourage consumers to download these files using
Bower (`bower install scribe`)

[https://github.com/guardian/scribe/tree/dist](https://github.com/guardian/scribe/tree/dist)

~~~
hypertexthero
Thank you!

------
danabramov
I'm so so glad.

Just two weeks ago I thought we'd have to implement the same thing for the
same reasons: old editors were good in handling browser inconsistencies but
bad in being too tightly coupled to their dated UIs, new editors sucked at
generating semantic markup.

Thank you for making this public.

------
onion2k
I'm hugely impressed by some of the open source code that comes out of the
Guardian offices.

------
Edmond
Hmmm...this may just be my favorite post on HN in a while... I have been
working on a UI builder tool and have to some extent built the kind of
functionality in this editor.

It looks, at least from first impression that you guys have done a much better
job...hopefully it turns out to be the perfect solution my need..Thank you!

------
marknadal
I'm a tad late, but I should mention a really important library I wrote that
changes HOW one writes editors -
[http://github.com/amark/monotype](http://github.com/amark/monotype) . Save
your caret, do whatever transformation you want directly with HTML
manipulation, then you restore your caret. Done, that simple - don't even
bother trying to use the browser's API, because yes they are wildly unreliable
in their behavior. The kicker with my library is it saves your caret (your
selection) based on actual content, rather than the DOM tree, so it still is
able to accurately restore it even if you completely delete and replace the
DOM tree.

~~~
timdown
How does it work?

~~~
marknadal
Just do:

monotype.save(editor);

Then perform your manipulation, when done call:

monotype.restore();

Tim Down! Thank you so much for your rangy library, it is a noble library, and
I respect all your hard work trying to normalize the Range API. Monotype is
similar to your new Text module but behaves a bit differently, please email me
so we can talk more! mark at accelsor.com

------
luwes
Shamelessly promoting my text selection and range API polyfill,
[https://github.com/luwes/selection-
polyfill](https://github.com/luwes/selection-polyfill)

Just to make clear, this would enable Scribe to work in older IE browsers.

~~~
OliverJAsh
Nice! What persuaded you to write this polyfill?

------
pea
The Guardian has a great approach to tech; they've done some really cool stuff
with Scala / Play and released some great code.

------
alistairjcbrown
I'm surprised not to see Aloha editor [0] in their list of "Existing
Solutions".

[0] - [http://www.aloha-editor.org/](http://www.aloha-editor.org/)

~~~
OliverJAsh
I should’ve added that! There are many…

~~~
arnehormann
And how about Codemirror? It's mainly used for monospace code editing and it's
missing a UI and editing widgets, but the configurability and maturity should
blow everything else out of the water. A demo showing it's not only usable for
monospace fonts in a single font-size:
[http://codemirror.net/demo/variableheight.html](http://codemirror.net/demo/variableheight.html)

~~~
nomadcoop
I really like Codemirror, it was simple to get set up in
[https://storytel.la/](https://storytel.la/) and easy to extend. I'm a few
versions behind and didn't realise it handles mixed width fonts, so thanks for
mentioning it.

------
dunham
I suspect the best way to make rich text editing work across browsers is to do
what the current google docs editor does: Drop contentEditable and just
implement your own editor in javascript. Read key and mouse events and
manipulate the DOM.

It's a bit more work up front, but possibly less work than trying to get four
different implementations of contentEditable to behave the way you want. And
this way you own the editing logic.

~~~
sunir
Do you know any good libraries that do this?

I tried building one a long time ago. The hard parts are mouse events (ie
moving the caret based on a pixel coordinate) and pasting.

Google docs solves the mouse event problem by rendering every glyph offscreen
and measuring it's bounding box. That is intensive and difficult if your font
is bidi or changes by context (eg Arabic, accents). I tried wrapping every
glyph in a span but it wiped out the available RAM. That may not be a problem
anymore in 2014.

~~~
dunham
Honestly, I haven't looked into it. The bounding box thing sounds messy.

But on chrome, firefox, and probably safari, it appears that you can call
document.getSelection() in a click handler, and the returned object will be an
appropriate Caret selection with offset and node.

I recall that the selection object in older versions of IE do not offer the
character offset, so you have to copy the selection and move it one character
at a time until you get to the beginning of the text node. IE9 and above
appear to have modern selection objects (according the MS docs). I don't know
if they're updated onclick.

------
cantbecool
Does anyone else notice how ridiculously bloated theguardian.com is? It
seriously takes 5 seconds to completely load on my modern machine.

~~~
OliverJAsh
[http://www.theguardian.com/uk?view=mobile](http://www.theguardian.com/uk?view=mobile)

~~~
tommyd
I thought it was interesting how they appear to have inlined the CSS and JS
into the HTML for their new responsive site. Is this becoming a common
practice?

I guess they must have worked out that their CSS and JS was lean and minified
enough that despite the extra overhead of it being in the HTML (which is even
less when gzipped, I guess) and not being served from cache, it was faster
than making another 2 requests. I guess there is some small overhead to
fetching a file from the browser cache but surely it is tiny?

Would be interested to get more insight into this aspect of the site if anyone
knows anything!

~~~
mattmanser
There's a new trend to try and put the really important CSS rules inline so
your content can render quickly then fill in the rest later without making the
initial render look like it was designed by design blind programmers, i.e. the
unmodified Times New Roman HTML page.

The Gruand is also trying progressive image loading.

Unfortunately it actually just looks shit, it's like peeking behind the
curtains in Oz as your content kinda half renders then renders again properly,
twitching into life like some sort of half botched Frankenstein experiment.

Notice the `loadFontsAsynchronously` and the `loadCssFromStorage` in the js.

In some ways I like that the Gruand's team is experimenting, in others it's
frustrating to see how the creaky and leaky HTML/web browser combo is still
justifying people doing all sorts of crazy things to try and get a simple page
to render quickly 25 years after the web was born. And "responsive" designs
that are really just mobile designs stretched to a one-size-fits-all simply
because browsers are too fucking stupid to tell you what they actually need.

My Grump is in full swing today.

~~~
TazeTSchnitzel
*Grauniad

~~~
pessimizer
The irony.

------
lewisflude
I'm excited about using this in my next project. Love the fact they kept it
simple with this one.

------
spartanatreyu
Anyone using angularjs should seriously consider textAngular. I'm currently
using it on an enewsletter editor app and it's pretty darn sweet.

[http://textangular.com/](http://textangular.com/)

~~~
OliverJAsh
It’s easy enough to write a directive to plug Scribe in to Angular with
bidirectional binding. I’ll post an example soon, but it’s simply a case of
listening to content changes in Sribe and updating the Angular model using
ng.NgModel.NgModelController.

------
mherdeg
I hope the name is in homage to Scribe, the markup language and word
processor. There was a time when you might write up your thesis in Scribe
(even as, in parallel, some people chose LaTeX).

------
GamboMama
I am shocked how misinformed and technically misleaded a pro team like the
guardian is. Saving your stories in HTML in a CMS? This is like using Joomla
for hobbiest sites or something like that, it will kill your archive and it
will be a pain reusing content. A good Content Management Strategy is: use
libreoffice for wirting, create a plugin that publish to archive and put
articles in the pipeline for redaction, save content to docbook and be
prepared for any kind of reuse of that content.

~~~
OliverJAsh
We only save inline text in HTML. Text “elements” along with any other media
“elements” is stored as JSON and we attach metadata to each element.

------
etherealG
nice to see someone trying to fix contenteditable as an api, thanks for the
hard work and sharing it!

------
paulyg
This looks really nice. I can't wait to play with it on an upcoming project.
Thanks to The Guardian for open sourcing it.

I'm curious what support on mobile is like. I tried the demo on my Nexus 7 and
it worked well for the two minutes I played. However I skimmed the
browserinconsistencies.md file and did not see any specific mention of mobile
browsers. Mobile is a place where most of the existing solutions fall on their
face. Having good mobile could really drive adoption.

~~~
OliverJAsh
I haven’t done any testing on mobile, but we could add a few platforms to our
integration tests to see what the support is like, perhaps! It would be good
to test.

------
danso
First of all, props of course to the Guardian's team, not only for devoting
resources to improvement of the CMS field, but open-sourcing it...a concept
that is still mostly alien to the modern newsroom.

As far as I know, the solution they have here is impressive and as good as the
state-of-the-art, in terms of usability and modularity...but it still can't
overcome the major quirks that come up with rich-text editors.

For example, I typed in the following in the demo
([http://guardian.github.io/scribe/](http://guardian.github.io/scribe/)):

Hello, _world_

 _Why italics?_

So the error here is that _after_ typing "world", I switched _off_ the italics
and hit line break/Enter. However, the italics-mode persisted into the next
line. This was the generated HTML:

    
    
          <p>Hello, <i>world</i></p><p><i>Why italics?</i></p>
    
    

As a programmer, I can appreciate why this might happen, and I know how to fix
it...but this is the kind of unexpected behavior that is the bane of the
layperson, so much so that with each new rich-text environment -- whether it
be Word, Google Docs, TinyMCE, etc -- they have to come up with a whole new
list of hacks to get around these quirks.

 _edit: the rest of this is address to a general "you", not to "you, the
Guardian developers", as in, "why didn't you just do Markdown"...though if the
Guardian took the lead in that, I'd most definitely upvote that too ;)_

I think rich-text editors are fine for the _very_ layperson. But I think for
professional reporters, there needs to be a move toward the expectation that
they all learn Markdown. Note, I'm not saying that everyone needs to learn
HTML...but Markdown is basically the exact subset of text formatting a
professional online writer needs to communicate 99.9% of their reportage
material, with the rest being made up through plugins/shortcode/embed, as is
currently the case for most online CMSes.

Markdown can be written in any editor and is portable to a huge variety of
systems and services. More importantly, even without a specialized editor,
Markdown still has human-friendly structure. What else does a writer need?

Before you say: _oh but we can 't expect our writers to learn code-like
things_...this is not true at all. As an intern at the Denver Post, I spent at
least a day learning the in-house editor, which was Windows-only, designed for
the print-publishing workflow, and had all manner of arcane key combinations
to add editing marks (again, for print, and not the web). Everyone was
expected to learn it, and everyone did fine.

But unlike Markdown, this in-house closed source coding system was...well,
shitty as most industry-specific codes are...and once in awhile someone would
accidentally "un-hide" notes meant for an editor's eyes only, which would then
show up in print. What I love about Markdown is that you don't even have to
really know it to write what you need...hitting Enter creates a paragraph
break, both in your text editor and the platform to which you publish. How
much easier can you get?

~~~
OliverJAsh
The behaviour you describe around italics is completely native to the browser,
so it will at least be consistent across uses of `contentEditable`. I agree
that it is annoying, but there were much bigger fish to fry
([https://github.com/guardian/scribe/blob/87d3ed1a7f28d9fcdcc0...](https://github.com/guardian/scribe/blob/87d3ed1a7f28d9fcdcc0a54e0d168dbeac26e558/BROWSERINCONSISTENCIES.md)).

You make some very good points about Markdown, and we spent awhile thinking
about whether to go down that road or not. Although there is a technical
barrier to access for Markdown, that’s not the reason we decided against it.
Maybe we should do another blog post to follow up with more details regarding
our decisions. (Or, I’ll come here with more details in just a bit.)

~~~
bronson
So, assuming unlimited developer time, you would want Scribe to correct this?

I'm just trying to get a feel of Scribe's scope... Does it want to fix all
inconsistencies, even pure UI irritants, or is it mostly about ensuring
correct content?

I'm tempted to write a plugin that "types" space-delete after every CR. That's
the quickest and nastiest fix I can think of...

~~~
OliverJAsh
Can you add an issue here
([https://github.com/guardian/scribe/issues](https://github.com/guardian/scribe/issues))
and we can talk solutions? I have a few in mind!

Generally speaking, it should patch behaviour to the point where it doesn’t
get in the way of the user and produces clean and semantic markup. We have
found that patching some of the smaller behaviour obscurities would not be
trivial. If it matters to the community though, we (the open source community)
can definitely get them patched!

------
noir_lord
This is quite simply stunning.

I've used ckeditor 4 recently and found that pretty impressive (it was a
system for a company of charity workers, not a hope in hell that markdown was
going to happen) and it was good enough.

This looks like something I'd use for my _own_ stuff.

------
RaRic
Does anyone know why Chrome uses &nbsp; (non-breaking-space) before and after
inline tags such as anchor and italics? This behavior causes lines to sometime
break much sooner than needed.

------
spoiledtechie
Any links to see a demo?

~~~
sv123
[http://guardian.github.io/scribe/](http://guardian.github.io/scribe/)

