

Ask HN: I want to write my own CMS.  Point me to some reading? - Random_Person

My google-fu is lacking today and I am certain the HN community has a few bookmarks they can share...<p>I've had my website up since September.  It is 100% hand generated and static.  I like it that way.  I would rather spend some time up front writing my own code over dealing with updating and protecting dynamic solutions such as WordPress, Django, Flask, etc.<p>What I would like to do, is code my own content generation system that I run locally.  I don't want to use something like DreamWeaver, I simply want to be able to type up a post, run my script and have it spit out markup based on my templates.  I've been doing all this work by hand and I recognize that it is slowing becoming daunting.  I don't want to burn out on writing because the process is annoying.<p>I work in Python, but I'm willing to pick up a different language if there are libraries that are purpose built for this.  I've tried searching, but I'm coming up empty.  I have very little [read: no] experience  parsing text like this, and I'm looking to learn.<p>What are some good resources that can get me started?<p>Thanks in advance!
======
Lazare
Writing a CMS is fun and educational, but don't conflate Wordpress and Flask.
Wordpress is a CMS; Flask is a framework which you could use to write a CMS.
:)

More generally, figure out what you want to implement, and what you don't want
to implement. From the sound of it, you don't want to write your own web
server. What about URL routing? Do you want to parse your own HTTP requests,
or do you want that handled for you? Want to write your own template system,
or use an existing one? Write your own markup language and parser, or go with
an off-the-shelf solution?

If you're looking for a local tool that lets you "compile" some content into a
static deployable website, then you're looking for a static site generator
tool like Jekyll, Hyde, or Octopress.

If you want to build your own, you'll need to make some decisions. Pick a
language (for example, Python), a markup language (for example, Markdown), and
a template engine (for example Jinja 2). Now you just need to install some
libraries, and write a little glue script that will enumerate files in a
directory, run then through Markdown, pass them through a template, save the
results, and upload them (probably via some mix of fabric and git). That'll
get you started.

~~~
Random_Person
Yeah, being so new to these technologies, I wasn't sure what to call things.
No, I don't want to write a web server. I don't want to code a complete CMS
that serves up dynamic pages.

I just want to auto generate static HTML pages from a selection of page
templates and a file of text. I want to be able to add links to my navigation
menus and have my solution iterate through all the pages and change things
accordingly. I don't want to have to copy/paste my text from my writing
software, into my editor and enclose it in tags and do all that menial stuff.
That's the stuff that I want to automate. I just have to learn how to parse...

~~~
Lazare
If you want an off-the-shelf solution, then you need two pieces - a markup
parser and a template package.

The parser could be Textile, Markdown, RST, or various parsers implementing
different wiki markups. There are python libraries to implement almost
anything you want. Or you could write your own.

The parser goes through your content, and applies basic rules to translate it
into HTML. If you want a quick a dirty hack, just:

    
    
      1. enumerate files in "posts" directory
      2. sort by date edited
      3. open each file in turn
      4. read in contents and store as a string
      5. chunk string by splitting on "\n\n" (ie, an empty line)
      6. for each chunk, append <p> to output string
      7. for each chunk, apply a regex to look pairs of *s
      8. for each pair of *s, replace them with <b> and </b>
      9. append the resulting chunk and a </p>
      10. repeat 6-9 for each chunk
      11. repeat 5-10 for each file until done
    

The end result of all that should probably be something like a dict, with a
single key "body" container your post with <p> and <b> tags. Add an author and
date keys, and your ready to hand it off to your template system.

The problem comes if you want to do anything more than paragraphs and bold
text, like, for example, a hyperlink, or an unordered list. All of a sudden
you will learn why using regexps to parse any sort of markup is a terrible
idea. It's like trying to fix a car engine with a toothbrush; even if you can
make it work, you really shouldn't. There's a lot of info on writing parsers
with Python out there, and a bunch of parser modules you can use[1], but as
far as I know the short of it is that it's really quite hard. Either do some
research on parsers and be prepared to devote a lot of time to it, or find one
someone's already written is my advice. (Either way, don't actually use
regexps to try and make your own markup parser on the cheap.)

As for the template system, you could use Mustache, or Jinja 2, or if you're
okay with going beyond Python, any of a hundred others, depending in what you
want. I'm quite fond of HAML syntax, but most of the template systems I know
which speak HAML or a variant or written in Javascript or Ruby, for example.

If you want to write your own you'll face an easier task than for the parser
step. Again, you'll want to read in the template files, but the tokens your
looking to replace are pretty simple. Ideally they'll be something like:
"$$IDENTIFIER$$"; a regexp can easily find that, then you use the identifier
as a key for your dictionary you got from your parser, and bob's your aunty.

The remaining trick is stuff like navigation. A naive implementation will just
process each file one at a time, which makes things like an archive page which
lists every post you've made in each month tricky. You can get around this by
reprocessing every page every "compile", saving some metadata (like data and
titles) and then generating your navigation, indexes, and archives. If you're
even smarter you can try and cache stuff so you don't need to recompile stuff
that hasn't changed; this will require persisting metadata locally, presumably
on the file system.

At some point you'll probably be tempted to drop in an SQLite database to
store the metadata; this is pretty easy, but you might want to just run a full
on CMS locally and cache the results to a file system. Django has the Static
Generator project[2], and Flask has something very similar. These basically
turn Django (or Flask) into a tool that will "auto generate static HTML pages
from a selection of page templates and a [database] of text", which is more or
less what you want.

(Of course, the static generation route is never going to make comments
workable. This may or may not be an issue for you, depending on how much you
hate Disqus.)

[1]: For example, PLY: <http://www.dabeaz.com/ply/> If you look at that and go
"ah, this will be easy" then knock yourself out. Most people find this stuff
pretty dry and difficult though. :)

[2]: <https://github.com/luckythetourist/staticgenerator>

~~~
Random_Person
WOW. Thank you SO much.

This is exactly what I needed to read to get headed in the right direction. I
was so lost in the idea of frameworks and whatnot that I got turned around
quickly.

I have never done regex. I am baffled by it actually and didn't know if I
wanted to attempt that. I have a few ideas in the works for how to make this
work. I just need to learn how to parse and substitute based on a set of
rules. I love learning more than doing, so this might turn out to be an
awesome project for me.

------
kls
I actually built something similar in Clojure for a client, if it where not
for a client I would share it with you, but the basics where pretty easy, what
I did was I used the HTML data attributes as my templating instructions so I
would add a attribute like <div data-cms-template="header"> I then used a
parser to look for the attributes and replace them with the other template
file content that where specified in a config.xml file. It was simple and it
worked well. I really wish the big CMS's would convert over to using the data
attributed as opposed to each using some kind of proprietary custom tags. IT
would be really nice if they standardized the attribute names so that at least
the basic templates where cross CMS compatible.

~~~
Random_Person
That's actually really close to what I want to do... I just need to learn how
to parse correctly and replace.

I have 4 basic page layouts, but only 2 that I regularly generate content for.
The header, footer, nav bar, sidebars, and major layout DIV's on these pages
are all constant. I'd like to be able to take a text document of paragraphs
and run it through the parser that would apply <p>'s to the paragraphs and
format the image links properly based on some sort of predefined replacement
markup.

This is where I fall down. I need to figure out how to parse and I'm not
finding any links to learn. I can read lines and stick open and close <p>'s on
newlines, but the parsing of stuff inline is where I'm stuck.

------
johnmurch
I would take a look at how WordPress and Pyro (<http://www.pyrocms.com/>)
others (<http://storageroomapp.com/>) built out their system. How you store
content push/pull content (json? xml?) your backend db (Mongo, MySQL, etc.) as
well as your front-end (build themes, simple code language like wordpress to
be able to build a blog or a job board in minutes).

Good Luck, - keep us posted!

