Hacker News new | comments | show | ask | jobs | submit login
What is a Web Framework? (jeffknupp.com)
147 points by mgrouchy on Mar 4, 2014 | hide | past | web | favorite | 75 comments

I love stuff like this that talks about web basics in precise detail. Too much of the stuff out there for beginners glosses over the basics of the web. In the midst of receiving tons of info about HTML & CSS & Javascript & whatever scripting language beginning devs have trouble knowing at all times what level of the stack they are operating at, what's the execution context. Can I write Javascript here? Can I write Ruby/Python? Does it have to be interpolated? This stuff is seemingly simple but is actually so complicated when you're thrown into the ocean that beginners have trouble keeping track of it.

And it really doesn't help when more experienced devs, or non-web programmers shame people about web dev being dead simple because it's not. It's a different kind of programming, it may even be easier than writing C and managing memory explicitly and not having garbage collection. But if it is easier, it's only easier once you've mastered it and have a really strong grasp of what is happening at all times. Until you get there, it's easy to get lost in the endless list of technologies you must be familiar with to write even a simple web application. And I know this because I see it in beginners that I have taught and/or worked with.

I recently decided to write from some tutorials[1] (maybe eventually a book) that assumes zero knowledge and works its way up. I'd like to eventually go all the way up through SQL, basic httpd administation, etc. It's a long row to hoe but I think it may be valuable.

[1] http://trevorhunsaker.com/

I would strongly recommend reviewing your 101 WRT what you're calling tags and clarifying that tags represent elements that are interpreted and rendered by the browser. (The browser does not render tags.)

Also, the distinction of semantics and styling should be cleared up a little. Less emphasis on how elements are styled by default, and more on what they represent.

I like the approach of assuming no knowledge, but this requires extra care to introduce concepts without using improper or misleading terminology. As a beginner, there's not much worse than hearing simplifications and thinking they're the whole story.

Good luck with continuing your series!

Thanks for the feedback! I'll make sure to review for improper simplifications.

Both your site and the original post are great starts.

Anyone reading or writing these sorts of fundamental web programming tutorials should look at Philip Greenspun's books Philip and Alex's Guide to Web Publishing and Software Engineering for Internet Applications, both available free online (The latter is a stripped down version of the former, written for goal-oriented MIT students who don't want anecdotes or pretty pictures). They are around a decade old, so they aren't a good source for copy-paste-ready code, but the ideas in the books have aged quite well (from simple things like using "abstract URLs"--URLs without extensions like .aspx and .php, to focusing on the website's data model and user interaction instead of the programming language).

The most interesting part of these books is that they explored creating web applications for a purpose, not as an end in themselves. They were written for the sort of people who evaluated their past year not on what their manager wrote in some HR form and the raise they got, but on what they were able to accomplish, like creating http://scorecard.goodguide.com/. (What's funny is that in Greenspun's ITConversations interview (around 4:20 http://web.archive.org/web/20130729213414id_/http://itc.conv...), he said that the parts of Phil and Alex's Guide that aged were not the technology, but the rampant idealism in the book that web applications would unleash a new age of enlightenment, like the academics who had believed television would be used to broadcast Harvard lectures around the world).

Here are the chapters on HTML: http://philip.greenspun.com/panda/html http://philip.greenspun.com/seia/html

And the closest to describing HTTP and frameworks: http://philip.greenspun.com/panda/server-programming http://philip.greenspun.com/seia/basics

(It just occurred to me that Philip's interview above is one day short of being 10 years old, and his assessment might be worth revisiting. The internet still contains volumes of crap, but the internet as only grown as a teaching and community building tool, with sites like Meetup and Facebook organizing learners and teachers, Khan Academy and Udacity doing organized classes, and StackOverflow collaboratively solving specific problems).

Thanks for the recommendations, I'll be sure to check them out.

I disagree with the sentiment of this blog post, because it implies (unless I missed something) that you need to use some complicated web framework to make the internet go.

You don't, and I've recently been encountering a LOT of very confused would-be devs who get stuck behind some tangled mess of django installation.

The "simplest" web application could look like this:


     import cgi

     form = cgi.FieldStorage()
     name = form.getvalue("name", none)

     print "content-type:text/plain\n"
     print "Your name is %s" % (name)

Or maybe you want to get fancy:

     import MySQLdb
     c.execute("select something from sometable")
     results = c.fetchall()

     print "content-type:text/plain\n"

     #hey look, CSV data!
     for line in results:
          print ",".join(line)

etc. etc.

I encountered a dev the other day (I guess I'm a shitty dev, then?) who asked me how my application handled URLs.


"No, how does it handle URLs, though"

"Apache. WTF are you talking about?"

"No, like if you go to example.com/foo/bar/, how does it know what to send me?"


I swear half the devs I meet lately are more concerned with javascript frameworks and rewriting their own webserver than they are about actually serving content to users.

Imagine a baker who was more concerned with building ovens than baking bread.

Keep in mind what your goals are when you're working: are you looking to write a javascript framework, or are you looking to write a web application? Then evaluate what path you want to take from "not having an application" to "shipping a product".

Sometimes (probably many times), you don't need to make it as complicated as you're making it, and apache will work just fine.

Sure. But then you hit a case where your URI scheme doesn't map 1:1 to the part of the filesystem your web server is looking at. Sometimes you can just hang ?id=foo off the end of the URI and be done with it. Sometimes you're doing something more complex, and an application-level URI router suddenly looks useful, precisely because it makes it easy to map arbitrarily and in detail between the things your application knows about and the URIs your client interacts with.

It's a bit much, too, to say "All frameworks are too complicated to be worth the effort because Django!" Even Django's partisans acknowledge that it's on the very high end of the complexity curve; indeed, it's the only framework I know of which requires more up front from the developer than Rails. I'm lately more and more of the opinion that, in almost all cases, most of that up-front investment is wasted; you can get 90% of the benefit from a microframework such as Ruby's Sinatra or Python's Flask, either of which can be mastered from a standing start inside a weekend, and save the suffering involved in Rails or Django for when, if ever, you actually need that last ten percent.

Basically, you want to choose the right tool for the job. If you're putting in two screws, just grab your trusty Apache-brand screwdriver and go to it. If you're putting in forty, you're better off breaking out the Flask power drill and a screwdriver bit; sure, you can put in forty screws with a hand driver, but you won't do your wrists any favors in the process. And if you're putting up a frame for a two-story house, you're really going to need something like Django's powder-actuated, magazine-fed nail gun, but for anything smaller, it'd be overkill.

The purist approach is nice, and we've all been there, but eventually you'll want to pool those db connections for speed, to comprehensively handle security, and to do things Apache does not do with the URL's like dynamically register/unregister paths in URL's that have no concrete corresponding script file.

Of course you can attempt to add those things yourself, but the best code you'll often write is that code you don't write at all because someone already did. Don't reinvent square wheels.

Or you could keep your markup out of the database and db connections become a non-issue (granted, I'm talking about the 99.9% of websites that are content-centric as opposed to web apps).

We were doing all of those things in the 1990s.

For many "developers" these days the job consists of trawling the Internet looking for a "framework" that matches their project, then tweaking it.

I agree with the sentiment that the framework "disease" has gotten a little out of hand (it seems like we have a new one every few weeks now...). However, these same people would come up with a dozen ways of routing URLs to handler functions if they weren't given some sort of standard solution to use.

Basically, frameworks are a way to work on a team without everyone reinventing solutions to problems that are almost beneath notice when trying to get dynamic content on the web.

The reason we don't have single CGI scripts for each url, and instead route everything through a single bootup script is to centralize things like session management, database initialization, memcache, configuration... etc. Things that you could do on every page, but that would be too cumbersome, repetitive, and generally don't change often.

When working with django, or zend, or cake, or whatever the framework of the day is I've often just wanted to go back to the simplicity of plain WSGI. But then I see the mess that people still make when everything is done for them and realize that would just create a new hell.

It takes discipline to program at a lower level, and understanding. Both of those qualities are in short supply.

> It takes discipline to program at a lower level, and understanding.

Why don't we train developers for this instead of constantly telling them they need to learn flash-in-the-pan-framework.js?

Well, because when you want to get stuff done you really don't want to invent the wheel again. Why not reuse the work someone else has done before?

This is not to say that understanding web at a low level isn't useful, it certainly is. However, in my opinion it is better to for example build a simple mvc framework as learning experience and use something battle tested in production.

Reuse is good. But my fear is that people learn the intricacies of Rails over SOLID Ruby. The basics are worth practicing over and over again -- the benefits accumulate much like compound interest. Whereas there's no guarantee of Rails being around in ten years.

There's a certain sort of intellectual deference that is given to these frameworks (much of it has to do with how readily people accept things said by famous/rich people). For example, is MVC really the best way to write traditional webapps?

We need more iconoclasts.

You don't separate the headers from the body, producing a potential vulnerability for header injection or response splitting attacks. Both are severe security problems and both are easily avoided by using a web framework.

While it may not appear to be the case, developing secure web applications is rather complicated, that's why we have frameworks and that's why these frameworks can be somewhat complex. That doesn't make not using them the simpler solution.

> While it may not appear to be the case, developing secure web applications is rather complicated, that's why we have frameworks and that's why these frameworks can be somewhat complex. That doesn't make not using them the simpler solution.

I disagree. The more complexity you introduce, the more code is needed, the greater the chance of bugs, and the greater the chance of those bugs not being discovered sooner.

You don't separate the headers from the body

Looks like it does to me. The content-type is printed with a \n and then Python implicitly adds a second newline. Or did you mean something else?

Hm, you're right. Response splitting and with that header injection should still be possible though, I think.

In any case having to manually make sure to print newlines in the right places and escape user input in headers correctly is insane.

> I swear half the devs I meet lately are more concerned with javascript frameworks and rewriting their own webserver than they are about actually serving content to users.

But a good URL router allows for code separation, and to a lesser extent, separation of concerns, which directly speaks to the ability to deliver, uhhh, baked goods.

If you're just building static HTML files, then sure, your URL routing is, and probably should be handled by Apache, because each page is its own destination. In modern web development though, you have an application, and that application would likely get big and unruly if you didn't make some efforts to modularize the application into URL addressable resources.

"Handling URLs" then, is as much a part of baking a cake as sifting the flour, or measuring the baking soda.

"If you're just building static HTML files, then sure, your URL routing is, and probably should be handled by Apache, because each page is its own destination. In modern web development though, you have an application"

This is part of the problem, though: most websites should just be collections of static files, but copious kool-aid swallowing has led to heavyweight frameworks being the default architecture.

I suppose, but that's a highly variable statement to make. Undoubtedly, there are many websites that could be reduced to static equivalents, but as I deal in web applications, that's kind of a tough pill to swallow.

With apps, even if I build a single-page app, I have to build an API to power it, and that API needs to know the difference between "customers" and "customers/customer_id", and the easiest logical way to manage that is through URL routing.

Even in static apps, URL routing is kind of a burden, unless you never link to internal pages and never grow your content beyond what you can keep in your head.

The goal was not to convince anyone that they needed to use a web framework. Rather, I hoped to explain to novices exactly what a web framework is and what problems it solves.

Couldn't resist!!! :P

Apache? Python? Why!?

Nginx maybe?

  http {
    upstream database {
        postgres_server dbname=test user=test password=test;

     server {
         location / {
             postgres_pass   database;
             postgres_query  "SELECT * FROM cats";
             rds_json on;
I'm actually using it... (and planning to use for much deeper things tho).



edit: reformated code block

Because software engineering should happen with programming languages plus service engines, the service engine should not be the total replacement for all.

Should not be the total replacement for all because?

Also, you could use LUA with nginx to increase the flexibility (if the lack of it was what you mean).

OpenResty[0] is a web application framework that consists of nginx and a bunch of plugins, including Lua scripting.

[0] http://openresty.org/

Why not, _if_ you are operating at such scale that the extra performance outweighs any additional development/maintenance cost/complexity?

I think this is what Taobao uses/used: https://github.com/alibaba/tengine

"Imagine a baker who was more concerned with building ovens than baking bread."

I'd imagine that at some point, it's hard to make enough bread without considering the ovens.

You're right.

But my point is that sometimes people get caught into pre-mature optimization. The baker analogy, within the context of what I'm saying, is about a baker who never bakes any bread, because they spend all of their time iterating on ovens, and not bread.

Having some super fancy bread-making-machine is totally vital...eventually, or if you're doing something very cutting-edge, or out of the ordinary.

But not when you're writing a personal blog.

Fair enough points; there is indeed a difference between being more concerned about the ovens than the bread and all.

That said, I have worked with/for a lot of bakers who have literally given no time thinking about the oven (I've done a lot of cheap PHP freelancing), and occasionally ruminate on finding a better oven and the choice that led to my current situation in the kitchen.

There are a fair number of folks out there who bake plenty of bread, but who could do so in more sanitary conditions with easier to reproduce recipes if they were open to some different methods.

I agree that a personal blog is probably not the right point for that kind of methodology, but there are also a lot folks who need to learn the difference between yeast and random fungii that they found by googling "free wordpress themes".

Good point. The metaphor is much stronger than the grandparent realized: a great baker should spend some time to find the best oven, install it in their bakery, then (hopefully) never think about ovens again.

> then (hopefully) never think about ovens again.

If "baker" denotes an individual whose job is to bake bread in a small bakery, then perhaps.

But we're more like industrial chefs, hired to "make" bread at scale (e.g. for Tesco or Walmart). As a software engineer, our roles may be much more about combining, improving, or making our own ingredients, machinery, and processes.

Then, when it comes to baking the bread, we're able to produce much more with less effort, or even hand off the final production to others.

yes, but then you hire a specialist at building ovens and tailor it to your bread.

Thank you! But I'm slipping into things I don't understand, and maybe you could illuminate. I assume that for both your examples, we are using, say Apache, to listen on port 80 and route requests to files. So how do you get apache to not just deliver the file as-is, "#!/usr/bin/python" and all? Does Apache typically know to execute and return the result?

Or if my questions aren't making sense, that would be good to know as well.

You can configure Apache to treat some files as CGI executables [1], which get invoked by their file path, read http requests on their standard in, and print http responses on their standard out. It's how most of "Web 1.0" was built, and it supports streaming in a way that most MVC frameworks don't.

[1] http://en.wikipedia.org/wiki/Common_Gateway_Interface

Sometimes if you haven't got things configured properly, you do return the script as text rather than the output of the script.

The old school default was to put executable in the cgi-bin folder (wherever that had been configured to be) and make them executable. That would run the script and return the output.

Sorry, but I'll have to disagree. Frameworks are great. Yep, sometimes a simple one will be enough, but you'll still need one.

Yep, it's quite simple to do GCI by hand... But you'll want all your pages to look the same, so you'll need to add some template functionality to your code. Also, you'll certainly need to handle data, so add some data validation there. You'll also benefit from some better abstraction over your database, and connection pooling... and while we are talking about finite resources, you'll want to limit the number of threads Apache launches.

There is certainly more. I'll certainly not remember all the troubles of reinventing that wheel, as I don't do it since the 90's.

This. Even when your needs are relatively simple there is a lot of work in building a web app from scratch.

I've been on the other side of an exchange like that. I'm not a web developer, so "Apache" doesn't mean anything to me. I wanted to know "What does Apache do, and how does it interact with the code you write?" but they didn't seem to understand what I was asking.

I found the article informative. I can understand what their 'simplest' example does knowing only what sockets are. However, I have no idea what cgi is. Your example is magic to me. The the things the example is supposed to teach are hidden behind its abstraction.

CGI lets any server-side script "print" to the browser. (Apache "httpd" is a modular "do everything" web server, which you can customize and fine-tune for performance -- but some say it has a steep learning curve.)

How do you handle deployments when your URL handling is in Apache? Do you deploy the Apache config at the same time as your code?

Seems messy to say the least.

Way back in my Apache/CGI days the file system was the url handling. No need to touch httpd.conf, just ftp the files..

These days, with Mojolicious behind nginx, it's much cleaner. (But it is good to have used a less abstract setup.)

There's no reason to touch the Apache config to handle URLs. .htaccess files do the job just fine and they're located in the directory structure of your application. Or am I totally misunderstanding what you're talking about?

We put all of our web server config in the web server config - overrides like .htaccess need to be compiled/interpreted/whatever at the beginning of each request so they can add unnecessary latency.

Deployments are still easy. Either symlink the web server config, or (in our case) deploy an RPM which copies the config to The right location. Then it's just a matter of a graceful restart.

A framework is a weasel word for a Massively Coupled System. The trappings of orthogonality and modularity are given lip service, while actually creating some of the most anti-modular footprints in human systems design history.

This has all happened before. MFC used to be the way to structure your Win32 applications. Later on, Microsoft leaked a small library known as WTL that provided a lot of the UI niceties of MFC without a gigantic runtime DLL. More importantly, it didn't specify as much of an architecture. It became very popular; I'd attribute a big part of it as feeling non-monolithic.

The biggest disservice that Industry does to working class programmers is when it tells them that all of these 'old' practices of modularity/coupling are outdated/can't possibly work/too hard to learn/too academic/require writing too much code. They free developers to work faster and better, rather than shackle them to fashionable technology, keeping them in a perpetual state of engineering amateurism.

Worse, Industry has the gall to proclaim each small step as progress. It's all hype and bullshit, including your favorite framework.

I generally agree. But I think there exist good "frameworks" out there. I'd cite Bottle[1] as one. And while it's still new, I think Martini[2] has the beginnings of something great. Both of them heavily rely on principles of composition.

[1] - http://bottlepy.org/

[2] - https://github.com/codegangsta/martini

I agree, I think there is a movement afoot (if not named) that new frameworks will be truly decoupled -- being designed by war-worn veterans. The first frameworks were all massively coupled bells and whistles. The whole point of RoR is a pseduo-OO skin over a massively coupled Active Record system that can't really be extended without hula hoops. And Rails gurus love hula hoops.

One problem: users of open source love oodles of features, no matter how useless. Witness the sheer number of people who admit the primary way they choose a library is the last commit date and features list.

So if you write a solid library with just the right number of features, then it might get less consideration because you're not always duct-taping over poor design choices.

This is what happens when we glorify Internet Time. It becomes more about shipping and less about quality.

This is very true. It is a real issue in the open source world. http://github.com/codegangsta/martini tries to pride itself on minimalism. Thankfully the modularity of the project allows me to tell people to add features via other packages and repositories. Even though the product is solid it is difficult to communicate that the project is still active without having so many commits.

This is one of the reasons that I find the Golang package management philosophy refreshing in theory. "Master should never break" really prevents feature creep from coming in and promotes the use of solid packages that aren't always being actively worked on. Of course there are some major drawbacks wrt lack of versioning in Go, but I think the philosophy there overall is a very good thing for open source development.

Incredibly, people seem receptive to overly ambitious feature-creep-laden libraries even if they're completely half-baked. It's like they'd rather debug someone else's code than write it properly in the first place. IMO, there's few things more painful than when someone's library just doesn't work at all. The 'shiny' factor of communities usually indicates a lack of respect for good engineering. I much prefer communities that take coupling seriously; the only one I've found so far seems to be Clojure.

Please continue to push Golang away from fashion-oriented 'engineering.' I hope you take marketing seriously; it seems very possible for someone to create the next ultra-coupled-hack-of-a-Go-framework to rile everyone up and consequently forget all the lessons of minimalism.

Well, there's already Revel for that :)

I don't use the last commit date itself; I use how many issues, pull requests, and mailing list posts there are, how recent, whether the library author is replying, and how many others are using the library in their projects. For any reasonably sized library, there should at least be some of this happening. I don't want to take on the full development effort of the library by myself if I run into problems with it.

I disagree. I tried learning Catalyst for Perl a few years back. You could swap almost any component in it, and the tutorial explained how to. Being new to the whole concept of a web framework, I couldn't actually work out what the framework was doing, as it just seemed like a collection of libraries for various tasks. Learning Django after that and it all made a lot more sense. You can swap parts out, but it makes more sense, especially in the begining to use the sensible defaults provided. They are probably more coupled than they could be, but it works well, and the parts play nicely.

JavaScript frameworks seem to be another matter.

1. Let's review what we need for most web application, at minimum.

-- Some way to parse URLs coming into the system and route to the correct code that handles the response to that request. -- Some way to marshal and unmarshal form, request parameters, and more recently JSON. -- Some way to return HTML, after we have done what we need to considering the inputs we received. -- Some way to handle cookies. We probably also want a convenience layer so that we can have some sort of session to make authentication and authorization easier. -- Some way to interface with a more fixed storage, usually some form of a database. It would be nice to have a set of convenience methods to handle prepared statements. -- It would be nice to have some sort of way to escape text going to and from the fixed storage to help prevent XSS attacks. -- It would be nice to have a way to conveniently handle CSRF attacks.

Now, we could have a library for each one, or maybe 20. But that means every new project, we're making 7 (at a minimum) libraries, evaluating them for security, keeping on top of security updates for 7 projects, and learning 7 (or more) fundamental libraries each time we come on to another team because someone made different choices. On top of that, we've written glue code on top of this all to make our lives livable. The current state of code ownership means that we probably can't take this from employer to employer, so we'll have to write it all over again, or learn someone else's glue code with its own idiosyncrasies.

All for things that matter quite a bit, but I'd prefer a single good implementation over having to search for the 7 best. Further, what happens when one of these projects goes dormant? It's easy to rip everything out if you've written your code modularly, but what about that junior programmer's code from before you were there?

And are we supposed to thrust this all on a junior programmer who is just starting out? That's how PHP happened. ;) Not every programmer is gifted with a good sense of architecture, and most frameworks at least enforce a Not-Terrible architecture. When taking over someone else's code, this can be a very good thing.

Frameworks, as massively coupled as they are, have some distinct advantages for getting things done in 95% of systems. If you're running a system for which the defaults don't work, a framework isn't right for you. That's fine. Depending on how far out you are from the opinions, you can either cobble together your own solution or write it from scratch. But for most use cases, there are more advantages to that massively coupled system than disadvantages.

Cool post. I would be interested in a follow-up post about how web frameworks handle user registration, authentication, and sessions.

and forms, file uploads, etc.

I don't know if this qualifies as a framework as it is really minimal: http://jflask.net

In short: java, inspired by flask, uses the http server included in the JRE. Perfect for embedding a small webapp in a java program (jar size < 20k, no external deps). Probably not suited for big scale apps.

Edit: homepage

I wasn't able to open the article, it was hanging on a call to disqus.com and I have that IP banned in my firewall (iptables):


Sorry, seems like Disqus was serving up spammy ads along with the comments. I've disabled the ads.

No worries, I want to read this after work, thanks for posting the information. I'll edit/update with any feedback.

The first two chapters of this online book cover similar ground, but for PHP and the symfony framework:


Part that took me the most time to understand is the different way to have an execution context for dynamic pages ( cgi, wsgi, fastcgi, servlets). I think that should definitely be a part of a "what is a web framework".

I thought it was a good article, there are a lot of different programmers out there, not all do web programming. This provides necessary coverage of the core web stuff you need to know if you want to get into it.

Interesting discussion. Any thoughts on Spring and Spring MVC? Could Spring be considered monolithic as well?

Thanks, you helped me a lot :) Does anyone know a similarly straightforward article on TCP/IP?

Beej's guide to network programming is great http://beej.us/guide/bgnet/ - it's pretty deep, but it's very clearly written.

It has been heartening to read so many of the comments here pointing how absurd web development has become with the endlessly more rigid frameworks prescribing their world view onto the developer, effectively reducing them to module install muppets, rather than empowered developers.

It's incredibly stifling once you get experience under your belt. FWIW, I think Rails has a great design aesthetic, and yet it feels like a straightjacket once I get near it. I want to just write Ruby.

meh, no websockets?

I would think that topic worthy of it's own blog post rather than a footnote or a distraction.

"its own ..."

Read all of the post.

I was expecting some enlightenment at the end or something useful for me, but nothing. This stuff is mandatory for each and every programmer who writes web-apps and I wouldn't call them web-programmers if they wouldn't know these basics.

Well I think the purpose was to teach people who are not web programmers. Or at least those who are new to web programming.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact