Hacker News new | comments | show | ask | jobs | submit | thecodemonkey's comments login

Hacks like these are scary, and unfortunately they happen way to often.

The fundamental problem here is that it's a "cat and mouse" game. As soon as your WP install have been patched, it's just a matter of time before a new vulnerability has been discovered.

Shameless plug: We built a WordPress hosting platform[1] that protects your entire website by only making a static copy of the site publicly available.

[1] https://spudpress.com/

Why are they scary? The fact that the wordpress code base can be exploited in ways the developers didn't think is not scary, it is stupid. If there was ever a code base that needs rewritten I give you wordpress.

Wordpress is a glorious example of a wrong product gaining popularity for all the right reasons. I mean, it's bad. It's extremely, extremely, extremely bad. You look at the code and you see an army of monkeys happily shitting line after line of spaghetti; you open up database only to see hundreds of tables painfully obviously created by people who might have heard phrase "database normalization", but never looked it up; you try to make it do other stuff except blogging (e.g. add another language), you end up in horrible pains. But you open up admin and suddenly you understand why it's popular. It's design is sleek. The end user will never know the horror that lurks underneath it. It has myriads of plugins written by worse hacks than WP core team, but the user will never know that for he has no knowledge. It has seas of free templates (some of which contain dubious php code), but hey, they are free and they look good. And damn it's popular and that many people just can't be wrong. After all, if you say it's bad and should be nuked from orbit just to be sure, it's you who's wrong - because you haven't done any better and that is sufficient grounds for calling you incompetent... TL;DR: exploits in wordpress? I would have told you so years ago.

> Hundreds of tables?

Not picking sides on this whole thing - but cutting the sensationalism on this comment a bit.

WordPress's default database structure isn't complex at all. If you have an abnormally large amount of tables, that's probably because you're using a bunch of plugins to do simple easy things - your own self created Hell that you kind of deserve for being plugin crazy lazy developer.

There's basically these main tables to care about:

- `wp_posts`: for posts...

- `wp_postmeta`: for post meta...

- `wp_options`: for global defaults...

- `wp_terms`: for taxonomies...

- `wp_users`: for users...

- Some others / relationships / comments / etc...

Pretty simple - nothing complex going on at all. The biggest problem used to be that there wasn't a meta table for `wp_terms`. The "standard" was to actually save taxonomy/term meta in the `wp_options` table with it's own key value reference. Terribly not normalized, terribly inefficient at scale.

My beef has never been too much tables but the lack of tables. There definitely needed to be a `wp_termmeta` table. Fortunately, it was added in WordPress 4.4. For perspective though, I think that was only a month ago.

Meta tables are a horrible design pattern. Yes, let's serialize our data and shove it all into one column.

If you're still a little concerned with licensing and copyrights, I would recommend taking a look at www.graphicstock.com - you just play a flat monthly or yearly fee and you can download as much as you want.

Disclaimer: I work for the company behind GraphicStock. Oh, and we're hiring!


Static websites are great of course, and the article outlines that nicely.

The problem that I've been facing personally is that most of the times, the people who are updating and maintaining content on marketing websites are not developers.

So having to use the command line for static site generators such as Jekyll or Middleman is just not a good experience. Don't get me wrong, I LOVE static site generators in general and I'm using both Middleman and Sculpin myself.

Other people has seen this problem too, and some user-friendly-ish static site generators have started to surface, but I want to chime in with a solution to that too since I believe that the best way to adopt static site generators is to use the tools that content creators and maintainers already love and are familiar with, e.g. WordPress which now has a whopping 25% marketshare[1].

I built SpudPress[2] with a friend. It is a hosted static site generator for WordPress. We automatically generate a static version of your WordPress site and host in on a super fast CDN. You don't have to worry about any of the edge cases of generating a static copy, we take care of all that automatically for you.

[1] http://ma.tt/2015/11/seventy-five-to-go/ [2] https://spudpress.com


Movable Type, one of the original blogging platforms, worked (and works) exactly like that, though it also supports dynamic publishing. For example, Jeff Atwood's blog has always been statically generated: http://blog.codinghorror.com/coding-horror-movable-type-sinc...


That's really cool! I like that. It's nice to see that a platform like Movable Type has had this feature right off the bat for such a long time.

The main reason that we decided to build SpudPress this way, is that you can more or less take your existing WordPress site and instantly make it static.

In contrary to Jeff's Movable Type blog we're taking full advantage of the static pages to host the entire site on a CDN + handle asset cache validation automatically.


So is Daring Fireball.



It's really interesting that they decided to retain the full history! It also gives away some information about the initial contributors and the timeframe of when they started.


I love it. I've never taken a compiler class so going through and reading all the commits since day one is really fascinating. It's easy to see exactly how the language and structure of the code base came to be. Super cool in my book.




Here are some visual aids that better shows how this project has evolved. It also shows how active things are today.



Anyone willing to make a Gource video?





Hey, I'm one of the founders. We built SpudPress because we felt that there was room for a different static-site approach.

Most other static site generators are awesome, but they require you to use cumbersome command line tools. SpudPress uses WordPress (which 25% of all websites are already running[1]) and turns it into a pain-free static site generator.

We take care of all the edge cases and just generate and host a static site for you so you can take advantage of all the benefits (high scalability, affordable hosting, no more server-side security vulnerability worries, etc.)

[1] http://ma.tt/2015/11/seventy-five-to-go/


Advantages over `wget -mkEpnp` ?


Great question! What we offer is basically that - with a whole bunch more. We take automatically care of all the edge cases, trigger syncs automatically, invalidate CDN cache. Not to mention taking care of the whole deployment and hosting pipeline.


I can highly recommend https://basecamp.com - it just works and you can even invite clients to the projects so you can show transparency.

Worked with this in multiple agency settings in the past.


We launched our side project as a Show HN almost 2 years ago and it was a great success. Could not have asked for a better way to kick it off and the feedback was super valuable.

Obviously a lot has happened with our product since launch, but this is the original thread: https://news.ycombinator.com/item?id=7095228

This is our side project: http://geocod.io


We're very close to launching a product that serves this exact purpose. [1]

Our primary motivation was being able to build a static website as easy as using WordPress - so we figured, why not turn WordPress into a static site generator?

As notacoward mentioned, there is a few gotchas so we decided to launch it as a SaaS in order to abstract those things away and make it fully hosted on a CDN out of the box. We still give you full access to the WP install though, so no vendor-lock in or shackles for the customer.

Happy to give anyone a demo if you're interested. You can contact me at mathias AT dotsqua.re

[1] http://spudpress.com


1) My side project is http://geocod.io

2) It started out because I needed to geocode a large set of addresses for another side project. Ended up building a geocoder and realized that others might find it useful too

3) Making geocoding affordable to the masses and having built something to be proud of. Continuously learning new things because of it too.


That's a really good idea! I've had to deal with API limits countless times and it sucks.

For some projects I had to geocode with multiple providers. Bing seems to be better than Google at geocoding in some parts of the county. I esp. noticed it in the Midwest / Illinois.


I don't quite understand why you would use a full-blown browser like phantomjs for crawling (I've seen a lot of projects recently taking this approach, so this critique is not directly towards Apifier).

Yes, I get that in some specific circumstances it would be nice to be able to execute the JavaScript on the page but think about the trade-off here.

In the vast majority of cases a simple HTTP GET request with a DOM parser is all you need -- actually not a single one of the examples on the Apifier homepage has any need for phantomjs.

Wouldn't it be much much cheaper, simpler and faster to ditch phantomjs? Or is there something I'm missing here?


You're right that most of the time you don't need to use JavaScript.

But look at Google Groups for example - there is an infinite scroll to get all the topics, posts are also loaded dynamically, so you have to wait some time to get them.

In the SFO flights example you have to deal with pagination also using JavaScript.

We wanted to build a powerful tool which can crawl and scrape almost any website out there. It's slower, but you can use bench of our nodes to do it in parallel.


I agree that the Google Groups example is much simpler when using PhantomJS, but I would argue that it would be an outlier.

The SFO flights example is actually heavily over engineered, from quickly glancing over the XHR tab in Chrome Network tools it was pretty obvious that all of the data is actually located in this very nice JSON blob http://www.flysfo.com/flightprocessing/fullFlightData.txt

(I assume that the SF Flight Info was just meant as an example for the platform and as such the fact that it's already a JSON blob was just ignored for the sake of the example)


I've written some projects which use phantomjs; the primary motivation for me has been the desire to look at the web in general, rather than specific sites I'm scraping data off, and having the ability to see what their javascript does.


It's OK when you only have to crawl one or two websites, sure, manually analyze the js and write minimal DOM parsing routes would do.

But how about hundreds or thousands of websites to crawl? Or do you prefer just use phantomjs write static extraction rules.


If you limit to just that, then there's no benefit over 10 lines of Ruby and Nokogiri.



Applications are open for YC Summer 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact