Given almost all blogs run on a handful of services (i.e Wordpress) it probably wouldn't be hard to make custom scrapers that capture everything.
I'm looking to create an easy way for bloggers to publish their content for ebook readers without having to lock themselves into DRM schemes like the Kindle publishing platform, so this is targeted at the bloggers themselves primarily.
In Wordpress, at least, this is an easy configuration change. In Blogger, you can just add ?max-results=500 (or whatever) to the feed URI to get as many as you want.
produces a helluva lot more than 20 entries. :-)
If not, why did you balk at creating an account with a centralized provider that lets you log into multiple websites with the same account, claiming you don't want "Yet Another Damn Account"?
Hell, I logged in just because it supported Persona and I already have an account there (and love it).
Good for you. I would have tried it if I could have logged in with Twitter or my WordPress account.
I'd love some feedback, and it's still pretty experimental, so there will be some bugs, but I've got it working for most full-text RSS feeds.
Currently, the interface is very simple: you give it the URL of your blog, RSS, or Atom feed, and it will give you a link you can use to share it. This "share link" contains downloads for ePub and Mobi files, and the downloads will always stay up to date with the latest content (I use Superfeedr to poll the feeds on the backend).
Let me know if you have any questions.
What is your business model with respect to the book after it is created? Is it a 'percentage of sales', 'one time fee' ? Your terms document wasn't very clear :-)
And what is your position on someone using the URL/RSS of a blog they haven't written. So lets say someone puts the RSS link for Ars Technica in there, sort an instapaper on steroids, and then downloads the epub, how does that work?
I'm still working on the details, but I want to create an easy way for someone to sell their blog's content as an ebook subscription that works on any eReader and isn't locked into a single platform (I'm working on adding PDF output and some more formats soon!). There are two models that are possible: monthly fee, or percentage of sales. I think I can do both -- percentage of sales for people who are just starting out, and a monthly fee for heavy users who want a volume discount on their sales. I apologize if this is a bit vague, but I'm still trying to figure out what will be the most fair and sustainable way to do pricing.
I'm also working on the terms, right now I just prohibit using this for anything illegal, which would include redistributing content you don't own or isn't under the Creative Commons. This is a very nuanced issue: I think it is perfectly fine for someone to use my service for a feed like Ars Technica as long as they don't redistribute the ebooks that I generate for them. In other words, it's fine if you use my service as a "RSS Aggregator for eReaders" if that's your intended use case, but it's not fine if you post up a Mobi file of Ars Technica's content to The Pirate Bay or something like that.
Let me know if you have any questions :-)
Forbidding such use doesn't limit it though.
If you're going to profit from this in some way make sure you don't accidentally set yourself up as a prime target for a lawsuit based on contributory infringement.
This means that before you get into the money stream you have to make 100% sure that the content you profit from was acquired legally.
I'm assuming that right now, I can just say, "this is an RSS aggregator like Google Reader, and it's up to users to comply with copyright laws," but it would definitely be tricky once I begin charging.
In general, would some kind of automated verification like Google Apps (DNS records or sending an email to the owner of the domain name) suffice?
Anyway, thanks for the help!
EDIT: The nature of my app is very similar to Stitcher Radio (http://stitcher.com/). They seem to have a really detailed signup page for podcasters (http://stitcher.com/contentProviders.php). That seems to be the best way to protect against infringement.
Let me know how it goes, and I'm working on getting an in-house authentication system, but Mozilla Persona was the quickest way to get up and running. I've noticed that it's not as stable as I would have liked it to be.
I have the same problem with OP, where on my atom feed I have both <summary> and <content>, but I only get the summary with ebookglue.
I tried on other RSS reader, and it get the full content fine. Here's my feed for your reference:
Hope this helps.
EDIT: It appears like Feedburner sometimes improperly links embedded images. I've fixed the issue on my end, though images may not appear consistently.
How well does it handle image files?
Is there any icons or logos that I can use and do I get a link or something to make it easy for my site readers to grab the files?
EDIT: I can't log in for some reason and I guess I can't get a password reset?
It fetches all images and inserts them accordingly, so images should be handled appropriately. I haven't figured out a good way to handle video embeds, but I'm looking for a good way to fetch standardized thumbnails in place of videos.
I tried to add my "blog" to it:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Apache/2.2.22 (Amazon) Server at www.ebookglue.com Port 80
I'll look into it in another few days and try it again. There's some bugs here, but it seems like it worth checking out.
Seriously, I'll look at it in a few days. Do I have to create a RSS for the blog. I haven't done that yet, but I am planning to do so soon, so after I have that, I'll attempt it again.
EDIT: This is a "coming soon" feature, but I've spoken with the folks at Readability and I'm going to be implementing their content parsing for sites without RSS or Atom feeds. There isn't a definite timeline on this, but just want to send some love to the Readability guys for being awesome in general.
Thanks for the feedback.
The index are fine so there might be some missing or mistake of encoding notation around the body part.
It's a known issue with Persona and third party cookies:
Unfortunately, this is a situation where a lot of people on HN probably disable third party cookies (though perhaps a larger percentage of HN readers already have Persona accounts?). Anyway, I'm looking into rolling out my own authentication with email/password soon.
I'm still working out the optimal solution, but thanks for the feedback!
It is far better to deal with this sort of thing at the "transaction" level - eg use a captcha or similar for the second and subsequent conversions from a particular IP address.