Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Turn Your Blog Into a Downloadable Ebook (ebookglue.com)
36 points by shantanubala on Dec 25, 2012 | hide | past | favorite | 39 comments



The problem with all of these services doing Blog RSS -> ebook is that the RSS feed typically only has the last 20 items and doesn't include comments, hence the value is relatively limited.

Given almost all blogs run on a handful of services (i.e Wordpress) it probably wouldn't be hard to make custom scrapers that capture everything.


Yup! That's the next step that I'm working on right now :-) I've gotten in touch with the folks at Readability, and will be implementing their content parser soon.

I'm looking to create an easy way for bloggers to publish their content for ebook readers without having to lock themselves into DRM schemes like the Kindle publishing platform, so this is targeted at the bloggers themselves primarily.


"RSS feed typically only has the last 20 items"

In Wordpress, at least, this is an easy configuration change. In Blogger, you can just add ?max-results=500 (or whatever) to the feed URI to get as many as you want.


Adding this to the ebookglue URL yields the same result, only 20 entries.


It must be some limitation in the ebookglue code. For example:

http://steve-yegge.blogspot.com/feeds/posts/default?max-resu...

produces a helluva lot more than 20 entries. :-)


I was going to try it, stopped at the Persona (who?) log in. I'm not going to create Yet Another Damn Account I have to remember the password to. Then I read here that it's based on RSS, and that kills it for me altogether. BTW, I've tried other blog->book services and they usually choke to death on my blogs. This one probably would too. Next!


Did you expect that you would be able to use it without an account? That's not really a reasonable expectation, at this day and age.

If not, why did you balk at creating an account with a centralized provider that lets you log into multiple websites with the same account, claiming you don't want "Yet Another Damn Account"?

Hell, I logged in just because it supported Persona and I already have an account there (and love it).


>>>I already have an account there

Good for you. I would have tried it if I could have logged in with Twitter or my WordPress account.


Hi! I made this as a side project to play around with Elastic Beanstalk and scalable ebook conversions, and it's now evolved into something a little bigger.

I'd love some feedback, and it's still pretty experimental, so there will be some bugs, but I've got it working for most full-text RSS feeds.

Currently, the interface is very simple: you give it the URL of your blog, RSS, or Atom feed, and it will give you a link you can use to share it. This "share link" contains downloads for ePub and Mobi files, and the downloads will always stay up to date with the latest content (I use Superfeedr to poll the feeds on the backend).

Let me know if you have any questions.


I think it is a cool idea, I suggest you negotiate with a blog owner and create example files from their blog which potential users could see.

What is your business model with respect to the book after it is created? Is it a 'percentage of sales', 'one time fee' ? Your terms document wasn't very clear :-)

And what is your position on someone using the URL/RSS of a blog they haven't written. So lets say someone puts the RSS link for Ars Technica in there, sort an instapaper on steroids, and then downloads the epub, how does that work?


Thanks! I'm happy to hear you like it.

I'm still working on the details, but I want to create an easy way for someone to sell their blog's content as an ebook subscription that works on any eReader and isn't locked into a single platform (I'm working on adding PDF output and some more formats soon!). There are two models that are possible: monthly fee, or percentage of sales. I think I can do both -- percentage of sales for people who are just starting out, and a monthly fee for heavy users who want a volume discount on their sales. I apologize if this is a bit vague, but I'm still trying to figure out what will be the most fair and sustainable way to do pricing.

I'm also working on the terms, right now I just prohibit using this for anything illegal, which would include redistributing content you don't own or isn't under the Creative Commons. This is a very nuanced issue: I think it is perfectly fine for someone to use my service for a feed like Ars Technica as long as they don't redistribute the ebooks that I generate for them. In other words, it's fine if you use my service as a "RSS Aggregator for eReaders" if that's your intended use case, but it's not fine if you post up a Mobi file of Ars Technica's content to The Pirate Bay or something like that.

Let me know if you have any questions :-)


> right now I just prohibit using this for anything illegal

Forbidding such use doesn't limit it though.

If you're going to profit from this in some way make sure you don't accidentally set yourself up as a prime target for a lawsuit based on contributory infringement.

This means that before you get into the money stream you have to make 100% sure that the content you profit from was acquired legally.


I'll definitely be getting some legal advice before setting up any kind of paid service, but do you have any specific advice or guidelines to follow?

I'm assuming that right now, I can just say, "this is an RSS aggregator like Google Reader, and it's up to users to comply with copyright laws," but it would definitely be tricky once I begin charging.

In general, would some kind of automated verification like Google Apps (DNS records or sending an email to the owner of the domain name) suffice?

Anyway, thanks for the help!

EDIT: The nature of my app is very similar to Stitcher Radio (http://stitcher.com/). They seem to have a really detailed signup page for podcasters (http://stitcher.com/contentProviders.php). That seems to be the best way to protect against infringement.


I've tried using it, but with limited success. First of all I had problem with logging in, probably because I block third party cookies in Firefox. Then I switched to Safari, and got a bit further. I did get logged out when I added a blog URL though, for some reason. Finally, the file generated contains only a small intro of each post, and no images. According to my Wordpress settings, the feed is supposed to contain the full posts.

https://www.ebookglue.com/share/ludvig-och-leija-eiman


Thanks for the feedback! I checked your feed, and it looks like your feed only contains the post summaries. Could you perhaps check your settings again? You can also try directly entering the URL for your feed if you're using a service like Feedburner.

Let me know how it goes, and I'm working on getting an in-house authentication system, but Mozilla Persona was the quickest way to get up and running. I've noticed that it's not as stable as I would have liked it to be.


Hi.

I have the same problem with OP, where on my atom feed I have both <summary> and <content>, but I only get the summary with ebookglue.

I tried on other RSS reader, and it get the full content fine. Here's my feed for your reference:

https://www.ebookglue.com/share/bertzzie-com-atom-feed http://bertzzie.com/atom

Hope this helps.


Thanks for the help! I'm looking into it, as I think there may be some issues with the extraction of different content types in an atom feed. I think I have a way to fix it, though.


I am getting an internal server error when I tried to generate it for my blog https://www.ebookglue.com/share/rajeeshcv-com


same with my blog (http://jwillmer.de/blog/), using feedburner for rss maybe some parsing errors with that service?


Thanks for trying it out! I'll take a look into it.

EDIT: It appears like Feedburner sometimes improperly links embedded images. I've fixed the issue on my end, though images may not appear consistently.


Checked it again now, without changing any settings, and now it seems to include the full posts.


I think this is an awesome idea and I'm willing to test it.

How well does it handle image files?

Is there any icons or logos that I can use and do I get a link or something to make it easy for my site readers to grab the files?

EDIT: I can't log in for some reason and I guess I can't get a password reset?


It uses Mozilla Persona -- were you able to sign in using your Persona/BrowserID account? (https://login.persona.org/) If you don't already have a Persona account, you should be able to create one using the login dialog itself. If you're still having trouble, let me know.

It fetches all images and inserts them accordingly, so images should be handled appropriately. I haven't figured out a good way to handle video embeds, but I'm looking for a good way to fetch standardized thumbnails in place of videos.


Ah, just close the window, wait 10 minutes, and come back.

I tried to add my "blog" to it:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Apache/2.2.22 (Amazon) Server at www.ebookglue.com Port 80

I'll look into it in another few days and try it again. There's some bugs here, but it seems like it worth checking out.


Thanks! I'll take a look. Try adding http:// to the url. I'll fix that in a little bit and make sure it fails more gracefully :-)


Oops! The URL you entered doesn't look like a valid RSS or Atom feed.

Seriously, I'll look at it in a few days. Do I have to create a RSS for the blog. I haven't done that yet, but I am planning to do so soon, so after I have that, I'll attempt it again.


Ah yes. I probably should change the marketing language to reflect that an RSS or Atom feed is required. Sorry about the confusion :-)

EDIT: This is a "coming soon" feature, but I've spoken with the folks at Readability and I'm going to be implementing their content parsing for sites without RSS or Atom feeds. There isn't a definite timeline on this, but just want to send some love to the Readability guys for being awesome in general.


With videos, how about putting the video URL through a shortener (maybe only if it's really long) and displaying it in plain text? You usually cannot sum up a video with an image anyway, and often enough the thumbnails are kinda useless to begin with. So I would consider a thumbnail a "nice to have" for sure, but title and link much more important. Maybe date and duration as well.


That's a good idea! I'll look into implementing it.

Thanks for the feedback.


It is an interesting service. I tried to ebook-ify my (Japanese) blog but noticed the contents body are all garbled. https://www.ebookglue.com/share/Qiu-Yuan-saibouzurabopurogur...

The index are fine so there might be some missing or mistake of encoding notation around the body part.


Thanks for the feedback! I looked at the resulting ebooks, and it looks fine, but it may be a character encoding issue that is specific to your ePub or Mobipocket viewer or reader -- which device or software are you using? I'll look into fixing it.


The site is down. Unfortunately the "yellow" status of AWS elastic load balancers in my region have affected this as well -- I'm working to get it up and running in a different region, but I'm also waiting on a response from Amazon.


I tried logging in -- persona sent me email of confirmation, when clicked it redirected to ebookglue which alerted 'login attempt failed'. you might want to look into that.


I'm sorry about the trouble with Persona -- many people have been reporting issues, and it's unfortunately not been as stable as I would have wanted it to be. I'm looking into whether it's my own implementation (though I followed all the recommended best practices quite closely), or an issue with Persona itself.

EDIT: It's a known issue with Persona and third party cookies: https://github.com/mozilla/browserid/issues/1352

Unfortunately, this is a situation where a lot of people on HN probably disable third party cookies (though perhaps a larger percentage of HN readers already have Persona accounts?). Anyway, I'm looking into rolling out my own authentication with email/password soon.


I'd like to try it without logging in at all. The first question everyone will have is if it works on their own site or a site they are familiar with and to see the results. Having to create accounts/login just impedes that process.


I understand -- there's a bit of a problem when trying to prevent abuse, though, and relying on the Persona identity provider makes it easier to prevent abusive behavior upfront, especially since the conversions themselves consume a lot of resources.

I'm still working out the optimal solution, but thanks for the feedback!


In what way does Persona prevent abuse? All you validate is email receipt which can be trivially fudged using mailinator.

It is far better to deal with this sort of thing at the "transaction" level - eg use a captcha or similar for the second and subsequent conversions from a particular IP address.


It's just a rough barrier before I set up something a little better, though I already have logging at the transaction level set up. Even using Mailinator requires a little bit of work, so I figured Persona was a good starting point -- I'll look into adding a "try it out" area on the home page that doesn't require you to sign up though.


Internal server errors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: