

Ask HN: How to package this algorithm? - alexdong

I have worked out a content extraction algorithm. Given any web pages, it will extract the contents while throwing away the ads, navigation bar, header, footers, etc.<p>Here is one way to package this algorithm into a useful website.  User will add url links via the website or by importing from a specified del.icio.us tags. The site will extract the contents from these links as a rss feed. User could subscribe to this feed in their mobile news reader.  We're thinking of making this a subscription-based service at $3.99/mo.  The target audiences will be the ones who will download mobile feed readers.<p>Benefits:
1. user could read 'pure content' from their phones, no ads, just the content. 
2. the contents, with all related images, could be pre-downloaded into the feed reader. so whenever you got a minute or two, you could just pull out your phone and read.<p>Questions:
1. What do you think of this idea?  Good? Bad? Dumb?
2. Is the $3.99/mo priced right?<p>Thanks a lot,
Alex
======
fizx
"Wrapper induction" (Do a Google Scholar search for it) has been around for a
while (Kushmerick, 1997). It's used by search engines, etc, quite
successfully, but I don't know of anyone who is able to make a living selling
it standalone.

Independent of that, I don't think it's enough of a value-add for me to pay
for, plus, you're running into copyright concerns left and right.

~~~
alexdong
Well, our first customer, the one who licenses our core algorithm, had their
processing efficiency boosted by 300% + just to use our engine as a pre-
processor.

I'm aware of existing "Wrapper induction" papers. But they still keep lots of
internal navigation links, most of which are navigation bars, etc. Probably
for helping the indexer to use recursive PageRank algorithm to determine the
weight for current page.

I am in the process of building a quick demo page for you guys to see the
effect.

Thanks for commenting.

------
jfornear
Regarding subscription pricing, if you are going to charge and you want to
keep the price under $10/mo, charge _at least_ $5.99/mo because a) you can get
away with it, and b) $3.99/mo could make your product seem cheap. IMO $5.99 is
close to the cost of a fast food meal, which most people don't hesitate when
faced with.

I don't know how much thought you put into the $3.99/mo, but there is a lot of
research out there that supports pricing higher for reasons none other than to
create the perception of value. Interesting topic to look into, either way.

~~~
alexdong
Interesting. Will definitely try an A/B test on the pricing to see whether
that matters.

One question: between $5.99/mo and $30/yr, which one will you recommend?
(There is intense computing power required here, so I'll definitely work out
the equations. )

------
answerly
You will need to be very careful about which sites you allow users to extract
content from as many larger sites have TOS which prevent this type of
activity. You could be opening yourself up to some nasty DMCA take down
notices depending on how you plan on displaying/redistributing the scraped
content.

------
m_eiman
Instapaper.com does this, more or less. Their basic service is free, but they
(he actually, unless he's expanded lately) sell an iPhone app with additional
usability features and offline reading.

It's a really useful thing, but probably a bit shady in the legal area with
ToSes.

~~~
alexdong
Instapaper.com does only the 'read after' thing with the algorithm part we're
offering.

Say you want to read an NYTimes article on your iPhone, sure you've got the
nice Zoom-and-Pan. But wouldn't it be nice if you could read any webpage in
your favorite news reader with only the content, no ads, no header, no footer,
no navigation bar? Only the content.

~~~
m_eiman
I do that all the time with instapaper? Have a look at their "text only"
version of the articles.

~~~
alexdong
@m_eiman. InstaPaper.com does offer similar service, but not good enough.

Besides the small headaches, the text-version does not provide images. What's
even worse is the lack of support other encodings. Try this one in your
instapaper and you'll understand what do I mean:
<http://news.sina.com.cn/w/2008-12-08/170516806326.shtml>

That being said, you do raise a good point: how much shall we charge for the
consumer version? Maybe it should be free while the API should cost some
money.

~~~
m_eiman
Speaking for myself and not for the general population, I'd say that you'll
probably need to go freemium. A free basic service with for-pay extra
features.

Maybe you could do what Instapaper is doing and charge for mobile software and
not the service? Maybe let basic users store five or so pages, and let paying
users keep as many as they want? I tend to put quite a bunch of pages in my
instapaper queue before reading them all at once.

I guess you should spend a day or two to brainstorm a long list of features
and let intended users rank them and tag them as "nice" vs "must have". Put
the must haves in the free version, or most of them, and the nice ones in the
paid category?

I have no idea what the usual way of dividing features between versions is,
but putting the must haves in the free version should help build user base.
And when you've started using it, those nice features might start looking
tastier. On the other hand, if all your needs are taken care of in the free
version you're not as inclined to spend.. Better ask someone who knows this
stuff instead :)

$4 per month is more than I'd be willing to pay, and I consider myself fairly
payment friendly - but I'm more in favour of one time payments than
subscriptions.

~~~
alexdong
m_eiman, yes, yes, this is a very good idea. We're in the middle of putting up
a site so that people like you could play with it.

stay tune.

------
RobGR
I don't know if this would make money as a web service, but I think you are
right to try it -- it could work.

Personally, as a developer, I would be more interested in licensing the
algorithm or code from you. In my case the potential use would be for a
desktop application, that is closed source. Essentially it would be a filter
for a local file search engine, which is part of a larger product. Would you
be interested in this ? It would not be an exclusive arrangement, so you could
continue to persue other revenue methods.

Also, I suggest that you test the algorithm to see if the amount of content
stripped and tossed is a good identifier of spam emails, if the algorithm is
fast enough it may have use for that.

------
lacker
If you copy other peoples' content and host it from your servers, that's
copyright infringement. It doesn't matter if you call it an RSS feed or if you
just copy it exactly.

What you could do is offer people software that downloads web pages but strips
the ads. This could be a neat feature of an RSS reader, or perhaps a plugin to
another RSS reader, but I doubt you will be able to sell this on its own.

~~~
alexdong
Hmm, this makes sense.

Since the algorithm is the core, I would also consider wrapping it as a web
service. Charge on API usage or license the technology.

The problem is, I don't have that much experience doing that. How can I find
the target audiences? Also, since the algorithm is quite CPU intensive, how
should I price the service? Add a markup to the average computing resources
will cost?

I'm trying to put a demo page so that you guys can try it out. Stay tuned.

------
andrewljohnson
Sounds like you're going to be violating some Terms of Service that users of a
software like adBlock skirt because they're just private citizens.

Verdict: if your software is good, you'll get your pants sued off. But, I
assume since you didn't post a link to a demo, you're a business guy with an
idea and not much code. So I doubt you'll have many users, much less legal
problems!

~~~
alexdong
Well, this is an algorithm I wrote for a vertical search engine. I retain the
rights to reuse it in other products I'll build.

I'm definitely not a lawyer. As for the ToS violation concerns you raised,
since I'm not redistributing the content, am I still to be sued? The user just
saves some excerpts for later reading. No?

Alex

PS. 'I assume since you didn't post a link to a demo, you're a business guy
with an idea and not much code.'. Will you call Joel Spolsky a business guy?
:-)

~~~
lacker
From your description it sounds like you _are_ redistributing the content:

    
    
      The site will extract the contents from these links as a rss feed. User could subscribe to this feed in their mobile news reader.
    

Having your website turn another person's website into an rss feed _is_
redistributing the content.

~~~
thomaspaine
Isn't this exactly what dapper.net does? I've used it myself to create feeds
for websites that are stuck in 1996 and haven't moved on to RSS yet.

~~~
lacker
I'm not very familiar with dapper.net, but it seems to be saying that content
owners can use dapper to distribute their content, rather than allowing random
people to strip ads out of anything. For example when I search for [colbert]
on dapper.net it has little icons saying permission of the copyright owner is
needed.

------
markessien
The hardest sell to make is a monthly subscription service. It's easier to
sell yearly or one-time.

