

Ask HN: I may have been spun. Is there anything I can do about it? - spokey

So in the shadier parts of the SEO world there's this concept called "spinning" where one takes a single article and programatically creates hundreds of small variations of it, largely by substituting words and phrases with synonymous terms. This is meant to work around Google's duplicate content detection algorithm).  It isn't hard for a person to notice the similarities between the documents, but it is subtle enough (or computationally difficult enough) that search engines seem to be fooled.<p>I think that one of my competitors has taken some article content from my site, "spun" it (possibly by hand) and reposted it on one of those post-your-content-with-backlinks sites you sometimes find in search results.  (Except they posted it with backlinks to their site, of course.)<p>If this was a direct, verbatim copy I'd contact the hosting site to notify them of the copyright violation.<p>But that's not what this is, it's more like a section by section, sentence by sentence paraphrasing of my content, with some other slight modifications (which is why I think it may have been done by hand, but I'm not familiar enough with spinning to know what's normal and what's not).  I'm pretty sure that my content was the source document for this, since there are some unusual phrases that carried over to the new document, and the structure of many of the sections, paragraphs and sentences of the new document are identical to some on my site.<p>Both the site on which this content was posted and the site to which it is linking are legitimate business that I assume would respond to something like a DMCA take-down notice.<p>I'm well aware of the degree, nature and much of the mechanics of plagiarism on the web, and while I'm annoyed (possibly flattered) that it seems a competitor more or less plagiarized content from my site, but my bigger concern is this is the tip of the iceberg and I'm about to see dozens of close copies of my content floating around the web.<p>Is there anything I can or should do about this?
======
roel_v
A derived work is just as much protected by copyright as the original is. If
you translate a book, you cannot distribute that translation without the
original author's consent. So the circumstance that they didn't republish your
article verbatim, but in a modified way, is of little relevance. So you can do
the same things you would if it was a verbatim copy.

------
Sujan
All the people saying it is copyright violation are probably right.

But, I don't think it's worth the effort and money (perhaps you will have to
get a lawyer...). If at all, just contact the site where they posted it and
report the copyright violation. Describe what you think what happened and wait
what (and if) they answer. Most of the time, they won't bother and just remove
the text.

Much better use of your time is to write another, even better text for your
website.

~~~
olefoo
All the people saying it is a copyright violation are in fact wrong.

Copyright protects a specific expression of an idea or group of ideas. It does
not address the use of a copyrighted work as a template for generating other
copyrighted works that use alternate expressions.

If you create a copyrighted work that consists of the phrase "The rain in
Spain falls mostly on the plain." and someone comes along and very carefully
creates a work of parallel intent that includes the phrase "In Spain,
precipitation occurs mostly on the lowland prairies."; they are NOT violating
your copyright. What they are doing may be sleazy, dishonest and lazy, but it
isn't legally actionable.

This is actually a fairly deep topic, our legal system is constrained to
working with tangible expressions and cannot identify the similarity between
two expressions of an idea that share the same semantic structure (meaning)
and yet have completely different linguistic surface (text).

Now if you can show that their works were mechanically derived from yours,
that may be a different kettle of fish.

~~~
roel_v
Really? So how do you align that with
e.g.<http://en.wikipedia.org/wiki/Derivative_work> or the 'without prejudice'
clause in art. 2.3 of the Berne Convention?

~~~
olefoo
A mere claim that something is a deriviative work doesn't make it so; you'd
have to show priority, connection and derivation. All of which are trickier
than you'd think. If a company sends out a press release and two bloggers
write stories about that company that are substantively similar but not word
for word identical and one of them accused the other of copying, that would be
a difficult case to prove.

Generally speaking a derivative work implies the overt appropriation of
elements of the original; for instance if you write a novel about the rich
inner life of Ernst Stavro Blofeld, that would be derivative of Ian Fleming's
body of work. If however, you write a novel about a spy with a sex addiction
problem who works for the Dutch Secret service and is blond and you very
carefully avoid duplicating any precise plot points from any of the James Bond
novels, it may be derivative in the sense that it's obvious you were emulating
Ian Fleming, but it would not legally be a derived work.

~~~
roel_v
> A mere claim that something is a deriviative work doesn't make it so

Obviously. But there is no way to tell from the OP's story. The way the OP
presented the facts, the interpretation is that it's a derivative work.
Someone took his text and shuffled it around just enough to make it not word
for word identical. How the dice will roll in court is always tricky to say,
and impossible in a case where you only have a general description of the
facts, and only from one party at that.

The point is that a derivative work is protected just like the original. The
evidence is a whole other matter, one that nobody, in this particular case,
can judge from the information given.

------
rlpb
> If this was a direct, verbatim copy I'd contact the hosting site to notify
> them of the copyright violation.

From your description it is a derivative work. This is still a copyright
violation. I can't comment on what you should do, though.

------
URSpider94
I think there may be some confusion here. If the person truly and skillfully
paraphrased your writing, to the point that there are no duplicate sentences
between the two documents, it would be hard to argue that it is a derivative
work. They have effectively created an entirely new document espousing all of
the same ideas as yours, and on which you would not have any copyright claims.

~~~
spokey
> They have effectively created an entirely new document espousing all of the
> same ideas as yours,

IMO, that describes the situation well

> and on which you would not have any copyright claims.

That is also my understanding, unless I could somehow prove that they violated
my copyright in the construction of this work. For instance, if one could
prove that my copyrighted content was fed into a spinning algorithm to
generate this new document, I think you could argue that it is in fact a
derivative work.

But it's kind of an academic question anyway, as it would be very difficult to
prove that in the first place, and likely not worth the time and trouble if
you did. It may even require new legal precedent to win that sort of case, but
I'm not that familiar with the entire scope of copyright law.

Just as an intellectual exercise, suppose I fed one of the Harry Potter books
into a spinning algorithm to come up with a book about Larry Kotter, a pupil
at the Cowpimple Academy of Sorcery. If my new book is just substituting
synonymous terms in the original work, I'm pretty sure I'd lose a copyright
claim. But how different would it need to be to become legal? To become
undetectable? Suppose I was synthesizing multiple works for my spun article?
How sophisticated would my spinning algorithm need to be before it really was
creating new works?

~~~
roel_v
First, you need to stop thinking in terms of 'algorithms' and 'spinning'. It's
irrelevant and confuses the discussion. I understand how tempting it is to
frame new concepts (law) in terms of things you understand (computers) but
it's very dangerous territory. It's also (this is not to bash you personally,
just a general remark) why 95% of legal chatter amongst technical people on
the internet is meaningless to lawyers ('meaningless' as in 'jibberish', 'flux
capacitor'-style nonsense) - because the fundamental assumptions and reasoning
methods are so vastly different that it's almost impossible to reconcile the
two in the limited (in terms of expressiveness) environment of forum posts.
FWIW I've been a programmer for over 10 years and hope to finish my law degree
in a few months, so I have some inside experience on the two worlds, and in
how the two interact (or rather, seem not to be able to interact).

Furthermore the question whether something is derivative does not depend on
the form of representation, or the amount of similarity (on the textual level)
between two works. If I write a theater play based on a book, I may not use a
single sentence from the book; it's still a derivative work.

'Derivative' is casuistic. It's meaningless to argue about 'levels of
similarity', in a legal context. The question is whether the second author
based himself on the artistic values that the original author put in the work.
The originality of a work is not necessarily in the wording, it's in the
creative effort that went into the work. So if someone blends two works into
one (without permission), he's violating 2 author's copyrights.

~~~
dejb
> First, you need to stop thinking in terms of 'algorithms' and 'spinning'.
> It's irrelevant and confuses the discussion.

This is HN, not a court of law. It is entirely appropriate to use technical
language here. I think if you understood what 'spinning' was you'd see that it
was relevant to the legal situation even if these exact words may never be
used in a legal action (although the very pre-existence of concrete spinning
programs would actually be relevant as evidence that a deliberate process of
copying had occurred).

> why 95% of legal chatter amongst technical people on the internet is
> meaningless to lawyers

I'd argue that a similar percentage of lawyers' technical discussions are
meaningless to the reality of technology. As we move forward I'll wager that
the lawyers will be the ones needing to do a greater percentage of the
adjusting if they are to remain relevant.

~~~
roel_v
> It is entirely appropriate to use technical language here.

Of course. I'm not telling anyone what to do. What I meant was, that to
understand the legal issues, the explanations in these terms are irrelevant
and obfuscate the thought processes that lead to the legal understanding of
the situation. I understand perfectly what spinning is. But 'algorithmic
spinning' _is_ irrelevant in this context. It's not about the level or nature
of mechanical changes. One shouldn't think about legal issues in quantitative
ways. It's orthogonal to the system.

> As we move forward I'll wager that the lawyers will be the ones needing to
> do a greater percentage of the adjusting if they are to remain relevant.

Thanks for making my point for me. It's this technocratically warped world
view that lies at the basis of the bigger part of misunderstandings about the
legal field and how it relates to technology. The technical details are seldom
relevant in most technology legal issues. or at least in a very different way
than technologically oriented people look at it.

~~~
dejb
> One shouldn't think about legal issues in quantitative ways.

What about evidence? Presumably the copyright 'deriver' isn't just going to
honestly detail how they stole the work. Surely an important part of the
evidence is going to be what process was used to derive the work? If it can be
established that an existing software package would produce an identical
derivation surely that would be an important part of the evidence. Without
understanding the processes of transformation (i.e. the spinning process) it
may not be possible to accurately gauge the likelihood of the work having been
derived versus a coincidence. This all seems to be firmly in the realm of
quantities to me.

> It's this technocratically warped world view that lies at the basis of the
> bigger part of misunderstandings about the legal field and how it relates to
> technology.

I think the 'technocratically warped world view' is based on the notion that
the function of the legal system is ultimately to serve society - not the
other way around. Given that technological change is the major driver of
social change it will necessarily also drive changes in the legal system.
Situations where ancient laws and ignorant judges determine cases represent
failures of the legal system and the more it fails the less relevant and
powerful it must become for society to flourish. Inevitably it will be the
legal system that needs to 'warp' itself towards the technocratic one rather
than the other way around. Of course what I'm talking about is a medium/long
term trend. Anyone expecting this to happen in any individual legal case would
be making a big mistake.

------
mbyrne
Did you invent the term "spinning" or are you copying it from someone else?
Your explanation of it is very similar to some other text I have read, with
just some different paraphrasing...

Big Picture: You don't have a copyright claim. (I am basing this _definitive_
verdict on your complete failure to supply us with any facts.) If this "issue"
has taken more than 30 seconds of your time, I predict mediocrity or failure
for your business. You can't afford to waste your time on it. Seriously.

------
davidw
As an amateur onlooker, it'd be useful to see both pages for comparison,
although I can't say I know anything about 'spinning'. I'm curious what the
result looks like, though.

------
hcho
What's the damage in monetary terms? Is it worth spending your time and money
to get the spun article taken down?

~~~
spokey
I don't know exactly. What's a tiny bit of link juice worth? What's a tiny
edge in Google ranking worth?

If it is simply sending an email, then, sure I'll spend two minutes doing
that. If I need to engage a lawyer, no it's probably not worth that.

But again, my bigger concern is that if this content is now part of a spinning
database then it's not one spun article, it's hundreds.

This hasn't happened yet, so maybe this just borrowing trouble I don't yet
have, but even as an academic legal question, suppose this sort of
sophisticated plagiarism happens to your content on a broad scale: can
anything be done about it?

~~~
hcho
I doubt a single email would work. The content farm would actaully hold on the
said article, hoping that the issue escalates to a public flame war and they
get links from the blogosphere. Bad publicity is still publicity for those
operations.

------
CatalystFactory
Without reading your article and the "spun" article it would be hard to tell
whether or not your copyright has been infringed. Could you post the links?

Perhaps, open up a channel of communication with the "legitimate businesses"
that run the site instead of sending them a DMCA notice.

For more on spinning, I found this article helpful:
[http://www.plagiarismtoday.com/2009/06/16/spinning-
spamming-...](http://www.plagiarismtoday.com/2009/06/16/spinning-spamming-and-
twitter/)

~~~
spokey
I'd rather not promote this content by linking to it.

Thank you for that link, it's a great overview.

------
speleding
A lot of my competitors are using spin articles too (although I don't think
they are using my content as the basis). The main problem seems to be that it
appears to work, I can see them rise in the organic search results.

I wish there were some way to report this to Google, it would improve the user
experience and save everybody a lot of work. Until then it's very tempting to
join in on it.

------
skinnymuch
Going to reiterate what some people have said here already. Chances are you
can't do anything. This is the internet after all. There's a small chance the
person will take down the content just from a polite email from you. Assuming
this won't work the only other course of action is spending $$. This route
will be a money sink for you so the only reason to do this is ego or you just
have money to burn.

------
paolomaffei
You can't do pretty much anything. Welcome to the internet. Just get over it.

By the way, you can also translate from english to another language and then
back to english. Now just fix broken grammar and you might have what you
called a spun article.

------
sgmurphy
I found this short video from a lawyer to be very helpful in understanding the
different options you have when pursuing copyright violation prosecution:
<http://houchinlaw.com/?p=660>

------
ohashi
spin your content and use it for links first.

~~~
nailer
Please don't. Same content from multiple domain names pointing back to the
original is against Google ToS, and you might lose your PageRank unless you're
Jason Calacanis in which case it's fine.

~~~
spokey
Believe me, I won't, but note that the "same content from multiple domains"
problem is precisely what spinning sets out to solve. It's not the same
content, certainly not at a superficial level.

~~~
SpoonMeiser
s/solve/cause/

------
hotmind
A lot of spinning going on here. Your competitor is spinning your content for
his gains (not cool), and you are spinning your wheels wondering what you
should do.

There is no better way to put a competitor on the defensive than by
innovating. So innovate. Stop wasting your energy on this guy and do something
awesome that will clearly differentiate your business from his.

This competitor isn't worth worrying about if he's doing weaksauce tactics
like stealing your content.

------
eof
I thought someone dosed you with LSD and you were looking for the antidote.

------
blueberry
I don't know about copyright laws and I am not saying copyright is unimportant
but from your post, I don't think your site got hurt in anyway. Aside from
giving your competitors' a free lunch, why do you care? Especially if your
site is the first one when people search for the title or keywords of your
article. If you are still #1 on google search and did not have a drop in the #
of visits, you might as well thank the competitors. Otherwise, I agree it's
unfair.

