

Ask HN: how would you graph news comments and/or website crosslinking? - anigbrowl

I've been wondering for a while if there's a way to analyze and graph comment spam, on blogs or particularly on news websites.<p>Not the 'great deals on shoes http...' variety, but more astro-turfing, where a small group of people who are fanatical about a certain issue leave comments on any relevant news story/discussion forum they find. Often they simply pasting the same comment over and over (even on multiple stories at the same site) until they get banned, or they might post the same generic comment on lots of different news sites but change it up each day so their contributions on any individual site look varied and unique.<p>The task is quite similar to plagiarism detection or lexical analysis, but I'm not aware of good tools for doing that on a really wide scale. I'm looking for similarities both across the universe of news websites and blogs, and over time, especially in response to specific news events where an issue yields a lot of coverage and the astro-turfers go into overtime.<p>In a related task, there are a relatively small number of websites focused on an issue, and there would be a great deal overlap between them, they would be constantly citing each other and so on, partly to ensure that when a mainstream news outlet covers 'their' issue one or other of them is sure to be quoted as an authoritative source. Of course, there would be a great deal of outward linking to various news stories from articles/forums on these subject-specific websites.<p>This is a bit like recursively graphing blog track/pingbacks, or retweets, or citations in academic papers or legal judgments. Unfortunately most of the analytic tools I've looked at either do geomaps or just graph statistical aggregates as bars. What I'm really interested in is looking at the patterns of propagation - both to see how they evolve, and to identify major vectors leading back to sources. This seems like something best handled by De Bruijn or directed graphs, but my knowledge of graph theory and related tools is pretty limited.<p>I'm interested in the content of the subject material rather than product development, so tools/services &#62; APIs/frameworks &#62; theory/research - although I'd be grateful for any and all suggestions. Thanks for your interest.
======
anigbrowl
PS - A more limited but equally handy thing would be suggestions for tools to
graph individual websites. For example, imagine visualizing HN or the comments
on an individual HN thread in tree (or other?) form without the text, though
allowing access to the text via clicking on individual nodes. I'm interested
in studying the structure and temporal development of the thread in
particular.

