Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you consider tokenizing the inputs and comparing those? Based on Levenshtein distance alone you're basically saying that "++" and "!=" are twice as important as "=" or "*", which doesn't seem right to me.

Next question: How are you picking (x,y) coordinates for the graph? You've explained how you determine the connectivity, but the positioning is a bit unclear -- edges with the same score often have quite different lengths.



I did consider tokenising the inputs, and probably will. The only reason not to have done so yet was that this was a no-brainer in terms of getting something working just to see if produced something useful.

I'm using neato for the layout. Graph layout is hard, and in some cases unsolved. I'm using this for rough visualisation, then I'll write code to find true clusters.


I'm using neato for the layout.

Ok, so you're using all the pairwise distances for computing the layout, even though you're only showing the tree edges on the graph?


No, I'm generating a tree.

Put every node in its own component. Find the shortest edge that joins two components, emit that edge, merge the components. Lather, Rinse, Repeat.

Also, braces penalise you twice. Code that is identical except that one includes, the other excludes, a pair of braces are distance 2 apart. There is some reason to say they should have distance 0. Fully parenthesised code, and then ignore the close (or open) brace would fix that.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: