
Esh – Statistical Similarity of Binaries - mrswag
http://www.binsim.com/
======
gravypod
I was thinking of creating a system like this to find duplicate code in a
programming project.

This way, programmers would know how to correctly abstract all of their code.

~~~
Waterluvian
I bet an analysis like that will find a ton of code that shouldn't be
abstracted. Maybe at some compilation level more optimization can be done. But
perfectly optimized code is likely much less maintainable. Nevertheless, it
would be very revealing and educational.

~~~
gravypod
Is there any reason why something shouldn't be abstracted?

At least in a functional or procedural realm, I don't think that this would be
a detriment.

~~~
Ace17
Structural equivalence doesn't necessarily means there's some missed
abstraction:

struct Point { int x, y; }

struct Fraction { int num, denom; }

Would it be desirable to replace both above definitions by the following
"abstraction"?

struct StructWithTwoIntegers { int first, second; }

~~~
gravypod
I'd just call it a union. Since it's a union of two integers.

Than name of the variables should tell you about what the data means.

The data structure should tell you how it's organized in memory.

Edit: You can also abstract the data type from the union. You can have unions
of other types as well so why have 30+ union types. You can also have unions
of more then a single pair.

Also I feel that rather then having the data structure represent the meaning,
variable-based meaning is the best way to go as it will lead to clearer names.

    
    
       union_t<int> location_coordinate = {
          x = 0;
          y = 0;
       };
    

This would be accessed by:

    
    
       location_coordinate.x; // == x
       location_coordinate.y; // == y
    

Rather then

    
    
       location_t position = ...;
    

I personally like longer variable names as it makes it explicitly easy to see
exactly how it's expected to behave.

~~~
Ace17
You just moved the problem.

Suppose you have two "add" functions: one for fractions, one for points:

    
    
      Fraction add(Fraction a, Fraction b)
      {
        Fraction result;
        result.num = a.num * b.denom + b.num * a.denom;
        result.denom = a.denom * b.denom;
        return result;
      }
    
      Point add(Point a, Point b)
      {
        Point result;
        result.x = a.x + b.x;
        result.y = a.y + b.y;
        return result;
      }
    
    

How would you write this in C++?

Obviously the code below won't do:

    
    
      union_t<int> add_fractions(union_t<int> a, union_t<int> b);
      union_t<int> add_points(union_t<int> a, union_t<int> b);
    

Because you now just lost type safety (i.e you can add fractions with points).
To get back strong type safety, you could write:

    
    
      struct Fraction : public union_t<int> {};
      struct Point : public union_t<int> {};
    

At this point, you're mostly back to square one, except that Fraction users
must use members named "x" and "y" ... (and I'm not even talking about
operator overloading)

(By the way, "union" is an unfortunate choice of words, as the language
already defines it to mean something completetly different. Maybe "pair" is
more appropriate?).

