
Introducing the ~=~ operator - JoelJacobson
Is there any function or operator in any language that can unbiased without any constants or subjectivity return a boolean TRUE&#x2F;FALSE if two different noisy sets of integers of the same size are equal?<p>Examples of desired output:<p>[0] ~=~ [0] -&gt; TRUE<p>[0] ~=~ [1] -&gt; FALSE<p>[84765,193] ~=~ [84765,193] -&gt; TRUE<p>[84765,193] ~=~ [84765,32] -&gt; FALSE<p>[1047072,1047216,1047441,1047521,1047682,59102,59361,59583,59818] ~=~ 
[1047085,1047276,1047471,1047754,1047938,59128,59364,59732,59945] -&gt; TRUE<p>[1047072,1047216,1047441,1047521,1047682,59102,59361,59583,59818] ~=~ [1047085,1047276,1017471,1047754,1047938,59128,59364,59732,59945] -&gt; FALSE<p>Rules:<p>* The operator is not allowed to know anything about the nature of integers it receives as input<p>* The operator is not allowed to know anything about the world, it must be immutable and without any constants at all<p>* The operator is not allowed to be in disagreement with humans about what sets of integers are to be considered equal. (Naturally, a stupid human will sometimes make a mistake, so if in disagreement, continue to multiply the group of humans with two and if the disagreement continues forever, the operator is useless and incorrect).<p>I have &quot;discovered&quot; an algorithm which always returns the same boolean value as a sufficiently intelligent human would return, but I&#x27;m sure this algorithm must already exist since it&#x27;s so obvious.<p>Hopefully you hackers can tell me what this operator or algorithm is called so I don&#x27;t have to spend the time to implement it in assembly language, since now I only have a reference implementation in a higher order language.
======
gus_massa
This is in Racket, but it has an arbitrary subjective threshold of 1%: (Online
version:
[http://pasterack.org/pastes/94321](http://pasterack.org/pastes/94321) )

    
    
      #lang racket
      
      (define (~=~? x y)
        (and (= (length x) (length y))
             (for/and ([vx (in-list x)]
                       [vy (in-list y)])
               (define avg (/ (+ (abs vx) (abs vy)) 2))
               (define diff (abs (- vx vy)))
               (< diff (/ avg 100)))))
      
      (~=~? '(1047072 1047216 1047441 1047521 1047682 59102 59361 59583 59818)
            '(1047085 1047276 1047471 1047754 1047938 59128 59364 59732 59945)) ; ==> #t
      
      (~=~? '(1047072 1047216 1047441 1047521 1047682 59102 59361 59583 59818)
            '(1047085 1047276 1017471 1047754 1047938 59128 59364 59732 59945)) ; ==> #f

~~~
JoelJacobson
Then it's disqualified. 1% is just an arbitrary constant and if it's suitable
or not depends on the input. The operator must not have any constants,
otherwise it won't work in the general case.

~~~
nardii
Then how would you define what is "suitable" in the general case?

~~~
JoelJacobson
That depends on the input, and that's what it must be able to tell, unbiased
without any pre-defined constants.

------
JoelJacobson
I've thought a bit more and made a small but important change in how the
"equalish" operator works:

The operator should take three (not two as suggested before) sets of integers
as input of equal sample size.

The first two sets are already known to be equalish TRUE, as defined by either
a human or a previous test of a smaller subset of the integer sets. The third
set is a the new sample you want to compare.

Example output:

equalish('{0}','{0}','{0}') -> TRUE

equalish('{0}','{0}','{1}') -> FALSE

equalish('{110,105,120}','{113,107,121}','{110,106,120}') -> TRUE

equalish('{1047072,1047216,1047441,1047521,1047682,59102,59361,59583,59818}','{1047085,1047276,1047471,1047754,1047938,59128,59364,59732,59945}','{1047085,1047276,1017471,1047754,1047938,59128,59364,59732,59945}')
-> TRUE

~~~
SummerSnow
Which of first or second input are we comparing the third to? If both, then I
don't think you operator makes sense, since that would suggest your ~=~
operator is transitive. Since all inputs are comparable, I assume, that would
mean your operator is an equivalence relation.

Due to transitivity:

A ~=~ B ~=~ C and

C ~=~ E ~=~ F implies that

A ~=~ F which would not necessarily be the case.

For instance:

Let X be a large set with many elements. Then a "reasonable intelligent human"
might suggest that:

(X `union` {1}) ~=~ (X `union` {2})

And also:

(X `union` {2}) ~=~ (X `union` {3})

...... (X `union` {999999999999}) ~=~ (X `union` {1000000000000 })

which implies

(X `union` {1}) ~=~ (X `union` {1000000000000 })

, even if this is actually the case, you expect it not to be for some large
enough number. Then your operator cannot be transitive, but then what use is
the knowledge that input #1 is approximate equal to input #2?

But since you already "discovered" the algorithm, why not just show it, or the
pseudo code, such that we could discover its true name?

------
SFjulie1
You can create an operator that can be made of other operators

cardinal of A intersection B == cardinal of (A) (or cardinal of B : since
A~=~B implies card( A )= card( B) ) This is your test re-expressed by using
math.

But basically the formula I gave is the shortest non factorable form of this
operator in a normal set theory.

And there you have a problem, every truth are biased by a corpus of prejudice
called theory and the physical nature : because of the nature of computer you
cannot represent any given number. There always be a number that can saturate
your computer resources. It is called MAXINT.

Old MAXINT where the size of a register. With Big Int there are the size of
your allocatable computer memory (phys + virt)..... on a distributed grid it
will be the resulting union of resources. And universe being resource bound
there always be a limit to the representation of the integer you can make. So
computers will still fall short of resources to represent ALL numbers.

So the absolute operator you call for does not and cannot exists as long as
computer are resource bound. So there is a map problem here that cannot draw
completly the territory. The playground of physic is bounded, math
abstraction/measures are not.

And your way of interpretating data can be wrong : what tells you the input
number are not modulo something? Like in a % 2 universe : 2 == 0 [2].

Good software are high context software. There is no such thing as immutable
truth and monads is still an elucubration of a monk that ended being burnt by
the church for blasphemy. Not to say he was wrong on saying the earth is
moving (in fact the solar system)... but to say ignoring context can literally
burn you.

PS Noise is defined by the observator. It is measuring the log of the ratio of
non of relevant choices over relevant choices (entropy) and entropy varies
according to what the observer calls relevance of the data.

What is relevant to someone (random numbers in cryptography) maybe irrelevant
to other ones.

------
rcxdude
Define 'noisy' and then what you mean by two sets being 'equal', given that
definition.

------
qzkz
I don't get it. The last two examples in your post compare the exact same
sets, yet the result of the operator is TRUE in the first example, and FALSE
in the second? Or is this a joke that's gone over my head?

~~~
Spikefu
the second sequence in the first example starts: 1047085,1047276,1047471

in the second example it starts: 1047085,1047276,1017471

The 3rd element is different.

------
barrucadu
Why is the last example false? I'm not sure what you mean by equal in this
context, I guess.

------
nabla9
Common Lisp: equalp

~~~
JoelJacobson
No. That's not the one. "Two objects are equalp if they are equal; if they are
characters and satisfy char-equal, which ignores alphabetic case and certain
other attributes of characters; if they are numbers and have the same
numerical value, even if they are of different types; or if they have
components that are all equalp."

