
A theory of the learnable - mpweiher
https://blog.acolyer.org/2018/01/31/a-theory-of-the-learnable/
======
NougatRillettes
Very interesting topic indeed !

Me and two other researchers have published a paper[1] using Valiant's
Probably Approximately Correct learning to learn regulatory Gene networks,
which can be interesting if you want to dig deeper !

If anyone has questions on the topic, feel free to ask, I'll keep an eye on
the thread.

[1]A. Carcano, F. Fages, and S. Soliman, “Probably Approximately Correct
Learning of Regulatory Networks from Time-Series Data,” presented at the
CMSB’17 - 15th International Conference on Computational Methods for Systems
Biology, 2017, vol. Lecture Notes in Computer Science, pp. 74–90.
[https://hal.archives-ouvertes.fr/hal-01519826v2](https://hal.archives-
ouvertes.fr/hal-01519826v2)

~~~
jj12345
Thanks for the paper link! I've been digging into PAC learning so I'm taking
anything I can get.

On the paper, I was hoping you could help me understand a few things: \- It
seems that the main finding is that any k-CNF form, like Thomas' Boolean
Regulatory Network, can be expressed by PAC learning bounds. In section 4 of
the abstract [1], you mentioned that "when the dimension increases... the PAC
learning algorithm can leverage available prior knowledge...". Are you
referring to the time dimension adding more clauses to the k-CNF? \- I'm
having trouble reconciling the PAC term "h" with "model confidence" in section
5.2. Is this allowed because the PAC learning "delta" (probability) [2]
parameter is dropped for the k-CNF adaptation? \- In this concrete case, is
the learning portion just the mapping the stochastic traces to outputs (i.e.
lookups)? I'm missing some understanding on how such a mapping handles
stochasticity.

You'll have to forgive me, as I'm still trying to understand the paper. It's
incredibly interesting to me, so thanks for writing it!

[1] Abstract -
[http://mlsb.cc/2017/abstracts/MLSB_2017_paper_11.pdf](http://mlsb.cc/2017/abstracts/MLSB_2017_paper_11.pdf)

[2] PAC Learning -
[https://en.wikipedia.org/wiki/Probably_approximately_correct...](https://en.wikipedia.org/wiki/Probably_approximately_correct_learning)

~~~
NougatRillettes
Hi ! I'll try to answer your questions as I can :

Unfortunately the final version of the paper is not the one that is on hal --
which is an older one, and I must concede that being new to this whole
publishing world, I don't know exactly how you can have access to it. I'll try
to sort it out and get back to you. In the mean time, you can refer to the hal
version.

    
    
      In section 4 of the abstract [1], you mentioned that "when the dimension increases... the PAC learning algorithm can leverage available prior knowledge...". Are you referring to the time dimension adding more clauses to the k-CNF?
    

I think this sentence is actually referring to what is presented in section
5.3 of the final (and hal version) paper.

    
    
      I'm having trouble reconciling the PAC term "h" with "model confidence" in section 5.2. Is this allowed because the PAC learning "delta" (probability) [2] parameter is dropped for the k-CNF adaptation?
    

I think there are two things here.

First the "delta" you are referring to is indeed taken to be equal to the
"epsilon" and both are what we call "h" in section 2.1.

Second, the idea of the discussion in section 5.2 is the following: let's say
you fix a number A of initial states and simulate B steps of for each of these
states. You will have a given number of (de)activation samples for each
(de)activation function. Then, accross all 2n (de)activation functions, take
the minimum number of samples you got, and call it Lmin. Then using results
from section 2.2, with S=(2n)^(k+1), you can find h so that Lmin=2h(S+log(h)).
This h will be your "model confidence".

However, if we didn't have one "h" but one "delta" and one "epsilon"
(wikipedia notations), I guess we would not have a given value but only a
relation between the two (i.e. defining one would define the other).

I'm afraid I don't get your last point.

I'm new to HN so I hope this answer will be somewhat correctly formatted.

~~~
jj12345
Thanks, these are great responses! Also, I apologize for not fixing my own
formatting sooner, which makes the questions almost impossible to read. You
answered my last question about stochastic traces by explaining the
de/activation function relationships with Lmin.

Thanks again!

------
JohnStrange
Interesting blog entry but I was a bit disappointed not to hear about
Solomonoff induction, as it is often taken to determine quite rigidly what is
in principle learnable by induction. What's the connection of the approach
mentioned in the blog post to Solomonoff induction?

~~~
yters
Isn't Solomonoff induction optimal if the target is learnable?

The connection to Kolmogorov complexity is interesting. It implies a perfect
learner must have infinite KC or increase KC, neither of which an algorithm
can do.

