
Cross Entropy - pandeykartikey
https://pandeykartikey.github.io/machine/learning/basics/2018/05/22/cross-entropy.html
======
patall
What I always wondered about is that this generally assumes that classes are
assigned with an equal (human level) error probability. While this is
certainly the case in heavily curated example dataset, many real world
scenarios only consist of a considerably labeled positive set while the
negative set is often drawn randomly from the background. Is there anything on
how this can be taken into account (Besides weighting, obviously)?

~~~
pandeykartikey
I too have been thinking about this problem, but I have yet to come across a
viable solution.

------
banachtarski
"Cross-entropy is defined as the difference between the following two
probability distributions"

Huh? No this is a mathematically imprecise statement (and not correct either).
Most explanations use references to information theory, where a perfect
knowledge of the desired probability distribution leads to a perfect
allocation of bits in a binary encoding. The entropy is the expected number of
bits when this allocation is done using the _incorrect_ distribution, and
obviously the goal is to minimize this, hence why it is suitable for use as a
loss function.

~~~
tziki
> The entropy is the expected number of bits when this allocation is done
> using the incorrect distribution

Is there any source that would derive and/or explain this more in-depth? I've
been trying to develop an intuition for this, but haven't come across a good
explanation.

~~~
banachtarski
The other reply mentioning "kullback-leibler divergence" (aka KL divergence)
is what you need to understand as this is the fundamental concept. Minimizing
this quantity is equivalent to minimizing the given "cross-entropy loss"
expression. More generally to understand where this comes from, you'll want to
read about information theory.

------
pixelpoet
Aside: I'm always surprised how few people notice that e.g. "cos" is rendered
differently to "\cos" in TeX; for a discipline largely characterised by
attention to detail, almost no programmers seem to notice.

~~~
thanatropism
I'm always surprised when people use ^T for transpose. Use ^\top, people.

------
dang
Please do not put "Show HN" on blog posts. This is in the rules:
[https://news.ycombinator.com/showhn.html](https://news.ycombinator.com/showhn.html).

------
jdonaldson
Why link to the disqus thread?

------
vedanshbhartia
Nice read! Looking forward to more blogs.

