Core Idea

Conceptual similarity between images can be quantified by the complexity of the natural language descriptions required to distinguish the images.

Description	Conceptual Similarity
Shorter descriptions that apply well to both images	High
Longer descriptions that only fit one image well and not the other	Low

Optimization Problem

Original Optimization Problem

The optimization problem is written as:

$$ h^\ast=\arg\min_{h\in H}l(x\vert h)\quad s.t. \quad I(h)\le C $$

where:

$h$ is a possible description for an image $x$.
$l(x\vert h)=-\log p(x\vert h)$ is the loss function.
- $p(x\vert h)$ is the likelihood of $x$ given $h$, i.e., the probability that $x$ is explained by $h$.
- A higher/lower likelihood represents that $h$ is a good/poor description of $x$.
$I(h)=-\log p_{code}(h)$ is the complexity (or the coding length) of $h$, and $C$ is upper-bound on the complexity.
- Since simpler $h$ is more possible, higher probability $p_{code}(h)$ indicates simpler $h$. This negative relationship is expressed as $I(h)$.

Stochastic Relaxation

Since $H$ is discrete, the optimization problem cannot be solved using gradient descent. Instead of directly finding a single description $h$, its probability distribution $q(h)$ is used. Naturally, the loss function $l(x\vert h)$ is replaced by the average loss over $q$, denoted as $E_q[l(x\vert h)]$.

Now the optimization problem is modified as:

$$ q^\ast(h)=\arg\min_{q\in \mathcal Q}E_q[l(x\vert h)]\quad s.t. \quad KL(q(h)\Vert p_{code}(h))\le C $$

Note that the relaxed constraint $KL(q(h)\Vert p_{code}(h))\le C$ replaces the strict constraint $-\log p_{code}(h)\le C$:

KL divergence of $q$ from $p_{code}$ measures the difference of $q$ from $p_{code}$.
By making $q(h)$ similar to $p_{code}(h)$, we ensure that the probability distribution $q(h)$ does not place too much weight on complex hypotheses.

Conceptual Distance Function

For two images $x_1$ and $x_2$, the conceptual distance is:

$$ d=\frac{\Delta_{2\to1}+\Delta_{1\to2}}{2} $$

where:

$$ \Delta_{2\to1}:=E_{q_1}[l(x_1\vert h)]-E_{q_2}[l(x_1\vert h)] \\ \Delta_{1\to2}:=E_{q_2}[l(x_2\vert h)]-E_{q_1}[l(x_2\vert h)] $$

$E_{q}[l(x\vert h)]$ represents the average loss when using the description distribution $q$ to describe an image $x$. Small value indicates that the image description fits well.

Example:

If the word of $q_1$ and $q_2$ exchanges, $\Delta$ and $d$ will have positive value.

AUC

To summarize the conceptual distance between two images, the paper suggests calculating the Area Under Curve (AUC) of the distance function across all complexity levels $C$.

The AUC provides a single number that reflects the overall difference between the two images at all levels of description complexity.

AUC	Representation
Small	The images are conceptually similar and can be described with similar descriptions, even as the complexity increases.
Large	The images are conceptually distant, and more detailed descriptions are needed to distinguish between them.

저작자표시 비영리 변경금지

mathnotes

Review: CC:DAE

Core Idea

Optimization Problem

Original Optimization Problem

Stochastic Relaxation

Conceptual Distance Function

Example:

AUC

티스토리툴바

Review: CC:DAE

MathJax = { tex: {inlineMath: [['$', '$']]} };

Core Idea

Optimization Problem

Original Optimization Problem

Stochastic Relaxation

Conceptual Distance Function

Example:

AUC

티스토리툴바