Given:

- A random variable \(z\) with PDF \(\pi(z)\).
- A transformation \(x=f(z)\).

We have the inverse transformation \(z=f^{-1}(x)\). The target is to get the PDF of \(x\), denoted as \(p(x)\) of the new random variable \(x\).

The CDF of \(z\), \(F_Z(z)\), is \(\mathbb{P}(Z \leq z)\).

The CDF of \(x\), \(F_X(x)\), is \(\mathbb{P}(X \leq x)\).

Using the transformation \(x=f(z)\), we can write the CDF of \(x\) in terms of \(z\):

\[F_X(x) = \mathbb{P}(X \leq x) = \mathbb{P}(f(Z) \leq x) =\mathbb{P}(Z \leq f^{-1}(x)) =F_Z(f^{-1}(x)) .\]Differentiate \(F_X(x)\) with respect to \(x\) to obtain the PDF \(p(x)\):

\[p(x) = \frac{dF_X(x)}{dx} = \frac{dF_Z(f^{-1}(x))}{dx}\]Using the chain rule for differentiation:

\[\frac{dF_Z(f^{-1}(x))}{dx} = \frac{dF_Z(f^{-1}(x))}{d(f^{-1}(x))} \cdot \frac{d(f^{-1}(x))}{dx} .\]The derivative of the CDF \(F_Z(z)\) with respect to \(z\) is the PDF \(\pi(z)\):

\[\frac{dF_Z(z)}{dz} = \pi(z) .\]Thus:

\[p(x) = \pi(f^{-1}(x)) \cdot \frac{d(f^{-1}(x))}{dx} .\]Alternatively, using \(z = f^{-1}(x)\) and the inverse function theorem:

\[\frac{d(f^{-1}(x))}{dx} = \frac{1}{\frac{df(z)}{dz}|_{z=f^{-1}(x)}} .\]Therefore:

\[p(x) = \pi(f^{-1}(x)) \left| \frac{d}{dx} f^{-1}(x) \right| .\]Given:

- A random vector \(\mathbf{z}\) with joint PDF \(\pi(\mathbf{z})\).
- A transformation \(\mathbf{x} = f(\mathbf{z})\). The inverse transformation can also be written as \(\mathbf{z} = f^{-1}(\mathbf{x})\), and we want to get the joint PDF \(p(\mathbf{x})\) of the new random vector \(\mathbf{x}\).

**1. Jacobian Determinant:**

The transformation \(\mathbf{x} = f(\mathbf{z})\) involves multiple variables, so we need to consider the Jacobian matrix of partial derivatives:

\[J = \frac{\partial(x_1,x_2,\ldots,x_n)}{\partial(z_1,z_2,\ldots,z_n)} = \begin{pmatrix} \frac{\partial x_1}{\partial z_1} & \frac{\partial x_1}{\partial z_2} & \cdots & \frac{\partial x_1}{\partial z_n} \\ \frac{\partial x_2}{\partial z_1} & \frac{\partial x_2}{\partial z_2} & \cdots & \frac{\partial x_2}{\partial z_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial x_n}{\partial z_1} & \frac{\partial x_n}{\partial z_2} & \cdots & \frac{\partial x_n}{\partial z_n} \\ \end{pmatrix} .\]**2. Change of Variables Formula:**

The joint PDF \(p(\mathbf{x})\) is given by:

\[p(\mathbf{x}) = \pi(\mathbf{z}) \cdot |\text{det}(J)| \text{ where } \mathbf{z} = f^{-1}(\mathbf{x}).\]Here, \(|\text{det}(J)|\) is the determinant of the Jacobian matrix.

**Special Case: Linear Transformation**

If \(\mathbf{z} \sim \mathcal{N}(\mu,\sigma^2)\) and the transformation is linear, such as \(\mathbf{x} = a\mathbf{z} + b\), the resulting distribution will also be normal due to the properties of the normal distribution.

For a linear transformation \(\mathbf{x} = a\mathbf{z} + b\):

\[\mathbf{x} \sim \mathcal{N}(a\mu + b,(a\sigma)^2).\]]]>Imagine you’re a detective trying to solve a puzzle. You’re given two sets of clues: one set that helps you identify the culprit based on evidence like fingerprints and eyewitness accounts, and another set that throws you off track with misleading information such as planted alibis and false leads. Contrastive learning operates on a similar principle, but instead of suspects and clues, it deals with data points and similarities.

In essence, contrastive learning involves presenting a model with pairs of data points and teaching it to differentiate between similar pairs (positive samples) and dissimilar pairs (negative samples). By optimizing the model to maximize the similarity between positive pairs and minimize the similarity between negative pairs, the model learns to capture meaningful representations of the data.

In contrastive learning, generating positive pairs is crucial for training the model effectively. Positive pairs consist of data points that are similar or belong to the same class. Creating positive pairs involves various strategies depending on the dataset and the task at hand.

One common method is to use data augmentation techniques to create variations of the same data point. For example, in computer vision tasks, positive pairs can be generated by applying random transformations such as rotation, cropping, flipping, or color jittering to an image. These augmented versions of the original image serve as positive examples, which may still be considered to represent the same object. Another approach is to leverage domain-specific knowledge to identify similarities between data points. For instance, in natural language processing tasks, positive pairs can be created by considering synonyms or paraphrases of the same sentence or phrase. Deep learning models, such as NERF, can also be used for augmentations (create different views of the object); however, this would greatly add to heavy computational costs.

Another crucial aspect of contrastive learning is negative sampling, which involves selecting dissimilar pairs of data points to serve as negative examples during training. Negative sampling plays a pivotal role in guiding the model to focus on relevant features and discard irrelevant ones.

The most basic approach to negative sampling is to randomly select data points from the dataset that are are simply not the anchor point. This is particularly effective in datasets with diverse classes, reducing the likelihood of sampling instances from the same class as the anchor. Another method for negative sampling is to employ hard negative mining or generation, where challenging examples that are close to positive samples are identified or generated based on given rules and used as negative examples. By focusing on challenging examples, the model is forced to learn more robust representations and improve its performance on difficult tasks.

The contrastive loss writes as below:

\[\mathcal{L}_{N} = \mathbb{E}_{\mathbf{X}} \left[ -\log \frac{e^{f_{\text{sim}}(\mathbf{x}_{i}, \mathbf{x}_{j})/\tau}}{\sum_{k=1}^{N} \mathbb{I}_{[k \neq i,j]} e^{f_{\text{sim}}(\mathbf{x}_{i}, \mathbf{x}_{k})/\tau}} \right]\]Where:

- \(\mathcal{L}_{N}\) is the overall contrastive loss.
- \(N\) is the number of samples.
- \(\mathbb{E}_{\mathbf{X}}\) is the expectation operator over the batch of data \(\mathbf{X}\).
- \(\mathbf{x}_{i}\) is an anchor sample.
- \(\mathbf{x}_{j}\) is a positive sample (similar to the anchor).
- \(\mathbf{x}_{k}\) is a negative sample (dissimilar to the anchor).
- \(\tau\) is the temperature parameter to scale the logits.
- \(f_{\text{sim}}(\cdot, \cdot)\) is a similarity metric (e.g., Euclidean distance or cosine similarity).

Here’s a curated list of papers on contrastive learning:

**SimCLR: A Simple Framework for Contrastive Learning of Visual Representations**

Authors: Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton

Venue: ECCV 2020

**MoCo: Momentum Contrast for Unsupervised Visual Representation Learning**

Authors: Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick

Venue: CVPR 2020

**SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments**

Authors: Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin

Venue: NeurIPS 2020

**MoCo v2: Improved Baselines with Momentum Contrastive Learning**

Authors: Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He

Venue: arXiv preprint

**BYOL: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning**

Authors: Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

Venue: NeurIPS 2020

**Barlow Twins: Self-Supervised Learning via Redundancy Reduction**

Authors: Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stéphane Deny

Venue: arXiv preprint

**CLIP: Contrastive Language-Image Pre-training**

Authors: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

Venue: NeurIPS 2021

**SimCSE: Simple Contrastive Learning of Sentence Embeddings**

Authors: Tianyu Gao, Xingcheng Yao, Danqi Chen

Venue: arXiv preprint

**Matrix \(\mathbf{X}\)**: Matrices are represented by bold uppercase letters, such as `$\mathbf{X}$`

, in LaTeX notation. Matrices are fundamental mathematical objects used to represent linear transformations and solve systems of linear equations.

**Vector \(\mathbf{x}\)**: Vectors are represented by bold lowercase letters, such as `$\mathbf{x}$`

, in LaTeX notation. Vectors are commonly used to represent quantities with magnitude and direction, appearing in various fields like physics, engineering, and machine learning.

**Scalar \(x\)**: Scalars are represented by regular letters, such as `$x$`

, in LaTeX notation. Scalars are single numerical values representing quantities like mass, distance, or temperature. Unlike vectors and matrices, scalars have magnitude but no direction.

**display-math expressions**: LaTeX doesn’t officially support `$$`

. Use `amsmath`

package when using `\[`

. Check for more details in: post 1 and post 2.

**Commas ( ,) for Horizontal Concatenation**: In mathematical notation, commas are used to separate vectors when concatenating them horizontally. This means that vectors are aligned side by side to form a row or a single vector.

**Semicolons ( ;) for Vertical Concatenation**: Similarly, semicolons are used to separate vectors when concatenating them vertically. This notation stacks vectors on top of each other to form a column or a matrix.

**Comma spacing \(1{,}000\)**: In LaTeX notation, comma spaces within numerical values can be properly formatted by enclosing the comma in curly braces, like in `$1{,}000$`

. This notation ensures consistent and visually appealing spacing in mathematical expressions or numerical values.

**Spacing ~**: In LaTeX, the tilde symbol `~`

is used to create a non-breaking space. This type of space prevents LaTeX from breaking the line at that point, ensuring that the elements separated by the tilde remain together on the same line. It is commonly used to create fixed spaces between words or symbols that should not be separated, for example `~\cite{}`

.

In LaTeX, the default font for text is often Times, while the default math font is Latin Modern Math. When including figures in a document, it’s essential to maintain consistency with the font choice and size used in the paper.

**Function \(f(\cdot)\)**: In LaTeX, when denoting a function, it’s convention to use the function name in italic format as `$f(\cdot)$`

.