Road to RKHS
Published:
Overview
Reproducing Kernel Hilbert Space(RKHS) is related to kernel trick in machine learning and plays a significant role in representation learning. In this post, I will summarize the mathematical definition of RKHS.
Metric Space
First, we define the notion of a metric space. a metric space is defined as following:
Let $S$ be a set, and suppose $d$ is a function defined for all pairs $(x, y)$ of elements from $S$ satisfying:
- Nonnegativity: $d(x, x) = 0$, $\forall x \in S$ and $d(x, y) > 0$ for distinct $x, y \in S$
- Symmetry: $d(x, y) = d(y, x)$, $\forall x, y \in S$
- Triangle Inequality: $d(x, z) \leq d(x, y) + d(y, z)$, $\forall x, y, z \in S$
Such a function $d$ is called a distance function or a metric on $S$. A metric space $S$ is a set $S$ together with a distance metric $d$ on it. A metric space is often written as a pair $(S, d)$. To summarize, a metric space is a set and the distance between it’s elements can be measured by the distance metric $d$. This means $S$ has a topological structure, which enables us to study notions of continuity and convergence.
Linear Space
Now we introduce the notion of a linear space. A set $S$ or elements $x, y, z, …$ is a linear space if the following conditions are satisfied:
- For any two elements $x, y \in S$ there is a uniquely defined a third element $z = x + y$, called their sum, such that:
- $x + y = y + x$
- $x + (y+z) = (x+y) + z$
- there exists an element $0$ having the property that $x + 0 = x$ $\forall x \in S$, and
- for every $x \in S$ there exists an element $-x$ such that $x + (-x) = 0$
- For an arbitrary number $\alpha$ and element $x \in S$ there is defined an element $\alpha x$ such that:
- $\alpha(\beta x) = (\alpha \beta)x$
- $1\cdot x = x$
- The operation of addition and multiplication are related in the following way:
- $(\alpha + \beta)x = \alpha x + \beta x$
- $\alpha(x + y) = \alpha x + \alpha y$
Normed Space
With the definition of linear space, we now define the normed space: A linear space $S$ is said to be normed if to each element $x \in S$ there is made to correspond a nonnegative number $\lVert x \rVert$ which is called the norm of $x$ and such that:
- $\lVert x \rVert$ iff $x = 0$
- $\lVert \alpha x \rVert = \mid \alpha \mid \lVert x \rVert$
- $\lVert x + y \rVert \leq \lVert x \rVert + \lVert y \rVert$
It’s easy to see that a normed space is also a metric space with the distance metric $d(x, y) = \lVert x - y \rVert$.
Banach Space
Equipped with linear space and normed space, we are ready to define a Banach space. But first, we need to define what it means for a space to be complete. Before that, we need to define Cauchy(fundamental) sequence and convergent sequence.
Cauchy Sequence
We call a sequence ${x_n}$ of points of a metric space $S$ a Cauchy(fundamental) sequence if it satisfies the Cauchy criterion: for an arbitrary $\epsilon > 0$ there exists an $N_{\epsilon}$ such that $d(x_{n’}, x_{n’’}) < \epsilon$, $\forall n’, n’’ \geq N_{\epsilon}$.
Convergent Sequence
We call a sequence ${x_n}$ of points of a metric space $S$ a convergent sequence if for an arbitrary $\epsilon > 0$ there exists a number $N_{\epsilon}$ such that $d(x_n, x) < \epsilon$ $\forall n \geq N_{\epsilon}$. We say that $x_n$ converges to $x$. We write $\lim_{n\to \infty} x_n = x$.
Complete Sequence
Now we can define a complete sequence: A metric space $(S, d)$ is said to be complete if every Cauchy sequence in $S$ converges to some element in $S$. That is, every Cauchy sequence in $S$ is a convergent sequence that converges to an element in $S$.
With all those definitions, we can directly define what is a Banach space. A Banach space is a complete normed space. That is, a Banach space contains the limits of all it’s Cauchy sequences.
Hilbert Space
Now we are ready to evolve from Banach space to Hilbert space. A Hilbert space is a Banach space with function called an inner product.
Inner Product
An inner product $(\cdot, \cdot)$ is a real-valued function of pairs of vectors of a linear space satisfying the following conditions:
- $(f, g) = (g, f)$
- $(f_1+f_2, g) = (f_1, g) + (f_2, g)$
- $(\lambda f, g) = \lambda(f, g)$
- $(f, f) > 0$ if $f \neq 0$
A vector space with an inner product is called an inner product space. A inner product space is naturally a normed space because a norm can be induced by the inner product: $\lVert x \rVert = (x, x)^{\frac{1}{2}}$
It’s also obvious that normed space, Banach space and Hilbert space are all metric spaces.
Reproducing Kernel Hilbert Space
Now we are ready to define the Reproducing Kernel Hilbert Space(RKHS). Let $\mathcal{H}$ be a Hilbert space of functions mapping from some non-empty set $\mathcal{X}$ to $\mathcal{R}$(we write it as $\mathcal{H} \subset \mathcal{R}^{\mathcal{X}}$). An interesting property of RKHS is that if two functions $f, g$ are close in the norm of $\mathcal{H}$, then $f(x)$ and $g(x)$ are close $\forall x \in \mathcal{X}$.
Evaluation Functional
Let $\mathcal{H}$ be a Hilbert space of functions $f: \mathcal{X} \to \mathcal{R}$, defined on a non-empty set $\mathcal{X}$. For a fixed $x \in \mathcal{X}$, map $\delta_x: \mathcal{H} \to \mathcal{R}$, $\delta_x: f \to f(x)$ is called the (Dirac) evaluation functional at $x$.
Evaluation functionals are always linear: For $f, g \in \mathcal{H}$ and $\alpha, \beta \in \mathcal{R}$, $\delta_x(\alpha f + \beta g) = (\alpha f + \beta g)(x) = \alpha f(x) + \beta g(x) = \alpha \delta_x(f) + \beta \delta_x(g)$. It turns out for a Hilbert space to be a Reproducing Kernel Hilbert Space, the evaluation functional has to be continuous:
RKHS
A Hilbert space of functions $f: \mathcal{X} \to \mathcal{R}$, defined on a non-empty set $\mathcal{X}$ is said to be a Reproducing Kernel Hilbert Space(RKHS) if the evaluation functional $\delta_x$ is continuous $\forall x \in \mathcal{X}$.
As mentioned above, RKHS has a nice properity: if two functions converge in RKHS norm, then they converge at every point of the evaluation function. Mathematically: if $\lim_{n \to \infty} \lVert f_n - f \rVert = 0$, then $\lim_{n \to \infty}f_n(x) = f(x)$, $\forall x \in \mathcal{X}$.
Reproducing Kernels
As you might have noticed, our definition of RKHS doesn’t include kernels. What is the relation between kernels and RKHS? First, lets define the reproducing kernel: Let $\mathcal{H}$ be a space of $\mathcal{R}$-valued functions defined on a non-empty set $\mathcal{X}$. A function $k: \mathcal{X} \times \mathcal{X} \to \mathcal{R}$ is called a reproducing kernel of $\mathcal{H}$ if it satisfies:
- $\forall x \in \mathcal{X}$, $k(\cdot, x) \in \mathcal{H}$
- $\forall x \in \mathcal{X}$, $\forall f \in \mathcal{H}$, $(f, k(\cdot, x)) = f(x)$ (the reproducing property).
In particular, for any $x, y \in \mathcal{X}$, $k(x, y) = (k(\cdot, x), k(\cdot, y))$.
It turns out, the relationship between reproducing kernel and RKHS is the following:
- If a reproducing kernel exists for a Hilbert space $\mathcal{H}$, the reproducing kernel is unique.
- $\mathcal{H}$ is a RKHS iff $\mathcal{H}$ has a reproducing kernel.
Also, the reproducing kernels are positive definite functions
Now, we are ready to formally define a kernel as a function which can be represented via inner product:
Kernel
Let $\mathcal{X}$ be a non-empty set. The function $k: \mathcal{X} \times \mathcal{X} \to \mathcal{R}$ if there exists a real Hilbert space $\mathcal{H}$ and a map $\phi: \mathcal{X} \to \mathcal{H}$ such that $\forall x, y \in \mathcal{X}$: $k(x, y) = (\phi(x), \phi(y))$.
Such map $\phi: \mathcal{X} \to \mathcal{H}$ is referred to as the feature map and the space $\mathcal{H}$ as the feature space.