In the previous post I introduced the concept and properities of Reproducing Kernel Hilbert Space(RKHS). In this post, I will introduce how to construct RKHS from a kernel function and build learning machines that can learn invariances encoded in the form of bounded linear functionals on the constructed RKHS.
Reproducing Kernel Hilbert Space(RKHS) is related to kernel trick in machine learning and plays a significant role in representation learning. In this post, I will summarize the mathematical definition of RKHS.
Autoencoder is a type of neural network that learns a lower dimensional latent representation of input data in an unsupervised manner. The learning task is simple: given an input image, the network will try to reconstruct the image at the output. The loss is measured by the distance between two images, e.g., MSE loss.
In the previous post I introduced the concept and properities of Reproducing Kernel Hilbert Space(RKHS). In this post, I will introduce how to construct RKHS from a kernel function and build learning machines that can learn invariances encoded in the form of bounded linear functionals on the constructed RKHS.
Reproducing Kernel Hilbert Space(RKHS) is related to kernel trick in machine learning and plays a significant role in representation learning. In this post, I will summarize the mathematical definition of RKHS.
Support vector machines are in my opinion the best machine learning algorithm. It generalizes well with less risk to overfitting, it scales well to high-dimensional data, and kernel trick makes it possible to efficiently lift the feature space to higher dimensions, and the optimization problem for SVMs are usually quadratic programs that are efficient to solve and has a global minimum. However, there are a few drawbacks to SVMs. The biggest one is that it doesn’t scale well with large dataset and the inference time depends on the size of the training dataset. For a training set of set $m$, where $x \in \mathcal{R}^d$, the time complexity to compute the gram matrix is $\mathcal{O}(m^2 d)$ and the time complexity to for inference is $\mathcal{O}(md)$. This time complexity scales poorly with large training sets.
In kernel SVM, most kernel functions relies on some distance metric. A commonly used distance metric is the Euclidean distance. However in fields like pattern recognition, Euclidean distance is not robust to simple transformations that don’t change the class label: rotation, translation and scaling. For example:
Creating macOS application bundles that appear on your launchpad and can be docked is extremely easy. Once you have finished coding your Python script, change the .py prefix to .command and make it executable:
Autoencoder is a type of neural network that learns a lower dimensional latent representation of input data in an unsupervised manner. The learning task is simple: given an input image, the network will try to reconstruct the image at the output. The loss is measured by the distance between two images, e.g., MSE loss.
Support vector machines are in my opinion the best machine learning algorithm. It generalizes well with less risk to overfitting, it scales well to high-dimensional data, and kernel trick makes it possible to efficiently lift the feature space to higher dimensions, and the optimization problem for SVMs are usually quadratic programs that are efficient to solve and has a global minimum. However, there are a few drawbacks to SVMs. The biggest one is that it doesn’t scale well with large dataset and the inference time depends on the size of the training dataset. For a training set of set $m$, where $x \in \mathcal{R}^d$, the time complexity to compute the gram matrix is $\mathcal{O}(m^2 d)$ and the time complexity to for inference is $\mathcal{O}(md)$. This time complexity scales poorly with large training sets.
In kernel SVM, most kernel functions relies on some distance metric. A commonly used distance metric is the Euclidean distance. However in fields like pattern recognition, Euclidean distance is not robust to simple transformations that don’t change the class label: rotation, translation and scaling. For example:
In the previous post I introduced the concept and properities of Reproducing Kernel Hilbert Space(RKHS). In this post, I will introduce how to construct RKHS from a kernel function and build learning machines that can learn invariances encoded in the form of bounded linear functionals on the constructed RKHS.
Reproducing Kernel Hilbert Space(RKHS) is related to kernel trick in machine learning and plays a significant role in representation learning. In this post, I will summarize the mathematical definition of RKHS.
Creating macOS application bundles that appear on your launchpad and can be docked is extremely easy. Once you have finished coding your Python script, change the .py prefix to .command and make it executable:
Autoencoder is a type of neural network that learns a lower dimensional latent representation of input data in an unsupervised manner. The learning task is simple: given an input image, the network will try to reconstruct the image at the output. The loss is measured by the distance between two images, e.g., MSE loss.
Creating macOS application bundles that appear on your launchpad and can be docked is extremely easy. Once you have finished coding your Python script, change the .py prefix to .command and make it executable:
Support vector machines are in my opinion the best machine learning algorithm. It generalizes well with less risk to overfitting, it scales well to high-dimensional data, and kernel trick makes it possible to efficiently lift the feature space to higher dimensions, and the optimization problem for SVMs are usually quadratic programs that are efficient to solve and has a global minimum. However, there are a few drawbacks to SVMs. The biggest one is that it doesn’t scale well with large dataset and the inference time depends on the size of the training dataset. For a training set of set $m$, where $x \in \mathcal{R}^d$, the time complexity to compute the gram matrix is $\mathcal{O}(m^2 d)$ and the time complexity to for inference is $\mathcal{O}(md)$. This time complexity scales poorly with large training sets.
In kernel SVM, most kernel functions relies on some distance metric. A commonly used distance metric is the Euclidean distance. However in fields like pattern recognition, Euclidean distance is not robust to simple transformations that don’t change the class label: rotation, translation and scaling. For example: