Introduction

The "Curse of Dimensionality" is a complex phenomenon that affects the performance of machine learning and data mining algorithms when faced with high-dimensional datasets. This concept is essential for understanding the challenges faced by data scientists and machine learning engineers when working with large datasets.

What is Dimensionality ?

The dimensionality of a dataset refers to the number of features or variables that compose it. For example, in a two-dimensional dataset, each data point is defined by two variables (or dimensions). As the number of dimensions increases, the space in which the data resides becomes vast and complex.

The Curse of Dimensionality

The Curse of Dimensionality describes the negative phenomena that occur when the number of dimensions in a dataset becomes very high. These phenomena include increased data sparsity, increased distance between data points, and increased complexity of models needed to effectively process them.

Consequences and Challenges

The Curse of Dimensionality presents several important consequences and challenges for practical applications in data science:

It can lead to increased computation time required for model training.

It may require much larger datasets to achieve reliable modeling performance.

It can result in poor model generalization, due to overfitting or underfitting to the data.

Understanding these challenges is crucial for developing effective solutions and data preprocessing techniques to overcome the Curse of Dimensionality and improve the performance of machine learning algorithms.

Interactive Demo

Below, you will find an interactive demonstration to visualize concepts related to the Curse of Dimensionality. Here's an explanation of how it works:

1. Number of Dimensions:

The number of dimensions of the hypercube is defined using the "Dimension" slider. Thus, the number of vertices of the hypercube will be 2^dim vertices. Note that, due to display issues, only a certain number of outer vertices will be sampled, so the number of samples will be << 2^dim. See the Resources section for more details.

2. Number of Samples:

The number of samples displayed is also defined by entering it in the "Number of Samples" bar, which determines the number of vertex samples generated in the space between the circumscribed and inscribed circles (starting with the outermost vertices). Note that the number of samples must be << 2^dim.

3. Vector w:

The vector w, from which the projection plane by Gram-Schmidt will be constructed, can be selected using the "Choose the Vector W" section. See the Resources section for more details.

4. Convex Hull:

The "Convex Hull" option allows you to choose whether or not to draw the convex hull of the vertices of the cube projected onto the 2D plane.

5. Histogram of Norms:

A histogram is also generated to display the distribution of the norms of the vertices as well as those of random points. You can adjust the number of bars in the histogram using the "Number of Bars" slider.

______________________________________________

Dimension: Number of samples:

Choose vector W:

Random generation Base vector (1,0,...0) Vector full of 1 (1,1,...1) Fill normal vector

Convex hull:

Yes No

Number of bars 100 Bars

Note: If your number of dimensions is too high, generation may never complete due to algorithm complexity.

Resources

Hypercube Projection

Random Points Generation

We randomly sample \( N \) points \( x_i \) inside \( C \) : \( x_i = 2 \times \text{rand}() - 1 \) (as \(\text{rand}() \in [0,1]\)) \( i = 1, \ldots, N \) \( j = 1, \ldots, D \) (for each dimension of \( C \) we have a random coordinate of point \( x_i \))

Plane Construction

We define the plane by its normal vector \( \mathbf{w} \) (by default randomly chosen: \( w_j = \text{rand()} \) with \( j = 1, \ldots, D \)). Then we construct the projection plane using the Gram-Schmidt process (algorithm used in linear algebra to find an orthogonal base from a set of linearly independent vectors) :

Normalization of \( \mathbf{w} \) : \( \mathbf{w} = \frac{\mathbf{w}}{\| \mathbf{w} \|} \)
Initialization of vectors \( \mathbf{e}_1 \) and \( \mathbf{e}_2 \) : \( e_{1j} = \text{rand()} \) ; \( e_{2j} = \text{rand()} \) with \( j = 1, \ldots, D \)
Gram-Schmidt :
- \( \mathbf{e}_1 = \mathbf{e}_1 - \langle \mathbf{e}_1, \mathbf{w} \rangle \mathbf{w} \) (for \( \mathbf{e}_1 \perp \mathbf{w} \))
- \( \mathbf{e}_1 = \frac{\mathbf{e}_1}{\| \mathbf{e}_1 \|} \)
- \( \mathbf{e}_2 = \mathbf{e}_2 - \langle \mathbf{e}_2, \mathbf{w} \rangle \mathbf{w} \) (for \( \mathbf{e}_2 \perp \mathbf{w} \))
- \( \mathbf{e}_2 = \mathbf{e}_2 - \langle \mathbf{e}_2, \mathbf{e}_1 \rangle \mathbf{e}_1 \) (for \( \mathbf{e}_2 \perp \mathbf{e}_1 \))
- \( \mathbf{e}_2 = \frac{\mathbf{e}_2}{\| \mathbf{e}_2 \|} \)

Thus, we obtain the coordinates of each point \( \mathbf{y}_i \) projection of \( \mathbf{x}_i \) onto the plane (\( \mathbf{e}_1, \mathbf{e}_2 \)) : \( y_{i1} = \langle \mathbf{x}_i, \mathbf{e}_1 \rangle \) \( y_{i2} = \langle \mathbf{x}_i, \mathbf{e}_2 \rangle \)

Cube Corners

A corner of the hypercube is a vector \( \mathbf{a}_i = [\alpha_1, \ldots, \alpha_D] \) with \( \alpha_j \in [-1, 1] \) we can project this corner onto

\[ \mathbf{z_i} = \begin{bmatrix} z_{i1}=\langle a_i, e_1 \rangle \\ z_{i2}=\langle a_i, e_2 \rangle \end{bmatrix} \]

The number of corners of the \( D \)-dimensional hypercube will be \( 2^D \)

Inscribed and Circumscribed Spheres

As by definition \( O \in (P) \), we can trace the projection of the inscribed sphere as the circle of radius 1, so the corners of the hypercube will be well outside this sphere. Furthermore, we have that the norm of any corner \( \mathbf{a}_i \), \( \| \mathbf{a}_i \| = \sqrt{D} \) so the projection of the circumscribed sphere will be the circle of radius \( \sqrt{D} \)

Projection Coordinates Statistics

We plotted the histogram of \( \bar{y}_{i1} \) and \( \bar{y}_{i2} \), found in the "construction of the plane" section.

The expected outcome is a Gaussian distribution where the mean \( \bar{y}_i \) increases with \( D \) and the ratio \( \frac{\text{Var}(y_i)}{\bar{y}_i} \) tends towards 0.