Algorithms
Posted: 2016-03-03 , Modified: 2016-03-03
Tags: LDC
Posted: 2016-03-03 , Modified: 2016-03-03
Tags: LDC
(Taken from lecture 11 of algorithms)
Distance being preserved up to \(\ep\) is the same as saying that \[|\an{Mv_i,Mv_j} -\an{v_i,v_j}|\le \ep \ve{v_i}\ve{v_j}.\] This is good for vectors with large \(L^1/L^2\) ratio, and not good for sparse vectors.
Because many algorithms rely on finding a closest neighbor—nearest neighbor, minimum spanning tree, point location etc.—have a running time depending upon exp(d).
In machine learning and statistics sometimes the term refers to the fact that available data is too sparse in high dimensions; it takes exp(d) amount of data (say, points on the sphere)to ensure that each new sample is close to an existing sample
Blessing of dimensionality. This refers to the fact that many phenomena become much clearer and easier to think about in high dimensions because one can use simple rules of thumb (e.g., Chernoff bounds, measure concentration) which don’t hold in low dimensions.
Suppose we wish to hash high-dimensional vectors so that nearby vectors tend to hash into the same bucket. To do this we can do a random projection into say the cube in 5 dimensions. We discretise the cube into smaller cubes of size \(\ep\). Then there are 1/\(\ep^5\) smaller cubes; these can be the buckets.
Can reduce dimension while keeping the margin!