(New page: ---- =Questions and Comments= * Tian's review: It's a great slecture, the explanation is in detail and the logic between the steps is quite clear. Also, the MATLAB example provides me w...)
 
 
(16 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
<center><font size= 4>
 +
Questions and Comments for:
 +
'''[[Curse_of_Dimensionality|Curse of Dimensionality]]'''
 +
</font size>
  
 +
A [https://www.projectrhea.org/learning/slectures.php slecture] by Haonan Yu
 +
 +
</center>
 +
----
 +
Please leave me comment below if you have any questions, if you notice any errors or if you would like to discuss a topic further.
 
----
 
----
  
 
=Questions and Comments=
 
=Questions and Comments=
 +
Review by Soonam Lee:
 +
# This slecture mainly talks about curse of dimensionality using easy metaphor. Assume that our goal is finding the needle in N-D haystack. If the haystack lies in 1-D, simple search followed by one axis or basis is enough. However, as number of dimension increases, the number of basis increases and it produces exponentially expensive search problem. The author explains this concept with various pictures. Moreover, this slecture describes sparsity of samples that is from increasing dimension. With the Gaussian reconstruction using different number of samples, the author conveys the idea efficiently. Lastly, three ways are introduced to break this curse of dimensionality such as feature extraction method, dimensionality reduction techniques, and kernel method. Firstly, feature extraction method is the one way that selects and extracts meaningful features from given redundant information. On the other hands, dimensionality reduction technique gave us the way to project from the high dimensional space to low dimensional space. In this case, how to choose the project axes are very important tasks. PCA uses the largest variance axes as a projection axes whereas LDA decides these axes based on the best separation across class. Apart from these two methods, kernel methods maps data to much higher dimensions but the data should be explained with high dimension. 
 +
# Overall, the contents are very simple and intuitive. More specifically, the metaphor which describes the curse of dimensionality is easy to understand. The author uses several pictures and it helps even beginner to understand this concept. Also, the sparsity caused by increasing dimension is also clear because the slecture explains the concept with pictures. Lastly, dimensional reduction technique is also easily understood. 
 +
# I have some suggestions to make this slecture more abundant. First, the author uses the word overfitting which is the lack of ability to reliably estimate and generalize. However, overfitting is not coming from lack of training data. Perhaps, underfitting is the appropriate word that have bad estimation performance due to sparse samples. The definition of overfitting is such that it fits very well for training data, but it doesn't fit well for testing data. It means since it captures unnecessary details such as noise from training data, it did not fit well in testing data. Second, since PCA and LDA are not difficult idea, he can put another pictures to explain them easier. The last figures can help understanding dimensionality reduction technique roughly, but did not provide information about PCA or LDA itself. Since we have several nice slecture to explain this, I will link them as below.
 +
*'''[[PCA|Principal Component Analysis (PCA)]]'''
  
* Tian's review: It's a great slecture, the explanation is in detail and the logic between the steps is quite clear. Also, the MATLAB example provides me with a certain level of intuition of how the whitening process works. This slecture greatly helps me to understand the transformation between different Gaussian distributions, in both mathematical sense and practical sense. 
+
----
 
+
Answer to the comments: First, thanks for the reviews.
* Tian's comment 1: As we can see, the whited data does not have exactly zero mean and an identity covariance matrix, there are some tiny errors, is there a way that we can characterize how good the whiting process is from those errors? I guess the error should be linear to the difference between the largest and smallest eigenvalues of the covariance matrix. This could be a further topic that readers want to explore.
+
 
+
* Tian's comment 2: The real-world data that we process is usually not exactly multivariate Gaussian distribution, if the original data was not Gaussian distribution, or even very far away from a Gaussian distribution, can we still use this technique and how good the result would be? This could be an interesting question that readers could further explore.
+
  
* Questions and comments
+
It is '''overfitting''' instead of '''underfitting'''. Underfitting is because that the model is too simple to capture the trend of the data. For example, you try to fit points sampled from a curve with a straight line. Underfitting cannot be solved by adding into training samples because in the end you will still get a straight line. In this case, overfitting is the correct term because the quantity of the data usually does not match the complexity of the model (in high dimensionality). However, this overfitting will be solved by adding into more training points. About PCA, I do have a link in the original slecture. I also have a link on LDA (to wikipedia).
  
 
----
 
----
 +
Back to '''[[Curse_of_Dimensionality|Curse of Dimensionality]]'''

Latest revision as of 09:44, 30 April 2014

Questions and Comments for: Curse of Dimensionality

A slecture by Haonan Yu


Please leave me comment below if you have any questions, if you notice any errors or if you would like to discuss a topic further.


Questions and Comments

Review by Soonam Lee:

  1. This slecture mainly talks about curse of dimensionality using easy metaphor. Assume that our goal is finding the needle in N-D haystack. If the haystack lies in 1-D, simple search followed by one axis or basis is enough. However, as number of dimension increases, the number of basis increases and it produces exponentially expensive search problem. The author explains this concept with various pictures. Moreover, this slecture describes sparsity of samples that is from increasing dimension. With the Gaussian reconstruction using different number of samples, the author conveys the idea efficiently. Lastly, three ways are introduced to break this curse of dimensionality such as feature extraction method, dimensionality reduction techniques, and kernel method. Firstly, feature extraction method is the one way that selects and extracts meaningful features from given redundant information. On the other hands, dimensionality reduction technique gave us the way to project from the high dimensional space to low dimensional space. In this case, how to choose the project axes are very important tasks. PCA uses the largest variance axes as a projection axes whereas LDA decides these axes based on the best separation across class. Apart from these two methods, kernel methods maps data to much higher dimensions but the data should be explained with high dimension.
  2. Overall, the contents are very simple and intuitive. More specifically, the metaphor which describes the curse of dimensionality is easy to understand. The author uses several pictures and it helps even beginner to understand this concept. Also, the sparsity caused by increasing dimension is also clear because the slecture explains the concept with pictures. Lastly, dimensional reduction technique is also easily understood.
  3. I have some suggestions to make this slecture more abundant. First, the author uses the word overfitting which is the lack of ability to reliably estimate and generalize. However, overfitting is not coming from lack of training data. Perhaps, underfitting is the appropriate word that have bad estimation performance due to sparse samples. The definition of overfitting is such that it fits very well for training data, but it doesn't fit well for testing data. It means since it captures unnecessary details such as noise from training data, it did not fit well in testing data. Second, since PCA and LDA are not difficult idea, he can put another pictures to explain them easier. The last figures can help understanding dimensionality reduction technique roughly, but did not provide information about PCA or LDA itself. Since we have several nice slecture to explain this, I will link them as below.

Answer to the comments: First, thanks for the reviews.

It is overfitting instead of underfitting. Underfitting is because that the model is too simple to capture the trend of the data. For example, you try to fit points sampled from a curve with a straight line. Underfitting cannot be solved by adding into training samples because in the end you will still get a straight line. In this case, overfitting is the correct term because the quantity of the data usually does not match the complexity of the model (in high dimensionality). However, this overfitting will be solved by adding into more training points. About PCA, I do have a link in the original slecture. I also have a link on LDA (to wikipedia).


Back to Curse of Dimensionality

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett