Latest revision as of 19:55, 3 May 2014

Questions and Comments for: Derivation_of_Bayes_rule_In_Chinese

A slecture by Weibao Wang

如果你有什么问题或建议，请在下面留言。

问题和建议

This slecture is under review by Lu Zhang.

Summay: Your slecture is excellent. It definitely gives a good and fairly complete review of Bayes' rule. It is also very well organized, first the definition (What is Bayes' theroem), second the derivation (how to prove the correctness), then the detailed example(Why useful) and finally an introduction of Bayesian classifier (more applications).

LZ Comment1: I like your example for Bayes' rule. It is a simple, typical and real-world problem of solving the posterior probability. It might also be interesting if you can give an example for the Bayesian classification in the last section, so people who have not seen this before would have an idea how a maximum posterior probability could be used for classification. People will always like it to be used for real-world situations.

LZ Comment2: Your derivation of Bayes' rule is already very clear. However, it might be better for students to understand and memorize it if you could explain it with some graphs(like Venn diagrams) and list the Axioms of probability required for this derivation.

Back to Derivation_of_Bayes_rule_In_Chinese

@@ Line 1: / Line 1: @@
 <center><font size= 4>
 Questions and Comments for:
-'''[[Curse_of_Dimensionality|Curse of Dimensionality]]'''
+'''[[Derivation_of_Bayes_rule_In_Chinese]]'''
 </font size>
-A [https://www.projectrhea.org/learning/slectures.php slecture] by Haonan Yu
+A [https://www.projectrhea.org/learning/slectures.php slecture] by Weibao Wang
 </center>
 ----
-Please leave me comment below if you have any questions, if you notice any errors or if you would like to discuss a topic further.
+如果你有什么问题或建议，请在下面留言。
 ----
-=Questions and Comments=
+=问题和建议=
-Review by Soonam Lee:
-# This slecture mainly talks about curse of dimensionality using easy metaphor. Assume that our goal is finding the needle in N-D haystack. If the haystack lies in 1-D, simple search followed by one axis or basis is enough. However, as number of dimension increases, the number of basis increases and it produces exponentially expensive search problem. The author explains this concept with various pictures. Moreover, this slecture describes sparsity of samples that is from increasing dimension. With the Gaussian reconstruction using different number of samples, the author conveys the idea efficiently. Lastly, three ways are introduced to break this curse of dimensionality such as feature extraction method, dimensionality reduction techniques, and kernel method. Firstly, feature extraction method is the one way that selects and extracts meaningful features from given redundant information. On the other hands, dimensionality reduction technique gave us the way to project from the high dimensional space to low dimensional space. In this case, how to choose the project axes are very important tasks. PCA uses the largest variance axes as a projection axes whereas LDA decides these axes based on the best separation across class. Apart from these two methods, kernel methods maps data to much higher dimensions but the data should be explained with high dimension.
+This slecture is under review by Lu Zhang.
-# Overall, the contents are very simple and intuitive. More specifically, the metaphor which describes the curse of dimensionality is easy to understand. The author uses several pictures and it helps even beginner to understand this concept. Also, the sparsity caused by increasing dimension is also clear because the slecture explains the concept with pictures. Lastly, dimensional reduction technique is also easily understood.
-# I have some suggestions to make this slecture more abundant. First, the author uses the word overfitting which is the lack of ability to reliably estimate and generalize. However, overfitting is not coming from lack of training data. Perhaps, underfitting is the appropriate word that have bad estimation performance due to sparse samples. The definition of overfitting is such that it fits very well for training data, but it doesn't fit well for testing data. It means since it captures unnecessary details such as noise from training data, it did not fit well in testing data. Second, since PCA and LDA are not difficult idea, he can put another pictures to explain them easier. The last figures can help understanding dimensionality reduction technique roughly, but did not provide information about PCA or LDA itself. Since we have several nice slecture to explain this, I will link them as below.
+Summay: Your slecture is excellent. It definitely gives a good and fairly complete review of Bayes' rule. It is also very well organized, first the definition (What is Bayes' theroem), second the derivation (how to prove the correctness), then the detailed example(Why useful) and finally an introduction of Bayesian classifier (more applications).
-*'''[[PCA|Principal Component Analysis (PCA)]]'''
+LZ Comment1: I like your example for Bayes' rule. It is a simple, typical and real-world problem of solving the posterior probability. It might also be interesting if you can give an example for the Bayesian classification in the last section, so people who have not seen this before would have an idea how a maximum posterior probability could be used for classification. People will always like it to be used for real-world situations.
+LZ Comment2: Your derivation of Bayes' rule is already very clear. However, it might be better for students to understand and memorize it if you could explain it with some graphs(like Venn diagrams) and list the Axioms of probability required for this derivation.
 ----
-Answer to the comments: First, thanks for the reviews.
-It is '''overfitting''' instead of '''underfitting'''. Underfitting is because that the model is too simple to capture the trend of the data. For example, you try to fit points sampled from a curve with a straight line. Underfitting cannot be solved by adding into training samples because in the end you will still get a straight line. In this case, overfitting is the correct term because the quantity of the data usually does not match the complexity of the model (in high dimensionality). However, this overfitting will be solved by adding into more training points. About PCA, I do have a link in the original slecture. I also have a link on LDA (to wikipedia).
 ----
-Back to '''[[Curse_of_Dimensionality|Curse of Dimensionality]]'''
+Back to '''[[Derivation_of_Bayes_rule_In_Chinese]]'''

Difference between revisions of "Derivation of Bayes rule In Chinese review" - Rhea

Latest revision as of 19:55, 3 May 2014

问题和建议

Alumni Liaison