Questions and Comments

This slecture is reviewed by Tao Jiang

This slecture is well written, and organized in a good structure. Besides, visualizing the decision boundary of SVM is good to comprehend how SVM works. Like maximize margin, the influence of kernel functions.

Following are some suggestions for improvement.

Some typos:

Background of Linear Classification Part, some wiki syntax does not transform to formula correctly.
Support Vector Machine Part, formula of discriminate function, one more “)” was typed.
Effect of Kernel Parameters on SVM, kernel is mistakenly spelled as kernal.
By the way, formulas are not fit well in the text.

Support Vector Machine:

"Support vector machines are an example of a linear two-class classifier." Actually SVM can also handle multi classification problem and have (frequently) high accuracy. There are two common approaches, which could make SVM do multi classification, One-versus-the-rest approach and One-versus-one approach. You can find related materials in Bishop’s book Pattern Recognition and Machine Learning, in section 7.1.3 Multiclass SVMs.
Supplement: most of the coefficients would turn out to be zero. However, when it is not zero, it means that the vector is near the separation hyperplane or misclassified. We also call it support vector, only that would have influence on our final separation hyperplane.

Effect of Kernel Functions on SVM:

Misclassification rate of each kernel, what’s the parameters you chose, specifically what’s the penalty value c for each model, what’s the value of gamma for Gaussian kernel? Actually those parameters would significantly influence the classification accuracy. Suggestion: use cross validation and grid search to choose proper parameter.

Effect of Kernel Parameters on SVM:

It would be better that author introduced how the data was simulated. For example, if the data is mostly linear separable (with some outliner), then linear kernel would be better, even though the misclassification rate is higher than Gaussian kernel.
Performance should be measured by prediction, or at least by cross validation, but not by misclassification rate of the training data set. It is for sure that more complex the model is, the higher accuracy would be achieved in training set, which is also know as over fitting. By the way, over fitting is not measured by how complex the model is, but measured by how the model fits the data error.

SVMAndApplications - Rhea

Questions and Comments

Alumni Liaison