06-Supervised Learning Advanced Regressors and Classifiers

## Bayes Theorem and Naive Bayes Classifier ### Bayes Theorem Bayes Theorem provides a way to update the probability estimate for a hypothesis as more evidence or information becomes available. #### Formula For events $A$ and $B$: $P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$ where: - $P(A|B)$ is the posterior probability of $A$ given $B$. - $P(B|A)$ is the likelihood of $B$ given $A$. - $P(A)$ is the prior probability of $A$. - $P(B)$ is the probability of $B$. ### Naive Bayes Classifier Naive Bayes is a probabilistic classifier based on Bayes Theorem, assuming strong (naive) independence between features. #### Formula For a given class $C$ and features $x_1, x_2, \ldots, x_n$: $P(C|x_1, x_2, \ldots, x_n) \propto P(C) \prod_{i=1}^{n} P(x_i|C)$ #### Python Implementation ```python from sklearn.naive_bayes import GaussianNB # Sample data X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] y = [0, 1, 0, 1, 0] # Naive Bayes classifier model = GaussianNB() model.fit(X, y) y_pred = model.predict(X) # Results print("Predicted labels:", y_pred)` ``` --- ## Support Vector Machines (SVM) Support Vector Machines (SVM) are supervised learning models used for classification and regression. They aim to find the hyperplane that best separates the data into classes. ### Mathematical Formulation For a binary classification problem, the SVM algorithm finds the optimal hyperplane: $w \cdot x - b = 0$ where $w$ is the weight vector and $b$ is the bias. The goal is to maximize the margin, defined as: $Margin = \frac{2}{\|w\|}$ ### Python Implementation ```python from sklearn import svm # Sample data X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] y = [0, 1, 0, 1, 0] # SVM classifier model = svm.SVC(kernel='linear') model.fit(X, y) y_pred = model.predict(X) # Results print("Predicted labels:", y_pred)` ``` --- ## Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA) is a classification method that projects data onto a lower-dimensional space to maximize class separability. ### Mathematical Formulation LDA finds the linear combinations of features that best separate two or more classes. The objective is to maximize the between-class variance and minimize the within-class variance. #### Formula The LDA maximizes the following objective function: $J(w) = \frac{w^T S_B w}{w^T S_W w}$ where: - $S_B$ is the between-class scatter matrix. - $S_W$ is the within-class scatter matrix. ### Python Implementation ```python from sklearn.discriminant_analysis import LinearDiscriminantAnalysis # Sample data X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] y = [0, 1, 0, 1, 0] # LDA model = LinearDiscriminantAnalysis() model.fit(X, y) y_pred = model.predict(X) # Results print("Predicted labels:", y_pred)` ``` --- ## Quadratic Discriminant Analysis (QDA) Quadratic Discriminant Analysis (QDA) is a classification method similar to LDA but allows for quadratic decision boundaries. ### Mathematical Formulation QDA assumes that each class follows a Gaussian distribution, but unlike LDA, it does not assume that the covariance matrices of the classes are identical. #### Formula For a given class $C$ and feature vector $x$: $P(C|x) \propto P(C) \exp \left( -\frac{1}{2} (x - \mu_C)^T \Sigma_C^{-1} (x - \mu_C) \right)$ where: - $\mu_C$ is the mean vector of class $C$. - $\Sigma_C$ is the covariance matrix of class $C$. ### Python Implementation ```python from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis # Sample data X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] y = [0, 1, 0, 1, 0] # QDA model = QuadraticDiscriminantAnalysis() model.fit(X, y) y_pred = model.predict(X) # Results print("Predicted labels:", y_pred)` ``` ## Conclusion ### Summary of Advanced Regressors and Classifiers - **Bayes Theorem and Naive Bayes Classifier:** Probabilistic models based on Bayes Theorem, assuming feature independence. - **Support Vector Machines (SVM):** Models that find the optimal hyperplane to separate data into classes. - **Linear Discriminant Analysis (LDA):** Projects data onto a lower-dimensional space to maximize class separability using linear decision boundaries. - **Quadratic Discriminant Analysis (QDA):** Similar to LDA but allows for quadratic decision boundaries by considering different covariance matrices for each class. Continue: [[07-Unsupervised Learning Algorithms]]