Posts

Concepts in Machine Learning- CST 383 KTU Minor Notes- Dr Binu V P

About Me Syllabus Previous Year Question Papers-Concepts in Machine Learning CST 383 Module 1:  Overview of  Machine Learning Bayesian Formulation Maximum a Posteriori (MAP), a Bayesian method/Maximum Likelihood Estimation (MLE), Module 2: Supervised Learning Supervised Learning,Regression, Perceptron Naive Bayes Classifier Decision Trees-ID3 Module 3: Introduction to Neural Networks Neural Networks and activation functions Multi Layer Neural Networks, back propagation Back-propagation Example Activation Functions Implementation of a two layer XOR network with  sigmoid activation Application of Neural Networks Support Vector Machines ( SVM) Module 4:Unsupervised Learning Similarity Measures Representative-based Clustering(K-means and Expectation-Maximization Algorithms) Hierarchical Clustering-Agglomerative Clustering ( AHC) Dimensionality Reduction-Principal Component Analysis ( PCA) Factor Analysis Linear Discriminant Analysis ( LDA) Module 5: Classification Assessment Cross validati

Syllabus Concepts in Machine Learning- CST 383 KTU

Module-1 (Overview of machine learning)  Machine learning paradigms-supervised, semi-supervised, unsupervised, reinforcement learning. Basics of parameter estimation - maximum likelihood estimation(MLE) and maximum a posteriori estimation(MAP). Introduction to Bayesian formulation. Module-2 (Supervised Learning) Regression - Linear regression with one variable, Linear regression with multiple variables, solution using gradient descent algorithm and matrix method, basic idea of overfitting in regression. Linear Methods for Classification- Logistic regression, Perceptron, Naive Bayes, Decision tree algorithm ID3. Module-3 (Neural Networks (NN) and Support Vector Machines (SVM))   NN - Multilayer feed forward network, Activation functions (Sigmoid, ReLU, Tanh), Backpropagation algorithm. SVM - Introduction, Maximum Margin Classification, Mathematics behind Maximum Margin Classification, Maximum Margin linear separators, soft margin SVM classifier, non-linear SVM, Kernels for learn

Dimensionality Reduction -PCA

Image
Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: It is hard to analyze, interpretation is difficult, visualization is nearly impossible, and (from a practical point of view) storage of the data vectors can be expensive. However, high-dimensional data often has properties that we can exploit. For example, high-dimensional data is often overcomplete, i.e., many dimensions are redundant and can be explained by a combination of other dimensions. Furthermore, dimensions in high-dimensional data are often correlated so that the data possesses an intrinsic lower-dimensional structure. Dimensionality reduction exploits structure and correlation and allows us to work with a more compact representation of the data, ideally without losing information. We can think of dimensionality reduction as a compression technique, similar to jpeg or mp3, which are compression algorithms for images and music

Classification Assessment

Image
Evaluation of a machine learning model is crucial to measure its performance. Numerous metrics are used in the evaluation of a machine learning model. Selection of the most suitable metrics is important to fine-tune a model based on its performance.We should know the methods to assess the performance of classifiers, and to compare multiple classifiers. CLASSIFICATION PERFORMANCE MEASURES Let $D$ be the testing set comprising $n$ points in a $d$ dimensional space, let $\{c_1,c_2,\ldots ,c_k\}$ denote the set of $k$ class labels, and let $M$ be a classifier. For $x_i \in D$, let $y_i$ denote its true class, and let $\hat{y}_i = M(x_i)$ denote its predicted class. Classification Accuracy and its Limitations Classification accuracy is the ratio of correct predictions to total predictions made. $classification accuracy = correct predictions / total predictions$  It is often presented as a percentage by multiplying the result by 100. $classification accuracy = (correct predictions / total

Similarity Measures

Image
In data science, the similarity measure is a way of measuring how data samples are related or closed to each other. On the other hand, the dissimilarity measure is to tell how much the data objects are distinct. Moreover, these terms are often used in clustering when similar data samples are grouped into one cluster. All other data samples are grouped into different ones. It is also used in classification(e.g. KNN), where the data objects are labeled based on the features’ similarity. Another example is when we talk about dissimilar outliers compared to other data samples(e.g., anomaly detection). In data mining applications, such as clustering, outlier analysis, and nearest-neighbor classification, we need ways to assess how alike or unalike objects are in comparison to one another. For example, a store may want to search for clusters of customer objects, resulting in groups of customers with similar characteristics (e.g., similar income, area of residence, and age). Such informatio