Publications by Alexander Ng
Data 622 Homework 2: Palmer Penguines LDA, QDA and Naive Bayes
Introduction This assignment analyzes the Palmer Penguins dataset using linear (LDA) and quadratic (QDA) discriminant analysis and naive Bayes classifier. The dataset contains the physical measurements and traits of penguins collected over a 3 year period 2007-2009 from three closely related species: Gentoo, Chinstrap and Adelie in the Palmer Arc...
14344 sym R (17313 sym/12 pcs) 8 img 1 tbl
Data 622 Homework 3: Initial EDA and Notes
Introduction This assignment analyzes two problems. First, the Palmer Penguin dataset using the KNN model to predict species. Second, a Loan Approval dataset using Random Forest and Gradient Boosting methods. 1 Penguins and the KNN algorithm Following the data cleaning approach taken in prior assignments, we exclude observations where the featur...
9793 sym R (15192 sym/7 pcs) 7 img 27 tbl
Data 622 HW3 V2 KNN + Tree + RF
Introduction This document discusses analyses of two datasets, the Palmer Penguin dataset and a Loan Approvals dataset prepared by Group 6. We divide the document into five parts and adopted two key principles to undertaking this analysis: First, group has developed a system of checks and balances in preparing each model’s output. A primary and...
27363 sym R (28268 sym/15 pcs) 14 img 13 tbl
Data 622 Discussion Week 10 - Is Math Necessary to use SVM?
Data 622 Discussion Week 10 - Is Math Necessary for SVM? Alexander Ng 04/08/2021 Support Vector Machines While this course DATA 622 is mostly practical, we should remember the algorithms and code work because a deep theory was created in the 1990s. In some cases, the theory makes algorithms practical. Support vector machines have a deep and beaut...
7689 sym 4 img
Data 622 Discussion 11 - Minimum Spanning Trees in Finance
Data 622 Discussion Week 11 - Unsupervised Learning with Minimum Spanning Trees in Finance Alexander Ng 04/13/2021 Introduction One unsupervised learning method used in machine learning is minimum spanning trees (MST). In the financial industries, practitioners have attempted to apply these methods in various applications. This discussion will de...
3888 sym 1 img
Municipal Crime in Residential Subdivision - ANG Regression
\(\color{red}{\text{Notes for Group 6}}\) 0.1 Group and Final Project Deadlines Reports Final Drafts: \(\color{red}{\text{May 16, 2021 Sunday 11:00pm EST}}\) Report Merge: May 17, 2021 Monday EST Youtube Recording Prepared and Uploaded: May 19, 2021 Wed Project Due Date: \(\color{red}{\text{Submitted by May 20, 2021 Thursday 11:59pm EST}}\) Pl...
25232 sym R (16434 sym/13 pcs) 7 img 6 tbl
Data 622 HW4 Rough Draft MFA and EDA
Introduction This document discusses analyses of a mental health data set. Section 1 conducts exploratory data analysis and data wrangling on the dataset. We obtain a dataset with imputed values and omitted rows and columns. The author is Alexander Ng. Section 2 contains an analysis using a multiple factor analysis (MFA) which is a generalization...
32836 sym R (23196 sym/15 pcs) 27 img 20 tbl
Municipal Crime in Residential Subdivision Draft 1 Clean Data
\(\color{red}{\text{Notes for Group 6}}\) 0.1 Group and Final Project Deadlines Reports Final Drafts: \(\color{red}{\text{May 16, 2021 Sunday 11:00pm EST}}\) Report Merge: May 17, 2021 Monday EST Youtube Recording Prepared and Uploaded: May 19, 2021 Wed Project Due Date: \(\color{red}{\text{Submitted by May 20, 2021 Thursday 11:59pm EST}}\) Pl...
15111 sym R (6283 sym/8 pcs) 4 tbl
Predicting Crime Hotspots with Cubist and Random Forests
1 Introduction Crime analysis has made extensive and growing use of quantitative methods machine learning (ML) techniques over the last two decades (Kounadi et al., 2020; Santos, 2017). One area of application is crime hotspot identification. A crime hotspot may be defined as “an area that has a greater than average number of criminal or disord...
41408 sym R (9524 sym/1 pcs) 11 img 6 tbl
Crime Data EDA - Raleigh NC
1 Introduction This module extracts and describes the Raleigh Crime Incident data to be used for Model Evaluation. Crime incident data is organized by crime_category. Each observation has date, time, crime category and subcategory information, address, and geocoding information. We find that the crime_category of ASSAULT is the most frequent and ...
2844 sym R (6066 sym/21 pcs) 3 img 5 tbl