Publications by Jayesh Gokhale
NOAA Storm Analysis
Analysis of NOAA Storm Data and identifying the sources of events which have the greatest impact on population health and economy of the United States Synopsis What is meant by harmful? On exploring the dataset we have found that FATALITIES and INJURIES are two variables that explain the term: HARMFUL Now the variable BGN_DATE is one variabl...
5910 sym R (54467 sym/50 pcs) 3 img 3 tbl
Text Prediction Model Slide Deck
TextPrediction for DS CapstoneJayesh Gokhale5th June 2021 Technologies Used: R, quanteda, data.table, Shiny Dictionary Used: qdapDictionaries::GradyAugmented Corpus Data Cleaning Issues with Corpus Data Non-English Words: Removed Based on Dictionary Numbers and Dates: Removed everything that is not an alphabet Special and Unicode Characters: ...
2089 sym
Text Prediction Model Validation
Model Validation Let us validate our Model on 1000 rows of random sampled data. We will validate it using both Interpolation and Kneser-Ney Smoothing methods Validation of Text Prediction Model is tricky. Here is the approach - Our Shiny App makes 5 predictions (Ranked) for next word for each algorithm. If the target word is found in any of the...
1897 sym R (96519 sym/65 pcs)
US Vaccinations Progress
5/14/2021 US Vaccinations Progress We are trying to map the progress of Vaccinations across United States as on 13th May 2021. Data is collected from Kaggle Two Datasets are collected – Lat/Long Data – Vaccinations Data These Datasets are then merged for drawing maps Data Exploration ## date location State lat lng t...
447 sym R (1198 sym/1 pcs)
GDP Growth Rate in Plotly (G7 Members)
5/14/2021 Objective Let us analyze the growth rates of G7 Member Economies from 1960 to around 2020 (as per data availability) G7 Members are Canada, France, Germany, Italy, Japan, UK and US Data is taken from Kaggle GDP annual growth for each country (1960 - 2020) gdp.growth <- read.csv("GDP_annual_growth.csv") g7 <- c("CAN","FRA","DEU","ITA...
686 sym R (465 sym/1 pcs)
Stroke Prediction Dataset Analysis
Stroke Prediction AnalysisJayesh Gokhale5/16/2021 Data Dataset taken from Kaggle (posted by: fedesoriano) Categorical Variables gender; ever_married hypertension; heart_disease work_type; residence_status smoking status Numeric Variables age avg_glucose_level bmi Target Variable: stroke (Heart Stroke? Binary) Application Quick “Trial an...
834 sym R (302 sym/2 pcs)
Data Science Capstone - Initial Analysis
Initial Analysis on the Capstone Project (Data Science Specialization) Problem Statement We have to build a text prediction system based on some text Corpus provided. While there are corpus available in multiple languages, we shall do it in English only. There are three sources viz. Blogs, News and Twitter. Approach After doing some reading, ...
4611 sym R (7194 sym/20 pcs) 4 img