Publications by Umer Farooq
Logistic regression part 1
DATA 606 Data Project Proposal Umer Farooq 2023-04-09 library(tidyverse) library(psych) Data Preparation: The data set that I have found was already in tabular structure stored in a github repository. It did require some wrangling and transformation but I will include data wrangling and transformation in the final project report. Over here I...
3108 sym 4 img 1 tbl
Sentiment Analysis
Introduction In this particular file we will work on sentiment analysis. Sentiment analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral according t...
6160 sym R (5785 sym/43 pcs) 12 img 4 tbl
Inference for numerical data
Getting Started Load packages In this lab, we will explore and visualize the data using the tidyverse suite of packages, and perform statistical inference using infer. The data can be found in the companion package for OpenIntro resources, openintro. Let’s load the packages. library(tidyverse) library(openintro) library(infer) The data Ev...
8919 sym 5 img
Working with NYT API
Overview: This particular session will cover on how to connect to New York times(NYT) API through Rstudio and load the data in the form of data frame. First we have to create an API key. In order to create that we have to sign up to NYT using the link: https://developer.nytimes.com/accounts/create, after signing we can create the API key using ...
1779 sym R (1057 sym/7 pcs) 1 img 1 tbl
Most Important Skill of Data Scientist
Overview In this particular project we had to work collaboratively to find out about some of most high demand data scientist skills in the market. We found a large data set from Kaggle which list all the important skills and responses from the people who are working in the field. The data set contained 44 variables spread into 296 columns with ...
7333 sym R (16973 sym/32 pcs) 21 img 1 tbl
Inference on Proportion
Getting Started Load packages In this lab, we will explore and visualize the data using the tidyverse suite of packages, and perform statistical inference using infer. The data can be found in the companion package for OpenIntro resources, openintro. Let’s load the packages. library(tidyverse) library(openintro) library(infer) The data Yo...
9737 sym Python (8161 sym/37 pcs) 5 img
Sampling Distributions
In this lab, you will investigate the ways in which the statistics from a random sample of data can serve as point estimates for population parameters. We’re interested in formulating a sampling distribution of our estimate in order to learn about the properties of the estimate, such as its distribution. Setting a seed: We will take some ran...
13676 sym 5 img
Confidence Intervals
If you have access to data on an entire population, say the opinion of every adult in the United States on whether or not they think climate change is affecting their local community, it’s straightforward to answer questions like, “What percent of US adults think climate change is affecting their local community?”. Similarly, if you had d...
14006 sym 2 img 1 tbl
Working HTML,XML and JSON
Assignment week 7: Instructions: Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separa...
3964 sym R (3284 sym/13 pcs) 3 tbl
Disbtributions
In this lab, you’ll investigate the probability distribution that is most central to statistics: the normal distribution. If you are confident that your data are nearly normal, that opens the door to many powerful statistical methods. Here we’ll use the graphical tools of R to assess the normality of our data and also learn how to generate ...
13188 sym 18 img