Publications by Rashad Long

NYT Web API

24.03.2024

NYT Web API Author Rashad Long Introduction This project aims to construct a data frame containing the current New York Times Best Sellers List for the ‘Combined Print & E-Book Fiction’ category. The data will be retrieved by leveraging the New York Times Books API. Imported httr2 and jsonlite library(httr2) library(jsonlite) library(dpl...

1545 sym Python (7782 sym/7 pcs)

Super Bowl Data

24.03.2024

Super Bowl Data Author Rashad Long Import Libraries library(tidyverse) library(rvest) library(gtools) library(stringi) Scrape Data Use read_html to scrape the data from the website. url <- "https://www.espn.com/nfl/superbowl/history/winners" webpage <- read_html(url) Extract Data Use the html_nodes and html_table to extract HTML table e...

511 sym

Inference for categorical data

23.03.2024

Getting Started Load packages In this lab, we will explore and visualize the data using the tidyverse suite of packages, and perform statistical inference using infer. The data can be found in the companion package for OpenIntro resources, openintro. Let’s load the packages. library(tidyverse) library(openintro) library(infer) The data Yo...

10381 sym Python (4742 sym/25 pcs) 3 img

Project 3 - Data Science Skills

18.03.2024

Introduction This project aims to establish a quantitative assessment of the relative value of specific skills for data science professionals. We will achieve this by analyzing data extracted from job postings on relevant job boards. The analysis will focus on two key aspects of data scientist job postings: advertised salary and the frequency o...

1912 sym R (8445 sym/15 pcs) 4 img 1 tbl

Working with XML and JSON in R

10.03.2024

Working with XML and JSON in R Author Rashad Long Load required packages library(rvest) # for working with HTML library(xml2) # for working with XML library(jsonlite) # for working with JSON library(httr) # for working with HTTP library(tidyverse) library(XML) Read HTML Table # Read HTML table using rvest df_html <- rvest::read_html("h...

288 sym

Foundations for statistical inference - Confidence intervals

10.03.2024

If you have access to data on an entire population, say the opinion of every adult in the United States on whether or not they think climate change is affecting their local community, it’s straightforward to answer questions like, “What percent of US adults think climate change is affecting their local community?”. Similarly, if you had d...

12086 sym Python (13979 sym/30 pcs) 7 img 1 tbl

Foundations for statistical inference - Sampling distributions

09.03.2024

In this lab, you will investigate the ways in which the statistics from a random sample of data can serve as point estimates for population parameters. We’re interested in formulating a sampling distribution of our estimate in order to learn about the properties of the estimate, such as its distribution. Setting a seed: We will take some ran...

13024 sym 6 img

Fast Food Distribution

04.03.2024

In this lab, you’ll investigate the probability distribution that is most central to statistics: the normal distribution. If you are confident that your data are nearly normal, that opens the door to many powerful statistical methods. Here we’ll use the graphical tools of R to assess the normality of our data and also learn how to generate ...

11374 sym Python (7393 sym/23 pcs) 10 img

FIFA Player Data

03.03.2024

DATA-607 Project 2 IS 607 – Project 2 The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work. Your task is to: (1) Choose any three of the “wide” datasets identified in the Week 6 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that...

3311 sym R (13348 sym/14 pcs) 1 img

Language Diversity Dataset

03.03.2024

Language Diversity Dataset Authors Rashad Long Biyag Dukuray Original Dataset Data Source: The Language Diversity Dataset, obtained from untidydata repository Justification for Reshaping: The dataset is currently in an “untidy” format, meaning it is not in a wide format suitable for extensive analysis. Reshaping the data into a wide form...

1791 sym Python (4291 sym/11 pcs) 1 img 1 tbl