Publications by Shefali C.

Airbnb Data Cleaning Report

22.06.2024

AirBnB Data Cleaning AirBnB Data Cleaning Introduction Summary of cleaning steps Notes on reading the dataset Why col_types argument has been used in read_csv()?? Each cleaning step in detail 1. Remove duplicate rows 2. Check missing values 3. Clean column-names 4. Convert columns ...

9852 sym Python (12001 sym/58 pcs) 4 tbl

Data Cleaning- Building Permits Data

11.04.2024

San Francisco Building Permits Data Cleaning Shefali C. 2024-04-10 The dataset has been taken from here on Kaggle. This data set pertains to all types of structural permits from Jan 1, 2013-Feb 25th 2018. Data includes details on application/permit numbers, job addresses, supervisorial districts, and the current status of the applications. ...

4292 sym Python (20389 sym/66 pcs) 1 tbl

Diamonds Dataset- Detailed Descriptive Analysis

21.02.2024

library(tidyverse) #to visualize missing values library(visdat) library(patchwork) #to build treemap library(treemap) #to create text grobs library(grid) #to stop scientific notation on axes library(scales) #to build marginal distribution plots library(ggExtra) This notebook contains a list of most popular data distribution charts used ...

9896 sym 21 img

ggplot2: usage of "expand" in scale_(x|y)_* functions

19.02.2024

This notebook explains the expand argument of scale_(x|y)_continuous() and scale_(x|y)_discrete() functions. Both these functions are used to add/remove space between the main graph and X-Y axis. library(tidyverse) #to use font of choice library(showtext) #to use emoji library(emo) #to stitch some graphs together library(patchwork) I like...

3973 sym 8 img

World Bank Data Visualization

05.10.2023

This notebook contains visualizations of a few World Bank indicators for the top 6 economies of the world. It also contains a few plots for the Global emission data (\(CO_2\) & Greenhouse Gases). library(tidyverse) library(ggthemes) library(lubridate) library(plotly) library(patchwork) library(gghighlight) #read csv file all_data <- read_c...

5005 sym Python (24002 sym/40 pcs) 6 img

Data Cleaning- Audible Dataset

05.10.2023

Introduction This notebook answers some of the most common “How To..” questions that pop-up during data cleaning. I’ve also cleaned this notebook in Python. You may check out the Python version here on Kaggle. I hope both versions of this work helps beginners to understand corresponding functions in Python and R. The Audible Dataset used...

9579 sym Python (21235 sym/72 pcs)