Publications by AbdulMajedRaja RS
How to Automate EDA with DataExplorer in R
EDA (Exploratory Data Analysis) is one of the key steps in any Data Science Project. The better the EDA is the better the Feature Engineering could be done. From Modelling to Communication, EDA has got much more hidden benefits that aren’t often emphasised while beginners start while teaching Data Science for beginners. The Problem That said, ...
4111 sym R (2475 sym/15 pcs) 20 img
3 tidyverse tricks for most commonly used Excel Features
In this post, We’re simply going to see 5 tricks that could help improve your tooling using {tidyverse}. Create a difference variable between the current value and the next value This is also known as lead and lag – especially in a time series dataset this varaible becomes very important in feature engineering. In Excel, This is simply done ...
2208 sym R (1748 sym/3 pcs)
How to do Topic Extraction from Customer Reviews in R
Topic Extraction is an integral part of IE (Information Extraction) from Corpus of Text to understand what are all the key things the corpus is talking about. While this can be achieved naively using unigrams and bigrams, a more intelligent way of doing it with an algorithm called RAKE is what we’re going to see in this post. Udpipe udpipe is ...
3437 sym R (6167 sym/10 pcs) 2 img
Combining the power of R and Python with reticulate
R + Py In the word of R vs Python fights, This is a simple (could be called, naive as well) attempt to show how we can combine the power of Python with R and create a new superpower. Like this one, If you have watched The Incredibles before! About this Dataset This dataset contains a bunch of tweet that came with this tag #JustDoIt after Nike r...
1485 sym R (1477 sym/5 pcs) 6 img
How to scrape Zomato Restaurants Data in R
Zomato is a popular restaurants listing website in India (Similar to Yelp) and People are always interested in seeing how to download or scrape Zomato Restaurants data for Data Science and Visualizations. In this post, We’ll learn how to scrape / download Zomato Restaurants (Buffets) data using R. Also, hope this post would serve as a basic web...
2821 sym R (1274 sym/8 pcs) 4 img
How to do Tamil Text Analysis & NLP in R
udpipe is a beautiful R package for Text Analytics and NLP and helps in Topic Extraction. While most Text Analytics resources online are only about English, This post picks up a different lanugage – Tamil and fortuntely, udpipe has got a Tamil Language Model. Loading library(udpipe) Tamil Text Below is part extracted from a Tamil Movie Review...
2240 sym R (6792 sym/7 pcs) 4 img
Regex Problem? Here’s an R package that will write Regex for you
REGEX is that thing that scares everyone almost all the time. Hence, finding some alternative is always very helpful and peaceful too. Here’s a nice R package thst helps us do REGEX without knowing REGEX. REGEX This is the REGEX pattern to test the validity of a URL: ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ A typical regular expression contains �...
3597 sym R (749 sym/7 pcs)
Hindi and Other Languages in India based on 2001 census
India is the world’s largest Democracy and as it goes, also a highly diverse place. This is my attempt to see how “Hindi” and other languages are spoken in India. In this post, we’ll see how to collect data for this relevant puzzle – directly from Wikipedia and How we’re going to visualize it – highlighting the insight. Data Wikipe...
1961 sym R (7099 sym/4 pcs) 4 img
Functional Programming + Iterative Web Scraping in R
Web Scraping in R Web scraping needs no introduction among Data enthusiasts. It’s one of the most viable and most essential ways of collecting Data when the data itself isn’t available. Knowing web scraping comes very handy when you are in shortage of data or in need of Macroeconomics indicators or simply no data available for a particular pr...
4439 sym R (627 sym/2 pcs) 6 img
Handling Missing Values in R using tidyr
In this post, We’ll see 3 functions from tidyr that’s useful for handling Missing Values (NAs) in the dataset. Please note: This post isn’t going to be about Missing Value Imputation. tidyr According to the documentation of tidyr, The goal of tidyr is to help you create tidy data. Tidy data is data where: + Every column is variable. + Ever...
2155 sym R (1973 sym/7 pcs)