Publications by Shaya Engelman
Document
library(tidyverse) ## Warning: package 'ggplot2' was built under R version 4.3.3 ## Warning: package 'stringr' was built under R version 4.3.2 ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.3 ✔ readr 2.1.4 ## ✔ forcats 1...
25305 sym R (140256 sym/203 pcs) 8 img
Document
Title: Intriguing World of Bimodal Distributions: Peaks and Valleys in Data Introduction: In the realm of statistics, we often encounter distributions that follow a single peak, neatly symmetrical and bell-shaped. However, there exists a phenomenon known as the bimodal distribution, where the data exhibits not one, but two distinct peaks. These...
3798 sym R (834 sym/7 pcs) 1 img
Document
Title: Balancing Act: Tackling Imbalanced Target Variables in Logistic Regression with R In the realm of predictive modeling, logistic regression serves as a powerful tool for analyzing the relationship between independent variables and a binary outcome. However, when faced with imbalanced target variables, where one class is significantly unde...
3167 sym R (5219 sym/20 pcs) 1 img
Document
Title: Unleashing the Power of Transformations in R for Enhanced Data Analysis In the realm of data analysis, transforming variables can often unlock valuable insights and improve model performance. One powerful technique in this regard is power transformations. In this blog post, we’ll explore what power transformations are, why they’re us...
2562 sym 1 img
Document
Title: Unveiling Genetic Associations: Using the Chi-Squared Test for Gene Identification Introduction: In the era of genomics and personalized medicine, understanding the role of genes in health and disease is paramount. Genetic association studies play a crucial role in identifying genes that may be linked to various traits or diseases. Among...
3978 sym
Document
Title: Deciphering Missing Data Patterns: MCAR, MAR, and MNAR Introduction: In the realm of data analysis, missing data is a common challenge that researchers and analysts encounter. The way missing data is handled can significantly impact the validity and reliability of study findings. To effectively deal with missing data, it’s crucial to u...
4077 sym
Week 14 Discussion
#In Exercises 21 – 24, write out the first 5 terms of the Binomial series with the given k-value. 21. \(k = \frac{1}{2}\): The Binomial series with \(k = \frac{1}{2}\) expands as: \[ (1 + x)^{\frac{1}{2}} = 1 + \frac{1}{2}x - \frac{1}{8}x^2 + \frac{1}{16}x^3 - \frac{5}{128}x^4 + \cdots \] So, the first five terms are: \[ 1 + \frac{1}{2}x - \...
1216 sym
Document
use differentials to approximate propagated error. 31. A set of plastic spheres are to be made with a diameter of 1cm. If the manufacturing process is accurate to 1mm, what is the propagated error in volume of the spheres? The volume \(V\) of a sphere is given by the formula: \[V = \frac{4}{3}\pi r^3\] The differential \(dV\) of the volume \(V...
767 sym
Document
Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not? # Load libraries library(tidyverse) ## ...
2743 sym R (4109 sym/17 pcs) 1 img
Discussion 11
I used this data for a project in the past. For that project, I ended up using a polynomial model to get the best results.First, though, I tried using a linear model of the log-transformed datapoints. I didn’t end up using it because it didn’t fit. I will illustrate some of the thought process there. library(tidyverse) ## Warning: package '...
786 sym R (1885 sym/7 pcs) 1 img