Publications by Joel Predix

Data Visualization Project

04.12.2022

#BIOPICS #Create data frame using dplyr that shows year, count of males and females in biopics for that year df <- biopics %>% select(year_release, subject_sex) %>% group_by(year_release) %>% filter(year_release >= 1915) %>% summarise(female_count = sum(subject_sex == 'Female'), male_count = sum(subject_sex == 'Male'))%>% arrange(year_r...

1730 sym Python (2548 sym/3 pcs) 1 img

Portfolio: Working with VCF files 3: Removing samples with many NAsusing for loops

06.12.2022

Learning objectives Review the problem of missing data in SNP datasets Introduce the concept of researcher degrees of freedom Review how to locate NAs in R using is.na() and which() Outline the use of for() loops in R to carry out repetitive tasks. Review the use of regular expressions to clean text data. Introduction The data we are using to p...

11098 sym R (10162 sym/94 pcs) 3 img

Mean imputation of missing data in R

06.12.2022

Learning objectives All of this material will appear on the exam. Take notes on the workflow, functions, and concepts. Main objectives By the end of this lesson you will know how to … Identify all of the missing values in a column of a dataframe or vector Replaces all the NAs in a column with a new value, such as the mean. Know how a for() lo...

7433 sym 2 img

PCA Analysis Case Study - Bird Species Morphology

06.12.2022

Learning objectives All of this material will appear on the exam. Take notes on the workflow, functions, and concepts. Main objectives Work through a full analysis of a dataset with PCA Understand the connection between scree plots and the amount of variation explained by each PC Learn how to make a scree plot in terms of explained variation an...

6500 sym 4 img

Worked Example: PCA on SNPs data from a vcf file Part 1 - Data Preparation

08.12.2022

Introduction In this worked example you will replicate a PCA on a published dataset. The example is split into 2 Parts: Part 1: Data Preparation (this file) Part 2: Data analysis with PCA In this Data Preparation phase, you will do the following things: Load the SNP genotypes in .vcf format (vcfR::read.vcfR()) Extract the genotypes into an R-c...

3413 sym R (7345 sym/31 pcs) 1 img

Worked Example: PCA on SNPs data from a vcf file Part 2 - Data Analysis

09.12.2022

Introduction The example is split into 2 Parts: Part 1: Data Preparation Part 2: Data analysis with PCA (this file) Part 1 must be completed first to create a file, SNPs_cleaned.csv, that has been completely prepared for analysis. Now in Part 2, you will analyze the data with PCA. The steps here will be: Center the data (scale()) Run a PCA ana...

2820 sym R (2165 sym/22 pcs) 4 img

Final Report: Analysis of 1000 Genomes Data with PCA

14.12.2022

Introduction This report summarizes the analysis workflow and results of an analysis of SNPs from the 1000 Genomes Project. Data preparation Obtaining and loading data= Single Nucleotide Polymorphism (SNPs) data in VCF format were obtained from the 1000 Genomes Project. SNPs were downloaded using the Ensembl Data Slicer from chromosome 10 betwe...

7041 sym R (5988 sym/25 pcs) 7 img

Data Preparation: Analysis of 1000 Genomes Data with PCA

14.12.2022

Data Preparation Preliminaries Load the vcfR and other packages with library() library(vcfR) library(vegan) library(ggplot2) library(ggpubr) Make sure that the working directory is set to the location of the SNP file setwd("~/BIOSC_1540/BIOSC1540Project") getwd() ## [1] "/Users/joelpredix/BIOSC_1540/BIOSC1540Project" list.files(pattern = "vc...

1733 sym R (3080 sym/22 pcs)