Publications by Arhat Pradhan
Portfolio: Working with VCF files 2: Removing invariant columns
Learning objectives This lesson introduces the concept of invariant columns and why they should be removed. It also provides a function to remove them. All of this material will appear on the exam. Take notes on the workflow, functions, and concepts. Main objectives By the end of this lesson you will Understand what can lead to a column of ...
7583 sym R (7618 sym/63 pcs)
Portfolio: Working with VCF files 3: Removing samples with many NAs
Learning objectives Review the problem of missing data in SNP datasets Introduce the concept of researcher degrees of freedom Review how to locate NAs in R using is.na() and which() Outline the use of for() loops in R to carry out repetitive tasks. Review the use of regular expressions to clean text data. Introduction The data we are using t...
11223 sym R (332345 sym/94 pcs) 3 img
Portfolio: PCA Analysis Case Study - Bird Species Morphology
Learning objectives All of this material will appear on the exam. Take notes on the workflow, functions, and concepts. Main objectives Work through a full analysis of a dataset with PCA Understand the connection between scree plots and the amount of variation explained by each PC Learn how to make a scree plot in terms of explained variation ...
6564 sym 4 img
Working with VCF files 4: Imputation of missing data
Learning objectives All of this material will appear on the exam. Take notes on the workflow, functions, and concepts. Main objectives By the end of this lesson you will know how to … Identify all of the missing values in a column of a dataframe or vector Replaces all the NAs in a column with a new value, such as the mean. Know how a for()...
7514 sym 2 img
PCA on SNPs data from a vcf file Part 1 - Data Preparation
Introduction In this worked example you will replicate a PCA on a published dataset. The example is split into 2 Parts: Part 1: Data Preparation (this file) Part 2: Data analysis with PCA In this Data Preparation phase, you will do the following things: Load the SNP genotypes in .vcf format (vcfR::read.vcfR()) Extract the genotypes into an R...
3601 sym R (8313 sym/32 pcs) 1 img
PCA on SNPs data from a vcf file Part 2 - Data Analysis
Introduction The example is split into 2 Parts: Part 1: Data Preparation Part 2: Data analysis with PCA (this file) Part 1 must be completed first to create a file, SNPs_cleaned.csv, that has been completely prepared for analysis. Now in Part 2, you will analyze the data with PCA. The steps here will be: Center the data (scale()) Run a PCA ...
3172 sym R (2239 sym/22 pcs) 4 img
Final Report File
Introduction This report summarizes the analysis workflow and results of an analysis of SNPs from the 1000 Genomes Project. Data preparation Obtaining and loading data Single Nucleotide Polymorphism (SNPs) data in VCF format were obtained from the 1000 Genomes Project. SNPs were downloaded using the Ensembl Data Slicer from chromosome 5 betw...
7763 sym R (6397 sym/28 pcs) 7 img
Final RMD File
R Markdown This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks with...
1430 sym R (8720 sym/69 pcs)