Publications by Hersh Parikh
Portfolio: Downloading DNA sequences as FASTA files in R
This is a modification of “DNA Sequence Statistics” from Avril Coghlan’s A little book of R for bioinformatics.. Most of the text and code was originally written by Dr. Coghlan and distributed under the Creative Commons 3.0 license. NOTE: There is some redundancy in this current draft that needs to be eliminated. Functions library() help...
13509 sym R (2553 sym/43 pcs)
Using Dot Plots in R to Investigate Sequence Repeats
In this exercise we’ll look at a sequence with known tandem repeats. We’ll load the data, explore it in R, then use the dotPlot() function to make various dotplots to see how changing settings for dotPlots() help make repeat patterns stand out. Add the necessary code to make this script functional. Preliminaries Load packages library(seqinr...
1534 sym R (6837 sym/57 pcs) 12 img
Introduction to Dot Plots in R
Sequence dotplots in R By: Avril Coghlan. Adapted, edited and expanded: Nathan Brouwer under the Creative Commons 3.0 Attribution License (CC BY 3.0). NOTE: I’ve added some new material that is rather terse and lacks explication. Good sources of more info: https://omicstutorials.com/interpreting-dot-plot-bioinformatics-with-an-example/ http://r...
4087 sym R (1770 sym/15 pcs) 8 img
Investigating Your Focal Gene for the Presence of Repeats
Change the XXXXX of the title to your gene name. Change the names and text appropriately to reflect your gene / protein. Add the necessary code to make this script functional. Download the PROTEIN sequence of your gene. Adapting the code below, make 2 grids of 4 plots (8 plots total) exploring different settings for window size and the match thre...
1608 sym R (6064 sym/35 pcs) 5 img
Adjusting Dot Plot Setting in R to Investigate Sequence Repeats in Shroom
Preliminaries Load packages library(seqinr) library(rentrez) library(compbio4all) library(Biostrings) ## Loading required package: BiocGenerics ## Loading required package: parallel ## ## Attaching package: 'BiocGenerics' ## The following objects are masked from 'package:parallel': ## ## clusterApply, clusterApplyLB, clusterCall, clusterEv...
1122 sym R (6095 sym/38 pcs) 5 img
Portfolio: Pie graphs part 1
Introduction This data was collected by Alice B. Popejoy and Stephanie M. Fullerton. It was collected using the genome-wide association studies (GWAS) catalog. This catalog is the most comprehensive, accessible summary of human genetic association research. The process was repeated in 2016 because of the shrinking percentage of European participa...
1622 sym R (1130 sym/3 pcs) 1 img
Portfolio: Pairwise Alignment
Global proteins aligments in R By: Avril Coghlan. Adapted, edited and expanded: Nathan Brouwer under the Creative Commons 3.0 Attribution License (CC BY 3.0). Preliminaries library(compbio4all) library(Biostrings) Download sequences As we did in the previous lesson on dotplots, we’ll look at two sequences. # Download ## sequence 1: NP_950252....
11028 sym R (9266 sym/65 pcs)
Portfolio: Testing Google Sheets Access
The goal of this exercise is to make you familiar with how to download data from Google Sheets and to briefly review some key concepts R functions and coding concepts. We’ll do the following things download a list of RefSeq accessions from a Google sheet remove the NAs using na.omit() select out all but one isoform using duplicated() Packages...
865 sym R (2755 sym/25 pcs)
Portfolio: Downloading, Cleaning, and Aligning Data
The goal of this exercise is to make you familiar with how to download data from Google Sheets and to briefly review some key concepts R functions and coding concepts. We’ll do the following things download a list of RefSeq accessions from a Google sheet remove the NAs select out all but one isoform using duplicated() Packages ## Google sheets...
3808 sym R (17519 sym/122 pcs) 1 img
Portfolio - Predicting Amino Acid Chemistry Using Regression Models
Key vocab proteinogenic amino acids regression model / line of best fit pI confidence intervals (CI) confidence ellipse correlation coefficient Selenocysteine and Pyrrolysine re-coding stop codons y = m*x + b slope intercept Key functions / packages ggpubr pander lm() coef() cor() round() Predict pI for an Selenocysteine and Pyrrolysine Amino...
5645 sym R (3622 sym/25 pcs) 1 img 5 tbl