Publications by Hersh Parikh

Portfolio: Downloading DNA sequences as FASTA files in R

17.10.2021

This is a modification of “DNA Sequence Statistics” from Avril Coghlan’s A little book of R for bioinformatics.. Most of the text and code was originally written by Dr. Coghlan and distributed under the Creative Commons 3.0 license. NOTE: There is some redundancy in this current draft that needs to be eliminated. Functions library() help...

13509 sym R (2553 sym/43 pcs)

Using Dot Plots in R to Investigate Sequence Repeats

28.10.2021

In this exercise we’ll look at a sequence with known tandem repeats. We’ll load the data, explore it in R, then use the dotPlot() function to make various dotplots to see how changing settings for dotPlots() help make repeat patterns stand out. Add the necessary code to make this script functional. Preliminaries Load packages library(seqinr...

1534 sym R (6837 sym/57 pcs) 12 img

Introduction to Dot Plots in R

28.10.2021

Sequence dotplots in R By: Avril Coghlan. Adapted, edited and expanded: Nathan Brouwer under the Creative Commons 3.0 Attribution License (CC BY 3.0). NOTE: I’ve added some new material that is rather terse and lacks explication. Good sources of more info: https://omicstutorials.com/interpreting-dot-plot-bioinformatics-with-an-example/ http://r...

4087 sym R (1770 sym/15 pcs) 8 img

Investigating Your Focal Gene for the Presence of Repeats

28.10.2021

Change the XXXXX of the title to your gene name. Change the names and text appropriately to reflect your gene / protein. Add the necessary code to make this script functional. Download the PROTEIN sequence of your gene. Adapting the code below, make 2 grids of 4 plots (8 plots total) exploring different settings for window size and the match thre...

1608 sym R (6064 sym/35 pcs) 5 img

Adjusting Dot Plot Setting in R to Investigate Sequence Repeats in Shroom

28.10.2021

Preliminaries Load packages library(seqinr) library(rentrez) library(compbio4all) library(Biostrings) ## Loading required package: BiocGenerics ## Loading required package: parallel ## ## Attaching package: 'BiocGenerics' ## The following objects are masked from 'package:parallel': ## ## clusterApply, clusterApplyLB, clusterCall, clusterEv...

1122 sym R (6095 sym/38 pcs) 5 img

Portfolio: Pie graphs part 1

31.10.2021

Introduction This data was collected by Alice B. Popejoy and Stephanie M. Fullerton. It was collected using the genome-wide association studies (GWAS) catalog. This catalog is the most comprehensive, accessible summary of human genetic association research. The process was repeated in 2016 because of the shrinking percentage of European participa...

1622 sym R (1130 sym/3 pcs) 1 img

Portfolio: Pairwise Alignment

13.11.2021

Global proteins aligments in R By: Avril Coghlan. Adapted, edited and expanded: Nathan Brouwer under the Creative Commons 3.0 Attribution License (CC BY 3.0). Preliminaries library(compbio4all) library(Biostrings) Download sequences As we did in the previous lesson on dotplots, we’ll look at two sequences. # Download ## sequence 1: NP_950252....

11028 sym R (9266 sym/65 pcs)

Portfolio: Testing Google Sheets Access

13.11.2021

The goal of this exercise is to make you familiar with how to download data from Google Sheets and to briefly review some key concepts R functions and coding concepts. We’ll do the following things download a list of RefSeq accessions from a Google sheet remove the NAs using na.omit() select out all but one isoform using duplicated() Packages...

865 sym R (2755 sym/25 pcs)

Portfolio: Downloading, Cleaning, and Aligning Data

13.11.2021

The goal of this exercise is to make you familiar with how to download data from Google Sheets and to briefly review some key concepts R functions and coding concepts. We’ll do the following things download a list of RefSeq accessions from a Google sheet remove the NAs select out all but one isoform using duplicated() Packages ## Google sheets...

3808 sym R (17519 sym/122 pcs) 1 img

Portfolio - Predicting Amino Acid Chemistry Using Regression Models

10.12.2021

Key vocab proteinogenic amino acids regression model / line of best fit pI confidence intervals (CI) confidence ellipse correlation coefficient Selenocysteine and Pyrrolysine re-coding stop codons y = m*x + b slope intercept Key functions / packages ggpubr pander lm() coef() cor() round() Predict pI for an Selenocysteine and Pyrrolysine Amino...

5645 sym R (3622 sym/25 pcs) 1 img 5 tbl