Publications by Benjamin Solomon
Final Portfolio BIOSC 1540
Introduction OCA2 melanosomal transmembrane protein, formerly known as P, is a transmembrane protein that is linked to melanin pigment production in mammals and may determine skin/eye color. Specifically, the encoded protein is thought to help small molecule transport of tyrosine, a vital step in melanin production. In this workflow, OCA2 will be...
4500 sym R (32377 sym/81 pcs) 6 img 11 tbl
Predicting amino acid chemistry
Key vocab proteinogenic amino acids regression model / line of best fit pI confidence intervals (CI) confidence ellipse correlation coefficient Selenocysteine and Pyrrolysine re-coding stop codons y = m*x + b slope intercept Key functions / packages ggpubr pander lm() coef() cor() round() Predict pI for an Selenocysteine and Pyrrolysine Amino...
5648 sym R (3140 sym/21 pcs) 1 img 4 tbl
Testing ggplot2 and ggpubr
gpubr - allometric data Allometric data - classic case of regression, using logs, using non-linear model too library(compbio4all) Vocab wrapper ggplot2 ggpubr $ operator smoother continous data categorical data Learning objectives Know what a wrapper is Know the relationship between ggplot2 and ggpubr Be able to run code that makes graphs wit...
4282 sym R (2504 sym/35 pcs) 9 img
Testing Google Sheets Access
The goal of this exercise is to make you familiar with how to download data from Google Sheets and to briefly review some key concepts R functions and coding concepts. We’ll do the following things download a list of RefSeq accessions from a Google sheet remove the NAs using na.omit() select out all but one isoform using duplicated() Packages...
867 sym R (2919 sym/27 pcs)
Genomics is Failing on Diversity Diagram
Introduction The following code is a recreation of the Diagram in Alice B. Popejoy and Stephanie M. Fullerton’s article, “Genomics is failing on diversity”. The data was collected by Popejoy and Fullerton by retrieving the sample descriptions included in the GWAS Catalog that refer to location/group compiling all in one place. This process ...
910 sym R (1000 sym/3 pcs) 1 img
Oca2 Repeat Analysis
Change the OCA4 of the title to your gene name. Change the names and text appropriately to reflect your gene / protein. Add the necessary code to make this script functional. Download the PROTEIN sequence of your gene. Adapting the code below, make 2 grids of 4 plots (8 plots total) exploring different settings for window size and the match thres...
1598 sym R (6090 sym/37 pcs) 5 img
Dot Plot Introduction
Sequence dotplots in R By: Avril Coghlan. Adapted, edited and expanded: Nathan Brouwer under the Creative Commons 3.0 Attribution License (CC BY 3.0). NOTE: I’ve added some new material that is rather terse and lacks explication. Good sources of more info: https://omicstutorials.com/interpreting-dot-plot-bioinformatics-with-an-example/ http://r...
4891 sym R (1621 sym/16 pcs) 11 img
Dot Plot Settings and Manipulations
##Edited by Benjamin Solomon In this exercise we’ll look at a sequence with known tandem repeats. We’ll load the data, explore it in R, then use the dotPlot() function to make various dotplots to see how changing settings for dotPlots() help make repeat patterns stand out. Add the necessary code to make this script functional. Preliminaries ...
1530 sym R (7699 sym/64 pcs) 12 img
Testing OneDrive and File Location
Default working directory Open up this .Rmd file in RStudio. In the code chunk below type getwd() and run it. “wd” means “working directory”, or where R will currently save files if you tell it to save anything. # Get the current working directory getwd() ## [1] "C:/Users/benza/Documents/Education/21 Fall Semester/Computational Biology/A...
1844 sym R (607 sym/4 pcs)
CompBio Portfolio 1
Assignment: Your assignment is to use your notes from class - along with help from classmates, UTAs, and me - to turn this script into a fleshed-out description of what is going on. This is a substantial project - we’ll work on it in steps over the rest of the unit. We are currently focused on the overall process and will cover the details over...
6552 sym R (13550 sym/58 pcs) 1 img