Publications by dgrapov
Multivariate Data Analysis Work Flow
Here is an example of a data analysis work flow supported in imDEV. This network visualization was made using CmapTools. Related To leave a comment for the author, please follow the link and comment on their blog: imDEV » R. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click he...
515 sym 4 img
Discriminating Between Iris Species
The Iris data set is a famous for its use to compare unsupervised classifiers. The goal is to use information about flower characteristics to accurately classify the 3 species of Iris. We can look at scatter plots of the 4 variables in the data set and see that no single variable nor bivariate combination can achieve this. One approach t...
2045 sym 10 img
Excel + Cytoscape + R = ExCytR
My new project is coming along nicely and should be released early 2013. It builds on the structures developed in imDEV to link Excel, Cytoscape and R using RExcel, RCytoscape, and CytoscapeRPC . This trio can be used to rapidly generate beautiful and informative network representations of data. Here is an example of a undirected Gaussian...
1327 sym 4 img
ExCytR Concept
The concept is to make a GUI to provide a static and dynamic linking between data and its network representations. Static access will involve making networks based on data and metadata stored in some table or spreadsheet. Dynamic control will provide interactive access to network construction and annotation properties. Together, these will pr...
2154 sym 6 img
Anaerobic Stress in Seeds – A Chemical Similarity Network Story
The chemical similarity network or CSN is a great tool for organizing biological data based on known biochemistry or chemical structural similarity. Here is an example CSN for visualizing metabolomic changes (measured via GC/TOF) due to anaerobic stress in germinating seeds. In this network edges are formed for chemical similarity scores > 75...
2040 sym R (635 sym/1 pcs) 8 img
Power Calculations – relationship between test power, effect size and sample size
I was interested in modeling the relationship between the power and sample size, while holding the significance level constant (p = 0.05) , for the common two-sample t-Test. Luckily R has great support for power analysis and I found the function I was looking for in the package pwr. To calculate the power for the two-sample T-test at different...
2170 sym R (2205 sym/3 pcs) 4 img
Data analysis approaches to modeling changes in primary metabolism
View this document on Scribd Related To leave a comment for the author, please follow the link and comment on their blog: imDEV » R. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on ...
423 sym 2 img
PCA to PLS modeling analysis strategy for WIDE DATA
Working with wide data is already hard enough, add to this row outliers and things can get murky fast. Here is an example of an anlysis of a wide data set, 24 rows x 84 columns. Using imDEV, written in R, to calculate and visualize a principal components analysis (PCA) on this data set. We find that 7 components capture >80% of the variance i...
4589 sym 14 img
Evaluation of Orthogonal Signal Correction for PLS modeling (OSC-PLS and OPLS)
Partial least squares projection to latent structures or PLS is one of my favorite modeling algorithms. PLS is an optimal algorithm for predictive modeling using wide data or data with rows << variables. While there is s a wealth of literature regarding the application of PLS to various tasks, I find it especially useful for biological data w...
5197 sym 14 img
Tutorial- Building Biological Networks
I love networks! Nothing is better for visualizing complex multivariate relationships be it social, virtual or biological. I recently gave a hands-on network building tutorial using R and Cytoscape to build large biological networks. In these networks Nodes represent metabolites and edges can be many things, but I specifically focused on biochemi...
1320 sym R (1560 sym/2 pcs) 8 img