Publications by altuna

Principal Component Analysis: Which variables contribute most to principal components ?

23.11.2010

Principal component analysis (PCA) is a mathematical transformation of possibly(correlated) variables into a number of uncorrelated variables called principal components. The resulting components from this transformation is defined in such a way that the first principal component has the highest variance and accounts for as most of the variabilit...

1667 sym

sqldf and grouping rows in R

08.02.2011

In R, you can treat tables (or data.frames as they are called in R) as SQL tables. That means you can query them as you would query a database with SQL commands. This is particularly useful 1) if you know SQL, hahah:)  2) if you have large tables with millions of rows. In R, querying a database will be much faster than iterating through the r...

2254 sym

command line options in R: "optparse" package

08.02.2011

C/python style option parsing now available in R with “optparse” package. Check the documentation here and see below to see how it works[email protected] scripts $ Rscript filter.transcripts.by.ncRNA.R -husage:  usage: filter.transcripts.by.ncRNA.R [options]options:     -i INPUTFILE, –inputfile=INPUTFILE        a BED1...

1147 sym

Utilizing multiple cores in R

08.02.2011

There are a couple of options in R, if you want to utilize multiple cores on your machine. These days my favorite is doMC package, which depends on foreach and multicore packages.in the section below squareroot for each number is calculated in parallel. Check the vignette for more complicated example. In practice, if you need to itera...

1001 sym R (285 sym/2 pcs)

Access all UCSC wiggle tracks from R and your terminal

21.02.2011

rtracklayer package allows you to access most of the UCSC wiggle tracks from R. However, there is another way which might more practical in situations where you need to summarize the wig track scores over a given set of genomic coordinates. Although you can get a similar information from rtracklayer, you will need to do the summary statistics for...

2090 sym R (70 sym/1 pcs)

Calling BEDtools from R

22.02.2011

BEDtools suite provides command-line functionality when dealing with genomic coordinate based operations, such as overlapping bed files or getting coverage of a bed file over a genome (similar, not exactly same, functionality in R is provided by IRanges package in bioconductor). If you have the BEDtools suite installed and it is in your path, you...

1390 sym R (1242 sym/1 pcs)

Tips on installing R extension for Rapidminer on Mac OS X

09.03.2011

Rapidminer is a cool toy to play with machine-learning/data-mining algorithms and it can interface with R. However, it was a bit problematic for me to get the R extension working properly on Mac OS X Leopard for R 2.11. Here is what works for me at the moment:1) get rapidminer  (obviously 🙂 )2) install rJava and JavaGD in R through install.pa...

1669 sym 2 img

Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

17.03.2011

Which one of the sqldf, plyr, doBy and aggregate functions/packages would be faster for applying functions on groups of rows? I was wondering about this earlier in this post.  It seems sqldf would be the fastest according to a post in manipulatr mail list.Well, here is the ranking from the fastest to the slowest: (check the link for ...

1694 sym 6 img

Fast(ish) extraction of exon locations from a BED12 file using data.table

20.03.2011

Here is a fast R function to extract exon locations from a BED12 file. Note that fast is a relative term, the function below is fast enough for me, may not be fast enough for others 🙂 Anyway, a BED12 file typically has locations of genomic features (those features are usually genes or transcripts if the format is BED12 ). 11th and ...

1998 sym R (1207 sym/1 pcs)

knitr: nice alternative for Sweave

17.12.2011

I recently discovered knitr for dynamic report generation in R. It seems like a very powerful alternative to Sweave. Particularly, I am interested in png graphic device support (it supports more than 20 graphic devices) and R code formatting.Check it out at:http://yihui.github.com/knitr/ Related To leave a comment for the author, pl...

712 sym