Publications by Marcin Kosiński
Answers to FAQ about SparkR for R users
Many people keep asking me whether I have tried SparkR, is it worth using, is it sexy or WHAT is it at all. I felt that creating frequently asked questions (FAQ) in the field of WHAT is that Spark/SparkR? would help many R Scientists to understand this Big Data Buzz-tool. I have gathered information from the documentation and some code from stac...
7153 sym R (1284 sym/14 pcs) 2 img
Improve your shiny dashboard with Disqus panel
Getting users feedback is always a pleasant moment. In most cases in World of Open Source we are creating tools and applications for people and we love to hear that someone thinks our (generally pet) project is useful. Mostly this moment is nicer than any paycheck. But how author can enable easy contact with him as the author of the project or a...
3076 sym R (1000 sym/1 pcs) 17 img
RTCGA factory of R packages – Quick Guide
Yesterday we have been delivered with the new version of R – R 3.3.0 (codename Supposedly Educational). This enabled Bioconductor (yes, not all packages are distributed on CRAN) to release it’s new version 3.3. This means that all packages held on Bioconductor, that were under rapid and vivid development, have been moved to stable-release ve...
3892 sym R (1412 sym/8 pcs) 7 img
R 3.3.0 is another motivation for Docker
Have you ever encountered R packages versioning issues when one application required different dependent packages versions than other? Have you ever got stuck with your project because of wrong pre-installed software versions on machine on which you should run your code? Or maybe you had heavy adventures with installing R software on a new mach...
6422 sym R (489 sym/5 pcs) 3 img
Survival plots have never been so informative
Hadley Wickham’s ggplot2 version 2.0 revolution, at the end of 2015, triggered many crashes in dependent R packages, that finally led to deletions of few packages from The Comprehensive R Archive Network. It occured that survMisc package was removed from CRAN on 27th of January 2016 and R world remained helpless in the struggle with the elegant...
4542 sym R (1961 sym/7 pcs) 14 img
R Hero saves Backup City with archivist and GitHub
Have you ever suffered because of the impossibility of reproducing graphs, tables or analysis’ results in R? Have you ever bothered yourself for not being able to share R objects (i.e., plots or final analysis models) within your reports, posters or articles? Or maybe simply you have too many objects you can’t manage to store in a convenient ...
4735 sym R (4126 sym/22 pcs) 9 img 1 tbl
Venn Diagram Comparison of Boruta, FSelectorRcpp and GLMnet Algorithms
Feature selection is a process of extracting valuable features that have significant influence on dependent variable. This is still an active field of research and machine wandering. In this post I compare few feature selection algorithms: traditional GLM with regularization, computationally demanding Boruta and entropy based filter from FSelect...
6259 sym R (2660 sym/13 pcs) 6 img
LDAvis Show Case on R-Bloggers
Text mining is a new challenge for machine wandering practitioners. The increased interest in the text mining is caused by an augmentation of internet users and by rapid growth of the internet data which is said that in 80% is a text data. Extracting information from articles, news, posts and comments have became a desirable skill but what is eve...
5413 sym R (4764 sym/8 pcs) 2 img
BioC 2016 Conference Overview and Few Ways of Downloading TCGA Data
Few weeks ago I have a great pleasure of attending BioC 2016: Where Software and Biology Connect Conference at Stanford, where I have learned a lot! It wouldn’t be possible without the scholarship that I received from Bioconductor (organizers), which I deeply appreciate. It was an excellent place for software developers, statisticians and biol...
7483 sym R (357 sym/2 pcs) 2 img
Extending sparklyr to Compute Cost for K-means on YARN Cluster with Spark ML Library
Machine and statistical learning wizards are becoming more eager to perform analysis with Spark ML library if this is only possible. It’s trendy, posh, spicy and gives the feeling of doing state of the art machine learning and being up to date with the newest computational trends. It is even more sexy and powerful when computations can be perf...
5650 sym R (1832 sym/11 pcs) 4 img