Publications by David Smith

Data Analysis for Life Sciences

20.07.2017

Rafael Irizarry from the Harvard T.H. Chan School of Public Health has presented a number of courses on R and Biostatistics on EdX, and he recently also provided an index of all of the course modules as YouTube videos with supplemental materials. The EdX courses are linked below, which you can take for free, or simply follow the series of YouTube...

1484 sym 2 img

IEEE Spectrum 2017 Top Programming Languages

21.07.2017

IEEE Spectrum has published its fourth annual ranking of of top programming languages, and the R language is again featured in the Top 10. This year R ranks at #6, down a spot from its 2016 ranking (and with an IEEE score — derived from search, social media, and job listing trends — tied with the #5 place-getter, C#). Python has taken the #1...

1541 sym 2 img

Analyzing Github pull requests with Neural Embeddings, in R

24.07.2017

At the useR!2017 conference earlier this month, my colleague Ali Zaidi gave a presentation on using Neural Embeddings to analyze GitHub pull request comments (processed using the tidy text framework). The data analysis was done using R and distributed on Spark, and the resulting neural network trained using the Microsoft Cognitive Toolkit. You ...

812 sym

SQL Server 2017 release candidate now available

25.07.2017

SQL Server 2017, the next major release of the SQL Server database, has been available as a community preview for around 8 months, but now the first full-featured release candidate is available for public preview. For those looking to do data science with data in SQL Server, there are a number of new features compared to SQL Server 2017 which mig...

1565 sym

Introducing Joyplots

26.07.2017

This is a joyplot: a series of histograms, density plots or time series for a number of data segments, all aligned to the same horizontal scale and presented with a slight overlap. Peak time for sports and leisure #dataviz. About time for a joyplot; might do a write-up on them. #rstats code at https://t.co/Q2AgW068Wa pic.twitter.com/SVT6pkB2hB �...

3147 sym

The R6 Class System

27.07.2017

R is an object-oriented language with several object-orientation systems. There's the original (and still widely-used) S3 class system based on the “class” attribute. There's the somewhat stricter, signature-based S4 class system. There are reference classes (also called R5), which provide R objects with multiple references without duplicati...

2443 sym R (71 sym/2 pcs)

Learn parallel programming in R with these exercises for "foreach"

28.07.2017

The foreach package provides a simple looping construct for R: the foreach function, which you may be familiar with from other languages like Javascript or C#. It's basically a function-based version of a "for" loop. But what makes foreach useful isn't iteration: it's the way it makes it easy to run those iterations in parallel, and save time on ...

1577 sym

How to use H2O with R on HDInsight

31.07.2017

H2O.ai is an open-source AI platform that provides a number of machine-learning algorithms that run on the Spark distributed computing framework. Azure HDInsight is Microsoft's fully-managed Apache Hadoop platform in the cloud, which makes it easy to spin up and manage Azure clusters of any size. It's also easy to to run H2O on HDInsight: H2O AI...

1744 sym

A modern database interface for R

01.08.2017

At the useR! conference last month, Jim Hester gave a talk about two packages that provide a modern database interface for R. Those packages are the odbc package (developed by Jim and other members of the RStudio team), and the DBI package (developed by Kirill Müller with support from the R Consortium). To communicate with databases, a common ...

4276 sym R (143 sym/3 pcs)

Applications in energy, retail and shipping

02.08.2017

The Solutions section of the Cortana Intelligence Gallery provides more than two dozen working examples of applying machine learning, data science and artificial intelligence to real-world problems. Each solution provides sample data, scripts for model training and evaluation, and reporting of predictions. You can deploy a complete stack in Azur...

3156 sym 2 img