Publications by Kun Ren

Welcome to my homepage

22.01.2014

Finally I set up my blog website hosted by GitHub Pages. It is quite amazing to create a website totally for free. After some trial and error, now I come up with a stable workflow to publish my work, introductory materials, and random thoughts online. The contents of my postings will be related to data science, statistical programming, quantitat...

1130 sym

R: Getting Started

23.01.2014

R rocks in both academia and industry nowadays. A rapidly increasing number of researchers choose R to be one of their productive tools for data analysis and data visualization. It is partially because the software is totally free and open-source but also because the community behind the stage who contributes to nearly 5000 packages remains growi...

4134 sym

Useful packages for Sublime Text

24.01.2014

Sublime Text is an extremely powerful text editor. Currently I use Sublime Text 3 and quite enjoy its simplicity and extensibility. In this blog, I would like to introduce some of my favorite packages that leverage my productivity. Package Control Sublime Text is by default equipped with its package manager: Package Control. If you need to extend...

4689 sym R (459 sym/1 pcs)

R: Essentials

25.01.2014

It is quite easy to get started with R. The very first step is to download R from the official website , and install it. I suggest that you install both 32-bit and 64-bit versions for greater compatibility if you are running a 64-bit operating system. For typical statistical programming, if your dataset is not huge, it does not matter which one...

3771 sym

Difference between assignment operators in R

27.01.2014

For R beginners, the first operator they use is probably the assignment operator <-. Google's R Style Guide suggests the usage of <- rather than = even though the equal sign is also allowed in R to do exactly the same thing when we assign a value to a variable. However, you might feel inconvenient because you need to type two characters to repres...

3879 sym R (473 sym/9 pcs)

Introduction to parallel computing in R

31.01.2014

For R beginners, for loop is an elementary flow-control device that simplifies repeatedly calling functions with different parameters. A possible block of code is like this: run <- function(i) { return((i+1)/(i^2+1)) } for(i in 1:100) { run(i) } In this code, we first define a function that calculates something, and then run the function fro...

6549 sym R (688 sym/11 pcs)

Use SQL to operate R data frames

06.02.2014

In both research and application, we need to manipulate data frames by selecting desired columns, filtering records, transforming and aggregating data. R provides built-in functions for data frame manipulation. Suppose df is the data frame we are dealing with. We use df[1:100,] to select the first 100 rows, df[,c("price","volume")] to select pri...

4678 sym R (780 sym/9 pcs)

A principle of writing robust R program

14.02.2014

Writing R code can be very easy. It depends on how much you want to achieve with your code and what features you want your code to support. To test a random thought that needs some statistical evidence, you only need to casually import data, slightly transform the data to a necessary form, and perform some statistical tests and see the conclusion...

5357 sym R (1187 sym/6 pcs)

Extract information from texts with regular expressions in R

19.02.2014

People love dealing with well-structured data. It costs much less efforts than working with disorganized raw texts. In economic and financial research, we typically download data from open-access websites or authentication-required databases. These sources may provide data in multiple formats. For example, almost all databases are able to provide...

7410 sym R (3119 sym/16 pcs)

Reshape R data frame from long to wide format

13.03.2014

Oftentimes, we obtain a long or a wide table from a certain data source, and it may be the only format we can get. For example, some financial databases provide daily tick data for all stocks in a financial market. The data table may be arranged in a long format like this: Code Date Open High Low Close 1 A01 2014-03-10 10.0 13.0 9.0 1...

3081 sym R (574 sym/5 pcs)