Publications by Michael kao
Units and metadata
Handling meta-data is not natural in R, or any traditional rectangular shaped type data storage system.There are several tricks and packages which attempt to solve this problem, with Hmisc using the atrribute feature and the IRange package having its own DataFrame class.The Hmisc allows one to store meta data such as units, label and ...
1763 sym
Imputation by mean?
Today, I was briefed that when computing the regional aggregates such as those defined by the M49 country standard of the United Nation (http://unstats.un.org/unsd/methods/m49/m49regin.htm) I should use the regional mean to replace missing values.I was sceptical about this approach based on the little knowledge I had about missing va...
2827 sym
Preferential attachment for network
I am currently taking the networked life course on Coursera.org offered by Professor Michael Kearns from the University of Pennsylvania. I have been took several courses including machine learning, natural language processing since the platform was launched late last year. I have to give my biggest praise to Andrew Ng and his team ...
2949 sym
Network of trade
This week, I got my hands on some agricultural trade data. Trade data are typically extremely dirty so treat with care when you get your hands on them. Lab standard equipments are required.So I decided to look how countries trade by plotting the network (The data is confidential so I would not disclose the country nor the commodit...
2832 sym 2 img
my Facebook social network
I got very excited on making a network diagram of my Facebook network using Ghefi (https://gephi.org/) and submitted my first assignment for the Social Network Analysis course on https://www.coursera.org/. It’s middle of the night, so I will keep the post short and update more details tomorrow.This is my Facebook social network, a...
1582 sym 2 img
Perculiar behaviour of the sum function
The sum function in R is a special one in contrast to other summary statistics functions such as mean and median. The first distinguish is that it is a Primitive function where the others are not (Although you can call mean using .Internal). This causes many inconsistency and unexpected behaviours.(1) Inconsistency in argumentFor exa...
2795 sym
Maize trade Part I: Generate the network diagram
It has been several month since my last post, partially due to the fact that my laptop was lost and several deadlines was approaching. Fortunately I will be returning to Taiwan and get a new laptop within a week, and will be updating regularly again.This post will provide a brief peak of the trade network which will be presented in the...
1513 sym R (930 sym/1 pcs) 2 img
Maize trade Part II: Comparison and analysis
Following my last post about the maize network, although interesting but is not very informative. What we are going to do today is to contrast the maize network with the wine trade network.The choice why we have chose wine will become clear after the network and the analysis. Lets first have a look at the trade network of wine, again onl...
3152 sym Python (1147 sym/1 pcs) 2 img
A package for agricultural statistic: FAOSTAT
After 8 years of using R, today I finally become a contributor to the community and released my first package, FAOSTAT.The package is designed to provide user with direct access to the FAOSTAT data base via R and to support the open data and methodology philosophy used in the Statistical Yearbook of the Food and Agricultural Organization....
528 sym R (125 sym/3 pcs)
Relearn boxplot and label the outliers
Despite the fact that box plot is used almost every where and taught at undergraduate statistic classes, I recently had to re-learn the box plot in order to know how to label the outliers.This stackoverflow post was where I found how the outliers and whiskers of the Tukey box plots are defined in R and ggplot2:In ggplot2, what do the end ...
1207 sym R (1850 sym/1 pcs) 2 img