Publications by Gregory Kanevsky
Map of the Windows Fonts Registered with R
If you already found package extrafont then you probably found how to load and use Windows fonts in R visualizations. But just in case, everything to get started with extrafont is found here and summarized for using fonts in Windows for on-screen or bitmap output below:One thing to add is a summary of all Windows fonts registered in R. This...
1132 sym 2 img
Running similar but independent jobs in parallel on Aster with R
No surprise that Teradata Aster runs each SQL, SQL-MR, and SQL-GR command in parallel on many clusters with distributed data. But when faced with the task of running many similar but independent jobs one has to do extra work to parallelize them in Aster. When running a SQL script the next command has to wait for the previous to finish. This ma...
5609 sym 2 img
Correlation Primer with Aster and R
Calculating correlations is often starting point before more advanced analytical steps take place. Big data (long data) always presents computational challenges of both scale and distributed nature. In turn they may get aggravated by the presence of large number of features (wide data). But challenges do not stop here as complex relationships in...
1928 sym 1 tbl
MapReduce in Two Modern Paintings
Two years ago we had a rare family outing to the Dallas Museum of Art (my son is teenager and he’s into sport after all). It had an excellent exhibition of modern art and DMA allowed taking pictures. Two hours and dozen of pictures later my weekend was over but thanks to Google Photos I just stumbled upon those pictures again. Suddenly, I re...
2151 sym 4 img 2 tbl
Logarithmic Scale Explained with U.S. Trade Balance
Skewed data prevail in real life. Unless you observe trivial or near constant processes data is skewed one way or another due to outliers, long tails, errors or something else. Such effects create problems in visualizations when a few data elements are much larger than the rest.Consider U.S. 2016 merchandise trade partner balances data set wher...
3138 sym 12 img 1 tbl
The Role of Small Data and Vacation Recap Example
Wikipedia defines small data ‘small’ enough for human comprehension but then it goes further by qualifying data in a volume and format that makes it accessible, informative and actionable. I am not certain the latter is always true: smaller footprint doesn’t automatically qualify data as informative and actionable without more work. In my b...
3667 sym 4 img 2 tbl
Dallas Animal Services: Shelter Intake Types vs. Outcomes Analysis
Thanks to Dallas OpenData anyone has access to the city animal shelter records. If you lost or found a pet it could be that he or she spent some time in a shelter – I personally took lost dogs there. It’s unfortunate but every year tens of thousands of animals find their way to shelters with significant fraction never finding way out. C...
6313 sym 10 img 2 tbl
Surviving Shelter: Analysis of Time Spent and Outcome in Dallas Animal Shelters
In previous post we discovered Dallas Animal Services data sources (available on Dallas Open Data) and successfully analyzed how animals get admitted to and discharged from the city shelters. We loaded actual shelter records and looked at the types of admittance, different outcomes and their relationships. In this post we continue this analysis ...
10238 sym 34 img
Finally, You Can Plot H2O Decision Trees in R
Creating and plotting decision trees (like one below) for the models created in H2O will be main objective of this post:Figure 1. Decision Tree Visualization in RDecision Trees with H2OWith release 3.22.0.1 H2O-3 (a.k.a. open source H2O or simply H2O) added to its family of tree-based algorithms (which already included DRF, GBM, and XGBoost) supp...
11306 sym 10 img 5 tbl
How H2O propels data scientists ahead of itself: enhancing Driverless AI with advanced options, recipes and visualizations
H2O engineers continually innovate and implement latest techniques by following and adopting latest research, working on cutting edge use cases, and participating and winning machine learning competitions like Kaggle. But thanks to explosion of AI research and applications even most advanced automated machine learning platforms like H2O.ai Drive...
12647 sym R (975 sym/5 pcs) 16 img 2 tbl