Publications by tomaztsql
Sharing thoughts on satRdays R Conference, Budapest 2016 #satRdays
First satRdays in Budapest September 03, 2016 event is completed. This one day, community driven event with regional for very affordable prices, good for networking, getting latest from R community event is over. And it was a blast! Great time, nice atmosphere, lots of interesting people and where there is a good energy, there is a will to learn ...
2961 sym R (492 sym/1 pcs) 34 img
Size of XDF files using RevoScaleR package
It came to my attention that size of XDF (external data frame) file can change drastically based on the compute context and environment. When testing the output of a dataset I was working on in SQL Server Management Studio I was simultaneously testing R code in RTVS or R Studio and I have noticed a file growth. Following stored procedure will d...
2051 sym R (2926 sym/5 pcs) 10 img
FileTable and storing graphs from Microsoft R Server
FileTable has been around now for quite some time and and it is useful for storing files, documents, pictures and and binary files in a designated SQL Server table – FileTable. The best part of FileTable is the fact one can access it from windows or other application as if it were stored on file system (because they are) and not making any ot...
2807 sym R (4076 sym/6 pcs) 16 img
Comparing performance on dplyr package, RevoScaleR package and T-SQL on simple data manipulation tasks
Long I wanted to test a simple data manipulation tasks and compare the execution time, ease of writing the code and simplicity between T-SQL and R package for data manipulation. Couple of packages I will mention for data manipulations are plyr, dplyr and data.table and compare the execution time, simplicity and ease of writing with general T-SQL ...
4516 sym R (15136 sym/7 pcs) 14 img
Performance comparison between kmeans and RevoScaleR rxKmeans
In my previous blog post, I was focusing on data manipulation tasks with RevoScaleR Package in comparison to other data manipulation packages and at the end conclusions were obvious; RevoScaleR can not (without the help of dplyrXdf) do piping (or chaining) and storing temporary results take time and on top of that, data manipulation can be done e...
3066 sym R (1786 sym/8 pcs) 10 img 1 tbl
Association Rules on WideWorldImporters and SQL Server R Services
Association rules are very handy for analyzing Retail data. And WWI database has really neat set of invoices that can be used to make a primer. Starting with following T-SQL query: USE WideWorldIMportersDW; GO ;WITH PRODUCT AS ( SELECT [Stock Item Key] ,[WWI Stock Item ID] ,[Stock Item] ,LEFT([Stock Item], 8) AS L8DESC ,ROW_NUMBER(...
3306 sym R (5887 sym/7 pcs) 8 img
Detecting outliers and fraud with R and SQL Server on my bank account data – Part 1
Detecting outliers and fraudulent behaviour (transactions, purchases, events, actions, triggers, etc.) takes a large amount of experiences and statistical/mathetmatical background. One of the samples Microsoft provided with release of new SQL Server 2016 was using simple logic of Benford’s law. This law works great with naturally occurring numb...
2836 sym R (1137 sym/3 pcs) 8 img
R graphs and tables in Power BI Desktop
Power BI Desktop enable users to use R script visual for adding custom visualization generated with R language – regardless of R package used. Before using R script visual, you will need to enable it by setting path to R Engine on your client in the global options. Once this is done, you will be able to enhance your Power BI reports using R vis...
3774 sym R (467 sym/3 pcs) 20 img
Using R sp_execute_external_script with JSON
JSON has become part of the SQL Server in the same version as R. Both were very highly anticipated and awaited from the community. JSON has very powerful statements for converting to and from JSON for storing into / from SQL Server engine (FOR JSON and JSON VALUE, etc). And since it is gaining popularity for data exchange, I was curious to give...
1928 sym R (1359 sym/6 pcs) 8 img
Clustering executed SQL Server queries using R as tool for
When query execution performance analysis is to be done, there are many ways to find which queries might cause any unwanted load or cause stall on the server. By encouraging DBA community to start practicing the advantage or R Language and world of data science, I have created a demo to show, how statistics on numerous queries can be stored for l...
2655 sym R (6189 sym/5 pcs) 10 img