Publications by Data Science notes

Locality Sensitive Hashing in R

01.01.2015

Introduction In the next series of posts I will try to explain base concepts Locality Sensitive Hashing technique. Note, that I will try to follow general functional programming style. So I will use R’s Higher-Order Functions instead of traditional R’s *apply functions family (I suppose this post will be more readable for non R users). Also I...

6057 sym R (3180 sym/20 pcs) 3 tbl

Analyzing texts with text2vec package

08.11.2015

In the last weeks I have actively worked on text2vec (formerly tmlite) – R package, which provides tools for fast text vectorization and state-of-the art word embeddings. This project is an experiment for me – what can a single person do in a particular area? After these hard weeks, I believe, he can do a lot. There are a lot of changes from ...

8523 sym R (5698 sym/49 pcs) 8 img

GloVe vs word2vec revisited.

30.11.2015

Today I will start to publish series of posts about experiments on english wikipedia. As I said before, text2vec is inspired by gensim – well designed and quite efficient python library for topic modeling and related NLP tasks. Also I found very useful Radim’s posts, where he tried to evaluate some algorithms on english wikipedia dump. This d...

12522 sym R (4425 sym/12 pcs) 12 img

text2vec GloVe implementation details

08.01.2016

Before reading this post, I very recommend to read: Orignal GloVe paper Jon Gauthier’s post, which provides detailed explanation of python implementation. This post helps me a lot with C++ implementation. Word embedding After Tomas Mikolov et al. released word2vec tool, there was a boom of articles about words vector representations. One of t...

7998 sym R (5074 sym/8 pcs)

text2vec 0.3

16.03.2016

Today I’m pleased to announce preview of the new version of text2vec. It is located in the 0.3 development branch, but very soon (probably in about a week) it will be merged into master. To reproduce examples below, please install [email protected] from github: devtools::install_github('dselivanov/[email protected]') Also I’m waiting for fe...

6687 sym R (5017 sym/32 pcs)

text2vec 0.4

06.10.2016

Introducing text2vec 0.4 Today I’m pleased to announce new major release of text2vec – text2vec 0.4 which is already on CRAN. For those readers who is not familiar with text2vec – it is an R package which provides an efficient framework with a concise API for text analysis and natural language processing. With this release I also launched p...

11931 sym R (7015 sym/52 pcs) 8 img