Publications by Vik Paruchuri

NBA Playoff Predictions Update 2 and Results (3-1)

03.06.2012

This is my second follow-up to my previous two posts which were about predicting NBA games with an algorithm, and my first update to the algorithm. The algorithm’s record is now 3-1, as it correctly predicted Boston and Oklahoma City as winners of their past games.Upcoming things to doSadly, I have been a bit busy, and I have not b...

1202 sym 2 img

NBA Playoff Predictions Update 3 (4-2)

05.06.2012

This is my third update to my original post on predicting the NBA playoffs with an algorithm. Here are updates 1 and 2. The algorithm correctly predicted a Boston win, but missed on the Spurs/Thunder game, so it is currently 4-2. Haven’t had any time to update yet, so I will only be able to give you predictions for the next games,...

817 sym 2 img

NBA Playoff Predictions Update 4 (5-3)

07.06.2012

This is update 4 to my original post about predicting the NBA playoffs with R. With the Thunder beating the Spurs and the Heat losing to the Celtics, the algorithm went 1-1 on predictions, making it 5-3 so far. Making some improvements I have been posting for some time about incorporating more data into the models, and I finally got ...

3491 sym R (96 sym/1 pcs) 4 img

NBA Playoffs Update 5 (5-4)

09.06.2012

This is the sixth post in my series on predicting the NBA playoffs with an algorithm. After the Boston loss in their last game, the algorithm is now 5-4 in the playoffs. Hopefully it is correct tonight! Open Sourcing the CodeI have had a couple of requests to open source the code, which I had planned to do at the end of this series ...

1273 sym 2 img

Finding word use patterns in Wikileaks cables

12.06.2012

6/18: A follow-up to this post is now available here. Recent DiscoveriesWhen I was a diplomat, I was always interested in the Wikileaks cables and what could be done with them. Unfortunately, I never got a chance to look at the site in depth, due to security policies. Now that the ex- is firmly prepended to diplomat in my resume, I ...

6515 sym R (5234 sym/18 pcs) 10 img

Tracking US Sentiments Over Time In Wikileaks

18.06.2012

Introduction I recently posted about using the Wikileaks cable corpus to find word use patterns, both over time, and in secret cables vs unclassified cables. I received a lot of good suggestions for further topics to pursue with the corpus, and probably the most interesting was the idea to do sentiment analysis over time on a variety of named ent...

8460 sym R (917 sym/2 pcs) 16 img

How Many Data Scientists Are There?

09.08.2012

How Many Data Scientists Are There?I’ve seen a lot of articles lately about “Big Data” and the looming “talent gap.” This article from the Wall Street Journal is a good example. It cites a McKinsey estimate that states that we will need 1.5 million more managers and analysts who are conversant with “big data.” Of course, some of ...

12988 sym 14 img

My talk at Boston Python

25.06.2013

I just gave a talk at Boston Python about natural language processing in general, and edX ease and discern in specific. You can find the presentation source here, and the web version of it here. There is a video of it here. Nelle Varoquaux and Michael Selik also had interesting talks in the same video above, recommend checking them out. Related ...

745 sym

Natural language processing tutorial

25.06.2013

Introduction This will serve as an introduction to natural language processing. I adapted it from slides for a recent talk at Boston Python. We will go from tokenization to feature extraction to creating a model using a machine learning algorithm. The goal is to provide a reasonable baseline on top of which more complex natural language process...

11835 sym 18 img

Figuring out which Simpsons character is speaking

17.07.2013

You probably have a favorite Simpsons character. Maybe you hope to someday block out the sun, Mr. Burns style, maybe you enjoy Homer’s skill in averting meltdowns, or maybe you identify with Lisa’s struggles for acceptance. Through its characters, the Simpsons made a huge impact on a generation, and although the show is still running, my be...

6521 sym R (95 sym/3 pcs) 10 img