Publications by Vik Paruchuri
NBA Playoff Predictions Update 2 and Results (3-1)
This is my second follow-up to my previous two posts which were about predicting NBA games with an algorithm, and my first update to the algorithm. The algorithm’s record is now 3-1, as it correctly predicted Boston and Oklahoma City as winners of their past games.Upcoming things to doSadly, I have been a bit busy, and I have not b...
1202 sym 2 img
NBA Playoff Predictions Update 3 (4-2)
This is my third update to my original post on predicting the NBA playoffs with an algorithm. Here are updates 1 and 2. The algorithm correctly predicted a Boston win, but missed on the Spurs/Thunder game, so it is currently 4-2. Haven’t had any time to update yet, so I will only be able to give you predictions for the next games,...
817 sym 2 img
NBA Playoff Predictions Update 4 (5-3)
This is update 4 to my original post about predicting the NBA playoffs with R. With the Thunder beating the Spurs and the Heat losing to the Celtics, the algorithm went 1-1 on predictions, making it 5-3 so far. Making some improvements I have been posting for some time about incorporating more data into the models, and I finally got ...
3491 sym R (96 sym/1 pcs) 4 img
NBA Playoffs Update 5 (5-4)
This is the sixth post in my series on predicting the NBA playoffs with an algorithm. After the Boston loss in their last game, the algorithm is now 5-4 in the playoffs. Hopefully it is correct tonight! Open Sourcing the CodeI have had a couple of requests to open source the code, which I had planned to do at the end of this series ...
1273 sym 2 img
Finding word use patterns in Wikileaks cables
6/18: A follow-up to this post is now available here. Recent DiscoveriesWhen I was a diplomat, I was always interested in the Wikileaks cables and what could be done with them. Unfortunately, I never got a chance to look at the site in depth, due to security policies. Now that the ex- is firmly prepended to diplomat in my resume, I ...
6515 sym R (5234 sym/18 pcs) 10 img
Tracking US Sentiments Over Time In Wikileaks
Introduction I recently posted about using the Wikileaks cable corpus to find word use patterns, both over time, and in secret cables vs unclassified cables. I received a lot of good suggestions for further topics to pursue with the corpus, and probably the most interesting was the idea to do sentiment analysis over time on a variety of named ent...
8460 sym R (917 sym/2 pcs) 16 img
How Many Data Scientists Are There?
How Many Data Scientists Are There?I’ve seen a lot of articles lately about “Big Data” and the looming “talent gap.” This article from the Wall Street Journal is a good example. It cites a McKinsey estimate that states that we will need 1.5 million more managers and analysts who are conversant with “big data.” Of course, some of ...
12988 sym 14 img
My talk at Boston Python
I just gave a talk at Boston Python about natural language processing in general, and edX ease and discern in specific. You can find the presentation source here, and the web version of it here. There is a video of it here. Nelle Varoquaux and Michael Selik also had interesting talks in the same video above, recommend checking them out. Related ...
745 sym
Natural language processing tutorial
Introduction This will serve as an introduction to natural language processing. I adapted it from slides for a recent talk at Boston Python. We will go from tokenization to feature extraction to creating a model using a machine learning algorithm. The goal is to provide a reasonable baseline on top of which more complex natural language process...
11835 sym 18 img
Figuring out which Simpsons character is speaking
You probably have a favorite Simpsons character. Maybe you hope to someday block out the sun, Mr. Burns style, maybe you enjoy Homer’s skill in averting meltdowns, or maybe you identify with Lisa’s struggles for acceptance. Through its characters, the Simpsons made a huge impact on a generation, and although the show is still running, my be...
6521 sym R (95 sym/3 pcs) 10 img