Publications by Andrew Treadway
Running R Code in Parallel
Background Running R code in parallel can be very useful in speeding up performance. Basically, parallelization allows you to run multiple processes in your code simultaneously, rather than than iterating over a list one element at a time, or running a single process at a time. Thankfully, running R code in parallel is relatively simple using t...
5405 sym R (2184 sym/6 pcs) 4 img
Vectorize Fuzzy Matching
One of the best things about R is its ability to vectorize code. This allows you to run code much faster than you would if you were using a for or while loop. In this post, we’re going to show you how to use vectorization to speed up fuzzy matching. First, a little bit of background will be covered. If you’re familiar with vectorization a...
5177 sym R (1339 sym/7 pcs) 2 img
Underrated R Functions
I wanted to write a post about a couple of handy functions in R that don’t always get the recognition they deserve. This article will talk about a few functions that form part of R’s core functional programming capabilities. R has thousands of functions, so this is just a short list, and I’ll probably write other articles like this in the...
6298 sym R (1558 sym/13 pcs) 2 img
Timing Python Processes
Timing Python processes is made possible with several different packages. One of the most common ways is using the standard library package, time, which we’ll demonstrate with an example. However, another package that is very useful for timing a process — and particularly telling you how far along a process has come — is tqdm. As we’ll...
3603 sym Python (1380 sym/4 pcs) 6 img
Coding with the Yahoo_fin Package
Subscribe to TheAutomatic.net via the area on the right side of the page. The yahoo_fin package contains functions to scrape stock-related data from Yahoo Finance and NASDAQ. You can view the official documentation by clicking this link, but the below post will provide a few more in-depth examples. All of the functions in yahoo_fin are containe...
4334 sym Python (1525 sym/13 pcs) 2 img
ICA on Images with Python
Click here to see my recommended reading list. What is Independent Component Analysis (ICA)? If you’re already familiar with ICA, feel free to skip below to how we implement it in Python. ICA is a type of dimensionality reduction algorithm that transforms a set of variables to a new set of components; it does so such that that the statistical ...
3874 sym Python (475 sym/5 pcs) 16 img
R: How to create, delete, move, and more with files
Though Python is usually thought of over R for doing system administration tasks, R is actually quite useful in this regard. In this post we’re going to talk about using R to create, delete, move, and obtain information on files. How to get and change the current working directory Before working with files, it’s usually a good idea to first ...
6431 sym R (2275 sym/24 pcs) 2 img
How to download image files with RoboBrowser
In a previous post, we showed how RoboBrowser can be used to fill out online forms for getting historical weather data from Wunderground. This article will talk about how to use RoboBrowser to batch download collections of image files from Pexels, a site which offers free downloads. If you’re looking to work with images, or want to build a tr...
4359 sym Python (4912 sym/15 pcs) 4 img
How to get live stock prices with Python
In a previous post, I gave an introduction to the yahoo_fin package. The most updated version of the package includes new functionality allowing you to scrape live stock prices from Yahoo Finance (real-time). In this article, we’ll go through a couple ways of getting real-time data from Yahoo Finance for stocks, as well as how to pull cryptocu...
2567 sym Python (594 sym/5 pcs) 8 img
Getting data from PDFs the easy way with R
Earlier this year, a new package called tabulizer was released in R, which allows you to automatically pull out tables and text from PDFs. Note, this package only works if the PDF’s text is highlightable (if it’s typed) — i.e. it won’t work for scanned-in PDFs, or image files converted to PDFs. If you don’t have tabulizer installed, ju...
3013 sym R (781 sym/8 pcs) 2 img