Publications by RTextTools: a machine learning library for text classification - Blog
RStudio and RTextTools: A Perfect Pairing
The development team has spent the past six months creating the best possible experience for RTextTools users. A few months into development, we heard about a new IDE called RStudio, which has one of the cleanest interfaces to R we’ve seen. It integrates many R tools (graphing, file management, workspace management, tabbed source editor, and mo...
1121 sym
Preparing RTextTools Beta Release for Catania 2011
Right now our development team is busy preparing a conference release of RTextTools for The 4th Annual Conference of the Comparative Policy Agendas Project at the University of Catania in Sicily. One of the key issues we’ve had thus far is memory consumption with very large datasets.In the past week we’ve pushed out a slew of updates that al...
1246 sym
Reduce Memory Use for Large Datasets
One key limiting factor for automated text classification is memory consumption. As you accumulate more news articles, bills, and legal opinions, the term-document matrices used to represent the data grow quickly. RTextTools provides two algorithms, support vector machines and maximum entropy, that can handle large datasets with very little memor...
2076 sym
Drafting the Documentation for RTextTools
In preparation for The 4th Annual Conference of the Comparative Policy Agendas Project in Catania, Sicily, our development team has been busy drafting the documentation for RTextTools. In addition to standard documentation of functions, we want to provide quick-start guides, sample datasets, example scripts, and Amazon EC2 instructions to make i...
1137 sym
Maximum Entropy Now Supported for Windows
After several weeks trying to find the source of a bug in the maximum entropy library when compiling on Windows, Dirk Eddelbuttel pointed me in the right direction to resolve the issue. Although it required a re-write of the library using the new Rcpp API, maximum entropy now installs on Windows machines when Rtools is installed.This is significa...
1020 sym
RTextTools now 100% Java-free!
When we first wrote RTextTools, we opted to use RWeka for boosting and bagging algorithms for lack of a better alternative. We’ve discovered that this leads to all sorts of ugly rJava installation issues across platforms and prevents our users from getting started quickly. Recently, we’ve stumbled upon two excellent non-Java alternatives: Log...
1156 sym
Binary Installation Now Available
The biggest complaint we had during the installation process was that Xcode (account required) and Rtools were required for MacOS X and Windows. Today we released universal binaries (PPC/i386/x86_64) for MacOS 10.5+ as well as binaries (i386/x86_64) for Windows. This addition will significantly reduce the amount of time it takes to install RText...
881 sym
Next Steps: Drafting the R Help Files
With RTextTools now released and the feedback rolling in, the development team is getting the ball rolling on the help documentation for the library. Currently, you cannot access help files about the library or its functions from within R. However, we do offer a draft of a quick start guide in PDF format under the Documentation section of the web...
897 sym
RTextTools Improvements Underway
Since RTextTool’s unveiling at the 2011 Cap Conference in Catania, the development team has been busy working on refinements to the package. This includes a number of changes to simplify the API, improve analytics, decrease memory use, and increase functionality. We’ve added support for another low-memory algorithm (GLMNET) in addition to the...
1186 sym
RTextTools v1.1 Released
A major upgrade of RTextTools has been released, including many optimizations, UI changes, and features based on feedback from the 2011 CAP Conference in Catania. Changes include the addition of a new low-memory algorithm GLMNET, full user documentation, simplification of the user interface, bundled datasets, better analytics for both virgin and ...
1280 sym