Publications by RTextTools: a machine learning library for text classification - Blog

RStudio and RTextTools: A Perfect Pairing

15.04.2011

The development team has spent the past six months creating the best possible experience for RTextTools users. A few months into development, we heard about a new IDE called RStudio, which has one of the cleanest interfaces to R we’ve seen. It integrates many R tools (graphing, file management, workspace management, tabbed source editor, and mo...

1121 sym

Preparing RTextTools Beta Release for Catania 2011

23.05.2011

Right now our development team is busy preparing a conference release of RTextTools for The 4th Annual Conference of the Comparative Policy Agendas Project at the University of Catania in Sicily. One of the key issues we’ve had thus far is memory consumption with very large datasets.In the past week we’ve pushed out a slew of updates that al...

1246 sym

Reduce Memory Use for Large Datasets

01.06.2011

One key limiting factor for automated text classification is memory consumption. As you accumulate more news articles, bills, and legal opinions, the term-document matrices used to represent the data grow quickly. RTextTools provides two algorithms, support vector machines and maximum entropy, that can handle large datasets with very little memor...

2076 sym

Drafting the Documentation for RTextTools

07.06.2011

In preparation for The 4th Annual Conference of the Comparative Policy Agendas Project in Catania, Sicily, our development team has been busy drafting the documentation for RTextTools. In addition to standard documentation of functions, we want to provide quick-start guides, sample datasets, example scripts, and Amazon EC2 instructions to make i...

1137 sym

Maximum Entropy Now Supported for Windows

11.06.2011

After several weeks trying to find the source of a bug in the maximum entropy library when compiling on Windows, Dirk Eddelbuttel pointed me in the right direction to resolve the issue. Although it required a re-write of the library using the new Rcpp API, maximum entropy now installs on Windows machines when Rtools is installed.This is significa...

1020 sym

RTextTools now 100% Java-free!

16.06.2011

When we first wrote RTextTools, we opted to use RWeka for boosting and bagging algorithms for lack of a better alternative. We’ve discovered that this leads to all sorts of ugly rJava installation issues across platforms and prevents our users from getting started quickly. Recently, we’ve stumbled upon two excellent non-Java alternatives: Log...

1156 sym

Binary Installation Now Available

18.06.2011

The biggest complaint we had during the installation process was that Xcode (account required) and Rtools were required for MacOS X and Windows. Today we released universal binaries (PPC/i386/x86_64) for MacOS 10.5+ as well as binaries (i386/x86_64) for Windows. This addition will significantly reduce the amount of time it takes to install RText...

881 sym

Next Steps: Drafting the R Help Files

26.06.2011

With RTextTools now released and the feedback rolling in, the development team is getting the ball rolling on the help documentation for the library. Currently, you cannot access help files about the library or its functions from within R. However, we do offer a draft of a quick start guide in PDF format under the Documentation section of the web...

897 sym

RTextTools Improvements Underway

12.07.2011

Since RTextTool’s unveiling at the 2011 Cap Conference in Catania, the development team has been busy working on refinements to the package. This includes a number of changes to simplify the API, improve analytics, decrease memory use, and increase functionality. We’ve added support for another low-memory algorithm (GLMNET) in addition to the...

1186 sym

RTextTools v1.1 Released

03.08.2011

A major upgrade of RTextTools has been released, including many optimizations, UI changes, and features based on feedback from the 2011 CAP Conference in Catania. Changes include the addition of a new low-memory algorithm GLMNET, full user documentation, simplification of the user interface, bundled datasets, better analytics for both virgin and ...

1280 sym