Fast, Functional Text Mining: Rosie Pattern Language

April 27th 2019 09:00 - 09:45
Summit 1
Regex have well-known limitations and subtle ones. RPL replaces regex for mining unstructured text, is easy to write & maintain, and its match engine is fast. Under the covers, RPL expressions are combinators; these pure functions compile to instructions for a “matching virtual machine”.

Regular expressions are everywhere, including in the inner loops of most data mining code. But they don’t scale! Almost every implementation uses exponential backtracking, which can stall mining of big data, where input anomalies are likely. And building collections of regex is fraught, because they don’t compose. Perhaps most importantly, regex don’t scale to teams of people, because they are famously hard to read, understand, and maintain.

The Rosie Pattern Language (RPL) addresses all of these scale challenges: big data is processed in linear time in the input size; packages of composable patterns are easily shared; and it has a readable syntax, with named patterns, flexible whitespace, and comments, like a programming language.
Jamie Jennings avatar
Jamie Jennings
Jamie received her Ph.D. in Computer Science from Cornell University, and has held positions in academia and industry. As a Senior Technical Staff Member in IBM, she led the creation of an international technical standard for over the air update of cell phone software, a standard in wide use by the...
This website uses cookies. By continuing to browse you agree to this and Conferize's terms of service.