The Now Platform® Washington DC release is live. Watch now!

Help
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
SlightlyLoony
Tera Contributor

Some people think of regular expressions almost as a religion (The One Way), but more people think of them as something to be avoided if possible. Some people find them elegant and intuitive, but more people find them ugly, incomprehensible, and head-exploding. I know accomplished developers who will do almost anything to avoid using regular expressions, and I know non-developers who use incredibly complex regular expressions with the same ease that I drink my morning tea. Personally, I think of them as a useful tool for many needs when processing text — and well worth making the effort to understand them well.

Though I've received more requests to blog about regular expressions than on any other topic, I've avoided it until now — mainly because it's a very large and detailed topic. My favorite book on the subject (Mastering Regular Expressions) is almost 500 pages long. But...some recent experiences have convinced me that I might actually be able to do something useful with a few blog posts on the subject. This is the first one.

I'm going to concentrate more on why one might want to use regular expressions, with just enough introduction to the details to (hopefully) whet your interest — then you can go study them in detail on your own 🙂

The first challenge with regular expressions is knowing when they might actually be an appropriate tool for the problem at hand. In general, regular expressions excel at finding text when you don't know exactly what text you're looking for, but instead you know something about the pattern of the text. Here's a simple example: suppose you want to see if a piece of text contains a social security number (aka SSN, the closest thing Americans have to an ID number). You don't know which SSN might be in the text, you just know that they are conventionally written like this: 123-45-6789. Without using regular expressions, you'd have to write code something like this:


var test = 'I have written my social security number (123-45-6789) in here, like this.';
var result = getSSNwithoutRegex(test);
gs.log(result);

function getSSNwithoutRegex(text) {
var state, count, start;
clear();
for (var i = 0; i < text.length; i++) {
var c = text.charAt(i);
var t = 'other';
if ((c >= '0') && (c <= '9'))
t = 'digit';
else if (c == '-')
t = 'dash';
switch (state) {
case 'G1':
if (t == 'digit') {
count++;
if (count == 1)
start = i;
if (count == 3)
changeTo('D1');
} else
clear();
break;
case 'G2':
if (t == 'digit') {
count++;
if (count == 2)
changeTo('D2');
} else
clear();
break;
case 'G3':
if (t == 'digit') {
count++;
if (count == 4) {
if ((i == (text.length - 1)) ||
(text.charAt(i + 1) < '0') ||
(text.charAt(i + 1) > '9'))
return text.substring(start, i+1);
else
clear();
}
} else
clear();
break;
case 'D1':
if (t == 'dash')
changeTo('G2');
else
clear();
break;
case 'D2':
if (t == 'dash')
changeTo('G3');
else
clear();
break;
}
}
return null;

function clear() {
changeTo('G1');
}

function changeTo(newState) {
state = newState;
count = 0;
}
}

That function finds SSNs just fine, and there's nothing fancy about it at all. It wasn't difficult to write and test. It will probably only take you a few minutes to figure out how it works. So what's wrong with it? Why do I think a regular expression might be a better tool for this problem? Here's why:

var test = 'I have written my social security number (123-45-6789) in here, like this.';
var result = getSSNwithRegex(test);
gs.log(result);

function getSSNwithRegex(text) {
var parser = /(^|[^0-9])([0-9]{3}-[0-9]{2}-[0-9]{4})($|[^0-9])/m;
var ans = parser.exec(text);
return (ans == null) ? null : ans[2];
}

It's just a tad shorter and simpler, eh?

Simpler, that is, if the regular expression itself doesn't make your head explode.

This example demonstrates the why for regular expressions. There are many occasions when a regular expression can make code dramatically simpler and easier to write (and often the difference is even more dramatic than this example). They're not magic; they're just a tool — but a tool that takes some effort to learn how to use.

In the next post, I'll show you how this regular expression works...

1 Comment