Home

RegExp API Addendum

Thanks to Zachary Carter and Steve Leviathan for pointing out that the issue identified in my previous article has already gotten some attention:

I'm relieved that I wasn't the only one scratching his head over this issue.

I and others hope that this will be standardized for the next version of ECMAScript.

The solution seems simple enough, except that I have one nitpick: it mixes up both the compiled regular expression and the state of an an in-progress match in a single RegExp object.

That is, whether a match is exact or searching shouldn't be a property of the RegExp itself -- it's a property of the match in progress. Likewise, .lastIndex is a property of a match, not a RegExp.

Referring to Python's API again, the use of the match() or search() methods specifies what type of match to do, and pos is a parameter to each of the methods. Thus the same regexp object can be used in multiple matches at the same time.

This might seem like academic distinction in the browser, but JavaScript is used in many different contexts now. APIs should be designed with concurrency in mind.

For example, say you are writing a concurrent web scraper. You may want to use a RegExp to match the content of multiple HTTP responses. However, with the current API, you'll be forced to create a RegExp instance per response -- otherwise .lastIndex values from different requests will stomp on each other. Using the proposed parameters will let you keep state properly divided by request, while sharing the underlying RegExp instance.

Luckily, there's a simple and elegant solution that preserves compatibility while making this distinction. I propose adding two optional positional arguments to .exec() and .test():

RegExp.exec(str, pos, flags)

Example call:

operatorRegex.exec('var a = 5+3;', 10, 'y');

So then the two methods will follow this logic:

  1. First try the parameter pos to find the start position.
  2. If pos is unspecified, use the .lastIndex member.
  3. If both are undefined, use 0.

Likewise, for the match type:

  1. Look for y in the flags parameter.
  2. Then look for it in the RegExp object itself.
  3. If not found, then do the normal search for the RegExp.

The only valid flags parameters are the ones that don't change the compiled representation of the RegExp (just y for now).

This solution is easy to implement and preserves compatibility with both earlier ECMAScript versions and Mozilla's extensions in Firefox 3.