About the Site

This weblog is edited and run by members of reallyenglish, a company offering a total English learning solution based in London, Beijing, Shanghai and Tokyo. Visit our corporate site to know more about what we do.

Notes are posted by members from various cultural and geographical backgrounds, and the topics range from education, business and international communication to software development, the internet culture, and more.

Staff

Masatomo Nakano http://twitter.com/masatomon /m/mt-static/support/assets_c/userpics/userpic-2-100x100.png simonl davida jeremyw Go Kameda gavin b No name tomoyukis

 

Excluding certain matches in String.replace (JavaScript RegEx)

| No Comments

The other day I wanted to emphasize all the numbers in HTML text dynamically with JavaScript for some reason. First it was quite simple enough even to my RegEx-dummy brain. I just wrote something like the following;

(JavaScript)
function emphasizeNumbers(){
  var elmText = document.getElementById("text");
  elmText.innerHTML = elmText.innerHTML.replace(/(\d)/g, "<em>$1</em>");
  return false;
}

(HTML)
<p id="text">You have finished 3 lessons. Your score was 80/100. This is end of Stage 1. Please proceed to Stage 2. (By the way my number is 012-345-6789).</p>

So far it was simple enough. However when I looked at the consequence, I noticed that we did not want to emphasize the numbers followed by the word "Stage", as in "1" in "Stage 1" and "2" in "Stage 2". And I did not know the way how to exclude only these from the RegEx matches to be replaced.

Without too much consideration, first I tried the following;

function emphasizeNumbers(){
  var elmText = document.getElementById("text");
  elmText.innerHTML = elmText.innerHTML.replace(/[^(Stage )](\d)/g, "<em>$1</em>");
  return false;
}

Here I tried to mean "match with any digits which are preceded by anything other than the string 'Stage ' (as well as trying to exclude the grouped characters 'Stage ' from back-references)". However the result was miserable. Honestly I still cannot even explain what exactly happened here. Anyways, I understood that grouping with parenthesis ("()") in character class square bracket("[]") does not work (does it?).

After a while of googling, I found that there is an RegEx expression called "negative lookbehind". It took me about an hour to even vaguely understand the concept, but this seemed to be the answer. So anyways, I tried the following just mimicking the tutorial page;

function emphasizeNumbers(){
  var elmText = document.getElementById("text");
  elmText.innerHTML = elmText.innerHTML.replace(/[^(?<!Stage )(\d)/g, "<em>$1</em>");
  return false;
}

It did not work at all and ended up in a JavaScript error. Soon after that, I found that I missed out one important line in the tutorial page;

Finally, flavors like JavaScript, Ruby and Tcl do not support lookbehind at all, even though they do support lookahead.
OK great, (negative) lookbehind is NOT supported in current JavaScript (as of 2010 Jan, JavaScript 1.8.1) RegEx.

After all, I resolved the problem by doing the following.

function emphasizeNumbers(){
  var elmText = document.getElementById("text");
  elmText.innerHTML = elmText.innerHTML.replace(/(?:Stage\s)?(\d)/g, function(str, p1){
    if(str.indexOf("Stage ")!=-1) return str;
    else return "<em>" + p1 + "</em>";
  });
  return false;
}

This finally worked. I did not know that the String.replace method accepts a function instead of a replacing string as the second parameter.
String.replace (MDC Core JavaScript 1.5 reference) - Specifying a function as a parameter

However I'm not sure if this is the best solution - is there better and easier way to do the same?

Leave a comment