Wildcard Matching

After solving this leetcode problem using dynamic programming, I had a look at how others solved it, and this approach caught my eyes. The C++ version is surprisingly neat, possibly due to utilizing the fact NULL is at the end of the char array.

// reprinted from above link
bool isMatch(const char *s, const char *p) {
    const char* star=NULL;
    const char* ss=s;
    while (*s){
        //advancing both pointers when (both characters match) or ('?' found in pattern)
        //note that *p will not advance beyond its length
        if ((*p=='?')||(*p==*s)){s++;p++;continue;}

        // * found in pattern, track index of *, only advancing pattern pointer
        if (*p=='*'){star=p++; ss=s;continue;}

        //current characters didn't match, last pattern pointer was *, current pattern pointer is not *
        //only advancing pattern pointer
        if (star){ p = star+1; s=++ss;continue;}

       //current pattern pointer is not star, last patter pointer was not *
       //characters do not match
        return false;
    }

   //check for remaining characters in pattern
    while (*p=='*'){p++;}

    return !*p;
}

The gist of the algorithm is to match empty string with * firstly, and backtrack so that * match one more char every time the matching fails later on, basically depth-first search if we see how state graph is traversed.

Since I don’t know much about C++, I tried to re-implement it in Java, here is what ended up with. I had to say this Java code is rather ugly, for the program is littered with arbitrary +1, which makes it hard to assess whether the program is correct or not.

static boolean isMatch(String str, String pattern) {
  int i = 0;
  int j = 0;

  int bt_i = 0;
  int bt_j = 0;

  int m = str.length();
  int n = pattern.length();
  while (i < m) {
    char s = str.charAt(i);

    if (j < n) {
      char p = pattern.charAt(j);

      if (p == '*') {
        // match empty string, but leave backtrack here
        // potentially overwrite previous star, but it's fine
        bt_i = i + 1;
        bt_j = j + 1;
        j++;
        continue;
      }

      if (s == p || p == '?') {
        i++;
        j++;
        continue;
      }
    }

    if (bt_i != 0) {
      // backtrack to previous star
      i = bt_i;
      j = bt_j;

      bt_i++;
      continue;
    }

    return false;
  }

  while (j < n) {
    if (pattern.charAt(j) != '*') {
      return false;
    }
    j++;
  }

  return true;
}

Considering that Haskell is another programming language I can write helloworld in, I rewrote (sort of) the same logic in it, but the result was truly astonishing.

is_match :: String -> String -> Bool
is_match str@(s:ss) pattern@(p:ps)
  | p == '*' = is_match str ps || is_match ss pattern
  | s == p || p == '?' = is_match ss ps
  | otherwise = False
is_match [] pattern = all (=='*') pattern
is_match (s:ss) [] = False

Amazed by the resulting Haskell program, I started to ponder on this transcending difference. In the process, I realized that the two versions are actually not equivalent. If we draw the state graph, we could see that the two versions don’t cover the same state graph. In order to make the two programs the same, I had to change the Java program to the following. (Modification is marked with DELTA.)

static boolean isMatch(String str, String pattern) {
  int i = 0;
  int j = 0;

  int bt_i = 0;
  int bt_j = 0;

  int m = str.length();
  int n = pattern.length();
  while (i < m) {
    char s = str.charAt(i);

    if (j < n) {
      char p = pattern.charAt(j);

      if (p == '*') {
        // match empty string, but leave backtrack here
        // potentially overwrite previous star, but it's fine
        bt_i = i + 1;
        bt_j = j; // DELTA
        j++;
        continue;
      }

      if (s == p || p == '?') {
        i++;
        j++;
        continue;
      }
    }

    if (bt_i != 0) {
      // backtrack to previous star
      i = bt_i;
      j = bt_j;

      bt_i = 0; // DELTA
      continue;
    }

    return false;
  }

  while (j < n) {
    if (pattern.charAt(j) != '*') {
      return false;
    }
    j++;
  }

  return true;
}

The modified Java program visited more states, so does the Haskell version, compared with my original Java program. The complexity of three implementations is all length(str) * length(pattern), but with different constant factor. On the other hand, the Haskell version is too abstract to reveal any of this low level details. Therefore, the conclusion of this small adventure is rather mundane, abstraction and optimization are often conflicting with each other.