Boyer Moore Algorithm
The Boyer-Moore Algorithm is a highly efficient string pattern searching algorithm that was developed by Robert S. Boyer and J Strother Moore in 1977. It is considered one of the most effective algorithms for finding a specific substring pattern within a larger text string. The key feature of this algorithm is that it processes the characters in the pattern and the text in a right-to-left order, allowing it to skip several characters of the text while searching, leading to faster search times. The algorithm is particularly well-suited for searching large text corpora, as it can perform the search operation much faster than other traditional pattern matching algorithms, such as the naïve approach or the Knuth-Morris-Pratt algorithm.
The Boyer-Moore algorithm consists of two main components: the bad character heuristic and the good suffix heuristic. The bad character heuristic helps the algorithm to skip portions of the text by comparing the current character in the text with the pattern. If the character is not present in the pattern, the algorithm can safely skip the entire pattern length, significantly reducing the search time. On the other hand, the good suffix heuristic allows the algorithm to skip a portion of the text if a partial match is found but the characters do not align exactly. By combining these two heuristics, the Boyer-Moore algorithm can quickly search through large text strings, making it a popular choice for various applications such as text editors, search engines, and DNA sequence alignment.
package org.gs.search
import scala.annotation.tailrec
import scala.math.max
/** Search text for pattern match using bad character rule part of Boyer-More algorithm
*
* @constructor creates a new BoyerMoore
* @param pattern character array
* @param R character alphabet
* @see [[https://algs4.cs.princeton.edu/53substring/KMP.java.html]]
* @author Scala translation by Gary Struthers from Java by Robert Sedgewick and Kevin Wayne.
*/
class BoyerMoore(pattern: Array[Char], R: Int = 256) {
private val M = pattern.length
private val right = new Array[Int](R)
pattern foreach (j => right(j) = j)
/** returns text offset of match, text length for no match */
def search(text: Array[Char]): Int = {
val N = text.length
@tailrec
def loop(i: Int): Int = {
@tailrec
def findSkip(j: Int): Int = {
if (j < 0) 0 else if (pattern(j) != text(i + j)) max(1, j - right(text(i + j))) else findSkip(j - 1)
}
if (i <= N - M) {
val skip = findSkip(M - 1)
if (skip == 0) i else loop(i + skip)
} else N
}
loop(0)
}
}