KMP Algorithm

The KMP (Knuth-Morris-Pratt) Algorithm is a highly efficient string-matching algorithm that was developed jointly by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977. It is primarily used for searching a given pattern within a given text, and it is one of the earliest linear-time string-matching algorithms. The key feature of the KMP algorithm is that it avoids unnecessary comparisons between the pattern and the text by leveraging the information about previously matched characters. This is achieved by preprocessing the pattern to build a longest proper prefix array (also known as the failure function or the prefix function), which helps in determining the next possible starting point for the pattern search in case a mismatch occurs. The KMP algorithm works by iterating through the text one character at a time, attempting to match the pattern to a substring of the text. When a mismatch occurs between the pattern and the text, the algorithm refers to the precomputed prefix array to determine the next possible starting point for the pattern search. This allows the algorithm to skip over sections of the text that have already been determined to not match the pattern, reducing the number of comparisons needed and speeding up the search process. As a result, the KMP algorithm has a worst-case time complexity of O(n + m), where n is the length of the text, and m is the length of the pattern. This makes the KMP algorithm highly efficient in real-world applications, such as searching for specific sequences in DNA, text search in document editors, and string manipulation in programming languages.
/*
  */
package org.gs.search

import scala.annotation.tailrec

/** Search text for a pattern using Knuth-Morris-Pratt algorithm
  *
  * @constructor creates a new KMP
  * @param pattern character array
  * @param R character alphabet
  * @see [[https://algs4.cs.princeton.edu/53substring/KMP.java.html]]
  * @author Scala translation by Gary Struthers from Java by Robert Sedgewick and Kevin Wayne.
  */
class KMP(pattern: Array[Char], R: Int = 256) {
  private val M = pattern.length
  private val dfa = Array.ofDim[Int](R, M)
  dfa(pattern(0))(0) = 1
  private var X = 0

  for {
    j <- 1 until M
    c <- 0 until R
  } {
    dfa(c)(j) = dfa(c)(X)
    dfa(pattern(j))(j) = j + 1
    X = dfa(pattern(j))(X)
  }

  /** returns text offset of match, text length for no match */
  def search(text: Array[Char]): Int = {
    val M = pattern.length
    val N = text.length

    @tailrec
    def loop(i: Int, j: Int): (Int, Int) = if(i < N && j < M) loop(i + 1, dfa(text(i))(j)) else (i, j)

    val tuple = loop(0, 0)
    if(tuple._2 == M) tuple._1 -M else N
  }
}

LANGUAGE:

DARK MODE: