Skip to main content
Kofax

Fuzzy Matching in Script

17762

QAID # 17762 Published

Question / Problem:

Fuzzy Matching in Script

Answer / Solution:

The fuzzy matching algorithm is not available in the scripting environment.

KTM uses a modified Levenshtein Distance to perform the matching. The Levenshtein Distance counts how many changes (deletions, insertions and substitutions) need to be made to convert one string into another.

Below is a WinWrap implementation of fuzzy matching. You could change its behaviour by performing character conversion, and telling it which characters to ignore, customizing it to your needs.

The function FuzzyMatch(a as String, b as String) as Single returns a percentage match, 0.0 if the two strings are completely different and 1.0 if they are identical:

Public Function FuzzyMatch(ByVal a As String, ByVal b As String) As Single

   Dim length As Integer
   If len(a)>len(b) Then length = len(a) Else length = len(b)
   If length = 0 Then FuzzyMatch = 0: Exit Function

   Dim dist As Integer
   dist = LevenshteinDistance(a, b)
   FuzzyMatch = CSng(1.0 - (dist / length))

End Function
Public Function LevenshteinDistance(a As String, b As String) As Integer

   Dim i, j, cost, d, min1, min2, min3

   ' Avoid calculations where there there are empty words
   If Len(a) = 0 Then LevenshteinDistance = Len(b): Exit Function
   If Len(b) = 0 Then LevenshteinDistance = Len(a): Exit Function

   ' Array initialization
   ReDim d(Len(a), Len(b))

   For i = 0 To Len(a)
      d(i, 0) = i

   Next
   For j = 0 To Len(b)

      d(0, j) = j
   Next

   ' Actual calculation
   For i = 1 To Len(a)

For j = 1 To Len(b)

      If Mid(a, i, 1) = Mid(b, j, 1) Then
         cost = 0  ' cost of perfect match

      Else
         cost = 1   ' cost of substitution

End If
      ' Since min() function is not a part of WinWrap, we'll "emulate" it below
      min1 = ( d( i - 1, j ) + 1 ) ' cost of deletion
      min2 = ( d( i, j - 1 ) + 1 ) ' cost of insertion
      min3 = ( d( i - 1, j - 1 ) + cost ) 'cost of substition or match

      If min1 <= min2 And min1 <= min3 Then
         d(i, j) = min1

      ElseIf min2 <= min1 And min2 <= min3 Then
         d(i, j) = min2

      Else
         d(i, j) = min3

End If Next
Next
LevenshteinDistance = d(Len(a), Len(b))
End Function 

Applies to:

Product Version Category
AXPRO 5.5 Project Builder