Skip to main content
Kofax

Scripting - Fuzzy Matching in Script

Article # 3035634 - Page views: 155

Issue

How to do Fuzzy Matching calculation in Script?

 

Solution

The fuzzy matching algorithm is not available in the scripting environment.

KTM uses a modified Levenshtein Distance to perform the matching. The Levenshtein Distance counts how many changes (deletions, insertions and substitutions) need to be made to convert one string into another.

Below is a WinWrap implementation of fuzzy matching. You could change its behaviour by performing character conversion, and telling it which characters to ignore, customizing it to your needs.

The function FuzzyMatch(a as String, b as String) as Single returns a percentage match, 0.0 if the two strings are completely different and 1.0 if they are identical.

Public Function FuzzyMatch(ByVal a As String, ByVal b As String) As Single
   Dim length As Integer
   If len(a)>len(b) Then length = len(a) Else length = len(b)
   If length = 0 Then FuzzyMatch = 0: Exit Function

   Dim dist As Integer
   dist = LevenshteinDistance(a, b)
   FuzzyMatch = CSng(1.0 - (dist / length))
End Function

Public Function LevenshteinDistance(a As String, b As String) As Integer
   Dim i, j, cost, d, min1, min2, min3

   ' Avoid calculations where there there are empty words
   If Len(a) = 0 Then LevenshteinDistance = Len(b): Exit Function
   If Len(b) = 0 Then LevenshteinDistance = Len(a): Exit Function

   ' Array initialization
   ReDim d(Len(a), Len(b))

   For i = 0 To Len(a)
      d(i, 0) = i
   Next
   For j = 0 To Len(b)
      d(0, j) = j
   Next

   ' Actual calculation
   For i = 1 To Len(a)
     For j = 1 To Len(b)

      If Mid(a, i, 1) = Mid(b, j, 1) Then
         cost = 0  ' cost of perfect match
      Else
         cost = 1   ' cost of substitution
      End If
      ' Since min() function is not a part of WinWrap, we'll "emulate" it below
      min1 = ( d( i - 1, j ) + 1 ) ' cost of deletion
      min2 = ( d( i, j - 1 ) + 1 ) ' cost of insertion
      min3 = ( d( i - 1, j - 1 ) + cost ) 'cost of substition or match

      If min1 <= min2 And min1 <= min3 Then
         d(i, j) = min1
      ElseIf min2 <= min1 And min2 <= min3 Then
         d(i, j) = min2
      Else
         d(i, j) = min3
    End If Next
  Next
  LevenshteinDistance = d(Len(a), Len(b))
End Function 

QAID # 17762 Published

Level of Complexity 

Moderate

 

Applies to  

Product Version Build Environment Hardware
Kofax Transformation Modules all      

 

 

Article # 3035634
  • Was this article helpful?