Scripting - Fuzzy Matching in Script
Article # 3035634 - Page views: 212
Issue
How to do Fuzzy Matching calculation in Script?
Solution
The fuzzy matching algorithm is not available in the scripting environment.
KTM uses a modified Levenshtein Distance to perform the matching. The Levenshtein Distance counts how many changes (deletions, insertions and substitutions) need to be made to convert one string into another.
Below is a WinWrap implementation of fuzzy matching. You could change its behaviour by performing character conversion, and telling it which characters to ignore, customizing it to your needs.
The function FuzzyMatch(a as String, b as String) as Single returns a percentage match, 0.0 if the two strings are completely different and 1.0 if they are identical.
Public Function FuzzyMatch(ByVal a As String, ByVal b As String) As Single Dim length As Integer If len(a)>len(b) Then length = len(a) Else length = len(b) If length = 0 Then FuzzyMatch = 0: Exit Function Dim dist As Integer dist = LevenshteinDistance(a, b) FuzzyMatch = CSng(1.0 - (dist / length)) End Function Public Function LevenshteinDistance(a As String, b As String) As Integer Dim i, j, cost, d, min1, min2, min3 ' Avoid calculations where there there are empty words If Len(a) = 0 Then LevenshteinDistance = Len(b): Exit Function If Len(b) = 0 Then LevenshteinDistance = Len(a): Exit Function ' Array initialization ReDim d(Len(a), Len(b)) For i = 0 To Len(a) d(i, 0) = i Next For j = 0 To Len(b) d(0, j) = j Next ' Actual calculation For i = 1 To Len(a) For j = 1 To Len(b) If Mid(a, i, 1) = Mid(b, j, 1) Then cost = 0 ' cost of perfect match Else cost = 1 ' cost of substitution End If ' Since min() function is not a part of WinWrap, we'll "emulate" it below min1 = ( d( i - 1, j ) + 1 ) ' cost of deletion min2 = ( d( i, j - 1 ) + 1 ) ' cost of insertion min3 = ( d( i - 1, j - 1 ) + cost ) 'cost of substition or match If min1 <= min2 And min1 <= min3 Then d(i, j) = min1 ElseIf min2 <= min1 And min2 <= min3 Then d(i, j) = min2 Else d(i, j) = min3 End If Next Next LevenshteinDistance = d(Len(a), Len(b)) End Function
QAID # 17762 Published
Level of Complexity
Moderate
Applies to
Product | Version | Build | Environment | Hardware |
---|---|---|---|---|
Kofax Transformation Modules | all |
Article # 3035634