Unicode Database Without Byte Order Mark
Question / Problem:
Is it possible to make use of a Unicode database without Byte Mark Order (BOM)?
Answer / Solution:
Kofax Transformation supports Unicode for database files. However these database files MUST contain the Unicode Byte Order Mark (BOM) at the head of the file, otherwise KTM will parse the file according to the local default code page.
The following script helps when the database does not start with a Unicode BOM (e.g., the ERP system that is generating the file is unable to create the BOM.)
If the file is not Unicode BOM, a script can be used to add the correct BOM to the file.
Below is a sample script that looks at the first 3 bytes of the database file and determines if the file is utf8, utf16 or utf16Bigendian. Then the correct BOM is added to the database file, so that KTM parses the file correctly. The database is then compiled.
Note: The "Automatically update from input file" in the Database Settings must be disable, otherwise the script will exit as soon as it runs.
Private Sub Batch_Opened(ByVal ServiceNumber As Long) refreshUnicodeDatabase(Project.Databases(0)) End Sub Private Sub refreshUnicodeDatabase(db As CscDatabase) 'If the database is already up-to-date do nothing If FileDateTime(db.DatabasePath) > FileDateTime(db.ImportFilename) Then Exit Sub Dim temp As String temp = Environ("TEMP") & "\" & "db.txt" FileCopy db.ImportFilename, temp 'make a copy of the database, so we can edit the original Dim header() As Byte ReDim header(2) Open temp For Binary Access Read As #1 Open db.ImportFilename For Binary Access Write As #2 'read the first 3 bytes of the file and try to work out the Unicode encoding. 'This supports only UTF-8 and UTF-16 'This doesn't support UTF-7, UTF-32 and other encodings Get #1, , header Dim encoding As String If header(0) = &hEF And header(1) = &hBB And header(2) = &hBF Then encoding = "utf8" ElseIf header(0) = &hFF And header(1) = &hFE Then encoding = "unicode" ElseIf header(0) = &hFE And header(1) = &hFF Then encoding = "bigendian" ElseIf header(0) <> 0 And header(1) = 0 And header(2 ) <> 0 Then 'Guess encoding = "unicode" ReDim Preserve header(4) header(4) = header(2) header(3) = header(1) header(2) = header(0) ElseIf header(0) = 0 And header(1 ) <> 0 And header(2) = 0 Then 'Guess encoding = "bigendian" ReDim Preserve header(4) header(4) = header(2) header(3) = header(1) header(2) = header(0) Else encoding = "utf8" 'Guess ReDim Preserve header(5) header(5) = header(2) header(4) = header(1) header(3) = header(0) End If Select Case encoding Case "utf8" header(0) = &hEF header(1) = &hBB header(2) = &hBF Case "unicode" header(0) = &hFF header(1) = &hFE 'leave 3rd byte alone Case "bigendian" header(0) = &hFE header(1) = &hFF 'leave 3rd byte alone End Select Put #2, ,header 'Write header to database file Dim buffer() As Byte Dim buffersize As Long While Seek(1) <= LOF(1) 'copy rest of file buffersize = LOF(1) - Seek(1) If buffersize > 2047 Then buffersize = 2047 ReDim buffer(buffersize) Get #1, , buffer Put #2, , buffer Wend Close #1 Close #2 Kill temp 'delete the copy db.ImportDatabase(True) End Sub
Applies to:
Product | Version | Category |
---|---|---|
KTM | 6.3 | Scripting |
KTM | 6.2 | Scripting |