Last updated:
0 purchases
fmatch
FMatch #
Description #
Fuzzy text matching engine for entity screening or person screening against denial lists such as BIS Entity List.
Actutually, this is just a text matching engine, but not a screening system against denial lists.
You might have to join results with denial lists for practical applications.
This is intended to be a subsystem with local web API or dart API.
Features #
Fuzzy term matching using Levenshtein distance.
Divided query terms matching with single list term.
Fuzzy query matching respecting term similarity, term order, and term importance of IDF.
Perfect matching mode disabling fuzzy matchings for reducing false positives in some cases.
Accepting Latin characters, Chinese characters, Katakana characters, and others.
Canonicalaization of traditioanal and simplified Chinese characters, and others.
This makes matching insensitive to character simplification.
Canonicalaization of spelling variants of legal entity types such as "Limitd" and "Ltd.".
This makes matching insensitive to spelling variants of legal entity types.
White queries for avoiding screening your company itself and consequent false positives.
Results cache for time performance.
Solo query accepted by the web server for interactive UIs.
Bulk queries accepted and processed parallel by the web server for batch applicaions.
Text normalizing API for outer larger systems joining results with the denial lists.
And others.
Usage #
Fetch the public denial lists (optional) #
dart bin/fetch_public_lists.dart
copied to clipboard
This fetches lists from US Government's Consolidated Screening List and Japanese METI Foreign Users List.
Compile the local web server #
dart compile exe -v bin/wserver.dart -o bin/wserver
copied to clipboard
Note: The JIT mode doesn't work for some reasons. See dart-lang/sdk#50082.
Start the local web server #
bin/wserver
copied to clipboard
Send a query and receive the result #
$ http -b --unsorted ':4049?q=abc'
{
"serverId": 2,
"start": "2022-11-10T12:21:22.736901Z",
"durationInMilliseconds": 16,
"inputString": "abc",
"rawQuery": "ABC",
"cachedResult": {
"cachedQuery": {
"letType": "na",
"terms": [
"ABC"
],
"perfectMatching": false
},
"queryScore": 0.8325604366063432,
"queryFallenBack": false,
"matchedEntiries": [
{
"entry": "ABC LLC",
"score": 0.8325604366063432
},
{
"entry": "ABMC THAI SOUTH SUDAN CONSTRUCTION",
"score": 0.6244203274547574
},
{
"entry": "ABMC THAI-SOUTH SUDAN CONSTRUCTION COMPANY LIMITED",
"score": 0.6244203274547574
}
]
},
"message": ""
}
copied to clipboard
Do not forget to percent-encode the query.
Perfect matching #
Enclose the whole query with double quates.
$ http -b --unsorted ':4049?q="abc"'
{
"serverId": 1,
"start": "2022-11-11T00:39:16.506794Z",
"durationInMilliseconds": 2,
"inputString": "\"abc\"",
"rawQuery": "ABC",
"cachedResult": {
"cachedQuery": {
"letType": "na",
"terms": [
"ABC"
],
"perfectMatching": true
},
"queryScore": 1.0,
"queryFallenBack": false,
"matchedEntiries": [
{
"entry": "ABC LLC",
"score": 1.0
}
]
},
"message": "Cached result"
}
copied to clipboard
Post queries in JSON and receive the results #
$ http -b --unsorted :4049 'Content-type:application/json; charset=utf-8' '[]=abc' '[]="def"'
[
{
"serverId": 2,
"start": "2022-11-11T00:40:34.971122Z",
"durationInMilliseconds": 2,
"inputString": "abc",
"rawQuery": "ABC",
"cachedResult": {
"cachedQuery": {
"letType": "na",
"terms": [
"ABC"
],
"perfectMatching": false
},
"queryScore": 0.8325604366063432,
"queryFallenBack": false,
"matchedEntiries": [
{
"entry": "ABC LLC",
"score": 0.8325604366063432
},
{
"entry": "ABMC THAI SOUTH SUDAN CONSTRUCTION",
"score": 0.6244203274547574
},
{
"entry": "ABMC THAI-SOUTH SUDAN CONSTRUCTION COMPANY LIMITED",
"score": 0.6244203274547574
}
]
},
"message": "Cached result"
},
{
"serverId": 0,
"start": "2022-11-11T00:40:34.970496Z",
"durationInMilliseconds": 2,
"inputString": "\"def\"",
"rawQuery": "DEF",
"cachedResult": {
"cachedQuery": {
"letType": "na",
"terms": [
"DEF"
],
"perfectMatching": true
},
"queryScore": 1.0,
"queryFallenBack": false,
"matchedEntiries": [
{
"entry": "SAZEMANE SANAYE DEF",
"score": 1.0
}
]
},
"message": ""
}
]
copied to clipboard
Run the sample batch #
$ ls batch
queries.csv
$ dart bin/batchwb.dart -i batch/queries.csv
...
$ ls batch
queries.csv
queries_results.csv
copied to clipboard
Reflesh the server #
http :4049/restart
copied to clipboard
This makes the server reload the database, reread the configurations and the settings, and purge the result chache.
This is useful when the denial lists are updated or the configurations/ settings are modified.
Get normalized text as a key of join with the denial lists #
$ http -b ':4049/normalize?q=abc'
"ABC"
copied to clipboard
This is useful for prepareing outer larger systems which join results with the denial lists.
Note that results from this subsystem are normalized in the same way.
License #
Published under AGPL-3.0 or later. See the LICENSE file.
If you need another different license, contact me.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.