text_parser

Creator: coderz1093

Last updated:

0 purchases

TODO
Add to Cart

Description:

text parser

A Dart package for parsing text flexibly according to preset or custom regular expression patterns.
Usage #
Using the preset matchers (URL / email address / phone number) #
The package has the following preset matchers.

EmailMatcher
UrlMatcher
UrlLikeMatcher
TelMatcher

Below is an example of using three of the preset matchers except for UrlLikeMatcher.
import 'package:text_parser/text_parser.dart';

Future<void> main() async {
const text = 'abc https://example.com/sample.jpg. def\n'
'john.doe@example.com +1-012-3456-7890';

final parser = TextParser(
matchers: const [
EmailMatcher(),
UrlMatcher(),
TelMatcher(),
],
);
final elements = await parser.parse(text);
elements.forEach(print);
}
copied to clipboard
Output:
TextElement(matcherType: TextMatcher, matcherIndex null, offset: 0, text: abc , groups: [])
TextElement(matcherType: UrlMatcher, matcherIndex 1, offset: 4, text: https://example.com/sample.jpg, groups: [])
TextElement(matcherType: TextMatcher, matcherIndex null, offset: 34, text: . def\n, groups: [])
TextElement(matcherType: EmailMatcher, matcherIndex 0, offset: 40, text: john.doe@example.com, groups: [])
TextElement(matcherType: TextMatcher, matcherIndex null, offset: 60, text: , groups: [])
TextElement(matcherType: TelMatcher, matcherIndex 2, offset: 61, text: +1-012-3456-7890, groups: [])
copied to clipboard
The regular expression pattern of each of them is not very strict. If it does not meet
your use case, overwrite the pattern by yourself to make it stricter, referring to the
relevant section later in this document.
UrlMatcher vs UrlLikeMatcher
UrlMatcher does not match URLs not starting with "http" (e.g. example.com, //example.com,
etc). If you want them to be matched too, use UrlLikeMatcher instead.
matcherType and matcherIndex
matcherType contained in a TextElement object is the type of the matcher used
to parse the text into the element. matcherIndex is the index of the matcher in
the matcher list passed to the matchers argument of TextParser.
Extracting only matching text elements
By default, the result of parse() contains all elements including the ones that
have TextMatcher as matcherType, which are elements of a string that
did not match any match pattern. If you want to exclude them, set onlyMatches to true
when calling parse().
final elements = await parser.parse(text, onlyMatches: true);
elements.forEach(print);
copied to clipboard
Output:
TextElement(matcherType: UrlMatcher, matcherIndex 1, offset: 4, text: https://example.com/sample.jpg, groups: [])
TextElement(matcherType: EmailMatcher, matcherIndex 0, offset: 40, text: foo@example.com, groups: [])
TextElement(matcherType: TelMatcher, matcherIndex 2, offset: 56, text: +1-012-3456-7890, groups: [])
copied to clipboard
Extracting text elements of a particular matcher type
final telElements = elements.whereMatcherType<TelMatcher>().toList();
copied to clipboard
Or use a classic way:
final telElements = elements.map((elm) => elm.matcherType == TelMatcher).toList();
copied to clipboard
Conflict between matchers
If multiple matchers have matched the string at the same position in text, the first one
in those matchers takes precedence.
final parser = TextParser(matchers: const[UrlLikeMatcher(), EmailMatcher()]);
final elements = await parser.parse('foo.bar@example.com');
copied to clipboard
In this example, UrlLikeMatcher matches foo.bar and EmailMatcher matches
foo.bar@example.com, but UrlLikeMatcher is used because it is written before
EmailMatcher in the matchers list.
Overwriting the pattern of an existing matcher #
If you want to parse only URLs and phone numbers, but treat only a sequence of eleven numbers
after "tel:" as a phone number:
final parser = TextParser(
matchers: const [
UrlMatcher(),
TelMatcher(r'(?<=tel:)\d{11}'),
],
);
copied to clipboard
Using a custom pattern #
You can create a matcher with a custom pattern either with PatternMatcher
or by extending TextMatcher.
PatternMatcher
const boldMatcher = PatternMatcher(r'\*\*(.+?)\*\*');
final parser = TextParser(matchers: [boldMatcher]);
copied to clipboard
Custom matcher class
It is also possible to create a matcher class by extending TextMatcher.
Below is an example of a matcher that parses the HTML <a> tags into a set of the href
value and the link text.
class ATagMatcher extends TextMatcher {
const ATagMatcher()
: super(
r'\<a\s(?:.+?\s)*?href="(.+?)".*?\>'
r'\s*(.+?)\s*'
r'\</a\>',
);
}
copied to clipboard
const text = '''
<a class="foo" href="https://example.com/">
Content inside tags
</a>
''';

final parser = TextParser(
matchers: const [ATagMatcher()],
dotAll: true,
);
final elements = await parser.parse(text, onlyMatches: true);
print(elements.first.groups);
copied to clipboard
Output:
[https://example.com/, Content inside tags]
copied to clipboard
ExactMatcher #
ExactMatcher escapes reserved characters of RegExp so that those are used
as ordinary characters. The parser extracts the substrings that exactly match
any of the strings in the passed list.
TextParser(
matchers: [
// 'e.g.' matches only 'e.g.', not 'edge' nor 'eggs'.
ExactMatcher(['e.g.', 'i.e.']),
],
)
copied to clipboard
Groups #
Each TextElement in a parse result has the property of
groups. It is a list of strings that have matched the smaller pattern
inside every set of parentheses ( ).
Below is an example of a pattern that matches a Markdown style link.
r'\[(.+?)\]\((.*?)\)'
copied to clipboard
This pattern has two sets of parentheses; (.+?) in \[(.+?)\] and (/*?) in \((.*?)\).
When this matches [foo](bar), the first set of parentheses captures "foo" and the second
set captures "bar", so groups results in ['foo', 'bar'].
Tip:
If you want certain parentheses to be not captured as a group, add ?: after the opening
parenthesis, like (?:pattern) instead of (pattern).
Named groups
Named groups are captured too, but their names are lost in the resulting groups list.
Below is an example where a single match pattern contains capturing of both unnamed and
named groups.
final parser = TextParser(
matchers: const [PatternMatcher(r'(?<year>\d{4})-(\d{2})-(?<day>\d{2})')],
);
final elements = await parser.parse('2020-01-23');
print(elements.first);
copied to clipboard
Output:
TextElement(matcherType: PatternMatcher, matcherIndex 0, offset: 0, text: 2020-01-23, groups: [2020, 01, 23])
copied to clipboard
RegExp options #
How a regular expression is treated can be configured in the TextParser constructor.

multiLine
caseSensitive
unicode
dotAll

These options are passed to RegExp internally, so refer to its
document for information.
Limitations #

This package uses regular expressions. The speed of parsing is subject to the
performance of RegExp in Dart. It will take more time to parse longer text with
multiple complex match patterns.
On the web, parsing is always executed in the main thread because Flutter Web does
not support dart:isolate.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product:

Customer Reviews

There are no reviews.