GitLocker: The Coding Marketplace

Description:

beautiful soup dart

Beautiful Soup Dart #

Dart native package inspired by Beautiful Soup 4 Python library. Provides easy ways of navigating, searching, and
modifying the HTML tree.
Usage #
A simple usage example:
import 'package:beautiful_soup_dart/beautiful_soup.dart';

/// 1. parse a document String
BeautifulSoup bs = BeautifulSoup(html_doc_string);
// use BeautifulSoup.fragment(html_doc_string) if you parse a part of html

/// 2. navigate quickly to any element
bs.body!.a!; // navigate quickly with tags, use outerHtml or toString to get outer html
bs.find('p', class_: 'story'); // finds first element with html tag "p" and which has "class" attribute with value "story"
bs.findAll('a', attrs: {'class': true}); // finds all elements with html tag "a" and which have defined "class" attribute with whatever value
bs.find('', selector: '#link1'); // find with custom CSS selector (other parameters are ignored)
bs.find('*', id: 'link1'); // any element with id "link1"
bs.find('*', regex: r'^b'); // find any element which tag starts with "b", for example: body, b, ...
bs.find('p', string: r'^Article #\d*'); // find "p" element which text starts with "Article #[number]"
bs.find('a', attrs: {'href': 'http://example.com/elsie'}); // finds by "href" attribute

/// 3. perform any other actions for the navigated element
Bs4Element bs4 = bs.body!.p!; // navigate quickly with tags
bs4.name; // get tag name
bs4.string; // get text
bs4.toString(); // get String representation of this element, same as outerHtml
bs4.innerHtml; // get html elements inside the element
bs4.className; // get class attribute value
bs4['class']; // get class attribute value
bs4['class'] = 'board'; // change class attribute value to 'board'
bs4.children; // get all element's children elements
bs4.replaceWith(otherBs4Element); // replace with other element
... and many more
copied to clipboard
Check test folder for more examples.
Table of Contents #
The unlinked titles are not yet implemented.

Navigating the tree

Going down

Navigating using tag names
.contents and .children
.descendants
.string
.strings and .strippedStrings

Going up

.parent
.parents

Going sideways

.nextSibling and .previousSibling
.nextSiblings and .previousSiblings

Going back and forth

.nextElement and .previousElement - returns next/previous Bs4Element
.nextElements and .previousElements
.nextParsed and .previousParsed - returns next/previous any parsed Node (doc comments, tags, text), to get its data as String use node.data
.nextParsedAll and .previousParsedAll

Searching the tree

findFirstAny() - returns the top most (first) element of the parse tree, of any tag type
findAll()
find()
findParents() and findParent()
findNextSiblings() and findNextSibling()
findPreviousSiblings() and findPreviousSibling()
findAllNextElements() and findNextElement()
findAllPreviousElements() and findPreviousElement()
findNextParsedAll() and findNextParsed()
findPreviousParsedAll() and findPreviousParsed()

Modifying the tree

Changing tag names and attributes
Modifying .string
append()
extend()
newTag()
insert()
insertBefore() and insertAfter()
clear()
extract()
decompose()
replaceWith()
wrap()
unwrap()
smooth()

Output

prettify() - partial support
.text and getText()

Other methods from the Element from html package can be accessed via bs4element.element.
Features and bugs #
Please file feature requests and bugs at the issue tracker or feel
free to raise a PR.