EssenceParser 1.0.0
dotnet add package EssenceParser --version 1.0.0
NuGet\Install-Package EssenceParser -Version 1.0.0
<PackageReference Include="EssenceParser" Version="1.0.0" />
<PackageVersion Include="EssenceParser" Version="1.0.0" />
<PackageReference Include="EssenceParser" />
paket add EssenceParser --version 1.0.0
#r "nuget: EssenceParser, 1.0.0"
#:package EssenceParser@1.0.0
#addin nuget:?package=EssenceParser&version=1.0.0
#tool nuget:?package=EssenceParser&version=1.0.0
EssenceParser
EssenceParser is a small C# library for parsing, traversing, and transforming HTML documents into structured, strongly-typed content trees.
It provides an object-oriented API to work with HTML as a tree of ContentNode
objects, enabling transformation, filtering, serialization, and semantic analysis of HTML content.
Features
- Convert HTML to a tree structure of semantic content nodes
ContentNodeTree
- Traverse and manipulate nodes via predicates or transformation functions
- Serialize nodes back to HTML or plain text
- Configure parsing behavior through
ParserOptions
- Clone, filter, and transform HTML content using LINQ-like operations
Overview
The main class for parsing is EssenceHtmlParser
. It contains key methods for working with HTML:
ReadFromString()
- Loads HTML content from a string for subsequent parsing.ReadFromFileAsync()
- Asynchronously loads HTML content from a file for parsing.ParseAsync()
- Parses the previously loaded HTML into aContentNodeTree
structure.
As a result of parsing we get ContentNodeTree
, which is a hierarchical tree of content nodes, typically parsed from an HTML document. It acts as the root container for a set of top-level ContentNode
instances.
The ContentNodeTree
class, like ContentNode
, provides a number of methods for working with an object-oriented HTML tree:
MaxDepth()
- Calculates the maximum depth across all root nodes in the tree.GetNodeCount()
- Computes the total number of nodes in the tree, including all descendants.FindNodes()
- Searches all nodes in the tree and retains only those that match the provided predicate. This operation modifies the tree in place by flattening it to only matching nodes.Replace()
- Applies a transformation to each root node using the specified function.Purge()
- Removes all nodes from the tree that match the specified predicate.Clone()
- Creates a deep copy of the tree, including all nodes and their attributes.ToPlainText()
- Extracts and concatenates plain text from all nodes in the tree. HTML tags and structure are omitted.ToHtmlString()
- Serializes the tree into an HTML-formatted string. This reflects the structure and content of the nodes, including indentation.
Usage
Create an instance of ContentNodeTree
and specify the options in the constructor:
var parser = new EssenceHtmlParser(new ParsingOptions());
Load your HTML from a file or pass a string directly:
string html = "<html><body><p>Hello World</p></body></html>";
parser.ReadFromString(html);
// OR
await parser.ReadFromFileAsync("your-path-to-file.html");
Now you can parse your HTML in ContentNodeTree
:
var tree = await parser.ParseAsync();
You can perform operations on the received ContentNodeTree
:
var scriptNodes = tree.FindNodes(n => n.Tag == HtmlTag.Script);
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net9.0
- AngleSharp (>= 0.17.1)
- AngleSharp.Css (>= 0.17.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last Updated |
---|