EssenceParser 1.0.0

The owner has unlisted this package. This could mean that the package is deprecated, has security vulnerabilities or shouldn't be used anymore.
dotnet add package EssenceParser --version 1.0.0
                    
NuGet\Install-Package EssenceParser -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="EssenceParser" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="EssenceParser" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="EssenceParser" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add EssenceParser --version 1.0.0
                    
#r "nuget: EssenceParser, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package EssenceParser@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=EssenceParser&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=EssenceParser&version=1.0.0
                    
Install as a Cake Tool

EssenceParser

EssenceParser is a small C# library for parsing, traversing, and transforming HTML documents into structured, strongly-typed content trees. It provides an object-oriented API to work with HTML as a tree of ContentNode objects, enabling transformation, filtering, serialization, and semantic analysis of HTML content.

Features

  • Convert HTML to a tree structure of semantic content nodes ContentNodeTree
  • Traverse and manipulate nodes via predicates or transformation functions
  • Serialize nodes back to HTML or plain text
  • Configure parsing behavior through ParserOptions
  • Clone, filter, and transform HTML content using LINQ-like operations

Overview

The main class for parsing is EssenceHtmlParser. It contains key methods for working with HTML:

  1. ReadFromString() - Loads HTML content from a string for subsequent parsing.
  2. ReadFromFileAsync() - Asynchronously loads HTML content from a file for parsing.
  3. ParseAsync() - Parses the previously loaded HTML into a ContentNodeTree structure.

As a result of parsing we get ContentNodeTree, which is a hierarchical tree of content nodes, typically parsed from an HTML document. It acts as the root container for a set of top-level ContentNode instances. The ContentNodeTree class, like ContentNode, provides a number of methods for working with an object-oriented HTML tree:

  1. MaxDepth() - Calculates the maximum depth across all root nodes in the tree.
  2. GetNodeCount() - Computes the total number of nodes in the tree, including all descendants.
  3. FindNodes() - Searches all nodes in the tree and retains only those that match the provided predicate. This operation modifies the tree in place by flattening it to only matching nodes.
  4. Replace() - Applies a transformation to each root node using the specified function.
  5. Purge() - Removes all nodes from the tree that match the specified predicate.
  6. Clone() - Creates a deep copy of the tree, including all nodes and their attributes.
  7. ToPlainText() - Extracts and concatenates plain text from all nodes in the tree. HTML tags and structure are omitted.
  8. ToHtmlString() - Serializes the tree into an HTML-formatted string. This reflects the structure and content of the nodes, including indentation.

Usage

Create an instance of ContentNodeTree and specify the options in the constructor:

var parser = new EssenceHtmlParser(new ParsingOptions());

Load your HTML from a file or pass a string directly:

string html = "<html><body><p>Hello World</p></body></html>";
parser.ReadFromString(html);
// OR
await parser.ReadFromFileAsync("your-path-to-file.html");

Now you can parse your HTML in ContentNodeTree:

var tree = await parser.ParseAsync();

You can perform operations on the received ContentNodeTree:

var scriptNodes = tree.FindNodes(n => n.Tag == HtmlTag.Script);
Product Compatible and additional computed target framework versions.
.NET net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated