EvalSharp 0.1.0-alpha

This is a prerelease version of EvalSharp.

dotnet add package EvalSharp --version 0.1.0-alpha

NuGet\Install-Package EvalSharp -Version 0.1.0-alpha

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="EvalSharp" Version="0.1.0-alpha" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="EvalSharp" Version="0.1.0-alpha" />
                    

                            Directory.Packages.props

<PackageReference Include="EvalSharp" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add EvalSharp --version 0.1.0-alpha

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: EvalSharp, 0.1.0-alpha"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package EvalSharp@0.1.0-alpha

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=EvalSharp&version=0.1.0-alpha&prerelease
                    

                            Install as a Cake Addin

#tool nuget:?package=EvalSharp&version=0.1.0-alpha&prerelease
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

EvalSharp 🧠

LLM Evaluation for .NET Developers — No Python Required

EvalSharp brings the power of reliable LLM evaluation directly to your C# projects. Inspired by DeepEval, but designed for the .NET ecosystem, EvalSharp lets you measure LLM outputs with confidence using familiar C# tools and patterns.

🔥 Key Features

Fully Native .NET API — Designed for C# developers; no Python dependencies.
Out-of-the-box Metrics — Evaluate Answer Relevancy, Contextual Recall, GEval, and more.
LLM-as-a-Judge — Supports OpenAI, Azure OpenAI, and custom chat clients.
Easy Customization — Build your own metrics tailored to your use case.

⚡ Quick Start

Install EvalSharp

dotnet add package EvalSharp

Create an Evaluator

var cases = new[]
{
    new TType
    {
        UserInput    = "Please summarize the article on climate change impacts.",
        LLMOutput   = "The article talks about how technology is advancing rapidly.",
    }
};

var evaluator = Evaluator.FromData(
    ChatClient.GetInstance(),
    cases,
    c => new MetricEvaluationContext
    {
        InitialInput    = c.UserInput,
        ActualOutput    = c.LLMOutput
    }
);

Add Metrics

evaluator.AddAnswerRelevancy(includeReason: true);

Evaluate Your LLM Output

var result = await evaluator.RunAsync();

✅ Unit Testing with EvalTest.AssertAsync

In addition to evaluating datasets with the Evaluator, EvalSharp makes it easy to include LLM evaluation in your unit tests. The EvalTest.AssertAsync method allows you to assert results for a single test with one or more metrics.

Example: Asserting Multiple Metrics in a Unit Test

using EvalSharp.Models;
using EvalSharp.Scoring;
using Xunit.Abstractions;

public class MyEvalTests
{
    public MyEvalTests(ITestOutputHelper testOutputHelper)
    {
        _testOutputHelper = testOutputHelper;
    }

    [Fact]
    public async Task SingleTest_MultipleMetrics()
    {
        var testData = new EvaluatorTestData
        {
            InitialInput = "Summarize the meeting.",
            ActualOutput = "The meeting summary is provided below...",
        };

        var rel_config = new AnswerRelevancyMetricConfiguration
        {
            IncludeReason = true,
            Threshold = 0.9
        };

        var geval_config = new GEvalMetricConfiguration
        {
            Threshold = 0.5,
            Criteria = "Does the output correctly explain concepts, events, or processes based on the input prompt?"
        };

        var metrics = new List<Metric>
        {
            new AnswerRelevancyMetric(ChatClient.GetInstance(), rel_config),
            new GEvalMetric(ChatClient.GetInstance(), geval_config)
        };

        await EvalTest.AssertAsync(testData, metrics, _testOutputHelper.WriteLine);
    }
}

✅ Supports multiple metrics in a single call
✅ Output results to your preferred sink (e.g., Console, Xunit test output)
✅ Ideal for lightweight, targeted LLM evaluation in CI/CD pipelines

🛠 Metrics Included

✅ Answer Relevancy — Is the LLM's response relevant to the input?
✅ Bias — Checks for content biases.
✅ Contextual Precision — Measures if output precisely reflects provided context.
✅ Contextual Recall — Assesses how much of the relevant context was included in the output.
✅ Faithfulness — Evaluates factual correctness and grounding of the output.
✅ GEval — Enforces structure, logical flow, and coverage expectations.
✅ Hallucination — Detects whether the LLM generated unsupported or fabricated content.
✅ Match — Compares expected and actual output for equality or similarity.
✅ Prompt Alignment — Ensures output follows the intent and structure of the prompt.
✅ Summarization — Scores the quality and accuracy of generated summaries.
✅ Task Completion — Measures whether the LLM's output fulfills the requested task.
✅ Tool Correctness — Evaluates whether tool-augmented LLM responses are correct.

💡 Why EvalSharp?

No need to switch to Python for LLM evaluation
Designed with .NET 8 in mind
Beautiful, easy to digest outputs
Ideal for both RAG and general LLM application testing
Easy to extend with your own custom metrics

🚧 Future Roadmap

We're just getting started. Here's what's coming soon to EvalSharp:

Additional Built-in Metrics (e.g., DAG, RAGAS, Contextual Relevancy, Toxicity, JSON Correctness)
Data Synthesizer
Token Usage/ Cost Calculation
Additional Scorers (Rouge, Truth Identification, etc.)
Expanded Examples and Tutorials
Conversational Metrics

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Portions of this project include content adapted from deepeval, which is licensed under the Apache License 2.0. See the NOTICE file for attribution.

Acknowledgements

Aviron Software would like to give a special thanks to the team at DeepEval. Their original metrics and prompts are the catalysts for this project.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- CsvHelper (>= 33.1.0)
- Microsoft.Extensions.AI (>= 9.5.0)
- Microsoft.Extensions.AI.OpenAI (>= 9.5.0-preview.1.25265.7)
- Spectre.Console (>= 0.49.1)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.1.0-alpha	115	6/30/2025