NlpToolkit-DataGenerator
1.0.4
dotnet add package NlpToolkit-DataGenerator --version 1.0.4
NuGet\Install-Package NlpToolkit-DataGenerator -Version 1.0.4
<PackageReference Include="NlpToolkit-DataGenerator" Version="1.0.4" />
paket add NlpToolkit-DataGenerator --version 1.0.4
#r "nuget: NlpToolkit-DataGenerator, 1.0.4"
// Install NlpToolkit-DataGenerator as a Cake Addin #addin nuget:?package=NlpToolkit-DataGenerator&version=1.0.4 // Install NlpToolkit-DataGenerator as a Cake Tool #tool nuget:?package=NlpToolkit-DataGenerator&version=1.0.4
For Developers
You can also see Java, Python, Cython, or C++ repository.
Requirements
- C# Editor
- Git
Git
Install the latest version of Git.
Download Code
In order to work on code, create a fork from GitHub page. Use Git for cloning the code to your local or below line for Ubuntu:
git clone <your-fork-git-link>
A directory called DataGenerator-CS will be created. Or you can use below link for exploring the code:
git clone https://github.com/starlangsoftware/DataGenerator-CS.git
Open project with Rider IDE
To import projects from Git with version control:
Open Rider IDE, select Get From Version Control.
In the Import window, click URL tab and paste github URL.
Click open as Project.
Result: The imported project is listed in the Project Explorer view and files are loaded.
Compile
From IDE
After being done with the downloading and opening project, select Build Solution option from Build menu. After compilation process, user can run DataGenerator-CS.
Detailed Description
AnnotatedDataSetGenerator
DataSet yaratmak için AnnotatedDataSetGenerator sınıfı önce üretilir.
AnnotatedDataSetGenerator(string directory, string pattern, InstanceGenerator instanceGenerator)
Ardından Generate metodu ile DataSet yaratılır.
DataSet Generate()
InstanceGenerator
DataGeneratorlerin InstanceGeneratorlere ihtiyacı vardır. Bunlar bir tek kelimeden bir Instance yaratan sınıflardır.
Instance GenerateInstanceFromSentence(Sentence sentence, int wordIndex)
NER problemi için NerInstanceGenerator, FeaturedNerInstanceGenerator ve VectorizedNerInstanceGeneratorsınıfı
ShallowParse problemi için ShallowParseInstanceGenerator, FeaturedShallowParseInstanceGenerator ve VectorizedShallowParseInstanceGenerator sınıfı
WSD problemi için SemanticInstanceGenerator, FeaturedSemanticInstanceGenerator ve VectorizedSemanticInstanceGenerator sınıfı
Morphological Disambiguation problemi için FeaturedDisambiguationInstanceGenerator sınıfı
Example Generated DataSet
Word Sense Disambiguation Task
The following Table shows the sample text represented with sense labels and three possible features, namely the root form of the word, the part of speech (POS) tag of the word, and a boolean feature for checking the capital case.
Word | Root | Pos | Capital | ... | Tag |
---|---|---|---|---|---|
Yüzündeki | yüz | Noun | True | ... | yüz<sup>3</sup> |
ketçap | ketçap | Noun | False | ... | ketçap<sup>1</sup> |
lekesi | leke | Noun | False | ... | leke<sup>2</sup> |
yüzdükten | yüz | Verb | False | ... | yüz<sup>2</sup> |
sonra | sonra | PCAbl | False | ... | sonra<sup>1</sup> |
çıkmış | çık | Verb | False | ... | çık<sup>10</sup> |
. | . | Punctuation | False | ... | .<sup>1</sup> |
Named Entity Recognition Task
The following Table shows the sample text represented with tag labels and three possible features, namely the root form of the word, the part of speech (POS) tag of the word, and a boolean feature for checking the capital case.
Word | Root | Pos | Capital | ... | Tag |
---|---|---|---|---|---|
Türk | Türk | Noun | True | ... | ORGANIZATION |
Hava | Hava | Noun | True | ... | ORGANIZATION |
Yolları | Yol | Noun | True | ... | ORGANIZATION |
bu | bu | Pronoun | False | ... | NONE |
Pazartesi'den | Pazartesi | Noun | True | ... | TIME |
itibaren | itibaren | Adverb | False | ... | NONE |
İstanbul | İstanbul | Noun | True | ... | LOCATION |
Ankara | Ankara | Noun | True | ... | LOCATION |
güzergahı | güzergah | Noun | False | ... | NONE |
için | için | Adverb | False | ... | NONE |
indirimli | indirimli | Adjective | False | ... | NONE |
satışlarını | sat | Noun | False | ... | NONE |
90 | 90 | Number | False | ... | MONEY |
TL'den | TL | Noun | True | ... | MONEY |
başlatacağını | başlat | Noun | False | ... | NONE |
açıkladı | açıkla | Verb | False | ... | NONE |
. | . | Punctuation | False | ... | NONE |
Shallow Parse Task
The following Table shows the sample text represented with chunk labels and three possible features, namely the root form of the word, the part of speech (POS) tag of the word, and a boolean feature for checking the capital case.
Word | Root | Pos | Capital | ... | Tag |
---|---|---|---|---|---|
Türk | Türk | Noun | True | ... | ÖZNE |
Hava | Hava | Noun | True | ... | ÖZNE |
Yolları | yol | Noun | True | ... | ÖZNE |
Salı | Salı | Noun | True | ... | ZARF TÜMLECİ |
günü | gün | Noun | False | ... | ZARF TÜMLECİ |
yeni | yeni | Adjective | False | ... | NESNE |
indirimli | indirimli | Adjective | False | ... | NESNE |
fiyatlarını | fiyat | Noun | False | ... | NESNE |
açıkladı | açıkla | Verb | False | ... | YÜKLEM |
. | . | Punctuation | False | ... | HİÇBİRİ |
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp3.1 is compatible. |
-
.NETCoreApp 3.1
- NlpToolkit-AnnotatedTree (>= 1.0.8)
- NlpToolkit-Classification (>= 1.0.6)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.