jieba.NETjieba.NETC#
0.42.2jieba 0.42jieba****paddlejiebawiki
KeywordProcessor
FlashText KeywordProcessor
****
IssueI see u:)
net40net45netstandard2.0NuGet
PM> Install-Package jieba.NET
packages\jieba.NETResourcesjieba.NETResourcesjieba.NETapp.configweb.config
<appSettings>
<add key="JiebaConfigFileDir" value="C:\jiebanet\config" />
</appSettings>
jieba.NETBaseDirectory
config
JiebaNet.Segmenter.ConfigManager.ConfigFileBaseDir = @"C:\jiebanet\config";
JiebaSegmenter.Cut
textcutAllhmmhmmIEnumerable<string>
JiebaSegmenter.CutForSearch
texthmmhmmIEnumerable<string>
var segmenter = new JiebaSegmenter();
var segments = segmenter.Cut("", cutAll: true);
Console.WriteLine("{0}", string.Join("/ ", segments));
segments = segmenter.Cut(""); //
Console.WriteLine("{0}", string.Join("/ ", segments));
segments = segmenter.Cut(""); // HMM
Console.WriteLine("{0}", string.Join("/ ", segments));
segments = segmenter.CutForSearch(""); //
Console.WriteLine("{0}", string.Join("/ ", segments));
segments = segmenter.Cut("");
Console.WriteLine("{0}", string.Join("/ ", segments));
/ / / / / /
/ / /
/ / / / /
/ / / / / / / / / / / / / / / / / /
/ / / / /
JiebaSegmenter.LoadUserDict("user_dict_file_path")
3 i
5
nz
3
JiebaSegmenter.AddWord(word, freq=0, tag=null)``freq
JiebaSegmenter.DeleteWord(word)
JiebaNet.Analyser.TfidfExtractor.ExtractTags(string text, int count = 20, IEnumerable<string> allowPos = null)
JiebaNet.Analyser.TfidfExtractor.ExtractTagsWithWeight(string text, int count = 20, IEnumerable<string> allowPos = null)
****JiebaNet.Analyser.TextRankExtractor``TfidfExtractor``TextRankExtractor
var posSeg = new PosSegmenter();
var s = "";
var tokens = posSeg.Cut(s);
Console.WriteLine(string.Join(" ", tokens.Select(token => string.Format("{0}/{1}", token.Word, token.Flag))));
/m /i /uj /n /n /ns /x /p /a /c /a /uj /n /f /z /uv /v
var segmenter = new JiebaSegmenter();
var s = "";
var tokens = segmenter.Tokenize(s);
foreach (var token in tokens)
{
Console.WriteLine("word {0,-12} start: {1,-3} end: {2,-3}", token.Word, token.StartIndex, token.EndIndex);
}
word start: 0 end: 2
word start: 2 end: 4
word start: 4 end: 6
word start: 6 end: 10
var segmenter = new JiebaSegmenter();
var s = "";
var tokens = segmenter.Tokenize(s, TokenizerMode.Search);
foreach (var token in tokens)
{
Console.WriteLine("word {0,-12} start: {1,-3} end: {2,-3}", token.Word, token.StartIndex, token.EndIndex);
}
word start: 0 end: 2
word start: 2 end: 4
word start: 4 end: 6
word start: 6 end: 8
word start: 8 end: 10
word start: 6 end: 10
JiebaSegmenter.CutInParallel()``JiebaSegmenter.CutForSearchInParallel()
PosSegmenter.CutInParallel()
jiebaForLuceneNetLucene.NETjiebaForLuceneNet
jieba
Segmenter.Clibuildjiebanet.ext
-f --file the file name, ().
-d --delimiter the delimiter between tokens, default: / .
-a --cut-all use cut_all mode.
-n --no-hmm don't use HMM.
-p --pos enable POS tagging.
-v --version show version info.
-h --help show help details.
sample usages:
$ jiebanet -f input.txt > output.txt
$ jiebanet -d | -f input.txt > output.txt
$ jiebanet -p -f input.txt > output.txt
Counter
PythonCounter
var s = "algorithm";
var seg = new JiebaSegmenter();
var freqs = new Counter<string>(seg.Cut(s));
foreach (var pair in freqs.MostCommon(5))
{
Console.WriteLine($"{pair.Key}: {pair.Value}");
}
: 4
: 3
: 3
: 3
: 3
Counter``Add``Subtract``Union``MostCommon
KeywordProcessor
KeywordExtractor``KeywordProcessor
jieba**** KeywordProcessor
var kp = new KeywordProcessor();
kp.AddKeywords(new []{".NET Core", "Java", "C", " tree", "CET-4", " "});
var keywords = kp.ExtractKeywords("cet-4c.NET core JavaScript tree");
// keywords
// new List<string> { "CET-4", "C", ".NET Core", " ", " tree"}
// `raw`
var keywords = kp.ExtractKeywords("cet-4c.NET core JavaScript tree", raw: true);
// keywords
// new List<string> { "cet-4", "c", ".NET core", " ", " tree"}