C# 盘古分词

上传者: jaymezhang | 上传时间: 2023-09-11 06:09:00 | 文件大小: 3.04MB | 文件类型: ZIP
1、 修改字典格式,提高字典加载速度 2、 增加对英文专业名词的支持 如C++,C#等只要加入字典就可以被分出来 3、 增加词频判断功能,在无法取舍时根据词频取舍 4、 增加优先优先词频选项,通过这个选项动态决定分词粒度 需打开 FreqFirst 5、 增加中文人名前后缀统计和根据该统计定位人名的功能 6、 增加中文人名和未登录词出现频率统计功能 7、 增加自动更新字典功能,对超过阈值的人名和未登录词自动插入字典 需打开 AutoInsertUnknownWords 开关 并设置 UnknownWordsThreshold,(不推荐自动插入,推荐手工插入) 8、 增加定期保存字典和统计结果功能 需设置 AutoSaveInterval 9、 增加KTDictSeg.xml配置文件来配置分词参数 10、增加对Lucene.net 的支持,提供 KTDictSegAnalyzer 分析器给Lucene.net 11、增加字典管理功能,可以添加删除修改字典 12、字典管理中提供从未登录词中批量插入字典功能,可帮助使用者手工选择合适的未登录词插入字典(推荐) 13、提供一个新闻搜索的简单例子,采用Lucene.net+KTDictSegAnalyzer+KTDictSeg,项目名为Demo.KTDictSegAnalyzer 14、将所有ArrayList 改为List<> 其中 src_V1.3.01是源码 rel_V1.3.01 包含所有的可执行文件,配置文件;Data目录下是词库,停用词表,以及我目前统计的人名前后缀词表;News 目录下是Lucene.net为 新闻搜索的例子建的索引。 News.zip 是上图中批量插入时要输入的XML文件,它包含3万条从新浪和中华网抓下来的过时的新闻,大约2000万字左右,可供各位朋友学习使用。 注意:如果要导入news.xml,这个文件必须要和Demo.KTDictSegAnalyzer.exe放在同一个目录下!

文件下载

资源详情

[{"title":"( 235 个子文件 3.04MB ) C# 盘古分词","children":[{"title":"Default.aspx <span style='color:#111;'> 2.57KB </span>","children":null,"spread":false},{"title":"ResolveAssemblyReference.cache <span style='color:#111;'> 12.96KB </span>","children":null,"spread":false},{"title":"ResolveAssemblyReference.cache <span style='color:#111;'> 12.36KB </span>","children":null,"spread":false},{"title":"ResolveAssemblyReference.cache <span style='color:#111;'> 9.45KB </span>","children":null,"spread":false},{"title":"ResolveAssemblyReference.cache <span style='color:#111;'> 8.84KB </span>","children":null,"spread":false},{"title":"ResolveAssemblyReference.cache <span style='color:#111;'> 4.31KB </span>","children":null,"spread":false},{"title":"ResolveAssemblyReference.cache <span style='color:#111;'> 4.31KB </span>","children":null,"spread":false},{"title":"Web.Config <span style='color:#111;'> 1.88KB </span>","children":null,"spread":false},{"title":"SimpleDictSeg.cs <span style='color:#111;'> 56.71KB </span>","children":null,"spread":false},{"title":"MatchNameRule.cs <span style='color:#111;'> 42.19KB </span>","children":null,"spread":false},{"title":"FormDemo.Designer.cs <span style='color:#111;'> 24.42KB </span>","children":null,"spread":false},{"title":"ExtractWords.cs <span style='color:#111;'> 24.24KB </span>","children":null,"spread":false},{"title":"FormMain.Designer.cs <span style='color:#111;'> 24.06KB </span>","children":null,"spread":false},{"title":"CFile.cs <span style='color:#111;'> 19.16KB </span>","children":null,"spread":false},{"title":"Pos.cs <span style='color:#111;'> 18.69KB </span>","children":null,"spread":false},{"title":"FormMain.cs <span style='color:#111;'> 12.93KB </span>","children":null,"spread":false},{"title":"CRegex.cs <span style='color:#111;'> 12.12KB </span>","children":null,"spread":false},{"title":"AspNetPager.cs <span style='color:#111;'> 10.52KB </span>","children":null,"spread":false},{"title":"PosBinRule.cs <span style='color:#111;'> 10.47KB </span>","children":null,"spread":false},{"title":"Highlighter.cs <span style='color:#111;'> 10.28KB </span>","children":null,"spread":false},{"title":"FormDemo.cs <span style='color:#111;'> 10.15KB </span>","children":null,"spread":false},{"title":"CException.cs <span style='color:#111;'> 9.69KB </span>","children":null,"spread":false},{"title":"Dfa.cs <span style='color:#111;'> 8.92KB </span>","children":null,"spread":false},{"title":"FormUnknownWords.Designer.cs <span style='color:#111;'> 8.79KB </span>","children":null,"spread":false},{"title":"Index.cs <span style='color:#111;'> 8.40KB </span>","children":null,"spread":false},{"title":"Index.cs <span style='color:#111;'> 8.20KB </span>","children":null,"spread":false},{"title":"CSerialization.cs <span style='color:#111;'> 7.17KB </span>","children":null,"spread":false},{"title":"FormBatchInsert.Designer.cs <span style='color:#111;'> 7.06KB </span>","children":null,"spread":false},{"title":"Dict.cs <span style='color:#111;'> 6.86KB </span>","children":null,"spread":false},{"title":"FormFind.Designer.cs <span style='color:#111;'> 6.75KB </span>","children":null,"spread":false},{"title":"SearchForm.cs <span style='color:#111;'> 6.70KB </span>","children":null,"spread":false},{"title":"FormInsert.Designer.cs <span style='color:#111;'> 6.45KB </span>","children":null,"spread":false},{"title":"DictManage.cs <span style='color:#111;'> 5.63KB </span>","children":null,"spread":false},{"title":"SearchForm.Designer.cs <span style='color:#111;'> 5.44KB </span>","children":null,"spread":false},{"title":"CssStyleCollection.cs <span style='color:#111;'> 5.03KB </span>","children":null,"spread":false},{"title":"KTDictSegTokenizer.cs <span style='color:#111;'> 4.77KB </span>","children":null,"spread":false},{"title":"PosCtrl.cs <span style='color:#111;'> 4.70KB </span>","children":null,"spread":false},{"title":"SearchPage.cs <span style='color:#111;'> 4.64KB </span>","children":null,"spread":false},{"title":"SearchPage.cs <span style='color:#111;'> 4.63KB </span>","children":null,"spread":false},{"title":"FormTrafficPos.Designer.cs <span style='color:#111;'> 4.50KB </span>","children":null,"spread":false},{"title":"FormUnknownWords.cs <span style='color:#111;'> 4.32KB </span>","children":null,"spread":false},{"title":"PagerButton.cs <span style='color:#111;'> 4.24KB </span>","children":null,"spread":false},{"title":"Default.aspx.cs <span style='color:#111;'> 3.73KB </span>","children":null,"spread":false},{"title":"FormEncoder.Designer.cs <span style='color:#111;'> 3.53KB </span>","children":null,"spread":false},{"title":"CStream.cs <span style='color:#111;'> 3.20KB </span>","children":null,"spread":false},{"title":"Resources.Designer.cs <span style='color:#111;'> 2.80KB </span>","children":null,"spread":false},{"title":"Resources.Designer.cs <span style='color:#111;'> 2.78KB </span>","children":null,"spread":false},{"title":"Resources.Designer.cs <span style='color:#111;'> 2.77KB </span>","children":null,"spread":false},{"title":"FormBatchInsert.Designer.cs <span style='color:#111;'> 2.73KB </span>","children":null,"spread":false},{"title":"PosTraffic.cs <span style='color:#111;'> 2.22KB </span>","children":null,"spread":false},{"title":"FormFind.cs <span style='color:#111;'> 1.92KB </span>","children":null,"spread":false},{"title":"News.cs <span style='color:#111;'> 1.71KB </span>","children":null,"spread":false},{"title":"News.cs <span style='color:#111;'> 1.70KB </span>","children":null,"spread":false},{"title":"Fragment.cs <span style='color:#111;'> 1.70KB </span>","children":null,"spread":false},{"title":"KTDictSegAnalyzer.cs <span style='color:#111;'> 1.64KB </span>","children":null,"spread":false},{"title":"MergeNumRule.cs <span style='color:#111;'> 1.63KB </span>","children":null,"spread":false},{"title":"CFileException.cs <span style='color:#111;'> 1.61KB </span>","children":null,"spread":false},{"title":"FormBatchInsert.cs <span style='color:#111;'> 1.61KB </span>","children":null,"spread":false},{"title":"SimpleHTMLFormatter.cs <span style='color:#111;'> 1.57KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.41KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.40KB </span>","children":null,"spread":false},{"title":"PosCtrl.Designer.cs <span style='color:#111;'> 1.33KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.31KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.31KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.30KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.30KB </span>","children":null,"spread":false},{"title":"PageNoButton.cs <span style='color:#111;'> 1.23KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.17KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.16KB </span>","children":null,"spread":false},{"title":"AssemblyInfo.cs <span style='color:#111;'> 1.15KB </span>","children":null,"spread":false},{"title":"FormTrafficPos.cs <span style='color:#111;'> 1.15KB </span>","children":null,"spread":false},{"title":"Settings.Designer.cs <span style='color:#111;'> 1.09KB </span>","children":null,"spread":false},{"title":"Settings.Designer.cs <span style='color:#111;'> 1.08KB </span>","children":null,"spread":false},{"title":"Settings.Designer.cs <span style='color:#111;'> 1.06KB </span>","children":null,"spread":false},{"title":"FormEncoder.cs <span style='color:#111;'> 973B </span>","children":null,"spread":false},{"title":"FormInsert.cs <span style='color:#111;'> 813B </span>","children":null,"spread":false},{"title":"IRule.cs <span style='color:#111;'> 733B </span>","children":null,"spread":false},{"title":"FormBatchInsert.cs <span style='color:#111;'> 706B </span>","children":null,"spread":false},{"title":"PrevPageButton.cs <span style='color:#111;'> 666B </span>","children":null,"spread":false},{"title":"RecordCountButton.cs <span style='color:#111;'> 662B </span>","children":null,"spread":false},{"title":"PageCountButton.cs <span style='color:#111;'> 646B </span>","children":null,"spread":false},{"title":"NextPageButton.cs <span style='color:#111;'> 640B </span>","children":null,"spread":false},{"title":"Program.cs <span style='color:#111;'> 486B </span>","children":null,"spread":false},{"title":"Program.cs <span style='color:#111;'> 472B </span>","children":null,"spread":false},{"title":"Program.cs <span style='color:#111;'> 466B </span>","children":null,"spread":false},{"title":"Formatter.cs <span style='color:#111;'> 207B </span>","children":null,"spread":false},{"title":"DictManage.csproj <span style='color:#111;'> 5.38KB </span>","children":null,"spread":false},{"title":"Demo.KTDictSegAnalyzer.csproj <span style='color:#111;'> 5.26KB </span>","children":null,"spread":false},{"title":"Demo.csproj <span style='color:#111;'> 3.96KB </span>","children":null,"spread":false},{"title":"PosDisplayCtrl.csproj <span style='color:#111;'> 2.74KB </span>","children":null,"spread":false},{"title":"KTDictSeg.HighLight.csproj <span style='color:#111;'> 2.69KB </span>","children":null,"spread":false},{"title":"KTDictSegAnalyzer.csproj <span style='color:#111;'> 2.63KB </span>","children":null,"spread":false},{"title":"KTDictSeg.csproj <span style='color:#111;'> 2.45KB </span>","children":null,"spread":false},{"title":"AspDotNetPager.csproj <span style='color:#111;'> 2.29KB </span>","children":null,"spread":false},{"title":"FTAlgorithm.csproj <span style='color:#111;'> 2.19KB </span>","children":null,"spread":false},{"title":"Thumbs.db <span style='color:#111;'> 5.50KB </span>","children":null,"spread":false},{"title":"Dict.dct <span style='color:#111;'> 4.38MB </span>","children":null,"spread":false},{"title":"Name.dct <span style='color:#111;'> 756.27KB </span>","children":null,"spread":false},{"title":"Lucene.Net2.3.1.dll <span style='color:#111;'> 436.00KB </span>","children":null,"spread":false},{"title":"Lucene.Net.dll <span style='color:#111;'> 308.00KB </span>","children":null,"spread":false},{"title":"......","children":null,"spread":false},{"title":"<span style='color:steelblue;'>文件过多,未全部展示</span>","children":null,"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明