基于Hadoop架构的文本分类算法

上传者: cqyyjdw | 上传时间: 2019-12-21 19:36:01 | 文件大小: 3.9MB | 文件类型: rar
基于Hadoop的文本分类算法系统,本系统实现了分词处理,停用词处理(IK);使用朴素贝叶斯分类算法来对文本进行训练和分类,在测试过程中使用词频特征选择作为特征词选择算法,分类准确率达到了78%,包含卡方特征选择算法(训练集特征选择)。

文件下载

资源详情

[{"title":"( 25 个子文件 3.9MB ) 基于Hadoop架构的文本分类算法","children":[{"title":"ext.dic <span style='color:#111;'> 44B </span>","children":null,"spread":false},{"title":"META-INF","children":[{"title":"MANIFEST.MF <span style='color:#111;'> 25B </span>","children":null,"spread":false}],"spread":true},{"title":"org","children":[{"title":"wltea","children":[{"title":"analyzer","children":[{"title":"cfg","children":[{"title":"DefaultConfig.class <span style='color:#111;'> 2.94KB </span>","children":null,"spread":false},{"title":"Configuration.class <span style='color:#111;'> 422B </span>","children":null,"spread":false}],"spread":true},{"title":"dic","children":[{"title":"DictSegment.class <span style='color:#111;'> 5.05KB </span>","children":null,"spread":false},{"title":"Hit.class <span style='color:#111;'> 1.61KB </span>","children":null,"spread":false},{"title":"main2012.dic <span style='color:#111;'> 2.91MB </span>","children":null,"spread":false},{"title":"quantifier.dic <span style='color:#111;'> 1.78KB </span>","children":null,"spread":false},{"title":"Dictionary.class <span style='color:#111;'> 6.99KB </span>","children":null,"spread":false}],"spread":true},{"title":"solr","children":[{"title":"IKTokenizerFactory.class <span style='color:#111;'> 1.20KB </span>","children":null,"spread":false}],"spread":true},{"title":"core","children":[{"title":"QuickSortSet.class <span style='color:#111;'> 2.48KB </span>","children":null,"spread":false},{"title":"ISegmenter.class <span style='color:#111;'> 211B </span>","children":null,"spread":false},{"title":"CharacterUtil.class <span style='color:#111;'> 1.49KB </span>","children":null,"spread":false},{"title":"LexemePath.class <span style='color:#111;'> 3.93KB </span>","children":null,"spread":false},{"title":"Lexeme.class <span style='color:#111;'> 3.87KB </span>","children":null,"spread":false},{"title":"IKSegmenter.class <span style='color:#111;'> 3.30KB </span>","children":null,"spread":false},{"title":"IKArbitrator.class <span style='color:#111;'> 3.33KB </span>","children":null,"spread":false},{"title":"CJKSegmenter.class <span style='color:#111;'> 2.38KB </span>","children":null,"spread":false},{"title":"CN_QuantifierSegmenter.class <span style='color:#111;'> 4.22KB </span>","children":null,"spread":false},{"title":"AnalyzeContext.class <span style='color:#111;'> 6.08KB </span>","children":null,"spread":false},{"title":"LetterSegmenter.class <span style='color:#111;'> 3.39KB </span>","children":null,"spread":false},{"title":"QuickSortSet$Cell.class <span style='color:#111;'> 2.08KB </span>","children":null,"spread":false}],"spread":false},{"title":"lucene","children":[{"title":"IKAnalyzer.class <span style='color:#111;'> 1.39KB </span>","children":null,"spread":false},{"title":"IKTokenizer.class <span style='color:#111;'> 1.98KB </span>","children":null,"spread":false}],"spread":true},{"title":"sample","children":[{"title":"IKAnalyzerDemo.class <span style='color:#111;'> 5.79KB </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true}],"spread":true}],"spread":true}]

评论信息

  • tttk :
    CSDN的60秒限制实在是醉了。。。算法不错,值得学习。
    2017-09-26
  • hbwzhsh :
    还不错的资源,先学习再说
    2016-10-06
  • Richardo_SSS :
    可能好用吧,但发现不是我需要的,不过还是不错
    2015-06-28
  • 雨恨 :
    程序好像无法运行,提供一个搭配环境的文档会更好点
    2015-03-21
  • narcissusai :
    程序不全,连main都没有,无法运行
    2014-06-30

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明