学习使用Xgboost4j

Xgboost4j使用Java训练rank(Learning to Rank)模型，跟一般算法不同，这里数据有个组的概念，可以通过DMatrix的setGroup()方法设置，参数是一个int数组,这里还是用demo中rank的 package ml.dmlc.xgboost4j.java.example; import java.io.BufferedReader; import...

May 19, 2016 Xgboost

Learning to Rank

数据格式 0 qid:10050 1:0.434944 2:0.000000 3:0.000000 4:1.000000 5:0.425455 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.423479 12:0.000000 13:0.000000 14:1.000000 15:0.412193 16:0.643...

Apr 22, 2016 Datascience

SolrCloud Leader Elect

SolrCloud 领导选举，最小seq,这点有点同Kafka seq(0) 主要有两部分，一个是集群的leader选举过程，一个是collection的shard的leader选举过程集群的leader选举主要相关的类：OverseerElectionContext 相关节点：/overseer_elect/election可以说leader候选节点，/...

Oct 20, 2015 distributed

Elasticsearch堆大小与swap设置

Heap: Sizing and Swapping Heap: Sizing and Swapping Give Half Your Memory To Lucene Don’t Cross 30.5G Swapping Is the Death of Performance

Oct 19, 2015 Lucene

ScyllaDB: world's fastest NoSQL column store database(claims to be up to 10x faster)

ScyllaDB: world’s fastest NoSQL column store database (claims to be up to 10x faster) 1 000 000 transactions per second per server

Sep 26, 2015 distributed

Cassandra种种

一致性等级ConsistencyLevel Configuring data consistency-官方文档Apache Cassandra™ 2.0 ANY (0), ONE (1), TWO (2), THREE (3), QUORUM (4), ALL (5), LOCAL_QUORUM(6, t...

Sep 18, 2015 distributed

Solr5自定义逗号分词器

参考WhitespaceTokenizer写一个叫CommaTokenizer的Tokenizer，继承CharTokenizer package com.xxx.yyy.zzz.analyzer; import org.apache.lucene.analysis.core.WhitespaceTokenizerFactory; import org.apache.lucene.an...

Sep 18, 2015 distributed

libsvm And liblinear

目前使用的Java版本pom依赖 <dependency> <groupId>tw.edu.ntu.csie</groupId> <artifactId>libsvm</artifactId> <version>3.17</version> </dependency> &l...

Sep 17, 2015 Datascience

Solr5.x &&SolrCloud

Solr5 相比Solr4或更低版本，感觉最大的Future就是不依赖于其他Serlet容器了。不再需要部署war到Jetty或者Tomcat了。酸爽！单独的服务进程。算是开包装即用，离ES又近了一步！通过zkcli.sh更新配置 Lucene5.x-ReleaseNote Solr5.0-ReleaseNote Collections API 删除一个collection,...

Sep 17, 2015 distributed

Elasticsearch笔记

安装配置这里使用的2.3.3版本，这个版本默认不能用root直接启动。具体参考Bootstrap.java中的代码 public static void initializeNatives(Path tmpFile, boolean mlockAll, boolean seccomp, boolean ctrlHandler) { final ESLogger logger = L...

Sep 17, 2015 distributed

1
...
9
10
11
...
15
10 / 15