Home Learning to Rank
Post
Cancel

Learning to Rank

数据格式

1
2
3
4
5
6
7
8
9
10
11
12
0 qid:10050 1:0.434944 2:0.000000 3:0.000000 4:1.000000 5:0.425455 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.423479 12:0.000000 13:0.000000 14:1.000000 15:0.412193 16:0.643254 17:0.142857 18:0.000000 19:0.500000 20:0.638407 21:1.000000 22:0.958198 23:0.946106 24:0.936481 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.733195 34:0.000000 35:0.000000 36:0.000000 37:0.979925 38:0.922568 39:0.924231 40:0.918143 41:0.666667 42:0.739726 43:0.000000 44:0.009367 45:0.250000 46:0.000000 #docid = GX264-67-3123911 inc = 1 prob = 0.189357
0 qid:10056 1:0.179567 2:0.000000 3:0.000000 4:0.000000 5:0.174455 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.002800 12:0.000000 13:0.000000 14:0.000000 15:0.002287 16:0.084188 17:1.000000 18:0.571429 19:0.000000 20:0.084561 21:0.017313 22:0.220225 23:0.306690 24:0.295506 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.018289 38:0.220830 39:0.306274 40:0.297002 41:0.000000 42:0.000000 43:0.000000 44:0.264231 45:0.473684 46:0.172414 #docid = GX000-03-7721182 inc = 1 prob = 0.0897452
2 qid:10056 1:0.114551 2:1.000000 3:0.666667 4:0.000000 5:0.143302 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.229690 12:1.000000 13:0.821326 14:0.000000 15:0.330058 16:0.012862 17:0.708333 18:0.571429 19:0.050000 20:0.013133 21:0.795735 22:0.842889 23:0.857526 24:0.842199 25:0.778357 26:0.905145 27:1.000000 28:0.870033 29:0.720071 30:0.673076 31:0.959066 32:0.621171 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.889891 38:0.914435 39:0.948256 40:0.910097 41:0.500000 42:0.178571 43:0.000000 44:1.000000 45:0.578947 46:0.655172 #docid = GX001-20-2991462 inc = 0.0971783377719955 prob = 0.978707
0 qid:10056 1:0.015480 2:0.000000 3:0.000000 4:0.000000 5:0.009346 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.043660 12:0.000000 13:0.000000 14:0.000000 15:0.044096 16:0.005016 17:0.291667 18:0.571429 19:0.100000 20:0.005135 21:0.272555 22:0.166655 23:0.080059 24:0.077839 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.269664 38:0.157425 39:0.074903 40:0.069359 41:0.500000 42:0.133929 43:0.000000 44:0.001745 45:0.052632 46:0.000000 #docid = GX002-91-6093726 inc = 1 prob = 0.0465332
1 qid:10056 1:0.065015 2:0.111111 3:0.333333 4:0.000000 5:0.065421 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.093772 12:0.113496 13:0.407389 14:0.000000 15:0.113168 16:0.007609 17:0.166667 18:0.571429 19:0.150000 20:0.007693 21:0.672292 22:0.784546 23:0.700592 24:0.809056 25:0.330412 26:0.000000 27:0.000000 28:0.000000 29:0.357165 30:0.000000 31:0.000000 32:0.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.694775 38:0.803481 39:0.725529 40:0.826979 41:1.000000 42:0.250000 43:0.000000 44:0.135091 45:0.105263 46:0.034483 #docid = GX004-37-11235977 inc = 0.0572482685995155 prob = 0.867248
2 qid:10056 1:0.133127 2:0.888889 3:1.000000 4:0.000000 5:0.161994 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.246180 12:0.809717 13:1.000000 14:0.000000 15:0.323971 16:0.011540 17:0.541667 18:0.428571 19:0.150000 20:0.011761 21:0.838357 22:0.897991 23:0.900076 24:0.897615 25:0.813411 26:1.000000 27:0.918560 28:0.915127 29:1.000000 30:1.000000 31:1.000000 32:1.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.900711 38:0.956808 39:0.971849 40:0.954784 41:1.000000 42:0.241071 43:0.000000 44:0.172420 45:1.000000 46:1.000000 #docid = GX012-07-6597432 inc = 1 prob = 0.983119
0 qid:10056 1:0.000000 2:0.000000 3:0.333333 4:0.333333 5:0.000000 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.028872 12:0.000000 13:0.413937 14:0.380640 15:0.051310 16:0.000000 17:0.000000 18:0.142857 19:0.250000 20:0.000000 21:0.344383 22:0.342126 23:0.065642 24:0.172238 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.575903 30:0.110355 31:0.037870 32:0.076987 33:0.530176 34:0.000000 35:0.000000 36:0.000000 37:0.467593 38:0.408779 39:0.174411 40:0.169594 41:1.000000 42:0.285714 43:0.000000 44:0.000000 45:0.000000 46:0.000000 #docid = GX106-64-0336730 inc = 1 prob = 0.0830865
1 qid:10056 1:0.052632 2:0.000000 3:0.000000 4:0.000000 5:0.046729 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.108311 12:0.000000 13:0.000000 14:0.000000 15:0.108686 16:0.004253 17:0.083333 18:0.714286 19:0.350000 20:0.004389 21:0.768296 22:0.878078 23:0.724629 24:0.905898 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.766041 38:0.870349 39:0.723306 40:0.898199 41:1.000000 42:0.303571 43:0.000000 44:0.252919 45:0.157895 46:0.000000 #docid = GX142-64-2472087 inc = 0.0506903970754654 prob = 0.392123
0 qid:10056 1:0.009288 2:0.000000 3:0.000000 4:1.000000 5:0.012461 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.057557 12:0.000000 13:0.000000 14:1.000000 15:0.088138 16:0.000864 17:0.000000 18:0.000000 19:1.000000 20:0.001101 21:0.776012 22:0.783169 23:0.554848 24:0.771985 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.781721 34:0.930813 35:1.000000 36:0.898506 37:0.895931 38:0.877806 39:0.672227 40:0.808451 41:1.000000 42:1.000000 43:0.000000 44:0.000000 45:0.000000 46:0.000000 #docid = GX161-43-7991020 inc = 1 prob = 0.505019
1 qid:10056 1:0.055728 2:0.000000 3:0.000000 4:0.000000 5:0.049844 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.206377 12:0.000000 13:0.000000 14:0.000000 15:0.206964 16:0.002423 17:0.000000 18:0.714286 19:0.550000 20:0.002593 21:1.000000 22:0.936061 23:0.858554 24:0.859561 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:1.000000 38:0.921210 39:0.856579 40:0.845357 41:1.000000 42:0.598214 43:0.000000 44:0.000000 45:0.000000 46:0.000000 #docid = GX169-00-10359237 inc = 1 prob = 0.269895
1 qid:10056 1:0.160991 2:0.000000 3:0.000000 4:0.000000 5:0.155763 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.431296 12:0.000000 13:0.000000 14:0.000000 15:0.431691 16:0.008032 17:0.000000 18:0.714286 19:0.550000 20:0.008202 21:0.947859 22:1.000000 23:1.000000 24:1.000000 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.951447 38:1.000000 39:1.000000 40:1.000000 41:1.000000 42:0.607143 43:0.000000 44:0.000000 45:0.000000 46:0.000000 #docid = GX207-04-13100446 inc = 1 prob = 0.779311
0 qid:10056 1:0.009288 2:0.000000 3:0.000000 4:0.000000 5:0.003115 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.036442 12:0.000000 13:0.000000 14:0.000000 15:0.036946 16:0.002203 17:0.000000 18:1.000000 19:0.200000 20:0.002288 21:0.530088 22:0.583607 23:0.442096 24:0.748607 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.520742 38:0.563572 39:0.439313 40:0.729926 41:0.500000 42:0.151786 43:0.000000 44:0.000000 45:0.000000 46:0.379310 #docid = GX239-68-0758843 inc = 0.023928833150798 prob = 0.286562

其中第一列表示标签label值,label值表示

则其排在第i位置的dcg为:

\[dcg(i)=\frac{2^{label(i)}+1}{\log(i+1)}\]

对于文档i与j,设打分函数为F(X)(X=w1v1+w2v2+w3v3+…+wnvn),则F(Xi)-F(Xj)越大,i排在j前面的概率越高,即F(Xi)-F(Xj)表示文档i排在j前面的概率

但是概率的范围应该是[0,1]之间,参考逻辑斯蒂回归的归一化函数归一化得

\[P_(ij)=\frac {e^{F(x_i)-F(x_j)}} {1+e^{F(x_i)-F(x_j)}} \\ =\frac {e^{\sigma {(s_i-s_j)}}} {1+e^{\sigma {(s_i-s_j)}}}\\ =\frac 1 {1+e^{-\sigma {(s_i-s_j)}}} \\\]

定义其真实概率为

\[\overline {P_{ij}}=\frac 1 2 {(1+S_{ij})}\]

交叉熵函数

\[C=yln(1-a)+aln(1-y)\]

即:

\[C=-\overline P_{ij}logP_{ij}-(1-{\overline{P_{ij}}})log(1-P_{ij})\]

或?

\[C=-\overline P_{ij}lnP_{ij}-(1-{\overline{P_{ij}}})ln(1-P_{ij})\]

化简得 \(C=\frac 1 2(1-S_{ij})(s_i-s_j)+log{(1+e^{-(s_i-s_j)})}\)

This post is licensed under CC BY 4.0 by the author.