File: data/TEXTEN1.txt

Input size: 221098 (vocabulary has 9607 words)

Test data size: 20000 (vocabulary has 3159 words)
Heldout data size: 40000 (vocabulary has 3858 words)
Training data size: 161098 (vocabulary has 8076 words)
Word coverage (test data / training data): 75.8151313706869%

EM algorithm

Convergence epsilon is 0.01

Step 1: lambdas = 0.25 0.25 0.25 0.25
Step 2: lambdas = 0.135943240413787 0.260657210967233 0.390657428104089 0.212742120514891
Step 3: lambdas = 0.109828357142925 0.262573377825893 0.450944402203448 0.176653862827733
Step 4: lambdas = 0.102257791753817 0.264024130309261 0.477806729138784 0.155911348798138
Step 5: lambdas = 0.0997435920985783 0.264658870931708 0.49111584849892 0.144481688470794
Final step: lambdas = 0.0988538835424104 0.264775301355878 0.498263135151856 0.138107679949856

Cross entropy for tweaked lambdas:

l(0)       l(1)       l(2)       l(3)         entropy  
0.0988539  0.2647753  0.4982631  0.1381077    7.5591784
0.0889685  0.2382978  0.4484368  0.2242969    7.5728646
0.0790831  0.2118202  0.3986105  0.3104861    7.6171335
0.0691977  0.1853427  0.3487842  0.3966754    7.6877711
0.0593123  0.1588652  0.2989579  0.4828646    7.7858584
0.0494269  0.1323877  0.2491316  0.5690538    7.9165476
0.0395416  0.1059101  0.1993053  0.6552431    8.0905901
0.0296562  0.0794326  0.1494789  0.7414323    8.3296847
0.0197708  0.0529551  0.0996526  0.8276215    8.6838889
0.0098854  0.0264775  0.0498263  0.9138108    9.3144557
0.0049427  0.0132388  0.0249132  0.9569054    9.9604451
0.0009885  0.0026478  0.0049826  0.9913811    11.4816344
0.1004379  0.2690180  0.5062472  0.1242969    7.5606686
0.1020219  0.2732607  0.5142312  0.1104861    7.5634756
0.1036059  0.2775034  0.5222153  0.0966754    7.5677901
0.1051899  0.2817461  0.5301994  0.0828646    7.5738728
0.1067739  0.2859888  0.5381834  0.0690538    7.5820964
0.1083580  0.2902315  0.5461675  0.0552431    7.5930244
0.1099420  0.2944742  0.5541515  0.0414323    7.6075853
0.1115260  0.2987169  0.5621356  0.0276215    7.6275307
0.1131100  0.3029596  0.5701196  0.0138108    7.6571284
0.1146940  0.3072023  0.5781037  0.0000000    7.7241744

File: data/TEXTCZ1.txt

Input size: 222412 (vocabulary has 42826 words)

Test data size: 20000 (vocabulary has 7135 words)
Heldout data size: 40000 (vocabulary has 12862 words)
Training data size: 162412 (vocabulary has 35303 words)
Word coverage (test data / training data): 65.1716888577435%

EM algorithm

Convergence epsilon is 0.01

Step 1: lambdas = 0.25 0.25 0.25 0.25
Step 2: lambdas = 0.282828190990521 0.388350340124106 0.232445039354427 0.0963764295309468
Step 3: lambdas = 0.266068331282906 0.424789175396846 0.235623548055184 0.0735189452650641
Step 4: lambdas = 0.257854257446191 0.436757885785049 0.23903737632022 0.0663504804485399
Final step: lambdas = 0.254646780182521 0.440833209641139 0.240983457162891 0.0635365530134492

Cross entropy for tweaked lambdas:

l(0)       l(1)       l(2)       l(3)         entropy  
0.2546468  0.4408332  0.2409835  0.0635366    10.3940717
0.2291821  0.3967499  0.2168851  0.1571829    10.4072380
0.2037174  0.3526666  0.1927868  0.2508292    10.4758316
0.1782527  0.3085832  0.1686884  0.3444756    10.5809330
0.1527881  0.2644999  0.1445901  0.4381219    10.7209351
0.1273234  0.2204166  0.1204917  0.5317683    10.9013487
0.1018587  0.1763333  0.0963934  0.6254146    11.1352845
0.0763940  0.1322500  0.0722950  0.7190610    11.4497538
0.0509294  0.0881666  0.0481967  0.8127073    11.9072995
0.0254647  0.0440833  0.0240983  0.9063537    12.7094830
0.0127323  0.0220417  0.0120492  0.9531768    13.5236368
0.0025465  0.0044083  0.0024098  0.9906354    15.4302172
0.2563745  0.4438241  0.2426185  0.0571829    10.3968995
0.2581022  0.4468151  0.2442535  0.0508292    10.4005594
0.2598299  0.4498060  0.2458885  0.0444756    10.4052037
0.2615576  0.4527970  0.2475235  0.0381219    10.4110421
0.2632853  0.4557879  0.2491585  0.0317683    10.4183769
0.2650130  0.4587788  0.2507935  0.0254146    10.4276723
0.2667408  0.4617698  0.2524285  0.0190610    10.4397108
0.2684685  0.4647607  0.2540635  0.0127073    10.4560225
0.2701962  0.4677516  0.2556985  0.0063537    10.4805851
0.2719239  0.4707426  0.2573335  0.0000000    10.5519543