AtCoder Heuristic Contest 022に参加し、442位でした。

自己紹介

AtCoderはshogo314でやってます。現在のHeuristicレートは1279です。正直アルゴに比べて苦手意識があります。

やったこと（時系列）

環境整備

多くのテストケースで実行してくれるようにする。

サンプル

i番目の出口セルの位置についてPの値を10∗iに、出口セルが存在しない位置は0に設定する
各ワームホールについて、y=0,x=0で1回計測し、出口セルの中でPの値が計測値と最も近いものを推定結果として回答する

シードが0~99のときの平均
Score = 61006.45
Number of wrong answers = 70.54
Placement cost = 58216412.0
Measurement cost = 80050.0
Measurement count = 80.05

計測回数を増やす

とりあえず100回にする。

Score = 193196.03
Number of wrong answers = 54.96
Placement cost = 58216412.0
Measurement cost = 8005000.0
Measurement count = 8005.0

測定しないセルを均す

出口セルがないところについて始め0にする。
その場所と上下左右の5つの平均に変更する。 2の更新を収束するまで繰り返す。高々500回で収束した。
この方法は結構悩んで思いついた。

Score = 1043108.47
Number of wrong answers = 54.96
Placement cost = 4380140.38
Measurement cost = 8005000.0
Measurement count = 8005.0

シード0

出口セルの温度を並び替え

出口セルははじめ、ソートされているためそこそこ綺麗にならんでいるが、最適ではない。

(0,0)に近いセル程温度が小さくなるようにした。

Score = 1273992.77
Number of wrong answers = 54.87
Placement cost = 2189832.44
Measurement cost = 8005000.0
Measurement count = 8005.0

提出したときの得点が46665734。
https://atcoder.jp/contests/ahc022/submissions/44593554

分散ごとに調整

現在、計測コストの方が配置コストより大きいので、不要な計測を減らす。

S=1のとき

計測回数1回（シード0~99）
Score = 46539724.16
Number of wrong answers = 0.0
Placement cost = 2184762.94
Measurement cost = 80050.0
Measurement count = 80.05

十分余裕がありそうなので出口セルごとの10度の差を狭める

差	4	5	6	7	8	9	10
Score	58501072.9	111642068.92	94756777.73	83280013.78	67687172.44	55751381.55	46539724.16
Number of wrong answers	5.62	0.97	0.52	0.04	0.01	0.0	0.0

5を採用

Score = 111642068.92
Number of wrong answers = 0.97
Placement cost = 606090.1
Measurement cost = 80050.0
Measurement count = 80.05

S=4のとき

計測回数と差で3分探索
計測回数10、差7にする

Score = 46035178.9
Number of wrong answers = 0.69
Placement cost = 1112203.54
Measurement cost = 800500.0
Measurement count = 800.5

S=9のとき

計測回数30、差9にする

Score = 21359258.4
Number of wrong answers = 0.69
Placement cost = 1785155.8
Measurement cost = 2401500.0
Measurement count = 2401.5

S=16のとき

差はなるべく大きくなるよう1000/Nとし、計測回数は差が大きいほど小さくなるよう600/(1000/N)とした

Score = 11425781.84
Number of wrong answers = 1.33
Placement cost = 3018406.44
Measurement cost = 4068960.0
Measurement count = 4068.96

S=25のとき

差はなるべく大きくなるよう1000/Nとし、計測回数はmin(10000 / N, 1280 / (1000 / N))とした。

Score = 5487954.38
Number of wrong answers = 2.92
Placement cost = 3018406.44
Measurement cost = 8099700.0
Measurement count = 8099.7

S>=36のとき

差はなるべく大きくなるよう1000/Nとし、計測回数はなるべく大きくなるよう600/(1000/N)とした。

S = 36
Score = 2139835.28
Number of wrong answers = 9.35
Placement cost = 3018406.44
Measurement cost = 9961740.0
Measurement count = 9961.74

あわせる

シード0~99
Score = 6659829.24
Number of wrong answers = 53.56
Placement cost = 2855445.76
Measurement cost = 8864140.0
Measurement count = 8864.14

シードが大きいときはコストが大きいが正答率が上がり、シードが小さいときは正答率をあまり下げずにコストを大きく減らせた。
提出したときの得点が171125470。ここまで出来たら水perfはありそう。
https://atcoder.jp/contests/ahc022/submissions/44599076

出口セルの右側のセルも計測する

出口セルの部分だけを計測する戦略だとそろそろ限界が近い（SだけでなくLやNで場合わけして調節すれば、おそらくもう少しだけ上げられる）。

S=100にして調べる。
Score = 5161.73
Number of wrong answers = 45.89
Placement cost = 3018406.44
Measurement cost = 9961740.0
Measurement count = 9961.74

これの改善を目標にする。
最大100段階にわけるため、差が10しかなく、標準偏差が100もあると100回計測しても区別できなくなってしまうのが問題である。

出口セルの右側のセルにも意味のある数字を設定することにする。
温度を100段階にわけていたのを右側も含めて10×10段階にする。
右側に出口セルが既にあり設定出来ない場合は無視する。

S=100
シード0
以前

以後
Score = 871116.64
Number of wrong answers = 8.1
Placement cost = 29694576.56
Measurement cost = 8405250.0
Measurement count = 8005.0

よさげ
S>=49のときはこちらを採用することにする。

Score = 6862727.85
Number of wrong answers = 41.5
Placement cost = 23144140.46
Measurement cost = 7641190.0
Measurement count = 7338.94

提出したときの得点が184347800。
https://atcoder.jp/contests/ahc022/submissions/44603614
おそらくSが大きいときは不十分。

スコアの期待値を計算したい

正規分布のあたりの知識が欠落しているので実際に確かめながらやってみる。
正規分布から値をサンプリングする関数はモダンな言語なら大抵用意されている。

正規分布から値をサンプリング

コード

import numpy as np
import matplotlib.pyplot as plt
dataSize = np.int64(1e5)
sigma = np.float64(1)
noise = np.random.normal(0.0, sigma, dataSize)
plt.figure()
plt.hist(noise, bins=100, density=True, color="b")
x = np.arange(-5, 5, 0.01)
gauss = 1/np.sqrt(2*np.pi*sigma**2)*np.exp(-x*x/(2*sigma**2))
plt.plot(x, gauss, "r--")
plt.xlabel('Deviation')
plt.ylabel('Density')
plt.show()

確率密度関数が

であることが確認できる。

2個サンプリングして平均をとる

コード

import numpy as np
import matplotlib.pyplot as plt
dataSize = np.int64(1e5)
sigma = np.float64(1)
noise = np.random.normal(0.0, sigma, dataSize)
noise += np.random.normal(0.0, sigma, dataSize)
noise /= 2
plt.figure()
plt.hist(noise, bins=100, density=True, color="b")
x = np.arange(-5, 5, 0.01)
sigma /= np.sqrt(2)
gauss = 1/np.sqrt(2*np.pi*sigma**2)*np.exp(-x*x/(2*sigma**2))
plt.plot(x, gauss, "r--")
plt.xlabel('Deviation')
plt.ylabel('Density')
plt.show()

実は平均をとっても正規分布になる。標準偏差が1/sqrt(2)倍、つまり分散が1/2になっている。
一般化すると、n個の平均をとると、標準偏差は1/sqrt(n)になる。
100回計測して平均をとったとき、精度が上がったのはこれで理解できる。

正規分布の特定の範囲に入る確率

誤差がx以下になる確率を考える。実験してみる。

コード

import numpy as np
import matplotlib.pyplot as plt
import scipy
dataSize = np.int64(1e5)
sigma = np.float64(1)
noise = np.random.normal(0.0, sigma, dataSize)
noise.sort()
y = np.arange(dataSize)/dataSize
plt.figure()
plt.plot(noise, y, "b")
x = np.arange(-5, 5, 0.01)
erf = (1+scipy.special.erf(x/np.sqrt(2*sigma**2)))/2
plt.plot(x, erf, "r--")
plt.xlabel('Deviation')
plt.ylabel('ratio')
plt.show()

誤差がx以下になる確率、累積分布関数は

となる。erfは関数が用意されているので、詳しい説明はしない。

特定の確率でおこる事象がk回起こる確率

1%の確率で起こる事象が100回中k回起こる確率密度関数

コード

import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
dataSize = np.int64(1e5)
sampling = [(np.random.rand(100) < 0.01).sum() for i in range(dataSize)]
x, y = zip(*sorted(Counter(sampling).items()))
y = np.array(y)/dataSize
plt.figure()
plt.plot(x, y, 'o-')
plt.xlabel('times')
plt.ylabel('Density')
plt.show()

これはポアソン分布という。

確率pで起こる事象がn回中ちょうどk回起こる確率は、

である。ここでp=λ/nとすると下のようになり、

n→∞とすると

となるのだが、これは使わない。

実際n=100、λ=1を入れてみると、

k=0のとき0.3660
k=1のとき0.3697
k=2のとき0.1849
k=3のとき0.0610
概ね正しそうであることが確認できる。

予測スコアを計算して改善

予測スコアを求めてみる

武器は揃った。 0~1000の範囲に入らない場合、無理矢理この範囲に収めるため正確ではないのだが影響は十分小さいと考え無視する。

出口セルを100回計測するコードでスコアを求めてみる。
使うのは分散で場合わけした直前のコードを少し改善したもの。具体的には温度をi∗10からi∗10+5に変更した。

各シード値でのスコア

seed	Score	Number of wrong answers	Placement cost	Measurement cost	Measurement count
0	90791	20	3098642	9500000	9500
1	29	58	1593386	6600000	6600
2	1	82	2636568	8900000	8900
3	13740360	0	1077830	6100000	6100
4	1	80	2219656	8100000	8100
5	1	89	3172916	9300000	9300
6	1	88	2420828	8900000	8900
7	306617	16	1980038	7100000	7100
8	1	86	2488344	8700000	8700
9	298	48	1283724	6100000	6100
10	8050058	0	2822272	9500000	9500
11	158	50	1664606	7300000	7300
12	2246726	7	1634262	7600000	7600
13	7878710	0	2992434	9600000	9600
14	281	48	1351002	6500000	6500
15	1	88	2745460	9300000	9300
16	1095	41	1813038	7800000	7800
17	29	56	3217714	9800000	9800
18	2917552	6	1585070	7300000	7300
19	8	64	1623344	6500000	6500
20	75	52	2718534	9400000	9400
21	40	57	1226772	6200000	6200
22	41	57	1171738	6200000	6200
23	6742754	1	2764590	9000000	9000
24	14	62	1048700	6200000	6200
25	8	64	1632254	6500000	6500
26	1	74	1989926	7400000	7400
27	1	86	3446052	9500000	9500
28	1	74	1775136	7600000	7600
29	3339282	5	1812890	7900000	7900
30	8126288	0	2605742	9600000	9600
31	6	65	1536048	7200000	7200
32	9	63	1913126	7000000	7000
33	22	60	1163680	6000000	6000
34	12540522	0	1374150	6500000	6500
35	42	57	1102706	6000000	6000
36	8251728	0	2918674	9100000	9100
37	1	85	3426578	8500000	8500
38	3	68	1874656	7100000	7100
39	1	73	1977426	7900000	7900
40	1	95	3440096	9800000	9800
41	10349996	0	2061840	7500000	7500
42	575523	14	1141832	6400000	6400
43	115872	19	2837540	9500000	9500
44	1	78	2108154	7900000	7900
45	1	86	2969928	8700000	8700
46	1	89	3435172	9200000	9200
47	1	76	2904854	8900000	8900
48	2	68	3488868	9900000	9900
49	1041387	11	1348552	6800000	6800
50	1	77	2191778	8500000	8500
51	27	58	1593368	7400000	7400
52	1	78	2022976	8000000	8000
53	8	64	1462500	7000000	7000
54	22	59	1694966	7100000	7100
55	8946113	0	2678040	8400000	8400
56	1949	39	1628258	6800000	6800
57	1	76	1944562	8200000	8200
58	1	73	1870680	7600000	7600
59	6	65	1481398	6800000	6800
60	1	94	3113594	9600000	9600
61	44507	23	3563216	9600000	9600
62	3	69	1498994	7100000	7100
63	3	67	2926834	9400000	9400
64	1	74	1755430	7500000	7500
65	1	86	2746564	9300000	9300
66	1	79	1990120	7900000	7900
67	2	70	1462828	7300000	7300
68	1	76	2547210	8300000	8300
69	1	80	2101772	8200000	8200
70	2	72	1833956	7600000	7600
71	11	61	2851206	9100000	9100
72	726	43	1779660	7500000	7500
73	1	97	3561452	9800000	9800
74	56591	24	1444866	6800000	6800
75	1	81	2070848	8100000	8100
76	8838478	0	2614166	8600000	8600
77	706	42	3063870	8900000	8900
78	1	80	2284820	8600000	8600
79	9994775	0	2005228	7900000	7900
80	1	76	2241496	8200000	8200
81	3	67	2788358	9500000	9500
82	9729016	0	2078532	8100000	8100
83	11	62	1971160	6900000	6900
84	13	62	1319316	6700000	6700
85	2	69	4028562	9500000	9500
86	2	72	1830988	7400000	7400
87	7	65	1516618	6700000	6700
88	8032289	0	2649752	9700000	9700
89	528062	13	2110828	8200000	8200
90	11	63	1160222	6300000	6300
91	1	74	1975500	7900000	7900
92	1	91	3127390	9500000	9500
93	1	98	2914300	9900000	9900
94	2	70	1673378	7000000	7000
95	1	80	3109064	9400000	9400
96	2	70	1585056	7200000	7200
97	1	83	2347484	8700000	8700
98	1	82	2552252	8400000	8400
99	344038	16	1581520	6500000	6500

Score = 1328337.49
Number of wrong answers = 54.86
Placement cost = 2189842.84
Measurement cost = 8005000.0
Measurement count = 8005.0

予測スコアを計算して出力する

各シード値での期待されるスコアと間違う数

seed	Score	Number of wrong answers	Expected Score	Expected Number of wrong answers
0	90791	20	277072	15
1	29	58	46	56
2	1	82	1	78
3	13740360	0	13740360	0
4	1	80	1	74
5	1	89	1	85
6	1	88	1	84
7	306617	16	935719	11
8	1	86	1	82
9	298	48	298	48
10	8050058	0	8050058	0
11	158	50	1833	39
12	2246726	7	5485169	3
13	7878710	0	7878710	0
14	281	48	351	47
15	1	88	1	82
16	1095	41	1095	41
17	29	56	70	52
18	2917552	6	5698342	3
19	8	64	19	60
20	75	52	117	50
21	40	57	50	56
22	41	57	98	53
23	6742754	1	8428442	0
24	14	62	27	59
25	8	64	15	61
26	1	74	2	70
27	1	86	1	78
28	1	74	3	69
29	3339282	5	5217628	3
30	8126288	0	8126288	0
31	6	65	22	59
32	9	63	22	59
33	22	60	42	57
34	12540522	0	12540522	0
35	42	57	65	55
36	8251728	0	8251728	0
37	1	85	1	79
38	3	68	6	65
39	1	73	3	68
40	1	95	1	93
41	10349996	0	10349996	0
42	575523	14	1405085	10
43	115872	19	282889	15
44	1	78	1	75
45	1	86	1	82
46	1	89	1	86
47	1	76	2	71
48	2	68	10	61
49	1041387	11	1041387	11
50	1	77	2	71
51	27	58	42	56
52	1	78	1	75
53	8	64	18	60
54	22	59	18	60
55	8946113	0	8946113	0
56	1949	39	3806	36
57	1	76	2	71
58	1	73	2	72
59	6	65	8	64
60	1	94	1	87
61	44507	23	265278	15
62	3	69	4	67
63	3	67	7	63
64	1	74	3	69
65	1	86	1	81
66	1	79	2	72
67	2	70	3	69
68	1	76	1	75
69	1	80	1	78
70	2	72	2	72
71	11	61	11	61
72	726	43	1418	40
73	1	97	1	92
74	56591	24	138160	20
75	1	81	1	73
76	8838478	0	8838478	0
77	706	42	232	47
78	1	80	1	77
79	9994775	0	9994775	0
80	1	76	4	67
81	3	67	20	58
82	9729016	0	9729016	0
83	11	62	102	52
84	13	62	30	58
85	2	69	2	69
86	2	72	2	70
87	7	65	10	63
88	8032289	0	8032289	0
89	528062	13	528062	13
90	11	63	40	57
91	1	74	2	70
92	1	91	1	87
93	1	98	1	88
94	2	70	5	66
95	1	80	1	77
96	2	70	3	68
97	1	83	1	80
98	1	82	1	79
99	344038	16	1312399	10

Expected Score = 1455039.86
Expected Number of wrong answers = 51.6
Expected Placement cost = 2189842.84
Expected Measurement cost = 8005000.0
Score = 1328337.49
Number of wrong answers = 54.86
Placement cost = 2189842.84
Measurement cost = 8005000.0
Measurement count = 8005.0

配置コストと計測コストは正確に計算できるが、間違う数はランダムである。今回は間違う数の平均値をもとにスコアを計算したため、正確には期待値ではない。

全体的に間違う数が少なく出てしまっている。そのわりにはスコアの平均値は1割程しか変わらない。これはスコアへの影響が大きい間違う数が0のときを正確に計算出来ているためだと考えられる。

予測スコアから計測回数などを調整する

計測回数と出口セルごとの温度の差の2つを変え、15パターンで予想スコアを計算し、最もよかったものを採用する。

Score = 6828002.87
Number of wrong answers = 53.9
Placement cost = 1888787.5
Measurement cost = 5612150.0
Measurement count = 5612.15

場合分けを使わずに、分散ごとに場合わけした場合と同じくらいのスコアを得られた。
提出したときの得点が217889721。
https://atcoder.jp/contests/ahc022/submissions/44638443
場合分けを詰め切れていなかっただけだろうが、出口セルを見るだけで、右側も見るようにしたものを超えることが出来た。

分散ごとに場合わけ

分散ごとに、計測回数と差の候補を変える。

Score = 7080872.04
Number of wrong answers = 53.5
Placement cost = 2844973.2
Measurement cost = 8782590.0
Measurement count = 8782.59

提出したときの得点が196923321。
https://atcoder.jp/contests/ahc022/submissions/44646344 むしろ下がってしまった。スコアの予測の精度が低いためだと考えられる。もう少し精度を上げる余地があるが、後回し。

抜本的な解決をはかる。

出口セルの右側のセルも計測するパート2

以前は右側がすでに決まっていた場合、右側の数字をそのままにしていた。しかし、これでは重なる場合があるので、2つのペアが重複なく選ぶようにする。

Score = 7187204.81
Number of wrong answers = 39.81
Placement cost = 15115268.0
Measurement cost = 5302670.0
Measurement count = 5107.82

提出したときの得点が246053584。
https://atcoder.jp/contests/ahc022/submissions/44681182
いい感じ

右側以外のセルも計測する

出口セルだけをみるケースのパターン数を減らし、右も見る、右と左も見る、右と左と下も見る、右と左と下と上も見るケースを追加した。

Score = 7450807.57
Number of wrong answers = 28.47
Placement cost = 26950566.86
Measurement cost = 7085074.0
Measurement count = 6699.85

提出したときの得点が230821137。
https://atcoder.jp/contests/ahc022/submissions/44683982
得点は下がってしまったが、相対スコアは上がった。

相対スコアを考えるために、テストケースごとに実際のスコアの最大値を求める

S=1、シード値0~299で計測回数と差を変えて最もスコアが高くなる物をしらべる LとNによって最適な計測回数と差はある程度決まると思っていたがそうでもない。

予想スコアがうまく働いていないと見て、とりあえず、S=1のとき(計測回数,差)=(2,5)に固定した方がいいか？

上下左右以外も見る

合計11カ所見る
Score = 7405218.39
Number of wrong answers = 24.16
Placement cost = 35002507.7
Measurement cost = 8271817.0
Measurement count = 7705.87
提出したときの得点が249240205。
https://atcoder.jp/contests/ahc022/submissions/44784661
Sが大きいときのスコアは大幅に改善したはずだが相対スコアはあまり上がらず。たぶん0.00001が0.02になった、みたいな差。この感じだとこれ以上頑張っても難しそう。時間もないのでここで断念。

感想

結構頑張ったけど、水perfが限界。やれることをやりきったわけではないが、やりきったとしても青perfは出せそうにないなあという気持ち。
ある程度機能する予想スコア計算が書けたのは嬉しい。

長期コンは疲れる。期間中あんまり精進が出来なかった。夏休みはまだまだあるので、精進するつもり。時間を取られすぎたから、AHC023は出ないかな。

shogo314のブログ

AtCoder Heuristic Contest 022 参加記