Rで「ガチャとは心の所作」 #TokyoSciPy #rstatsj

Tokyo.SciPy#3で id:AntiBayesian さんがLTされていた「ガチャとは心の所作」が非常に興味深かったのでRでもやってみました。
（もっとも、ustがなかったので私は発表資料を読んだだけですが…）

「ガチャとは心の所作」 d:id:AntiBayesian:20120318

Rで書いたらこんな感じになりました。

ガチャのシミュレーション関数はこんな感じ。

# probが各アイテムの出現確率
# nopがガチャを回す人数（既定値は10000人）
# iter.maxが一人が回す回数の上限（既定値は10000回）
gacha.sim <- function(prob, nop=10000, iter.max=10000){
  nop.result <- numeric(nop)
  items <- length(prob)
  result <- matrix(sample(1:items, nop*iter.max, replace=TRUE, prob=prob),
                   iter.max)
  for (i in 1:nop) {
    for (j in 1:iter.max) {
      if (length(unique(result[1:j, i]))==items){
        nop.result[i] <- j
        break
        }
      }
    }
  list("result" = nop.result, "prob"=prob)
  }

二重ループで効率悪いのですが、きっと id:a_bicky さんや裏RjpWikiの中の人がもっと良い方法を示唆してくれるものと思います。

シミュレーションの実行はこんな感じで。

# ここでは４つの出現確率パターンで試してみる。
result1 <- gacha.sim(c(  1,  1,  1,  1, 1, 1))
result2 <- gacha.sim(c(100, 50, 10, 10, 3, 1))
result3 <- gacha.sim(c(  5,  5,  3,  3, 2, 1))
result4 <- gacha.sim(c( 10,  3,  2,  2, 1, 1))

シミュレーションの結果をヒストグラムにしてみましょう。

library(MASS)
windows(12, 9)
par(mfrow=c(2, 2))

x <- result1
truehist(x$result, xlim=c(min(x$result), max(x$result)),
         xlab="コンプまでの回数", ylab="人数", prob=FALSE)
legend(30,600,
       legend=c("出現確率 [1, 1, 1, 1, 1, 1]",
                paste("平均", round(mean(x$result), 1)),
                paste("標準偏差", round(sd(x$result), 1)),
                paste("中央値", median(x$result)),
                paste("最大値", max(x$result))))
par(new=TRUE)
plot(density(x$result), xlim=c(min(x$result), max(x$result)), col="red",
     lwd=2, xlab="", ylab="", axes=FALSE, main="")

x <- result2
truehist(x$result, xlim=c(min(x$result), max(x$result)),
         xlab="コンプまでの回数", ylab="人数", prob=FALSE)
legend(500,800,
       legend=c("出現確率 [100, 50, 10, 10, 3, 1]",
                paste("平均", round(mean(x$result), 1)),
                paste("標準偏差", round(sd(x$result), 1)),
                paste("中央値", median(x$result)),
                paste("最大値", max(x$result))))
par(new=TRUE)
plot(density(x$result), xlim=c(min(x$result), max(x$result)), col="red",
     lwd=2, xlab="", ylab="", axes=FALSE, main="")

x <- result3
truehist(x$result, xlim=c(min(x$result), max(x$result)),
         xlab="コンプまでの回数", ylab="人数", prob=FALSE)
legend(75,600,
       legend=c("出現確率 [5, 5, 3, 3, 2, 1]",
                paste("平均", round(mean(x$result), 1)),
                paste("標準偏差", round(sd(x$result), 1)),
                paste("中央値", median(x$result)),
                paste("最大値", max(x$result))))
par(new=TRUE)
plot(density(x$result), xlim=c(min(x$result), max(x$result)), col="red",
     lwd=2, xlab="", ylab="", axes=FALSE, main="")

x <- result4
truehist(x$result, xlim=c(min(x$result), max(x$result)),
         xlab="コンプまでの回数", ylab="人数", prob=FALSE)
legend(75,1000,
       legend=c("出現確率 [10, 3, 2, 2, 1, 1]",
                paste("平均", round(mean(x$result), 1)),
                paste("標準偏差", round(sd(x$result), 1)),
                paste("中央値", median(x$result)),
                paste("最大値", max(x$result))))
par(new=TRUE)
plot(density(x$result), xlim=c(min(x$result), max(x$result)), col="red",
     lwd=2, xlab="", ylab="", axes=FALSE, main="")

描画ももっとスマートな方法がありそう。
軸の目盛もちゃんと合わせた方がいいよなぁ。

ちなみに実行速度は

system.time(result1 <- gacha.sim(c(  1,  1,  1,  1, 1, 1)))
# ユーザ   システム       経過  
#     4.94       0.33       5.27 

system.time(result2 <- gacha.sim(c(100, 50, 10, 10, 3, 1)))
# ユーザ   システム       経過  
#    31.11       0.20      31.47 

system.time(result3 <- gacha.sim(c(  5,  5,  3,  3, 2, 1)))
# ユーザ   システム       経過  
#    5.99       0.20       6.20 

system.time(result4 <- gacha.sim(c( 10,  3,  2,  2, 1, 1)))
# ユーザ   システム       経過  
#    6.85       0.22       7.07

ちょっと思ったこととして、我々がガキの頃１回20円で回してたガチャガチャは非復元抽出だったわけだけど、今のソーシャルゲームのガチャは復元抽出なんだよね。
そのあたりが感覚的にわかりにくくて文句言う人が出てきてるのかも。

なお、coupon collector's problemについてはこちらのびいだまブログさんの記事も参考になります。

【追記】
期待通りにビッキーさんがやってくれました！
d:id:a_bicky:20120318:1332083264
こちらに載っているgacha.sim2関数は爆速です！
一応、私の環境で試してみると、

library(rbenchmark)
p <- rep(1, 6); benchmark(gacha.sim(p), gacha.sim2(p), replications = 1)
#            test replications elapsed relative user.self sys.self user.child sys.child
# 1  gacha.sim(p)            1    6.27 2.125424      5.95     0.29         NA        NA
# 2 gacha.sim2(p)            1    2.95 1.000000      2.95     0.00         NA        NA

p <- c(100, 50, 10, 10, 3, 1); benchmark(gacha.sim(p), gacha.sim2(p), replications = 1)
#            test replications elapsed relative user.self sys.self user.child sys.child
# 1  gacha.sim(p)            1   44.72  17.0038     44.35     0.33         NA        NA
# 2 gacha.sim2(p)            1    2.63   1.0000      2.62     0.00         NA        NA

すげー。
３倍から22倍も早くなってる。