Número esperado em que estarei depois de comprar cartas até obter um ás, 2, 3 e assim por diante

12

Estou tendo problemas para resolver o seguinte.

Você compra cartas de um baralho padrão de 52 cartas sem ser substituído até obter um ás. Você tira do que resta até obter um 2. Você continua com 3. Qual é o número esperado em que você estará depois que o baralho inteiro acabar?

Era natural deixar

$T_i = \text{first position of card whose value is }i$
$U_i = \text{last position of card whose value is }i$

Portanto, o problema basicamente equivale a descobrir a probabilidade de você estar em $k$ quando o baralho acabar, a saber:

P r (T_{1} < \dots < T_{k} \cap U_{k + 1} < T_{k})

$Pr(T_1<\cdots<T_k \cap U_{k+1} < T_k)$

Eu posso ver isso

P r (T_{1} < \dots < T_{k}) = 1 / k! and P r (U_{k + 1} < T_{k}) = 1 / 70

$Pr(T_1<\cdots<T_k) = 1/k! \\ \text{and} \\ Pr(U_{k+1} < T_k) = 1/70$

mas não podia ir mais longe ...

— conta
fonte

1

O que acontece se você já tiver tirado todas as

2

$2$ s no momento em que você desenhar o seu primeiro ace?

— gung - Restabelece Monica

O número "esperado" realmente significa o número "mais provável"?

— whuber

Esse é um problema interessante, mas não tenho certeza sobre a matemática que você escreve depois de "o problema basicamente equivale a". Na primeira afirmação, você quis escrever

\cap

$\cap$ vez de

\cup

$\cup$ ? Mesmo assim, não tenho certeza se a afirmação está correta. Considere um início de sequência 2AAA2. Temos

T_{1} = 2, T_{2} = 1

$T_1=2, T_2=1$ e, portanto,

T_{1} > T_{2}

$T_1 > T_2$ , mas se eu entender a descrição do seu texto corretamente, ainda podemos escolher o Ás na segunda posição e depois o 2 na quinta posição? E, portanto,

T_{1} < T_{2}

$T_1 < T_2$ não é uma condição necessária?

— precisa saber é o seguinte

@TooTone Oh, eu quis dizer

\cap

$\cap$ como você disse, e você está certo;

T_{1} < T_{2}

$T_1 < T_2$ não é uma condição necessária ...

— conta

@gung Nesse caso, seu baralho acabará e você ainda estará no 2.

— bill

0

seguindo a idéia de @ gung, acredito que o valor esperado seria 5,84? e da minha interpretação dos comentários, estou assumindo que "A" é um valor quase impossível (a menos que as últimas quatro cartas do baralho sejam todas ases). Aqui estão os resultados de uma simulação de Monte Carlo de 100.000 iterações

results
    2     3     4     5     6     7     8     9     J     K     Q     T 
 1406  7740 16309 21241 19998 15127  9393  4906   976   190   380  2334

e aqui está o código R, caso você queira brincar com ele ..

# monte carlo card-drawing functions from here
# http://streaming.stat.iastate.edu/workshops/r-intro/lectures/5-Rprogramming.pdf

# create a straightforward deck of cards
create_deck <-
    function( ){
        suit <- c( "H" , "C" , "D" , "S" )
        rank <- c( "A" , 2:9 , "T" , "J" , "Q" , "K" )
        deck <- NULL
        for ( r in rank ) deck <- c( deck , paste( r , suit ) )
        deck
    }

# construct a function to shuffle everything
shuffle <- function( deck ){ sample( deck , length( deck ) ) }

# draw one card at a time
draw_cards <-
    function( deck , start , n = 1 ){
        cards <- NULL

        for ( i in start:( start + n - 1 ) ){
            if ( i <= length( deck ) ){
                cards <- c( cards , deck[ i ] )
            }
        }

        return( cards )
    }

# create an empty vector for your results
results <- NULL

# run your simulation this many times..
for ( i in seq( 100000 ) ){
    # create a new deck
    sdeck <- shuffle( create_deck() )

    d <- sdeck[ grep('A|2' , sdeck ) ]
    e <- identical( grep( "2" , d ) , 1:4 )

    # loop through ranks in this order
    rank <- c( "A" , 2:9 , "T" , "J" , "Q" , "K" )

    # start at this position
    card.position <- 0

    # start with a blank current.draw
    current.draw <- ""

    # start with a blank current rank
    this.rank <- NULL

    # start with the first rank
    rank.position <- 1

    # keep drawing until you find the rank you wanted
    while( card.position < 52 ){

        # increase the position by one every time
        card.position <- card.position + 1

        # store the current draw for testing next time
        current.draw <- draw_cards( sdeck , card.position )

        # if you draw the current rank, move to the next.
        if ( grepl( rank[ rank.position ] , current.draw ) ) rank.position <- rank.position + 1

        # if you have gone through every rank and are still not out of cards,
        # should it still be a king?  this assumes yes.
        if ( rank.position == length( rank ) ) break        

    }

    # store the rank for this iteration.
    this.rank <- rank[ rank.position ]

    # at the end of the iteration, store the result
    results <- c( results , this.rank )

}

# print the final results
table( results )

# make A, T, J, Q, K numerics
results[ results == 'A' ] <- 1
results[ results == 'T' ] <- 10
results[ results == 'J' ] <- 11
results[ results == 'Q' ] <- 12
results[ results == 'K' ] <- 13
results <- as.numeric( results )

# and here's your expected value after 100,000 simulations.
mean( results )

— Anthony Damico
fonte

Por que é Aimpossível? Considere a sequência de 48 cartões, seguida por, AAAApor exemplo.

— precisa saber é o seguinte

você está certo .. é um fora de 270.725 - ou com código R1/prod( 48:1 / 52:5 )

— Anthony Damico

1

Esta resposta está incorreta. Considere a contagem de "2": como isso pode resultar apenas quando todos os 2 são encontrados antes de qualquer 1, sua probabilidade é de um em cada

e, portanto, a expectativa na sua simulação é

(\binom{8}{4}) = 70

$\binom{8}{4}=70$

com um erro padrão de

. Sua saída de

está acima de seis erros padrão muito altos, o que quase certamente está errado. Um valor exato para a média (com base em uma simulação diferente com

iterações) é

.

10^{5} / (\binom{8}{4}) \approx 1428.6

$10^5/\binom{8}{4}\approx 1428.6$

37.5

$37.5$

1660

$1660$

10^{6}

$10^6$

5.833 \pm 0.004

$5.833\pm 0.004$

— whuber

1

Infelizmente, seu código altamente documentado é várias vezes mais longo e mais lento do que precisa. Eu demonstrei que sua saída está incorreta; embora eu desejasse ter tempo para depurar seu código, não o faço e não é minha tarefa fazer isso. Meu argumento é o seguinte: você ainda estará trabalhando no "2" no final se e somente se todos os "2" precedem todos os "A". Entre os

maneiras igualmente prováveis de organizar os quatro "2" se quatro "A" s, exatamente um deles satisfaz esse critério. Portanto, o seu valorsob o título "2" deve ser perto de

, mas não é.

(\binom{4 + 4}{4}) = 70

$\binom{4+4}{4}=70$ results

10^{5} / 70 = 1429

$10^5/70=1429$

— whuber

1

Mesmo moderadores não podem remover os votos de outras pessoas :-). Um teste qui-quadrado agora sugere que seus resultados concordam com os meus, mas seria bom saber como você testou sua simulação, porque isso aumentaria a confiança em sua resposta. De fato, de acordo com uma edição que você fez no primeiro parágrafo da sua resposta, agora ambos os nossos resultados estão errados: como interpretei sua pergunta, nunca é possível ainda estar trabalhando em um ás quando todas as cartas estiverem esgotadas.

— whuber

7

Para uma simulação, é crucial estar correto e rápido. Ambos os objetivos sugerem a criação de código que tenha como alvo os principais recursos do ambiente de programação, bem como o mais curto e simples possível, porque a simplicidade confere clareza e a clareza promove a correção. Aqui está a minha tentativa de alcançar ambos em R:

#
# Simulate one play with a deck of `n` distinct cards in `k` suits.
#
sim <- function(n=13, k=4) {
  deck <- sample(rep(1:n, k)) # Shuffle the deck
  deck <- c(deck, 1:n)        # Add sentinels to terminate the loop
  k <- 0                      # Count the cards searched for
  for (j in 1:n) {
    k <- k+1                          # Count this card
    deck <- deck[-(1:match(j, deck))] # Deal cards until `j` is found
    if (length(deck) < n) break       # Stop when sentinels are reached
  }
  return(k)                   # Return the number of cards searched
}

A aplicação disso de maneira reproduzível pode ser feita com a replicatefunção após definir a semente do número aleatório, como em

> set.seed(17);  system.time(d <- replicate(10^5, sim(13, 4)))
   user  system elapsed 
   5.46    0.00    5.46

É lento, mas rápido o suficiente para realizar simulações bastante longas (e, portanto, precisas) repetidamente sem esperar. Existem várias maneiras de exibir o resultado. Vamos começar com a sua média:

> n <- length(d)
> mean(d)
[1] 5.83488

> sd(d) / sqrt(n)
[1] 0.005978956

O último é o erro padrão: esperamos que a média simulada esteja dentro de dois ou três SEs do valor verdadeiro. Isso coloca a verdadeira expectativa em algum lugar entre e $5.817$ $5.853$ .

Também podemos querer ver uma tabulação das frequências (e seus erros padrão). O código a seguir detalha um pouco a tabulação:

u <- table(d)
u.se <- sqrt(u/n * (1-u/n)) / sqrt(n)
cards <- c("A", "2", "3", "4", "5", "6", "7", "8", "9", "T", "J", "Q", "K")
dimnames(u) <- list(sapply(dimnames(u), function(x) cards[as.integer(x)]))
print(rbind(frequency=u/n, SE=u.se), digits=2)

Aqui está a saída:

                2       3      4      5      6      7       8       9       T       J       Q       K
frequency 0.01453 0.07795 0.1637 0.2104 0.1995 0.1509 0.09534 0.04995 0.02249 0.01009 0.00345 0.00173
SE        0.00038 0.00085 0.0012 0.0013 0.0013 0.0011 0.00093 0.00069 0.00047 0.00032 0.00019 0.00013

Como podemos saber se a simulação está correta? Uma maneira é testá-lo exaustivamente para problemas menores. Por esse motivo, este código foi escrito para atacar uma pequena generalização do problema, substituindo cartas distintas por e naipes por . No entanto, para o teste, é importante poder alimentar o código de um deck em uma ordem predeterminada. Vamos escrever uma interface ligeiramente diferente para o mesmo algoritmo: $13$ n $4$ k

draw <- function(deck) {
  n <- length(sentinels <- sort(unique(deck)))
  deck <- c(deck, sentinels)
  k <- 0
  for (j in sentinels) {
    k <- k+1
    deck <- deck[-(1:match(j, deck))]
    if (length(deck) < n) break
  }
  return(k)
}

(É possível usar drawno lugar de simqualquer lugar, mas o trabalho extra feito no início do processo drawtorna o dobro da velocidade sim.)

Podemos usar isso aplicando-o a todos os shuffles distintos de um determinado baralho. Como o objetivo aqui é apenas alguns testes pontuais, a eficiência na geração desses shuffles não é importante. Aqui está uma maneira rápida de força bruta:

n <- 4 # Distinct cards
k <- 2 # Number of suits
d <- expand.grid(lapply(1:(n*k), function(i) 1:n))
e <- apply(d, 1, function(x) var(tabulate(x))==0)
g <- apply(d, 1, function(x) length(unique(x))==n)
d <- d[e & g,]

Agora dé um quadro de dados cujas linhas contêm todos os shuffles. Aplique drawa cada linha e conte os resultados:

d$result <- apply(as.matrix(d), 1, draw)
    (counts <- table(d$result))

A saída (que usaremos momentaneamente em um teste formal) é

   2    3    4 
 420  784 1316

(O valor de é fácil de entender, a propósito: ainda estaríamos trabalhando no cartão se, e somente se, todos os dois precederem todos os ases. A chance de isso acontecer (com dois naipes) é $420$ $2$ . Dosembaralha distintas,têm essa propriedade.) $1/\binom{2+2}{2} = 1/6$ $2520$ $2520/6 = 420$

Podemos testar a saída com um teste qui-quadrado. Para esta finalidade, aplicar sim vezes para este caso de cartões distintas em ternos: $10,000$ $n = 4$ $k = 2$

>set.seed(17)
>d.sim <- replicate(10^4, sim(n, k))
>print((rbind(table(d.sim) / length(d.sim), counts / dim(d)[1])), digits=3)

         2     3     4
[1,] 0.168 0.312 0.520
[2,] 0.167 0.311 0.522

> chisq.test(table(d.sim), p=counts / dim(d)[1])

    Chi-squared test for given probabilities

data:  table(d.sim) 
X-squared = 0.2129, df = 2, p-value = 0.899

Como é tão alto, não encontramos diferença significativa entre o que diz e os valores calculados por enumeração exaustiva. Repetir este exercício para outros valores (pequenos) de e produz resultados comparáveis, dando-nos um amplo motivo para confiar quando aplicado a e . $p$ sim $n$ $k$ sim $n=13$ $k=4$

Por fim, um teste qui-quadrado de duas amostras comparará a saída de simcom a saída relatada em outra resposta:

>y <- c(1660,8414,16973,21495,20021,14549,8957,4546,2087,828,313,109)
>chisq.test(cbind(u, y))

data:  cbind(u, y) 
X-squared = 142.2489, df = 11, p-value < 2.2e-16

A enorme estatística qui-quadrado produz um valor-p que é essencialmente zero: sem dúvida, simdiscorda da outra resposta. Existem duas resoluções possíveis para a discordância: uma (ou ambas!) Dessas respostas está incorreta ou elas implementam interpretações diferentes da pergunta. Por exemplo, interpretei "depois que o baralho acabar" como após observar a última carta e, se permitido, atualizar o "número em que você estará" antes de terminar o procedimento. É concebível que o último passo não tenha sido feito. Talvez uma diferença tão sutil de interpretação explique a discordância; nesse ponto, podemos modificar a questão para tornar mais claro o que está sendo solicitado.

— whuber
fonte

4

Existe uma resposta exata (na forma de um produto de matriz, apresentada no ponto 4 abaixo). Existe um algoritmo razoavelmente eficiente para computá-lo, derivado dessas observações:

Um embaralhamento aleatório de cartões pode ser gerado ao embaralhar aleatoriamente cartões e, em seguida, intercalar aleatoriamente os restantes cartões dentro deles. $N+k$ $N$ $k$
Ao embaralhar apenas os ases e depois (aplicar a primeira observação) intercalar os dois, depois os três, e assim por diante, esse problema pode ser visto como uma cadeia de treze passos.
Precisamos acompanhar mais do que o valor do cartão que estamos buscando. Ao fazer isso, no entanto, não precisamos levar em consideração a posição da marca em relação a todas as cartas, mas apenas sua posição em relação às cartas de valor igual ou menor.

Imagine colocar uma marca no primeiro ás e depois marcar os dois primeiros encontrados depois dele, e assim por diante. (Se em algum momento o baralho acabar sem exibir a carta que estamos buscando, deixaremos todas as cartas sem marcação.) Deixe o "local" de cada marca (quando existir) seja o número de cartas de valor igual ou inferior a foram negociados quando a marca foi feita (incluindo o próprio cartão marcado). Os locais contêm todas as informações essenciais.
O local após a marca é um número aleatório. Para um determinado baralho, a sequência desses locais forma um processo estocástico. Na verdade, é um processo de Markov (com matriz de transição variável). Uma resposta exata pode, portanto, ser calculada a partir de doze multiplicações matriciais. $i^\text{th}$

Usando estas ideias, esta máquina obtém-se um valor de (computação no ponto flutuante de precisão dupla) em segundo. Esta aproximação do valor exato $5.8325885529019965$ $1/9$ é preciso para todos os dígitos mostrados.

\frac{1982600579265894785026945331968939023522542569}{339917784579447928182134345929899510000000000}

$\frac{1982600579265894785026945331968939023522542569}{339917784579447928182134345929899510000000000}$

O restante deste post fornece detalhes, apresenta uma implementação de trabalho (in R) e termina com alguns comentários sobre a questão e a eficiência da solução.

Gerando embaralhamento aleatório de um baralho

It is actually clearer conceptually and no more complicated mathematically to consider a "deck" (aka multiset) of $N = k_1+k_2+\cdots+k_m$ cards of which there are $k_1$ of the lowest denomination, $k_2$ of the next lowest, and so on. (The question as asked concerns the deck determined by the $13$ -vector $(4,4,\ldots,4)$ .)

A "random shuffle" of $N$ cards is one permutation taken uniformly and randomly from the $N! = N\times(N-1)\times\cdots\times 2\times 1$ permutations of the $N$ cards. These shuffles fall into groups of equivalent configurations because permuting the $k_1$ "aces" among themselves changes nothing, permuting the $k_2$ "twos" among themselves also changes nothing, and so on. Therefore each group of permutations that look identical when the suits of the cards are ignored contains $k_1!\times k_2!\times \cdots \times k_m!$

(\binom{N}{k_{1}, k_{2}, \dots, k_{m}}) = \frac{N!}{k_{1}! k_{2}! \dots k_{m}!},

$\binom{N}{k_1,k_2,\ldots,k_m} = \frac{N!}{k_1!k_2!\cdots k_m!},$

são chamados de "combinações" do baralho.

$k_1$ cards can form only $k_1!/k_1! = 1$ combination. They leave $k_1+1$ "slots" between and around them in which the next $k_2$ cards can be placed. We could indicate this with a diagram where " $*$ " designates one of the $k_1$ cards and " $\_$ " designates a slot that can hold between $0$ and $k_2$ additional cards:

\underset{k_{1} stars}{\underset{⏟}{_*_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{k_1\text{ stars}}$

When $k_2$ additional cards are interspersed, the pattern of stars and new cards partitions the $k_1+k_2$ cards into two subsets. The number of distinct such subsets is $\binom{k_1+k_2}{k_1,k_2} = \frac{(k_1+k_2)!}{k_1!k_2!}$ .

Repeating this procedure with $k_3$ "threes," we find there are $\binom{(k_1+k_2)+k_3}{k_1+k_2,k_3}= \frac{(k_1+k_2+k_3)!}{(k_1+k_2)!k_3!}$ ways to intersperse them among the first $k_1+k_2$ cards. Therefore the total number of distinct ways to arrange the first $k_1+k_2+k_3$ cards in this manner equals

1 \times \frac{(k_{1} + k_{2})!}{k_{1}! k_{2}!} \times \frac{(k_{1} + k_{2} + k_{3})!}{(k_{1} + k_{2})! k_{3}!} = \frac{(k_{1} + k_{2} + k_{3})!}{k_{1}! k_{2}! k_{3}!} .

$1\times\frac{(k_1+k_2)!}{k_1!k_2!}\times\frac{(k_1+k_2+k_3)!}{(k_1+k_2)!k_3!} = \frac{(k_1+k_2+k_3)!}{k_1!k_2!k_3!}.$

After finishing the last $k_n$ cards and continuing to multiply these telescoping fractions, we find that the number of distinct combinations obtained equals the total number of combinations as previously counted, $\binom{N}{k_1,k_2,\ldots,k_m}$ . Therefore we have overlooked no combinations. That means this sequential process of shuffling the cards correctly captures the probabilities of each combination, assuming that at each stage each possible distinct way of interspersing the new cards among the old is taken with uniformly equal probability.

The place process

Initially, there are $k_1$ aces and obviously the very first is marked. At later stages there are $n = k_1 + k_2 + \cdots + k_{j-1}$ cards, the place (if a marked card exists) equals $p$ (some value from $1$ through $n$ ), and we are about to intersperse $k=k_j$ cards around them. We can visualize this with a diagram like

\underset{p - 1 stars}{\underset{⏟}{_*_*_\dots_*_}} ⊙ \underset{n - p stars}{\underset{⏟}{_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{p-1\text{ stars}}\odot\underbrace{\_*\_\cdots\_*\_}_{n-p\text{ stars}}$

where " $\odot$ " designates the currently marked symbol. Conditional on this value of the place $p$ , we wish to find the probability that the next place will equal $q$ (some value from $1$ through $n+k$ ; by the rules of the game, the next place must come after $p$ , whence $q\ge p+1$ ). If we can find how many ways there are to intersperse the $k$ new cards in the blanks so that the next place equals $q$ , then we can divide by the total number of ways to intersperse these cards (equal to $\binom{n+k}{k}$ , as we have seen) to obtain the transition probability that the place changes from $p$ to $q$ . (There will also be a transition probability for the place to disappear altogether when none of the new cards follow the marked card, but there is no need to compute this explicitly.)

Let's update the diagram to reflect this situation:

\underset{p - 1 stars}{\underset{⏟}{_*_*_\dots_*_}} ⊙ \underset{s stars}{\underset{⏟}{* * \dots *}} | \underset{n - p - s stars}{\underset{⏟}{_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{p-1\text{ stars}}\odot\underbrace{**\cdots*}_{s\text{ stars}}\ \vert\ \underbrace{\_*\_\cdots\_*\_}_{n-p-s\text{ stars}}$

The vertical bar " $\vert$ " shows where the first new card occurs after the marked card: no new cards may therefore appear between the $\odot$ and the $\vert$ (and therefore no slots are shown in that interval). We do not know how many stars there are in this interval, so I have just called it $s$ (which may be zero) The unknown $s$ will disappear once we find the relationship between it and $q$ .

Suppose, then, we intersperse $j$ new cards around the stars before the $\odot$ and then--independently of that--we intersperse the remaining $k-j-1$ new cards around the stars after the $\vert$ . There are

τ_{n, k} (s, p) = (\binom{(p - 1) + j}{j}) (\binom{(n - p - s) + (k - j) - 1}{k - j - 1})

$\tau_{n,k}(s,p) = \binom{(p-1)+j}{j}\binom{(n-p-s) + (k-j)-1}{k-j-1}$

ways to do this. Notice, though--this is the trickiest part of the analysis--that the place of $\vert$ equals $p+s+j+1$ because

There are $p$ "old" cards at or before the mark.
There are $s$ old cards after the mark but before $\vert$ .
There are $j$ new cards before the mark.
There is the new card represented by $\vert$ itself.

Thus, $\tau_{n,k}(s,p)$ gives us information about the transition from place $p$ to place $q=p+s+j+1$ . When we track this information carefully for all possible values of $s$ , and sum over all these (disjoint) possibilities, we obtain the conditional probability of place $q$ following place $p$ ,

{Pr}_{n, k} (q | p) = (\sum_{j} (\binom{p - 1 + j}{j}) (\binom{n + k - q}{k - j - 1})) / (\binom{n + k}{k})

${\Pr}_{n,k}(q|p) = \left(\sum_j \binom{p-1+j}{j}\binom{n+k-q}{k-j-1}\right) / \binom{n+k}{k}$

where the sum starts at $j=\max(0, q-(n+1))$ and ends at $j=\min(k-1, q-(p+1)$ . (The variable length of this sum suggests there is unlikely to be a closed formula for it as a function of $n, k, q,$ and $p$ , except in special cases.)

The algorithm

Initially there is probability $1$ that the place will be $1$ and probability $0$ it will have any other possible value in $2, 3, \ldots, k_1$ . This can be represented by a vector $p_1 = (1, 0, \ldots, 0)$ .

After interspersing the next $k_2$ cards, the vector $p_1$ is updated to $p_2$ by multiplying it (on the left) by the transition matrix $(\Pr_{k_1,k_2}(q|p), 1\le p\le k_1, 1\le q\le k_2)$ . This is repeated until all $k_1+k_2+\cdots+k_m$ cards have been placed. At each stage $j$ , the sum of the entries in the probability vector $p_j$ is the chance that some card has been marked. Whatever remains to make the value equal to $1$ therefore is the chance that no card is left marked after step $j$ . The successive differences in these values therefore give us the probability that we could not find a card of type $j$ to mark: that is the probability distribution of the value of the card we were looking for when the deck runs out at the end of the game.

Implementation

The following R code implements the algorithm. It parallels the preceding discussion. First, calculation of the transition probabilities is performed by t.matrix (without normalization with the division by $\binom{n+k}{k}$ , making it easier to track the calculations when testing the code):

t.matrix <- function(q, p, n, k) {
  j <- max(0, q-(n+1)):min(k-1, q-(p+1))
  return (sum(choose(p-1+j,j) * choose(n+k-q, k-1-j))
}

This is used by transition to update $p_{j-1}$ to $p_j$ . It calculates the transition matrix and performs the multiplication. It also takes care of computing the initial vector $p_1$ if the argument p is an empty vector:

#
# `p` is the place distribution: p[i] is the chance the place is `i`.
#
transition <- function(p, k) {
  n <- length(p)
  if (n==0) {
    q <- c(1, rep(0, k-1))
  } else {
    #
    # Construct the transition matrix.
    #
    t.mat <- matrix(0, nrow=n, ncol=(n+k))
    #dimnames(t.mat) <- list(p=1:n, q=1:(n+k))
    for (i in 1:n) {
      t.mat[i, ] <- c(rep(0, i), sapply((i+1):(n+k), 
                                        function(q) t.matrix(q, i, n, k)))
    }
    #
    # Normalize and apply the transition matrix.
    #
    q <- as.vector(p %*% t.mat / choose(n+k, k))
  }
  names(q) <- 1:(n+k)
  return (q)
}

We can now easily compute the non-mark probabilities at each stage for any deck:

#
# `k` is an array giving the numbers of each card in order;
# e.g., k = rep(4, 13) for a standard deck.
#
# NB: the *complements* of the p-vectors are output.
#
game <- function(k) {
  p <- numeric(0)
  q <- sapply(k, function(i) 1 - sum(p <<- transition(p, i)))
  names(q) <- names(k)
  return (q)
}

Here they are for the standard deck:

k <- rep(4, 13)
names(k) <- c("A", 2:9, "T", "J", "Q", "K")
(g <- game(k))

The output is

         A          2          3          4          5          6          7          8          9          T          J          Q          K 
0.00000000 0.01428571 0.09232323 0.25595013 0.46786622 0.66819134 0.81821790 0.91160622 0.96146102 0.98479430 0.99452614 0.99818922 0.99944610

According to the rules, if a king was marked then we would not look for any further cards: this means the value of $0.9994461$ has to be increased to $1$ . Upon doing so, the differences give the distribution of the "number you will be on when the deck runs out":

> g[13] <- 1; diff(g)
          2           3           4           5           6           7           8           9           T           J           Q           K 
0.014285714 0.078037518 0.163626897 0.211916093 0.200325120 0.150026562 0.093388313 0.049854807 0.023333275 0.009731843 0.003663077 0.001810781

(Compare this to the output I report in a separate answer describing a Monte-Carlo simulation: they appear to be the same, up to expected amounts of random variation.)

The expected value is immediate:

> sum(diff(g) * 2:13)
[1] 5.832589

All told, this required only a dozen lines or so of executable code. I have checked it against hand calculations for small values of $k$ (up to $3$ ). Thus, if any discrepancy becomes apparent between the code and the preceding analysis of the problem, trust the code (because the analysis may have typographical errors).

Remarks

Relationships to other sequences

When there is one of each card, the distribution is a sequence of reciprocals of whole numbers:

> 1/diff(game(rep(1,10)))
[1]      2      3      8     30    144    840   5760  45360 403200

The value at place $i$ is $i! + (i-1)!$ (starting at place $i=1$ ). This is sequence A001048 in the Online Encyclopedia of Integer Sequences. Accordingly, we might hope for a closed formula for the decks with constant $k_i$ (the "suited" decks) that would generalize this sequence, which itself has some profound meanings. (For instance, it counts sizes of the largest conjugacy classes in permutation groups and is also related to trinomial coefficients.) (Unfortunately, the reciprocals in the generalization for $k\gt 1$ are not usually integers.)

The game as a stochastic process

Our analysis makes it clear that the initial $i$ coefficients of the vectors $p_j$ , $j\ge i$ , are constant. For example, let's track the output of game as it processes each group of cards:

> sapply(1:13, function(i) game(rep(4,i)))

[[1]]
[1] 0

[[2]]
[1] 0.00000000 0.01428571

[[3]]
[1] 0.00000000 0.01428571 0.09232323

[[4]]
[1] 0.00000000 0.01428571 0.09232323 0.25595013

...

[[13]]
 [1] 0.00000000 0.01428571 0.09232323 0.25595013 0.46786622 0.66819134 0.81821790 0.91160622 0.96146102 0.98479430 0.99452614 0.99818922 0.99944610

For instance, the second value of the final vector (describing the results with a full deck of 52 cards) already appeared after the second group was processed (and equals $1/\binom{8}{4}=1/70$ ). Thus, if you want information only about the marks up through the $j^\text{th}$ card value, you only have to perform the calculation for a deck of $k_1+k_2+\cdots+k_j$ cards.

Because the chance of not marking a card of value $j$ is getting quickly close to $1$ as $j$ increases, after $13$ types of cards in four suits we have almost reached a limiting value for the expectation. Indeed, the limiting value is approximately $5.833355$ (computed for a deck of $4 \times 32$ cards, at which point double precision rounding error prevents going any further).

Timing

Looking at the algorithm applied to the $m$ -vector $(k,k, \ldots, k)$ , we see its timing should be proportional to $k^2$ and--using a crude upper bound--not any worse than proportional to $m^3$ . By timing all calculations for $k=1$ through $7$ and $n=10$ through $30$ , and analyzing only those taking relatively long times ( $1/2$ second or longer), I estimate the computation time is approximately $O(k^2 n^{2.9})$ , supporting this upper-bound assessment.

One use of these asymptotics is to project calculation times for larger problems. For instance, seeing that the case $k=4, n=30$ takes about $1.31$ seconds, we would estimate that the (very interesting) case $k=1, n=100$ would take about $1.31(1/4)^2(100/30)^{2.9}\approx 2.7$ seconds. (It actually takes $2.87$ seconds.)

— whuber
fonte

0

Hacked a simple Monte Carlo in Perl and found approximately $5.8329$ .

#!/usr/bin/perl

use strict;

my @deck = (1..13) x 4;

my $N = 100000; # Monte Carlo iterations.

my $mean = 0;

for (my $i = 1; $i <= $N; $i++) {
    my @d = @deck;
    fisher_yates_shuffle(\@d);
    my $last = 0;
        foreach my $c (@d) {
        if ($c == $last + 1) { $last = $c }
    }
    $mean += ($last + 1) / $N;
}

print $mean, "\n";

sub fisher_yates_shuffle {
    my $array = shift;
        my $i = @$array;
        while (--$i) {
        my $j = int rand($i + 1);
        @$array[$i, $j] = @$array[$j, $i];
    }
}

— Zen
fonte

Given the sharp discrepancy between this and all the previous answers, including two simulations and a theoretical (exact) one, I suspect you are interpreting the question in a different way. In the absence of any explanation on your part, we just have to take it as being wrong. (I suspect you may be counting one less, in which case your 4.8 should be compared to 5.83258...; but even then, your two significant digits of precision provide no additional insight into this problem.)

— whuber

1

Yep! There was an off-by-one mistake.

— Zen