Uma coisa a ter em mente com a curva de sobrevivência de Kaplan-Meier é que ela é basicamente descritiva e não inferencial . É apenas uma função dos dados, com um modelo incrivelmente flexível que está por trás deles. Isso é uma força porque significa que praticamente não há suposições que possam ser quebradas, mas uma fraqueza porque é difícil generalizá-la e que se encaixa em "ruído" e "sinal". Se você deseja fazer uma inferência, basicamente precisa introduzir algo desconhecido que deseja saber.
Agora, uma maneira de comparar os tempos médios de sobrevivência é fazer as seguintes suposições:
- Tenho uma estimativa do tempo médio de sobrevivência ti para cada um dos estados, dada pela curva de Kaplan-Meier.i
- Espero que o verdadeiro tempo médio de sobrevivência, para ser igual a essa estimativa. E ( T i | t i ) = t iTiE(Ti|ti)=ti
- Estou 100% certo de que o verdadeiro tempo médio de sobrevivência é positivo. Pr(Ti>0)=1
Agora, a maneira "mais conservadora" de usar essas suposições é o princípio da entropia máxima, para que você obtenha:
p(Ti|ti)=Kexp(−λTi)
Onde e λ são escolhidos de forma que o PDF seja normalizado e o valor esperado seja t i . Agora temos:KλtEu
= K [ - e x p ( - λ T i )
1=∫∞0p(Ti|ti)dTi=K∫∞0exp(−λTi)dTi
e agora temos
E ( T i ) = 1=K[−exp(−λTi)λ]Ti=∞Ti=0=Kλ⟹K=λ
E(Ti)=1λ⟹λ=t−1i
And so you have a set of probability distributions for each state.
p(Ti|ti)=1tiexp(−Titi)(i=1,…,N)
Which give a joint probability distribution of:
p(T1,T2,…,TN|t1,t2,…,tN)=∏i=1N1tiexp(−Titi)
H0:T1=T2=⋯=TN=t¯t¯=1N∑Ni=1ti is the mean median survivial time. The severe alternative hypothesis to test against is the "every state is a unique and beautiful snowflake" hypothesis HA:T1=t1,…,TN=tN because this is the most likely alternative, and thus represents the information lost in moving to the simpler hypothesis (a "minimax" test). The measure of the evidence against the simpler hypothesis is given by the odds ratio:
O(HA|H0)=p(T1=t1,T2=t2,…,TN=tN|t1,t2,…,tN)p(T1=t¯,T2=t¯,…,TN=t¯|t1,t2,…,tN)
=[∏Ni=11ti]exp(−∑Ni=1titi)[∏Ni=11ti]exp(−∑Ni=1t¯ti)=exp(N[t¯tharm−1])
Where
tharm=[1N∑i=1Nt−1i]−1≤t¯
is the harmonic mean. Note that the odds will always favour the perfect fit, but not by much if the median survival times are reasonably close. Further, this gives you a direct way to state the evidence of this particular hypothesis test:
assumptions 1-3 give maximum odds of O(HA|H0):1 against equal median survival times across all states
Combine this with a decision rule, loss function, utility function, etc. which says how advantageous it is to accept the simpler hypothesis, and you've got your conclusion!
There is no limit to the amount of hypothesis you can test for, and give similar odds for. Just change H0 to specify a different set of possible "true values". You could do "significance testing" by choosing the hypothesis as:
HS,i:Ti=ti,Tj=T=t¯(i)=1N−1∑j≠itj
So this hypothesis is verbally "state i has different median survival rate, but all other states are the same". And then re-do the odds ratio calculation I did above. Although you should be careful about what the alternative hypothesis is. For any one of these below is "reasonable" in the sense that they might be questions you are interested in answering (and they will generally have different answers)
- my HA defined above - how much worse is HS,i compared to the perfect fit?
- my H0 defined above - how much better is HS,i compared to the average fit?
- a different HS,k - how much is state k "more different" compared to state i?
Now one thing which has been over-looked here is correlations between states - this structure assumes that knowing the median survival rate in one state tells you nothing about the median survival rate in another state. While this may seem "bad" it is not to difficult to improve on, and the above calculations are good initial results which are easy to calculate.
Adding connections between states will change the probability models, and you will effectively see some "pooling" of the median survival times. One way to incorporate correlations into the analysis is to separate the true survival times into two components, a "common part" or "trend" and an "individual part":
Ti=T+Ui
And then constrain the individual part Ui to have average zero over all units and unknown variance σ to be integrated out using a prior describing what knowledge you have of the individual variability, prior to observing the data (or jeffreys prior if you know nothing, and half cauchy if jeffreys causes problems).