Research blog: October 2012

Monday, October 29, 2012

Pre-headlines

I'm now a fully signed-up contributor to the Before the Headlines team. The idea is to support journalists in getting the right science stories out there in the best way. That means nipping unsupported, fanciful claims in the bud and getting the messages across in a clear and accurate way, but without boringafying/boringising.

More thoughts on the expected excess LOS

I realised that the expected LOS at a given time $s$, denoted $E[T|s]$, is the same as the life expectancy I'm more used to in life tables. The most common $s$ in this case is life expectancy at birth $E[T|s=0]$, which would be like stay expectancy at admission in the HCAI model.

Of course, life expectancy can be estimated from any age. Your life expectancy changes at you get older by the simple fact that you've already survived up to that point, so your life expectancy at 60 is probably more than your life expectancy at 0.

What we are interested in for the excess LOS measure is average difference LOS over all times in hospital. This is equivalent to the average life expectancy over all ages i.e. we don't know what age some person is so what would our best guess of how long they'll live be. Since a population is not evenly distributed over ages it would make sense to weight the more likely ages more heavily and the less likely ages less so. [I think this is called the overall life expectancy and is closely related to the all-age-all-cause mortality.]

The excess LOS is similar but slightly different. In this case, we're looking at the difference in 2 holding times- the infected "life expectancy" and the uninfected "life expectancy". Equivalently to the life table context, we want to weight the times at which there are more individuals in the states of interest- not simply the alive state but the infected and non-infected states.

So we prefer times with high probability of being in state 0 and high probability of being in state 1, Put another way, we prefer times with low probability of being in state 2, the death/discharge state.

The probability of being in state 0 is simply the survival probability of not having left it before some time.

The probability of being in state 1 at $s$ is the probability of having jumped to it at some time before $s$ and not having left.

What might be easier is to think in terms of 1-P(being in state 2 at time $s$) since this is a sink state, this is the probability of having entered it at an time up to time $s$ only i.e. a c.d.f.

Tuesday, October 23, 2012

Hospital Length of Stay (LOS)

The main equation in this work is

$ f(s) = E[T|X_s=1] - E[T|X_s=0] $ (*).

This is the difference between the expected time between admission and discharge/death, $T$, given infected at time $s$ and given not infected at time $s$.

This means that those individuals who are not infected could be so in the future. This is a snap shot of the case-control split in the sample at time $s$, so it doesn't tell us about what will happen after this. Those in state 0 at time $s$ may become infected for some time in the future or they may not pass go and jump straight to the sink state.

Intuitively, if we think about this in a latent time/counterfactual way then when patients are still in state 0, even if they do jump to state 1 (infected) later, they still would have jumped to the sink at a time after that.

By setting up like this we include the holding time in state 0 prior to either an infection or death/discharge as a non-infection length of stay time, and so not biasing the infection LOS times by not accounting for the two-way causality.

A weighting game

Eqn (*) is averaged over all $s$ to give an expected excess length of stay. The question is how to choose the weightings. It is suggested to weight the days when there are more infections more heavily or when there are more jumps out of state 0, regardless of whether to infection of death/discharge. By weighting in this way more emphasis is placed on the excess LOS on days when more happens, That is to say that when there are larger changes in the state populations and risks set then the difference in LOS is more influential on the estimate. Intuitively, this makes sense since otherwise we will count days when there is little or not change in the system.

For example, the times when transitions from 0 to 1 occur are the times before which the jumping individuals and those that remain in state 0 have been in state 0 together. That is, they have the same history (filtration) up to that time, say $s$ and then diverge at that time. So a comparison on the LOS between these two groups is a comparison accounting for the uninfected time too i.e. time-dependent. Conversely, the times at which transitions from 0 to 2 occur are those individuals that do not have an associated other group who transition from 0 to 1. So at this time $s$ we're are cleaning-up the sample to remove individuals that aren't helpful in the comparison between the infected and non-infected groups.

This rational is done probabilistically over the continuous variable $s$, rather than at discrete time points used above. This approximation could be useful for checking though.

The excess LOS is a weighted mean estimate of the separate LOS for each $s$. If we think about this as a sample size problem, we would place more weight on the larger samples and less on the smaller sample sizes. In essence, this is really placing emphasis on the points that contain more information. In the LOS context this would correspond to placing more weight on the times at which there has just been a transition to state 1, the infected state. Obviously, the infected individuals are most likely to be in this state at the beginning of their holding time.

Beyersmann also includes the times at which there has been a transition from 0 to 2, the death/discharge state. This is a removal of individuals that no longer contribute to the case-control comparison. To me this is a less obvious thing to use in the weights.

If we think about it for a countable set of uniformly spaced $s$, then the proportion of the interval an interval $[0,T]$ comprising of admission time and the proportion comprising on infection time will determine the influence of $T$ on the non-infected and infected LOS respectively.

Now, it we position the times $s$ non-uniformly, so that they are closer to the times when there are more transitions and further from the times where there are fewer transitions then we will pick-up more of the detail and fidelity of the process.

As the days progress the espected LOS for both infected and not infected will obviously increase. But the probability of having left state 0 will increase as the population continues to diminishes and be absorbed in state 2, sincehe survival function is monotonically decreasing.

Tuesday, October 9, 2012

Instantaneous measures

I've been reading about Influence functions,

$${d}/{d ε}(t((1-ε)F+ε I_{[y,∞)}))_{ε=0}$$.

These are used to quantify the influence a given data point has on a statistic $t$.

I was thinking about this as an instantaneous rate in the same sense as a hazard function.

In the limiting notation we can see how they are both types of averages across an increment and then the increment is decreased to approach 0 from above to give the derivative in that direction at a point.

So the influence function is an average difference between the statistic of interest between the 2 distributions (one being a mixture distribution). The hazard rate is an average probability of transitioning within the time interval.

The influence function is slightly different to the hazard rate because it includes a wieghted sum of cdfs to sum to 1 in the functional.