Monday, August 5, 2013

Increment-decrement life-tables

The increment-decrement life-table appears to be a birth-death life-table or more generally an open-population life-table.

This could be appropriate for the hospital acquired infection length of stay calculations because the risk set in the infected state changes over time as new patients become infected and infected patients die in-hospital or are discharged alive.

So potentially we can calculate a life-table for the infected state and a separate life-table for the non-infected state, and then make a comparison.

This would require us to account for censoring.

AFT and LoS

I realised the connection between the hospital length of stay estimation that I've been working on and the microlife work of David Speigelhalter.

At the heart of his approach is the life-table. These give the probability of reaching your next birthday. So that conditional on reaching age x what is the the chance that you don't die before you reach age x+1. Using life-table its possible to calculate an expected lifetime which is the residual remaining life.

It easy to see the connection with expected LoS time in hospital conditional on being in hospital on day x.

He uses an accelerated failure time model (AFT) which is an alternative to the very commonly used Cox proportional hazard model. Whereas the Cox model give the ratio of hazards for infected and non-infected i.e. a change in rate for each day, the AFT gives the amount of time difference for the two groups. That is, for a given percentile (e.g. the median) what are the times at which they occur. For example, when both samples have a 50% survival the difference in time may be x days.

The problem with the AFT model is that you need to specify a failure time distribution unlike the semi-parametric Cox model.

Speigelhalter, when looking at mortality uses the Gompertz distribution because after 30 years old the hazard is exponentially shaped, i.e. the log-transformed data is linear.

The hospital data is not so nicely behaved, unfortunately.

Using the Gompertz distribution an approximation of the remaining lifetime is derived in terms of hazards. Therefore, with a proportional hazard coefficient the difference in lifetime can be estimated.


Tuesday, April 23, 2013

30 day mystery

There's some interesting but at the moment unexplained behaviour for the premature children in my dataset. The top plot shows a weighting which is derived from the number of patients that leave the admission state to either infected or discharged/death. Because there are so few infections and deaths this is basically the rate at which patients are discharged. We can see that it starts off high and tails off, which is the behaviour that we would expect to see. However, at about 30 days or one month there a sudden peak and then the trailing off again. This seems to be an enforced discharging of premature patients at this time. This could we be that the discharge is not to their homes but to a different hospital ward.

We can clearly see this 30 day mode in the plot of discharge times for all premature cases.

 It turns out that for the patients with discharge description "The usual place of residence, including no fixed abode" at 30 days the number of patients discharged jumps from 42 to 122 and only gets back down to the same level as before after 53 days from admission.

So its not that premature children are moved to another ward after this time as suspected.


x <- mix.data[codesIndicators.mix$prem==T,]
View(x)
hist(as.numeric(x$dischargedate-x$admissiondate), breaks=200, xlim=c(0,100))
table(x$disdestdescription, x$dischargedate-x$admissiondate)

This looks like a mixture distribution so I investigated if a subset of the codes I use to group the premature cases are responsible for the early behaviour and another subset for the later behaviour.

Monday, March 18, 2013

Partial Map


Whilst travelling in to work this morning and thinking about the day ahead, I was struck by a connection between the two things. The London tube map is designed to represent the order of stations on a line but dispenses of the geographical distance in order to present the ordering information in the clearest way.

Cox regression using proportional hazards is calculated using what is known as a partial likelihood. Partial because only the information about the order of the events is used in the likelihood calculation and not the actual times of the events.

So, a partial likelihood is a bit like the London underground map. I suppose the map is an extension of this since there are multiple lines that cross one another but meet at the correct stations; a 2D partial likelihood perhaps?

Tuesday, March 12, 2013

Melt and join

Following my last post on rearranging and managing arrays, I've started to make use of the reshape package in R. The key functions are melt() and cast(). The melt() function rearranges an array into a long format where ids are duplicated for different covariate values. That means that all of the entries for a given patient are repeated with all of the information that was spread across multiple columns, one for each variable, put into only two; a variable and value column. Once the data is in this form you can then go ahead and cast the data in what ever `shape' you want. In the example below, I wanted to flag patients according to a group membership determined by if they had certain codes in one of their diagnosis fields. I used melt to collapse down the multiple diagnosis columns into a variable (what diagnosis field) and value (code) column pairing. In this case, the diagnosis number is not important. Then, I matched the codes to groups using a look-up table (like in Excel) and the join() function. I gather join() and merge() are similar but that's just the one I happened to use. I finally reconstructed the original array but now with the patient groupings.

Wednesday, February 6, 2013

Matching and linking spell in hospital

I want to identify when 2 spells for a patient stop and start on the same date and then treat these as a single spell in hospital rather than two separate ones. So the intermediate discharge time is dropped and the observed hospital stay is the longer sum of both stays. The problem with how I have codes this at the moment is that for over 600,000 records or episodes this takes about a day to run. There are several loops in the script that I would like to avoid but can't think of a cleverer way to do this.