There are many sources of travel data that researchers wish to fit models to. So, we have designed a generalized data frame template to standardize travel data from various sources into a long-form format that is compatible with the modeling and simulation tools in this package. The travel_data_sim() object contains a simulated example to illustrate the structure of the data. This example data set contains simulated values of location information and observed number of trips among origin and destination locations and within home locations. The travel_data_template() object is an empty template that can be populated from scratch.

Since the long-form data structure is designed to accomodate different types of data, some columns may be left blank. For example, in a travel survey the rows may represent indivdiduals compared with call data records where the rows may represent total trip counts for an origin and destination.

In terms of spatial data, if your data contain coordinate locations down to administrative level 3, then level 4 and 5 can be left blank and the functions will ignore them. Likewise, if all administrative units are in the same country, then admin_0 can be left blank.

A simulated example

str(travel_data_sim)
#> 'data.frame':    46 obs. of  28 variables:
#>  $ date_start: Date, format: "2020-01-01" "2020-01-01" ...
#>  $ date_stop : Date, format: "2020-01-08" "2020-01-08" ...
#>  $ date_span : 'difftime' num  NA NA NA NA ...
#>   ..- attr(*, "units")= chr "days"
#>  $ indiv_id  : int  NA NA NA NA NA NA NA NA NA NA ...
#>  $ indiv_age : num  NA NA NA NA NA NA NA NA NA NA ...
#>  $ indiv_sex : logi  NA NA NA NA NA NA ...
#>  $ indiv_type: chr  NA NA NA NA ...
#>  $ orig_adm0 : chr  "A" "A" "A" "A" ...
#>  $ orig_adm1 : chr  "B" "B" "B" "B" ...
#>  $ orig_adm2 : chr  "O" "S" "N" "C" ...
#>  $ orig_adm3 : chr  NA NA NA NA ...
#>  $ orig_adm4 : chr  NA NA NA NA ...
#>  $ orig_adm5 : chr  NA NA NA NA ...
#>  $ orig_type : chr  "County" "County" "County" "County" ...
#>  $ orig_x    : num  -91.5 -89.4 -92.4 -89.1 -90.3 ...
#>  $ orig_y    : num  30.4 30.8 29.8 29.8 29.3 ...
#>  $ orig_pop  : num  6360 7515 2839 3961 609 ...
#>  $ dest_adm0 : chr  "A" "A" "A" "A" ...
#>  $ dest_adm1 : chr  "B" "B" "B" "B" ...
#>  $ dest_adm2 : chr  "G" "U" "L" "O" ...
#>  $ dest_adm3 : chr  NA NA NA NA ...
#>  $ dest_adm4 : chr  NA NA NA NA ...
#>  $ dest_adm5 : chr  NA NA NA NA ...
#>  $ dest_type : chr  "County" "County" "County" "County" ...
#>  $ dest_x    : num  -90.2 -89.2 -89.6 -86.4 -87.6 ...
#>  $ dest_y    : num  30.8 29.8 31.1 31.2 30.2 ...
#>  $ dest_pop  : num  4048 7355 9542 8603 7596 ...
#>  $ trips     : num  2 79 2 0 23 13 247 0 6 7 ...

Detailed variable descriptions

Variable Class Description
date_start date beginning of the time interval for the trip count
date_stop date end of the time interval for the trip count
date_span integer time span in days
indiv_id numeric unique individual identifier
indiv_age numeric age of participant
indiv_sex logical gender of participant
indiv_type character if individual participants belong to different groups
orig_adm0 character name of highest administration level of origin location (Country)
orig_adm1 character name of administration level 1 of origin location (e.g. Division, State)
orig_adm2 character name of administration level 2 of origin location (e.g. District, County)
orig_adm3 character name of administration level 3 of origin location (e.g. Sub-district, Province)
orig_adm4 character name of administration level 4 of origin location (e.g. City, Municipality)
orig_adm5 character name of administration level 5 of origin location (e.g. Town, Village, Community, Ward)
orig_type character administrative type for the origin location (e.g. sub-district, community vs town, or urban vs rural)
orig_x numeric longitude of origin location centroid in decimal degrees (centroid of smallest admin unit
orig_y numeric latitude of origin location centroid in decimal degrees (centroid of smallest admin unit)
orig_pop numeric population size of lowest administrative unit for origin location
dest_adm0 character name of highest administration level of destination location (Country)
dest_adm1 character name of administration level 1 of destination location (e.g. Division, State)
dest_adm2 character name of administration level 2 of destination location (e.g. District, County)
dest_adm3 character name of administration level 3 of destination location (e.g. Sub-district, Province)
dest_adm4 character name of administration level 4 of destination location (e.g. City, Municipality)
dest_adm5 character name of administration level 5 of destination location (e.g. Town, Village, Community, Ward)
dest_type character administrative type for the destination location (e.g. sub-district, community vs town, or urban vs rural)
dest_x numeric longitude of destination location in decimal degrees (centroid of smallest admin unit)
dest_y numeric latitude of destination location centroid in decimal degrees (centroid of smallest admin unit)
dest_pop numeric population size of lowest administrative unit for destination location
trips numeric total number of observed trips made from origin to destination during time span

Populating a travel data template from scratch

This data template can be populated by starting with the travel_data_template object and adding rows. The code below starts by adding information on trips from an origin to a destination.

# Travel among some locations
trip <- travel_data_template

n <- 30 # number of locations
trip[1:n,] <- NA # add rows for each location

# Time span of travel data
trip$date_start <- as.Date("2020-01-01")
trip$date_stop <- trip$date_start + 7
trip$date_span <- difftime(trip$date_stop, trip$date_start, units='days')

# Origin info: some counties within the same state
trip$orig_adm0 <- trip$dest_adm0 <- 'A' # Country
trip$orig_adm1 <- trip$dest_adm1 <- 'B' # State
trip$orig_adm2 <- sample(LETTERS, n, replace=T)
trip$dest_adm2 <- sample(LETTERS, n, replace=T)
trip$orig_type <- trip$dest_type <- 'County' # Type of admin unit for lowest admin level

# Some fake coordinates in decimal degrees
trip$orig_x <- rnorm(n, -90, 2)
trip$orig_y <- rnorm(n, 30, 1)
trip$dest_x <- rnorm(n, -90, 2)
trip$dest_y <- rnorm(n, 30, 1)

# Population sizes of the origins and destinations
trip$orig_pop <- rnbinom(n, size=5, mu=5000)
trip$dest_pop <- rnbinom(n, size=10, mu=10000)

trip$trips <- rnbinom(n, size=1, mu=100) # Number of reported trips
trip <- trip[!(trip$orig_adm2 == trip$dest_adm2),]

In some cases it may be easier to fill in stays (the number of trips within the origin or home location) in a different data frame and then merge the two.

# Stays in home location
stay <- travel_data_template
origins <- unique(c(trip$orig_adm2, trip$orig_adm2)) # all the
stay[1:length(origins),] <- NA

# Time span of travel survey
stay$date_start <- trip$date_start[1]
stay$date_stop <- trip$date_stop[1]
stay$date_span <- difftime(stay$date_stop, stay$date_start, units='days')

stay$orig_adm0 <- stay$dest_adm0 <- 'A' # Country
stay$orig_adm1 <- stay$dest_adm1 <- 'B' # State
stay$orig_adm2 <- stay$dest_adm2 <- origins
stay$orig_type <- stay$dest_type <- 'County'

for (i in 1:length(origins)) {

  sel <- which(trip$orig_adm2 == stay$orig_adm2[i])[1]
  stay$orig_x[i] <- stay$dest_x[i] <- trip$orig_x[sel]
  stay$orig_y[i] <- stay$dest_y[i] <- trip$orig_y[sel]
  stay$orig_pop[i] <- stay$dest_pop[i] <- trip$orig_pop[sel]
}

# Number of reported trip within home county
stay$trips <- rnbinom(length(origins), size=10, mu=1000)

# Combine trips and stays
suppressMessages(
  travel_data <- dplyr::full_join(trip, stay)
)

head(travel_data, n=3)
#>   date_start  date_stop date_span indiv_id indiv_age indiv_sex indiv_type
#> 1 2020-01-01 2020-01-08    7 days       NA        NA        NA       <NA>
#> 2 2020-01-01 2020-01-08    7 days       NA        NA        NA       <NA>
#> 3 2020-01-01 2020-01-08    7 days       NA        NA        NA       <NA>
#>   orig_adm0 orig_adm1 orig_adm2 orig_adm3 orig_adm4 orig_adm5 orig_type
#> 1         A         B         H      <NA>      <NA>      <NA>    County
#> 2         A         B         H      <NA>      <NA>      <NA>    County
#> 3         A         B         P      <NA>      <NA>      <NA>    County
#>      orig_x   orig_y orig_pop dest_adm0 dest_adm1 dest_adm2 dest_adm3 dest_adm4
#> 1 -88.60990 30.21513     3892         A         B         W      <NA>      <NA>
#> 2 -87.97855 29.39085     2787         A         B         D      <NA>      <NA>
#> 3 -89.88991 28.81821     4543         A         B         Q      <NA>      <NA>
#>   dest_adm5 dest_type    dest_x   dest_y dest_pop trips
#> 1      <NA>    County -89.78925 29.62802     9172    43
#> 2      <NA>    County -87.43692 29.95134     9833    41
#> 3      <NA>    County -89.75648 29.55376    10191   153