Title: | Download and Analyze Crash Data |
---|---|
Description: | Download crash data from the National Highway Traffic Safety Administration and prepare it for research. |
Authors: | Steve Jackson [aut, cre] |
Maintainer: | Steve Jackson <[email protected]> |
License: | CC0 |
Version: | 1.2.0 |
Built: | 2025-03-02 03:36:39 UTC |
Source: | https://github.com/s87jackson/rfars |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
alcohol(df)
alcohol(df)
df |
The FARS or GESCRSS data object to be searched. |
(Internal) Append RDS files
appendRDS(object, file, wd)
appendRDS(object, file, wd)
object |
The object to save or append |
file |
The name of the file to be saved to be saved |
wd |
The directory to check |
(Internal) Label unlabelled values in imported SAS files
auto_label_unlabeled_values(lbl_vector, wd = wd, x = x, varname)
auto_label_unlabeled_values(lbl_vector, wd = wd, x = x, varname)
lbl_vector |
A vector with labels |
wd |
Working directory for files |
x |
NCSA table name (sas file name) |
varname |
Variable name or label |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
bicyclist(df)
bicyclist(df)
df |
The FARS or GESCRSS data object to be searched. |
Compare counts generated by counts()
compare_counts( df, interval = c("year", "month")[1], what = c("crashes", "fatalities", "injuries", "people")[1], where = list(states = "all", region = c("all", "ne", "mw", "s", "w")[1], urb = c("all", "rural", "urban")[1]), who = c("all", "drivers", "passengers", "bicyclists", "pedestrians")[1], involved = NULL, what2 = what, where2 = where, who2 = who, involved2 = involved )
compare_counts( df, interval = c("year", "month")[1], what = c("crashes", "fatalities", "injuries", "people")[1], where = list(states = "all", region = c("all", "ne", "mw", "s", "w")[1], urb = c("all", "rural", "urban")[1]), who = c("all", "drivers", "passengers", "bicyclists", "pedestrians")[1], involved = NULL, what2 = what, where2 = where, who2 = who, involved2 = involved )
df |
The input FARS object. |
interval |
The interval in which to count: months or years. |
what |
What to count: crashes, fatalities, or people involved. |
where |
Where to count, a list with up to three elements: states ("all" by default), region ("all"), urb ("all") |
who |
The type of person to count: all (default) drivers, passengers, pedestrians, or bicyclists. |
involved |
Factors involved with the crash. Can be any of: distracted driver, police pursuit, motorcycle, pedalcyclist, bicyclist, pedestrian, pedbike, young driver, older driver, speeding, alcohol, drugs, hit and run, roadway departure, rollover, or large trucks. |
what2 |
Comparison point for 'what' (set to 'what' unless specified). |
where2 |
Comparison point for 'where' (set to 'where' unless specified). |
who2 |
Comparison point for 'who' (set to 'who' unless specified). |
involved2 |
Comparison point for 'involved' (set to 'involved' unless specified). |
A tibble of counts.
## Not run: compare_counts( get_fars(years = 2020, states="Virginia"), where = list(urb="rural"), where2 = list(urb="urban") ) ## End(Not run)
## Not run: compare_counts( get_fars(years = 2020, states="Virginia"), where = list(urb="rural"), where2 = list(urb="urban") ) ## End(Not run)
Use FARS or GES/CRSS data to generate commonly requested counts.
counts( df, what = c("crashes", "fatalities", "injuries", "people")[1], interval = c("year", "month")[1], where = list(states = "all", region = c("all", "ne", "mw", "s", "w")[1], urb = c("all", "rural", "urban")[1]), who = c("all", "drivers", "passengers", "bicyclists", "pedestrians")[1], involved = NULL, filterOnly = FALSE )
counts( df, what = c("crashes", "fatalities", "injuries", "people")[1], interval = c("year", "month")[1], where = list(states = "all", region = c("all", "ne", "mw", "s", "w")[1], urb = c("all", "rural", "urban")[1]), who = c("all", "drivers", "passengers", "bicyclists", "pedestrians")[1], involved = NULL, filterOnly = FALSE )
df |
The input data object (must be of class 'FARS' or 'GESCRSS' as is produced by get_fars() and get_gescrss()). |
what |
What to count: crashes (the default), fatalities, injuries, or people involved. |
interval |
The interval in which to count: months or years (the default). |
where |
Where to count. Must be a list with any of the elements: states (can be 'all', full or abbreviated state names, or FIPS codes), region ('all', 'ne', 'mw', 's', or 'w'; short for northeast, midwest, south, and west), urb ('all', 'rural', or 'urban'). Any un-specified elements are set to 'all' by default. |
who |
The type of person to count: 'all' (default) 'drivers', 'passengers', 'pedestrians', or 'bicyclists'. |
involved |
Factors involved with the crash. Can be any of: 'distracted driver', 'police pursuit', 'motorcycle', 'pedalcyclist', 'bicyclist', 'pedestrian', 'pedbike', 'young driver', 'older driver', 'speeding', 'alcohol', 'drugs', 'hit and run', 'roadway departure', 'rollover', or 'large trucks'. NULL by default. |
filterOnly |
Logical, whether to only filter data or reduce to counts (FALSE by default). |
Either a filtered tibble (filterOnly=TRUE) or a tibble of counts (filterOnly=FALSE). If filterOnly=TRUE, the tibble that is returned is the 'flat' tibble from the input FARS object, filtered according to other parameters.
If 'df' is a GESCRSS object, the counts returned are the sum of the appropriate weights.
## Not run: counts(get_fars(years = 2019), where = list(states="Virginia", urb="rural")) ## End(Not run)
## Not run: counts(get_fars(years = 2019), where = list(states="Virginia", urb="rural")) ## End(Not run)
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
distracted_driver(df)
distracted_driver(df)
df |
The FARS or GESCRSS data object to be searched. |
Download files from NHTSA, unzip, and prepare them.
download_fars(years, dest_raw, dest_prepd, states)
download_fars(years, dest_raw, dest_prepd, states)
years |
Years to be downloaded, in yyyy (character or numeric formats) |
dest_raw |
Directory to store raw CSV files |
dest_prepd |
Directory to store prepared CSV files |
states |
(Optional) Inherits from get_fars() |
Raw files are downloaded from NHTSA.
Nothing directly to the current environment. Various CSV files are stored either in a temporary directory or dir as specified by the user.
Download files from NHTSA, unzip, and prepare them.
download_gescrss(years, dest_raw, dest_prepd, regions)
download_gescrss(years, dest_raw, dest_prepd, regions)
years |
Years to be downloaded, in yyyy (character or numeric formats) |
dest_raw |
Directory to store raw CSV files |
dest_prepd |
Directory to store prepared CSV files |
regions |
(Optional) Inherits from get_gescrss() |
Raw files are downloaded directly from NHTSA.
Nothing directly to the current environment. Various CSV files are stored either in a temporary directory or dir as specified by the user.
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
driver_age(df, age_min, age_max)
driver_age(df, age_min, age_max)
df |
The FARS or GESCRSS data object to be searched. |
age_min |
Lower bound on driver age (inclusive). |
age_max |
Upper bound on driver age (inclusive). |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
drugs(df)
drugs(df)
df |
The FARS or GESCRSS data object to be searched. |
A table describing each FARS variable name, value, and corresponding value label.
fars_codebook
fars_codebook
A data frame with 132,454 rows and 8 variables:
The source of the data (either FARS or GES/CRSS)
Years of the data element definition.
The data file that contains the given variable.
The original name of the data element.
The modified data element name used in rfars
The label of the data element itself (not its constituent values).
The original value of the data element.
The de-coded value label.
This codebook serves as a useful reference for researchers using FARS data. The 'source' variable is intended to help combine with the gescrss_codebook. Data elements are relatively stable but are occasionally discontinued, created anew, or modified. The 'year' variable helps indicate the availability of data elements, and differentiates between different definitions over time. Users should always check for discontinuities when tabulating cases.
The 'file' variable indicates the file in which the given data element originally appeared. Here, files refers to the SAS files downloaded from NHTSA. Most data elements stayed in their original file. Those that did not were moved to the multi_ files. For example, 'weather' originates from the 'accident' file, but appears in the multi_acc data object created by rfars.
The 'name_ncsa' variable describes the data element's name as assigned by NCSA (the organization within NHTSA that manages the database). To maximize compatibility between years and ease of use for programming, 'name_rfars' provides a cleaned naming convention (via janitor::clean_names()). Both names are provided here to help users find the corresponding entry in the Analytical User’s Manual but only the latter are used in the data produced by get_fars().
Each data element has a 'label', a more human-readable version of the element names. For example, the label for 'road_fnc' is 'Roadway Function Class'. These are not definitions but may provide enough information to help users conduct their analysis. Consult the Analytical User’s Manual for definitions and further details.
Each data element has multiple 'value'-'value_label' pairs: 'value' represents the original, non-human-readable value (usually a number), and 'value_label' represents the corresponding text value. For example, for 'road_fnc', 1 (the 'value') corresponds to 'Rural-Principal Arterial-Interstate' (the 'value_label'), 2 corresponds to 'Rural-Principal Arterial-Other', etc.
"gescrss_codebook"
A dataset providing different ways to refer to states and counties.
geo_relations
geo_relations
A data frame with 3,142 rows and 6 variables:
2-digit FIPS code indicating a state
3-digit FIPS code indicating a county within a state
6-digit FIPS code indicating a tract within a county
2-character, capitalized state abbreviation
fully spelled and case-sensitive state name
abbreviated county name (usually minus the word 'County')
fully spelled and case-sensitive county name
fully spelled out and case-sensitive NHTSA region and constituent states
abbreviated NHTSA region (ne, mw, s, w)
https://www.census.gov/geographies/reference-files/2015/demo/popest/2015-fips.html
A table describing each GESCRSS variable name, value, and corresponding value label.
gescrss_codebook
gescrss_codebook
A data frame with 85,907 rows and 8 variables:
The source of the data (either FARS or GESCRSS)
Years of the data element definition.
The data file that contains the given variable.
The original name of the data element.
The modified data element name used in rfars
The label of the data element itself (not its constituent values).
The original value of the data element.
The de-coded value label.
This codebook serves as a useful reference for researchers using GES/CRSS data. The 'source' variable is intended to help combine with the fars_codebook. Data elements are relatively stable but are occasionally discontinued, created anew, or modified. The 'year' variable helps indicate the availability of data elements, and differentiates between different definitions over time. Users should always check for discontinuities when tabulating cases.
The 'file' variable indicates the file in which the given data element originally appeared. Here, files refers to the SAS files downloaded from NHTSA. Most data elements stayed in their original file. Those that did not were moved to the multi_ files. For example, 'weather' originates from the 'accident' file, but appears in the multi_acc data object created by rfars.
The 'name_ncsa' variable describes the data element's name as assigned by NCSA (the organization within NHTSA that manages the database). To maximize compatibility between years and ease of use for programming, 'name_rfars' provides a cleaned naming convention (via janitor::clean_names()). Both names are provided here to help users find the corresponding entry in the CRSS User Manual but only the latter are used in the data produced by get_gescrss().
Each data element has a 'label', a more human-readable version of the element names. For example, the label for 'harm_ev' is 'First Harmful Event'. These are not definitions but may provide enough information to help users conduct their analysis. Consult the CRSS User Manual for definitions and further details.
Each data element has multiple 'value'-'value_label' pairs: 'value' represents the original, non-human-readable value (usually a number), and 'value_label' represents the corresponding text value. For example, for 'harm_ev', 1 (the 'value') corresponds to 'Rollover/Overturn' (the 'value_label'), 2 corresponds to 'Fire/Explosion', etc.
"fars_codebook"
Bring FARS data into the current environment, whether by downloading it anew or by using pre-existing files.
get_fars( years = 2011:2022, states = NULL, dir = NULL, proceed = FALSE, cache = NULL )
get_fars( years = 2011:2022, states = NULL, dir = NULL, proceed = FALSE, cache = NULL )
years |
Years to be downloaded, in yyyy (character or numeric formats), currently limited to 2011-2021 (the default). |
states |
States to keep. Leave as NULL (the default) to keep all states. Can be specified as full state name (e.g. "Virginia"), abbreviation ("VA"), or FIPS code (51). |
dir |
Directory in which to search for or save a 'FARS data' folder. If NULL (the default), files are downloaded and unzipped to temporary directories and prepared in memory. |
proceed |
Logical, whether or not to proceed with downloading files without asking for user permission (defaults to FALSE, thus asking permission) |
cache |
The name of an RDS file to save or use. If the specified file (e.g., 'myFARS.rds') exists in 'dir' it will be returned; if not, an RDS file of this name will be saved in 'dir' for quick use in subsequent calls. |
This function downloads raw data from NHTSA. If no directory (dir) is specified, SAS files are downloaded into a tempdir(), where they are also prepared, combined, and then brought into the current environment. If you specify a directory (dir), the function will look there for a 'FARS data' folder. If not found, it will be created and populated with raw and prepared SAS and RDS files. If the directory is found, the function makes sure all requested years are present and asks permission to download any missing years.
The object returned is a list with class 'FARS'. It contains six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
Flat files are wide-formatted and presented at the person level. All crashes involve at least one motor vehicle, each of which may contain one or multiple people. These are the three entities of crash data. The flat files therefore repeat some data elements across multiple rows. Please conduct your analysis with your entity in mind.
Some data elements can include multiple values for any data level
(e.g., multiple weather conditions corresponding to the crash, or multiple
crash factors related to vehicle or person). These elements have been
collected in the yyyy_multi_[acc/veh/per].rds files in long format.
These files contain crash, vehicle, and person identifiers, and two
variables labelled name
and value
. These correspond to
variable names from the raw data files and the corresponding values,
respectively.
The events tibble provides a sequence of events for all vehicles involved in the crash. See Crash Sequences vignette for an example.
Finally, the codebook tibble serves as a searchable codebook for all files of any given year.
Please review the FARS Analytical User's Manual
A FARS data object (list of six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook), described below.
## Not run: myFARS <- get_fars(years = 2021, states = "VA") ## End(Not run)
## Not run: myFARS <- get_fars(years = 2021, states = "VA") ## End(Not run)
Bring GES/CRSS data into the current environment, whether by downloading it anew or by using pre-existing files.
get_gescrss( years = 2011:2022, regions = c("mw", "ne", "s", "w"), dir = NULL, proceed = FALSE, cache = NULL )
get_gescrss( years = 2011:2022, regions = c("mw", "ne", "s", "w"), dir = NULL, proceed = FALSE, cache = NULL )
years |
Years to be downloaded, in yyyy (character or numeric formats), currently limited to 2011-2021. |
regions |
(Optional) Regions to keep: mw=midwest, ne=northeast, s=south, w=west. |
dir |
Directory in which to search for or save a 'GESCRSS data' folder. If NULL (the default), files are downloaded and unzipped to temporary directories and prepared in memory. |
proceed |
Logical, whether or not to proceed with downloading files without asking for user permission (defaults to FALSE, thus asking permission) |
cache |
The name of an RDS file to save or use. If the specified file (e.g., 'myFARS.rds') exists in 'dir' it will be returned; if not, an RDS file of this name will be saved in 'dir' for quick use in subsequent calls. |
This function downloads raw data from the GES and CRSS crash databases. If no directory (dir) is specified, raw CSV files are downloaded into a tempdir(), where they are also prepared, combined, and then brought into the current environment. If you specify a directory (dir), the function will look there for a 'GESCRSS data' folder. If not found, it will be created and populated with raw and prepared SAS and RDS files. If the directory is found, the function makes sure all requested years are present and asks permission to download any missing years.
The object returned is a list with class 'GESCRSS'. It contains six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
Flat files are wide-formatted and presented at the person level. All crashes involve at least one motor vehicle, each of which may contain one or multiple people. These are the three entities of crash data. The flat files therefore repeat some data elements across multiple rows. Please conduct your analysis with your entity in mind.
Some data elements can include multiple values for any data level
(e.g., multiple weather conditions corresponding to the crash, or multiple
crash factors related to vehicle or person). These elements have been
collected in the yyyy_multi_[acc/veh/per].rds files in long format.
These files contain crash, vehicle, and person identifiers, and two
variables labelled name
and value
. These correspond to
variable names from the raw data files and the corresponding values,
respectively.
The events tibble provides a sequence of events for all vehicles involved in the crash. See Crash Sequences vignette for an example.
The codebook tibble serves as a searchable codebook for all files of any given year.
Please review the CRSS Analytical User's Manual
Regions are as follows: mw = Midwest = OH, IN, IL, MI, WI, MN, ND, SD, NE, IA, MO, KS ne = Northeast = PA, NJ, NY, NH, VT, RI, MA, ME, CT s = South = MD, DE, DC, WV, VA, KY, TN, NC, SC, GA, FL, AL, MS, LA, AR, OK, TX w = West = MT, ID, WA, OR, CA, NV, NM, AZ, UT, CO, WY, AK, HI
A GESCRSS data object (a list with six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook).
## Not run: myGESCRSS <- get_gescrss(years = 2021, regions = "s") ## End(Not run)
## Not run: myGESCRSS <- get_gescrss(years = 2021, regions = "s") ## End(Not run)
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
hit_and_run(df)
hit_and_run(df)
df |
The FARS or GESCRSS data object to be searched. |
An internal function that imports the multi_ files
import_multi(filename, where)
import_multi(filename, where)
filename |
The filename (e.g. "multi_acc.csv") to be imported |
where |
The directory to search within |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
large_trucks(df)
large_trucks(df)
df |
The FARS or GESCRSS data object to be searched. |
(Internal) Make id and year numeric
make_all_numeric(df)
make_all_numeric(df)
df |
The input dataframe |
(Internal) Generate an ID variable
make_id(df)
make_id(df)
df |
The dataframe from which to make the id |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
motorcycle(df)
motorcycle(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
pedalcyclist(df)
pedalcyclist(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
pedbike(df)
pedbike(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
pedestrian(df)
pedestrian(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
police_pursuit(df)
police_pursuit(df)
df |
The FARS or GESCRSS data object to be searched. |
Prepare downloaded FARS files for use
prep_fars(y, wd, rawfiles, prepared_dir, states)
prep_fars(y, wd, rawfiles, prepared_dir, states)
y |
year, to be passed from |
wd |
working directory, , to be passed from |
rawfiles |
dataframe translating filenames into standard terms,
to be passed from |
prepared_dir |
the location where prepared files will be saved,
to be passed from |
states |
(Optional) Inherits from get_fars() |
Produces six files: yyyy_flat.rds, yyyy_multi_acc.rds, yyyy_multi_veh.rds, yyyy_multi_per.rds, yyyy_events.rds, and codebook.rds
Prepare downloaded GES/CRSS files for use
prep_gescrss(y, wd, rawfiles, prepared_dir, regions)
prep_gescrss(y, wd, rawfiles, prepared_dir, regions)
y |
year, to be passed from |
wd |
working directory, , to be passed from |
rawfiles |
dataframe translating filenames into standard terms,
to be passed from |
prepared_dir |
the location where prepared files will be saved,
to be passed from |
regions |
(Optional) Inherits from get_gescrss() |
Produces six files: yyyy_flat.rds, yyyy_multi_acc.rds, yyyy_multi_veh.rds, yyyy_multi_per.rds, yyyy_events.rds, and codebook.rds
(Internal) Takes care of basic SAS file reading
read_basic_sas( x, wd, rawfiles, catfile = paste0(wd, "formats.sas7bcat"), imps = NULL, omits = NULL )
read_basic_sas( x, wd, rawfiles, catfile = paste0(wd, "formats.sas7bcat"), imps = NULL, omits = NULL )
x |
The cleaned name of the data table (SAS7BDAT). |
wd |
The working directory for these files |
rawfiles |
The data frame connecting raw filenames to cleaned ones. |
catfile |
The location of the sas7bcat file |
imps |
A named list to be passed to use_imp(). Each item's name represents the non-imputed variable name; the item itself represents the related imputed variable. |
omits |
Character vector of columns to omit |
read_basic_sas_nocat
(Internal) Takes care of basic SAS file reading when the bcat file creates an issue
read_basic_sas_nocat(x, wd, rawfiles, imps = NULL, omits = NULL)
read_basic_sas_nocat(x, wd, rawfiles, imps = NULL, omits = NULL)
x |
The cleaned name of the data table (SAS7BDAT). |
wd |
The working directory for these files |
rawfiles |
The data frame connecting raw filenames to cleaned ones. |
imps |
A named list to be passed to use_imp(). Each item's name represents the non-imputed variable name; the item itself represents the related imputed variable. |
omits |
Character vector of columns to omit |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
road_depart(df)
road_depart(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
rollover(df)
rollover(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
speeding(df)
speeding(df)
df |
The FARS or GESCRSS data object to be searched. |
Compile multiple years of prepared FARS data.
use_fars(dir, prepared_dir, cache)
use_fars(dir, prepared_dir, cache)
dir |
Inherits from get_fars(). |
prepared_dir |
Inherits from get_fars(). |
cache |
Inherits from get_fars(). |
Returns an object of class 'FARS' which is a list of six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
Compile multiple years of prepared GESCRSS data.
use_gescrss(dir, prepared_dir, cache)
use_gescrss(dir, prepared_dir, cache)
dir |
Inherits from get_gescrss(). |
prepared_dir |
Inherits from get_gescrss(). |
cache |
Inherits from get_gescrss(). |
Returns an object of class 'GESCRSS' which is a list of six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
An internal function that uses imputed variables (present in many GES/CRSS tables)
use_imp(df, original, imputed, show = FALSE)
use_imp(df, original, imputed, show = FALSE)
df |
The input data frame. |
original |
The original, non-imputed variable. |
imputed |
The imputed variable (often with an _im suffix). |
show |
Logical (FALSE by default) Show differences between original and imputed values. |
(Internal) Validate user-provided list of states
validate_states(states)
validate_states(states)
states |
States specified in get_fars, prep_fars, or counts |