| Title: | Download and Analyze Crash Data |
|---|---|
| Description: | Easily Download Analysis-Ready Crash Data from the U.S. National Highway Traffic Safety Administration. |
| Authors: | Steve Jackson [aut, cre] (ORCID: <https://orcid.org/0000-0002-3337-7846>) |
| Maintainer: | Steve Jackson <[email protected]> |
| License: | CC0 |
| Version: | 2.0.4 |
| Built: | 2026-06-06 06:15:37 UTC |
| Source: | https://github.com/s87jackson/rfars |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
alcohol(df)alcohol(df)
df |
The FARS or GESCRSS data object to be searched. |
Pre-computed annual crash counts from FARS (fatal crashes) and CRSS (general crash estimates) databases for 2015-2024, broken down by various risk factors and vulnerable road user categories.
annual_countsannual_counts
A tibble with 340 rows and 9 variables:
Year (2015-2024)
Month, if included in interval, as the three-letter abbreviation and an ordered factor (Jan=1, Feb=2, etc.)
Count unit - currently only "crashes"
Geographic scope - "all" for national-level data
Regional scope - "all" for national-level data
Urban/rural classification - "all" for combined data
Person type - "all" for all person types
Risk factor or crash type. Options include:
All crashes (general counts)
Each factor listed below, separately
Alcohol-involved crashes
Crashes involving bicyclists
Distracted driving crashes
Drug-involved crashes
Hit-and-run crashes
Large truck-involved crashes
Motorcycle crashes
Crashes involving older drivers
Crashes involving pedalcyclists
Pedestrian and bicyclist crashes combined
Pedestrian crashes
Police pursuit-related crashes
Roadway departure crashes
Rollover crashes
Speed-related crashes
Crashes involving young drivers
Count of crashes. FARS counts represent actual fatal crashes; CRSS counts represent weighted estimates of all crashes
This dataset provides quick access to national-level annual crash counts without needing to download and process the full datasets. It combines data from two NHTSA databases:
Fatal crashes (actual counts)
General crashes (weighted estimates)
The data can be reproduced using the counts() function on downloaded
FARS and CRSS data with involved = "any" and involved = "each"
parameters.
counts for generating custom counts from downloaded data
## Not run: # View total crashes over time by data source library(dplyr) library(ggplot2) annual_counts %>% filter(involved == "any") %>% ggplot(aes(x = year, y = n, fill = source)) + geom_col(position = "dodge") + labs(title = "Annual Crash Counts by Data Source", x = "Year", y = "Number of Crashes") # Compare risk factor trends in fatal crashes annual_counts %>% filter(source == "FARS", involved %in% c("alcohol", "speeding", "distracted driver")) %>% ggplot(aes(x = year, y = n, color = involved)) + geom_line() + labs(title = "Fatal Crash Trends by Risk Factor", x = "Year", y = "Fatal Crashes") ## End(Not run)## Not run: # View total crashes over time by data source library(dplyr) library(ggplot2) annual_counts %>% filter(involved == "any") %>% ggplot(aes(x = year, y = n, fill = source)) + geom_col(position = "dodge") + labs(title = "Annual Crash Counts by Data Source", x = "Year", y = "Number of Crashes") # Compare risk factor trends in fatal crashes annual_counts %>% filter(source == "FARS", involved %in% c("alcohol", "speeding", "distracted driver")) %>% ggplot(aes(x = year, y = n, color = involved)) + geom_line() + labs(title = "Fatal Crash Trends by Risk Factor", x = "Year", y = "Fatal Crashes") ## End(Not run)
(Internal) Append RDS files
appendRDS(object, file, wd)appendRDS(object, file, wd)
object |
The object to save or append |
file |
The name of the file to be saved to be saved |
wd |
The directory to check |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
bicyclist(df)bicyclist(df)
df |
The FARS or GESCRSS data object to be searched. |
Test if internet connection is available by attempting to reach a reliable host. This function is used to gracefully handle cases where internet resources are not available.
check_internet_connection()check_internet_connection()
Logical indicating whether internet connection is available
Compare counts generated by counts()
compare_counts( df, interval = c("year", "month")[1], what = c("crashes", "fatalities", "injuries", "people")[1], where = list(states = "all", region = c("all", "ne", "mw", "s", "w")[1], urb = c("all", "rural", "urban")[1]), who = c("all", "drivers", "passengers", "bicyclists", "pedestrians")[1], involved = NULL, what2 = what, where2 = where, who2 = who, involved2 = involved )compare_counts( df, interval = c("year", "month")[1], what = c("crashes", "fatalities", "injuries", "people")[1], where = list(states = "all", region = c("all", "ne", "mw", "s", "w")[1], urb = c("all", "rural", "urban")[1]), who = c("all", "drivers", "passengers", "bicyclists", "pedestrians")[1], involved = NULL, what2 = what, where2 = where, who2 = who, involved2 = involved )
df |
The input FARS object. |
interval |
The interval in which to count: months or years. |
what |
What to count: crashes, fatalities, or people involved. |
where |
Where to count, a list with up to three elements: states ("all" by default), region ("all"), urb ("all") |
who |
The type of person to count: all (default) drivers, passengers, pedestrians, or bicyclists. |
involved |
Factors involved with the crash. Can be any of: distracted driver, police pursuit, motorcycle, pedalcyclist, bicyclist, pedestrian, pedbike, young driver, older driver, speeding, alcohol, drugs, hit and run, roadway departure, rollover, or large trucks. |
what2 |
Comparison point for 'what' (set to 'what' unless specified). |
where2 |
Comparison point for 'where' (set to 'where' unless specified). |
who2 |
Comparison point for 'who' (set to 'who' unless specified). |
involved2 |
Comparison point for 'involved' (set to 'involved' unless specified). |
A tibble of counts.
## Not run: compare_counts( get_fars(years = 2020, states="Virginia"), where = list(urb="rural"), where2 = list(urb="urban") ) ## End(Not run)## Not run: compare_counts( get_fars(years = 2020, states="Virginia"), where = list(urb="rural"), where2 = list(urb="urban") ) ## End(Not run)
Use FARS or GES/CRSS data to generate commonly requested counts.
counts( df, what = c("crashes", "fatalities", "injuries", "people")[1], interval = c("year", "month")[1], where = list(states = "all", region = c("all", "ne", "mw", "s", "w")[1], urb = c("all", "rural", "urban")[1]), who = c("all", "drivers", "passengers", "bicyclists", "pedestrians")[1], involved = c("any", "each", "alcohol", "bicyclist", "distracted driver", "drugs", "hit and run", "large trucks", "motorcycle", "older driver", "pedalcyclist", "pedbike", "pedestrian", "police pursuit", "roadway departure", "rollover", "speeding", "young driver")[1], filterOnly = FALSE )counts( df, what = c("crashes", "fatalities", "injuries", "people")[1], interval = c("year", "month")[1], where = list(states = "all", region = c("all", "ne", "mw", "s", "w")[1], urb = c("all", "rural", "urban")[1]), who = c("all", "drivers", "passengers", "bicyclists", "pedestrians")[1], involved = c("any", "each", "alcohol", "bicyclist", "distracted driver", "drugs", "hit and run", "large trucks", "motorcycle", "older driver", "pedalcyclist", "pedbike", "pedestrian", "police pursuit", "roadway departure", "rollover", "speeding", "young driver")[1], filterOnly = FALSE )
df |
The input data object (must be of class 'FARS' or 'GESCRSS' as is produced by get_fars() and get_gescrss()). |
what |
What to count: crashes (the default), fatalities, injuries, or people involved. |
interval |
The interval in which to count: months or years (the default). |
where |
Where to count. Must be a list with any of the elements: states (can be 'all', full or abbreviated state names, or FIPS codes), region ('all', 'ne', 'mw', 's', or 'w'; short for northeast, midwest, south, and west), urb ('all', 'rural', or 'urban'). Any un-specified elements are set to 'all' by default. |
who |
The type of person to count: 'all' (default) 'drivers', 'passengers', 'pedestrians', or 'bicyclists'. |
involved |
Factors involved with the crash: 'any' (the default, produces general counts), 'each' (produces separate counts for each factor), 'distracted driver', 'police pursuit', 'motorcycle', 'pedalcyclist', 'bicyclist', 'pedestrian', 'pedbike', 'young driver', 'older driver', 'speeding','alcohol', 'drugs', 'hit and run', 'roadway departure', 'rollover', or 'large trucks'. |
filterOnly |
Logical, whether to only filter data or reduce to counts (FALSE by default). |
Either a filtered tibble (filterOnly=TRUE) or a tibble of counts (filterOnly=FALSE). If filterOnly=TRUE, the tibble that is returned is the 'flat' tibble from the input FARS object, filtered according to other parameters.
If 'df' is a GESCRSS object, the counts returned are the sum of the appropriate weights.
## Not run: counts(get_fars(years = 2019), where = list(states="Virginia", urb="rural")) ## End(Not run)## Not run: counts(get_fars(years = 2019), where = list(states="Virginia", urb="rural")) ## End(Not run)
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
distracted_driver(df)distracted_driver(df)
df |
The FARS or GESCRSS data object to be searched. |
Download files from NHTSA, unzip, and prepare them.
download_fars(years, dest_raw, dest_prepd, states)download_fars(years, dest_raw, dest_prepd, states)
years |
Years to be downloaded, in yyyy (character or numeric formats) |
dest_raw |
Directory to store raw CSV files |
dest_prepd |
Directory to store prepared CSV files |
states |
(Optional) Inherits from get_fars() |
Raw files are downloaded from NHTSA.
Nothing directly to the current environment. Various CSV files are stored either in a temporary directory or dir as specified by the user.
Download files from NHTSA, unzip, and prepare them.
download_gescrss(years, dest_raw, dest_prepd, regions)download_gescrss(years, dest_raw, dest_prepd, regions)
years |
Years to be downloaded, in yyyy (character or numeric formats) |
dest_raw |
Directory to store raw CSV files |
dest_prepd |
Directory to store prepared CSV files |
regions |
(Optional) Inherits from get_gescrss() |
Raw files are downloaded directly from NHTSA.
Nothing directly to the current environment. Various CSV files are stored either in a temporary directory or dir as specified by the user.
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
driver_age(df, age_min, age_max)driver_age(df, age_min, age_max)
df |
The FARS or GESCRSS data object to be searched. |
age_min |
Lower bound on driver age (inclusive). |
age_max |
Upper bound on driver age (inclusive). |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
drugs(df)drugs(df)
df |
The FARS or GESCRSS data object to be searched. |
A table describing each FARS variable name, value, and corresponding value label.
fars_codebookfars_codebook
A data frame with 15,951 rows and 19 variables:
The source of the data (either FARS or GES/CRSS).
The data file that contains the given variable.
The original name of the data element.
The modified data element name used in rfars
The label of the data element itself (not its constituent values).
The data element's definition, pulled from the Analytical User Manual.
Additional information on the data element, pulled from the Analytical User Manual.
The original value of the data element.
The de-coded value label.
Indicator: 1 if valid for 2015, NA otherwise.
Indicator: 1 if valid for 2016, NA otherwise.
Indicator: 1 if valid for 2017, NA otherwise.
Indicator: 1 if valid for 2018, NA otherwise.
Indicator: 1 if valid for 2019, NA otherwise.
Indicator: 1 if valid for 2020, NA otherwise.
Indicator: 1 if valid for 2021, NA otherwise.
Indicator: 1 if valid for 2022, NA otherwise.
Indicator: 1 if valid for 2023, NA otherwise.
Indicator: 1 if valid for 2024, NA otherwise.
This codebook serves as a useful reference for researchers using FARS data. The 'source' variable is intended to help combine with the gescrss_codebook. Data elements are relatively stable but are occasionally discontinued, created anew, or modified. The 'year' variable helps indicate the availability of data elements, and differentiates between different definitions over time. Users should always check for discontinuities when tabulating cases.
The 'file' variable indicates the file in which the given data element originally appeared. Here, files refers to the SAS files downloaded from NHTSA. Most data elements stayed in their original file. Those that did not were moved to the multi_ files. For example, 'weather' originates from the 'accident' file, but appears in the multi_acc data object created by rfars.
The 'name_ncsa' variable describes the data element's name as assigned by NCSA (the organization within NHTSA that manages the database). To maximize compatibility between years and ease of use for programming, 'name_rfars' provides a cleaned naming convention (via janitor::clean_names()).
Each data element has a 'label', a more human-readable version of the element names. For example, the label for 'road_fnc' is 'Roadway Function Class'. These are not definitions but may provide enough information to help users conduct their analysis. Consult the Analytical User’s Manual for definitions and further details.
'Definition' and 'Additional Information' were extracted from the Analytical User’s Manual.
Each data element has multiple 'value'-'value_label' pairs: 'value' represents the original, non-human-readable value (usually a number), and 'value_label' represents the corresponding text value. For example, for 'road_fnc', 1 (the 'value') corresponds to 'Rural-Principal Arterial-Interstate' (the 'value_label'), 2 corresponds to 'Rural-Principal Arterial-Other', etc.
@source Codebooks are automatically generated by extracting SAS format catalogs (.sas7bcat files) and VALUE statements from .sas files during data processing, then consolidating variable names, labels, and value-label mappings across all years into searchable reference tables. Source files are published by NHTSA and available here.
"gescrss_codebook"
head(rfars::fars_codebook)head(rfars::fars_codebook)
A dataset providing different ways to refer to states and counties.
geo_relationsgeo_relations
A data frame with 3,142 rows and 6 variables:
2-digit FIPS code indicating a state
3-digit FIPS code indicating a county within a state
6-digit FIPS code indicating a tract within a county
2-character, capitalized state abbreviation
fully spelled and case-sensitive state name
abbreviated county name (usually minus the word 'County')
fully spelled and case-sensitive county name
fully spelled out and case-sensitive NHTSA region and constituent states
abbreviated NHTSA region (ne, mw, s, w)
https://www.census.gov/geographies/reference-files/2015/demo/popest/2015-fips.html
A table describing each GESCRSS variable name, value, and corresponding value label.
gescrss_codebookgescrss_codebook
A data frame with 34,662 rows and 8 variables:
The source of the data (either FARS or GESCRSS).
The data file that contains the given variable.
The original name of the data element.
The modified data element name used in rfars
The label of the data element itself (not its constituent values).
The data element's definition, pulled from the Analytical User Manual
Additional information on the data element, pulled from the Analytical User Manual.
The original value of the data element.
The de-coded value label.
Indicator: 1 if valid for 2015, NA otherwise.
Indicator: 1 if valid for 2016, NA otherwise.
Indicator: 1 if valid for 2017, NA otherwise.
Indicator: 1 if valid for 2018, NA otherwise.
Indicator: 1 if valid for 2019, NA otherwise.
Indicator: 1 if valid for 2020, NA otherwise.
Indicator: 1 if valid for 2021, NA otherwise.
Indicator: 1 if valid for 2022, NA otherwise.
Indicator: 1 if valid for 2023, NA otherwise.
Indicator: 1 if valid for 2024, NA otherwise.
This codebook serves as a useful reference for researchers using GES/CRSS data. The 'source' variable is intended to help combine with the fars_codebook. Data elements are relatively stable but are occasionally discontinued, created anew, or modified. The 'year' variable helps indicate the availability of data elements, and differentiates between different definitions over time. Users should always check for discontinuities when tabulating cases.
The 'file' variable indicates the file in which the given data element originally appeared. Here, files refers to the SAS files downloaded from NHTSA. Most data elements stayed in their original file. Those that did not were moved to the multi_ files. For example, 'weather' originates from the 'accident' file, but appears in the multi_acc data object created by rfars.
The 'name_ncsa' variable describes the data element's name as assigned by NCSA (the organization within NHTSA that manages the database). To maximize compatibility between years and ease of use for programming, 'name_rfars' provides a cleaned naming convention (via janitor::clean_names()).
Each data element has a 'label', a more human-readable version of the element names. For example, the label for 'harm_ev' is 'First Harmful Event'. These are not definitions but may provide enough information to help users conduct their analysis. Consult the CRSS User Manual for definitions and further details.
'Definition' and 'Additional Information' were extracted from the Analytical User’s Manual.
Each data element has multiple 'value'-'value_label' pairs: 'value' represents the original, non-human-readable value (usually a number), and 'value_label' represents the corresponding text value. For example, for 'harm_ev', 1 (the 'value') corresponds to 'Rollover/Overturn' (the 'value_label'), 2 corresponds to 'Fire/Explosion', etc.
@source Codebooks are automatically generated by extracting SAS format catalogs (.sas7bcat files) and VALUE statements from .sas files during data processing, then consolidating variable names, labels, and value-label mappings across all years into searchable reference tables. Source files are published by NHTSA and available here.
"fars_codebook"
head(rfars::gescrss_codebook)head(rfars::gescrss_codebook)
Bring FARS data into the current environment, whether by downloading it anew or by using pre-existing files.
get_fars( years = 2015:2024, states = NULL, source = c("zenodo", "nhtsa")[1], proceed = FALSE, dir = NULL, cache = NULL )get_fars( years = 2015:2024, states = NULL, source = c("zenodo", "nhtsa")[1], proceed = FALSE, dir = NULL, cache = NULL )
years |
Years to be downloaded, in yyyy (character or numeric formats, defaults to last 10 years). |
states |
States to keep. Leave as NULL (the default) to keep all states. Can be specified as full state name (e.g. "Virginia"), abbreviation ("VA"), or FIPS code (51). |
source |
The source of the data: 'zenodo' (the default) pulls the prepared dataset from Zenodo, 'nhtsa' pulls the raw files from NHTSA's FTP site and prepares them on your machine. 'zenodo' is much faster and provides the same dataset produced by using source='nhtsa' but is limited to the most recent 10 years of data. |
proceed |
Logical, whether or not to proceed with downloading files without asking for user permission (defaults to FALSE, thus asking permission) |
dir |
Directory in which to search for or save a 'FARS data' folder. If NULL (the default), files are downloaded and unzipped to temporary directories and prepared in memory. Required if cache is specified. |
cache |
The name of an RDS file to save or use (e.g., 'myFARS.rds'). If the file exists in 'dir', it will be returned directly. If not, data will be downloaded and an RDS file of this name will be saved in 'dir'. Requires 'dir' to be specified. |
This function provides the FARS database for the specified years and states. By default, it pulls from a Zenodo repository for speed and memory efficiency. It can also pull the raw files from NHTSA and process them in memory, or use an RDS file saved on your machine.
If source = 'nhtsa' and no directory (dir) is specified, SAS files are downloaded into a tempdir(), where they are also prepared, combined, and then brought into the current environment. If you specify a directory (dir), the function will look there for a 'FARS data' folder. If not found, it will be created and populated with raw and prepared SAS and RDS files, otherwise the function makes sure all requested years are present and asks permission to download any missing years.
The object returned is a list with class 'FARS'. It contains six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
Flat files are wide-formatted and presented at the person level. All crashes involve at least one motor vehicle, each of which may contain one or multiple people. These are the three entities of crash data. The flat files therefore repeat some data elements across multiple rows. Please conduct your analysis with your entity in mind.
Some data elements can include multiple values for any data level
(e.g., multiple weather conditions corresponding to the crash, or multiple
crash factors related to vehicle or person). These elements have been
collected in the yyyy_multi_[acc/veh/per].rds files in long format.
These files contain crash, vehicle, and person identifiers, and two
variables labelled name and value. These correspond to
variable names from the raw data files and the corresponding values,
respectively.
The events tibble provides a sequence of events for all vehicles involved in the crash. See Crash Sequences vignette for an example.
Finally, the codebook tibble serves as a searchable codebook for all files of any given year.
Please review the FARS Analytical User's Manual
A FARS data object (list of six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook), described below.
## Not run: # Use defaults to get 10 years of national data myFARS <- get_fars() # Get latest year of data myFARS <- get_fars(2023) # Get data for one state myFARS <- get_fars(states = "VA") ## End(Not run)## Not run: # Use defaults to get 10 years of national data myFARS <- get_fars() # Get latest year of data myFARS <- get_fars(2023) # Get data for one state myFARS <- get_fars(states = "VA") ## End(Not run)
Bring GES/CRSS data into the current environment, whether by downloading it anew or by using pre-existing files.
get_gescrss( years = 2015:2024, regions = c("mw", "ne", "s", "w"), source = c("zenodo", "nhtsa")[1], proceed = FALSE, dir = NULL, cache = NULL )get_gescrss( years = 2015:2024, regions = c("mw", "ne", "s", "w"), source = c("zenodo", "nhtsa")[1], proceed = FALSE, dir = NULL, cache = NULL )
years |
Years to be downloaded, in yyyy (character or numeric formats, defaults to last 10 years). |
regions |
(Optional) Regions to keep: mw=midwest, ne=northeast, s=south, w=west. |
source |
The source of the data: 'zenodo' (the default) pulls the prepared dataset from Zenodo, 'nhtsa' pulls the raw files from NHTSA's FTP site and prepares them on your machine. 'zenodo' is much faster and provides the same dataset produced by using source='nhtsa'. |
proceed |
Logical, whether or not to proceed with downloading files without asking for user permission (defaults to FALSE, thus asking permission) |
dir |
Directory in which to search for or save a 'GESCRSS data' folder. If NULL (the default), files are downloaded and unzipped to temporary directories and prepared in memory. Required if cache is specified. |
cache |
The name of an RDS file to save or use (e.g., 'myCRSS.rds'). If the file exists in 'dir', it will be returned directly. If not, data will be downloaded and an RDS file of this name will be saved in 'dir'. Requires 'dir' to be specified. |
This function provides the GES/CRSS database for the specified years and regions By default, it pulls from a Zenodo repository for speed and memory efficiency. It can also pull the raw files from NHTSA and process them in memory, or use an RDS file saved on your machine.
If source = 'nhtsa' and no directory (dir) is specified, SAS files are downloaded into a tempdir(), where they are also prepared, combined, and then brought into the current environment. If you specify a directory (dir), the function will look there for a 'GESCRSS data' folder. If not found, it will be created and populated with raw and prepared SAS and RDS files, otherwise the function makes sure all requested years are present and asks permission to download any missing years.
The object returned is a list with class 'GESCRSS'. It contains six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
Flat files are wide-formatted and presented at the person level. All crashes involve at least one motor vehicle, each of which may contain one or multiple people. These are the three entities of crash data. The flat files therefore repeat some data elements across multiple rows. Please conduct your analysis with your entity in mind.
Some data elements can include multiple values for any data level
(e.g., multiple weather conditions corresponding to the crash, or multiple
crash factors related to vehicle or person). These elements have been
collected in the yyyy_multi_[acc/veh/per].rds files in long format.
These files contain crash, vehicle, and person identifiers, and two
variables labelled name and value. These correspond to
variable names from the raw data files and the corresponding values,
respectively.
The events tibble provides a sequence of events for all vehicles involved in the crash. See Crash Sequences vignette for an example.
The codebook tibble serves as a searchable codebook for all files of any given year.
Please review the CRSS Analytical User's Manual
Regions are as follows: mw = Midwest = OH, IN, IL, MI, WI, MN, ND, SD, NE, IA, MO, KS ne = Northeast = PA, NJ, NY, NH, VT, RI, MA, ME, CT s = South = MD, DE, DC, WV, VA, KY, TN, NC, SC, GA, FL, AL, MS, LA, AR, OK, TX w = West = MT, ID, WA, OR, CA, NV, NM, AZ, UT, CO, WY, AK, HI
A GESCRSS data object (a list with six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook).
## Not run: # Use defaults to get 10 years of national data myCRSS <- get_gescrss() # Get latest year of data myCRSS <- get_gescrss(2023) # Get data for one region myCRSS <- get_gescrss(regions = "s") ## End(Not run)## Not run: # Use defaults to get 10 years of national data myCRSS <- get_gescrss() # Get latest year of data myCRSS <- get_gescrss(2023) # Get data for one region myCRSS <- get_gescrss(regions = "s") ## End(Not run)
(Internal) Check SAS attributes
get_sas_attrs(data)get_sas_attrs(data)
data |
An object produced by haven::read_sas() |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
hit_and_run(df)hit_and_run(df)
df |
The FARS or GESCRSS data object to be searched. |
An internal function that imports the multi_ files
import_multi(filename, where)import_multi(filename, where)
filename |
The filename (e.g. "multi_acc.csv") to be imported |
where |
The directory to search within |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
large_trucks(df)large_trucks(df)
df |
The FARS or GESCRSS data object to be searched. |
(Internal) Make id and year numeric
make_all_numeric(df)make_all_numeric(df)
df |
The input dataframe |
(Internal) Generate an ID variable
make_id(df)make_id(df)
df |
The dataframe from which to make the id |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
motorcycle(df)motorcycle(df)
df |
The FARS or GESCRSS data object to be searched. |
(Internal) Parse formats.sas instead of using a .sas7bcat file
parse_sas_format(file_path)parse_sas_format(file_path)
file_path |
The path of the formats.sas file |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
pedalcyclist(df)pedalcyclist(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
pedbike(df)pedbike(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
pedestrian(df)pedestrian(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
police_pursuit(df)police_pursuit(df)
df |
The FARS or GESCRSS data object to be searched. |
Prepare downloaded FARS files for use
prep_fars(y, wd, rawfiles, prepared_dir, states)prep_fars(y, wd, rawfiles, prepared_dir, states)
y |
year, to be passed from |
wd |
working directory, , to be passed from |
rawfiles |
dataframe translating filenames into standard terms,
to be passed from |
prepared_dir |
the location where prepared files will be saved,
to be passed from |
states |
(Optional) Inherits from get_fars() |
Produces six files: yyyy_flat.rds, yyyy_multi_acc.rds, yyyy_multi_veh.rds, yyyy_multi_per.rds, yyyy_events.rds, and codebook.rds
Prepare downloaded GES/CRSS files for use
prep_gescrss(y, wd, rawfiles, prepared_dir, regions)prep_gescrss(y, wd, rawfiles, prepared_dir, regions)
y |
year, to be passed from |
wd |
working directory, , to be passed from |
rawfiles |
dataframe translating filenames into standard terms,
to be passed from |
prepared_dir |
the location where prepared files will be saved,
to be passed from |
regions |
(Optional) Inherits from get_gescrss() |
Produces six files: yyyy_flat.rds, yyyy_multi_acc.rds, yyyy_multi_veh.rds, yyyy_multi_per.rds, yyyy_events.rds, and codebook.rds
(Internal) Takes care of basic SAS file reading
read_basic_sas(x, wd, rawfiles, catfile, imps = NULL, omits = NULL)read_basic_sas(x, wd, rawfiles, catfile, imps = NULL, omits = NULL)
x |
The cleaned name of the data table (SAS7BDAT). |
wd |
The working directory for these files |
rawfiles |
The data frame connecting raw filenames to cleaned ones. |
catfile |
The location of the sas7bcat file |
imps |
A named list to be passed to use_imp(). Each item's name represents the non-imputed variable name; the item itself represents the related imputed variable. |
omits |
Character vector of columns to omit |
read_basic_sas_nocat
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
road_depart(df)road_depart(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
rollover(df)rollover(df)
df |
The FARS or GESCRSS data object to be searched. |
These internal functions take the FARS object created by use_fars and look for various cases, such as distracted or drowsy drivers.
speeding(df)speeding(df)
df |
The FARS or GESCRSS data object to be searched. |
Compile multiple years of prepared FARS data.
use_fars(dir, prepared_dir, cache)use_fars(dir, prepared_dir, cache)
dir |
Inherits from get_fars(). |
prepared_dir |
Inherits from get_fars(). |
cache |
Inherits from get_fars(). |
Returns an object of class 'FARS' which is a list of six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
Compile multiple years of prepared GESCRSS data.
use_gescrss(dir, prepared_dir, cache)use_gescrss(dir, prepared_dir, cache)
dir |
Inherits from get_gescrss(). |
prepared_dir |
Inherits from get_gescrss(). |
cache |
Inherits from get_gescrss(). |
Returns an object of class 'GESCRSS' which is a list of six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
An internal function that uses imputed variables (present in many GES/CRSS tables)
use_imp(df, original, imputed, show = FALSE)use_imp(df, original, imputed, show = FALSE)
df |
The input data frame. |
original |
The original, non-imputed variable. |
imputed |
The imputed variable (often with an _im suffix). |
show |
Logical (FALSE by default) Show differences between original and imputed values. |
(Internal) Validate user-provided list of states
validate_states(states)validate_states(states)
states |
States specified in get_fars, prep_fars, or counts |