Back to Explore

heart_data_pipeline

public
main

AI Architecture

Sign in to run a focused analysis on specific parts of this codebase.

Sign in

Trigger

User executes R Markdown document with specified data release folder

1
ProcessingR

Import SAF datasets

Loads multiple SRTR SAF files from the specified release folder in SAS format including cand_thor, stathist_thor, tx_hr, justification forms, and risk stratification data.

read_saslist.files
havenbase
heart_data_pipeline.Rmd
2
ProcessingR

Convert date-time variables

Converts Change_Dt variables from date-time format to simple date format for consistency across datasets.

as.Datemutate
dplyrlubridate
heart_data_pipeline.Rmd
3
ProcessingR

Remove invalid JustId entries

Filters out entries from risk stratification data that lack a valid JustId identifier.

filteris.na
dplyr
heart_data_pipeline.Rmd
4
ProcessingR

Deduplicate justification forms

Filters justification forms to keep only one form per candidate per day, retaining the form submitted last when multiple exist for the same date.

group_byfiltermax
dplyr
heart_data_pipeline.Rmd
5
ProcessingR

Join datasets using PX_ID

Performs multiple joins to establish PX_ID as the common linking variable across all imported datasets including cand_thor, stathist_thor, tx_hr, and justification forms.

left_joininner_joinfull_join
dplyr
heart_data_pipeline.Rmd
6
ProcessingR

Remove irrelevant variables

Removes columns deemed irrelevant including those ending with 'St', starting with 'CandHist', 'SenData', 'CurrTher', ending with 'Type' or 'Perf'.

selectstarts_withends_with
dplyr
heart_data_pipeline.Rmd
7
ProcessingR

Apply listing date filter

Excludes candidates listed prior to the specified filter_date (default January 1, 2010).

filtermdy
dplyrlubridate
heart_data_pipeline.Rmd
8
Parallel
ProcessingR

Apply age filter

Excludes candidates with age less than 18 at the time of listing.

filter
dplyr
heart_data_pipeline.Rmd
ProcessingR

Apply organ type filter

Excludes lung-only transplant recipients and optionally excludes multi-organ recipients based on include_multi_organ_recipients parameter.

filter
dplyr
heart_data_pipeline.Rmd
9
ProcessingR

Remove high-missingness variables

Removes variables from cand_thor and tx_hr that exceed the missingness_threshold percentage of missing values.

selectwhereis.namean
dplyr
heart_data_pipeline.Rmd
10
ProcessingR

Pivot cand_thor and stathist_thor to long form

Transforms cand_thor and stathist_thor datasets from wide to long form by pivoting all date columns, creating unique_date and unique_event variables for each date-related event.

pivot_longerends_with
tidyr
heart_data_pipeline.Rmd
11
ProcessingR

Pivot back to wide and full_join

Pivots the long-form cand_thor and stathist_thor back to wide form, then performs a full_join using PX_ID, unique_date, and unique_event to create the full_list dataset.

pivot_widerfull_join
tidyrdplyr
heart_data_pipeline.Rmd
12
ProcessingR

Pivot tx_hr and join with full_list

Applies the same pivot_longer and pivot_wider transformations to tx_hr dataset, then joins it with full_list using common keys.

pivot_longerpivot_widerleft_join
tidyrdplyr
heart_data_pipeline.Rmd
13
ProcessingR

Coalesce duplicate measurement variables

Uses coalesce to combine multiple variables containing the same measurement type into single unified variables, such as combining HemoSbp, EcmoWithoutHemoSbp, McsdWithoutHemoSbp into systolicBP.

coalescemutate
dplyr
heart_data_pipeline.Rmd
14
ProcessingR

Resolve duplicate measurements on same date

Identifies instances where a single candidate has multiple different values for a variable on the same date, keeps the first submission date unchanged, and adjusts later submission dates to Change_Dt, marking changes in date_changed variable.

group_bymutateifelse
dplyr
heart_data_pipeline.Rmd
15
ProcessingR

Create status_criteria variable

Combines information from multiple indicator variables to create a single status_criteria variable that describes the criteria for each candidate's assigned status.

mutatecase_when
dplyr
heart_data_pipeline.Rmd
16
ProcessingR

Pivot old policy justification forms

Applies the same pivot transformations to old policy era justification forms and joins them with full_list, applying the same coalescing and deduplication steps.

pivot_longerpivot_widercoalesceleft_join
tidyrdplyr
heart_data_pipeline.Rmd
17
ProcessingR

Fill missing values with LOCF

Fills missing values using last observation carried forward method for time-varying variables, and bidirectional fill for non-time-varying variables from cand_thor and tx_hr.

fillgroup_by
tidyrdplyr
heart_data_pipeline.Rmd
18
ProcessingR

Filter to desired timepoints

Filters full_list to include only rows corresponding to desired event types by using grepl pattern matching on the unique_event variable.

filtergrepl
dplyrbase
heart_data_pipeline.Rmd
19
ProcessingR

Calculate days between observations

Uses a for loop to iterate over each PX_ID group and calculates the number of days elapsed between successive rows for each candidate/recipient.

fordiffas.numericgroup_by
basedplyr
heart_data_pipeline.Rmd
20
Parallel
ResponseR

Save as RData file

Saves the final processed dataset as an RData file for use in R environments.

save
base
heart_data_pipeline.Rmd
ResponseR

Export as CSV file

Exports the final processed dataset as a CSV file for broader accessibility and use in other tools.

write.csv
base
heart_data_pipeline.Rmd

Analyzed 2/19/2026, 9:10:00 PM

Sign in to analyze your own repositories.

Sign in