heart_data_pipeline

public

main

AI Architecture

Trigger

User executes R Markdown document with specified data release folder

ProcessingR

Import SAF datasets

Loads multiple SRTR SAF files from the specified release folder in SAS format including cand_thor, stathist_thor, tx_hr, justification forms, and risk stratification data.

read_saslist.files

havenbase

heart_data_pipeline.Rmd

ProcessingR

Convert date-time variables

Converts Change_Dt variables from date-time format to simple date format for consistency across datasets.

as.Datemutate

dplyrlubridate

heart_data_pipeline.Rmd

ProcessingR

Remove invalid JustId entries

Filters out entries from risk stratification data that lack a valid JustId identifier.

filteris.na

dplyr

heart_data_pipeline.Rmd

ProcessingR

Deduplicate justification forms

Filters justification forms to keep only one form per candidate per day, retaining the form submitted last when multiple exist for the same date.

group_byfiltermax

dplyr

heart_data_pipeline.Rmd

ProcessingR

Join datasets using PX_ID

Performs multiple joins to establish PX_ID as the common linking variable across all imported datasets including cand_thor, stathist_thor, tx_hr, and justification forms.

left_joininner_joinfull_join

dplyr

heart_data_pipeline.Rmd

ProcessingR

Remove irrelevant variables

Removes columns deemed irrelevant including those ending with 'St', starting with 'CandHist', 'SenData', 'CurrTher', ending with 'Type' or 'Perf'.

selectstarts_withends_with

dplyr

heart_data_pipeline.Rmd

ProcessingR

Apply listing date filter

Excludes candidates listed prior to the specified filter_date (default January 1, 2010).

filtermdy

dplyrlubridate

heart_data_pipeline.Rmd

Parallel

ProcessingR

Apply age filter

Excludes candidates with age less than 18 at the time of listing.

filter

dplyr

heart_data_pipeline.Rmd

ProcessingR

Apply organ type filter

Excludes lung-only transplant recipients and optionally excludes multi-organ recipients based on include_multi_organ_recipients parameter.

filter

dplyr

heart_data_pipeline.Rmd

ProcessingR

Remove high-missingness variables

Removes variables from cand_thor and tx_hr that exceed the missingness_threshold percentage of missing values.

selectwhereis.namean

dplyr

heart_data_pipeline.Rmd

ProcessingR

Pivot cand_thor and stathist_thor to long form

Transforms cand_thor and stathist_thor datasets from wide to long form by pivoting all date columns, creating unique_date and unique_event variables for each date-related event.

pivot_longerends_with

tidyr

heart_data_pipeline.Rmd

ProcessingR

Pivot back to wide and full_join

Pivots the long-form cand_thor and stathist_thor back to wide form, then performs a full_join using PX_ID, unique_date, and unique_event to create the full_list dataset.

pivot_widerfull_join

tidyrdplyr

heart_data_pipeline.Rmd

ProcessingR

Pivot tx_hr and join with full_list

Applies the same pivot_longer and pivot_wider transformations to tx_hr dataset, then joins it with full_list using common keys.

pivot_longerpivot_widerleft_join

tidyrdplyr

heart_data_pipeline.Rmd

ProcessingR

Coalesce duplicate measurement variables

Uses coalesce to combine multiple variables containing the same measurement type into single unified variables, such as combining HemoSbp, EcmoWithoutHemoSbp, McsdWithoutHemoSbp into systolicBP.

coalescemutate

dplyr

heart_data_pipeline.Rmd

ProcessingR

Resolve duplicate measurements on same date

Identifies instances where a single candidate has multiple different values for a variable on the same date, keeps the first submission date unchanged, and adjusts later submission dates to Change_Dt, marking changes in date_changed variable.

group_bymutateifelse

dplyr

heart_data_pipeline.Rmd

ProcessingR

Create status_criteria variable

Combines information from multiple indicator variables to create a single status_criteria variable that describes the criteria for each candidate's assigned status.

mutatecase_when

dplyr

heart_data_pipeline.Rmd

ProcessingR

Pivot old policy justification forms

Applies the same pivot transformations to old policy era justification forms and joins them with full_list, applying the same coalescing and deduplication steps.

pivot_longerpivot_widercoalesceleft_join

tidyrdplyr

heart_data_pipeline.Rmd

ProcessingR

Fill missing values with LOCF

Fills missing values using last observation carried forward method for time-varying variables, and bidirectional fill for non-time-varying variables from cand_thor and tx_hr.

fillgroup_by

tidyrdplyr

heart_data_pipeline.Rmd

ProcessingR

Filter to desired timepoints

Filters full_list to include only rows corresponding to desired event types by using grepl pattern matching on the unique_event variable.

filtergrepl

dplyrbase

heart_data_pipeline.Rmd

ProcessingR

Calculate days between observations

Uses a for loop to iterate over each PX_ID group and calculates the number of days elapsed between successive rows for each candidate/recipient.

fordiffas.numericgroup_by

basedplyr

heart_data_pipeline.Rmd

Parallel

ResponseR

Save as RData file

Saves the final processed dataset as an RData file for use in R environments.

save

base

heart_data_pipeline.Rmd

ResponseR

Export as CSV file

Exports the final processed dataset as a CSV file for broader accessibility and use in other tools.

write.csv

base

heart_data_pipeline.Rmd

Analyzed 2/19/2026, 9:10:00 PM