heart_data_pipeline
AI Architecture
Sign in to run a focused analysis on specific parts of this codebase.
Sign inTrigger
User executes R Markdown document with specified data release folder
Import SAF datasets
Loads multiple SRTR SAF files from the specified release folder in SAS format including cand_thor, stathist_thor, tx_hr, justification forms, and risk stratification data.
Convert date-time variables
Converts Change_Dt variables from date-time format to simple date format for consistency across datasets.
Remove invalid JustId entries
Filters out entries from risk stratification data that lack a valid JustId identifier.
Deduplicate justification forms
Filters justification forms to keep only one form per candidate per day, retaining the form submitted last when multiple exist for the same date.
Join datasets using PX_ID
Performs multiple joins to establish PX_ID as the common linking variable across all imported datasets including cand_thor, stathist_thor, tx_hr, and justification forms.
Remove irrelevant variables
Removes columns deemed irrelevant including those ending with 'St', starting with 'CandHist', 'SenData', 'CurrTher', ending with 'Type' or 'Perf'.
Apply listing date filter
Excludes candidates listed prior to the specified filter_date (default January 1, 2010).
Apply age filter
Excludes candidates with age less than 18 at the time of listing.
Apply organ type filter
Excludes lung-only transplant recipients and optionally excludes multi-organ recipients based on include_multi_organ_recipients parameter.
Remove high-missingness variables
Removes variables from cand_thor and tx_hr that exceed the missingness_threshold percentage of missing values.
Pivot cand_thor and stathist_thor to long form
Transforms cand_thor and stathist_thor datasets from wide to long form by pivoting all date columns, creating unique_date and unique_event variables for each date-related event.
Pivot back to wide and full_join
Pivots the long-form cand_thor and stathist_thor back to wide form, then performs a full_join using PX_ID, unique_date, and unique_event to create the full_list dataset.
Pivot tx_hr and join with full_list
Applies the same pivot_longer and pivot_wider transformations to tx_hr dataset, then joins it with full_list using common keys.
Coalesce duplicate measurement variables
Uses coalesce to combine multiple variables containing the same measurement type into single unified variables, such as combining HemoSbp, EcmoWithoutHemoSbp, McsdWithoutHemoSbp into systolicBP.
Resolve duplicate measurements on same date
Identifies instances where a single candidate has multiple different values for a variable on the same date, keeps the first submission date unchanged, and adjusts later submission dates to Change_Dt, marking changes in date_changed variable.
Create status_criteria variable
Combines information from multiple indicator variables to create a single status_criteria variable that describes the criteria for each candidate's assigned status.
Pivot old policy justification forms
Applies the same pivot transformations to old policy era justification forms and joins them with full_list, applying the same coalescing and deduplication steps.
Fill missing values with LOCF
Fills missing values using last observation carried forward method for time-varying variables, and bidirectional fill for non-time-varying variables from cand_thor and tx_hr.
Filter to desired timepoints
Filters full_list to include only rows corresponding to desired event types by using grepl pattern matching on the unique_event variable.
Calculate days between observations
Uses a for loop to iterate over each PX_ID group and calculates the number of days elapsed between successive rows for each candidate/recipient.
Save as RData file
Saves the final processed dataset as an RData file for use in R environments.
Export as CSV file
Exports the final processed dataset as a CSV file for broader accessibility and use in other tools.
Analyzed 2/19/2026, 9:10:00 PM
Sign in to analyze your own repositories.
Sign in