
Long Island Hospital: Excel Analytical Analysis of the Discharge Data
A Complete Interactive, App-like, Power BI Report
Tech Stack
Microsoft Excel
Project
Analyze the hospital discharge data to uncover insights that inform strategic decisions and initiatives: descriptive report, financial analysis, payment typology analysis, and operational efficiency and quality of care analysis
Benefits
Leveraging Excel means more data-based cooperation as it is still a cross-department, cross-domain popular tool.
Goals
Developing Reporting, Visualization & Data Analysis skills using Excel.
I first made a research online to get more variables on the water-saving plants listed in the first public dataset. I then prepped them and modeled out a star schema that I then used to into visualizations that built up an application report that is easy-to-use.
Source:
I used the book from Alex Holloway, "Data Analysis in Microsoft Excel - Guided Project - Healthcare", 2024, to learn and complete this project.
Indeed, I have a strong motivation and interest in integrating the healthcare industry, especially the Rambam Research campus here in Israel Haifa.
This is an opportunity for me to show my motivation and interest for the healthcare field and at the same time, show and improve my Excel skills.
Starting with public data: Long Island Nassau/Suffolk Inpatient Discharge Data
The dataset was downloaded here, from the New-York health data portal. It is exclusively from the Long Island Nassau and Suffolk hospitals and concerns only Inpatient Discharge data. Year 2015.
It is originally comprising ~339K rows from 34 columns for a total weight of ~125MB.
Understanding the Variables
Right away I am facing expressions that are not familiar to me. So I had to do some research on these terms to know what they mean.
Operating Certificate Number :
It is the numeric identifier the New York State Department of Health issues to a licensed health facility.
If a provider changed ownership or merged, the same facility may have multiple OCNs over time (or one OCN reused); check existence years in the Health Facilities dataset.
The values are all 7 digits numbers. None of them has a leading 0, it is good to go.


Age Group :
There are 5 age groups:
0 - 17
18 - 29
30 - 49
50 - 69
70 or older
Each group ends at its age year upper limit, meaning that in the 0 - 17 group, the oldest people are 17 years old plus all the days leading to the day before they turn 18.
Zip Code - 3 Digits :
The zip code values are limited to 3 digits (called "coarse" zip code) to intentionally protect patient privacy.
The value OOS means "Out Of State", meaning the patient's address is not from the New-York state.
The value Blanks could mean 2 main reasons :
the population of the zip area is under 20 000, this avoids small cell identification
abortion records, which demanded an anonymous zip code
Zip codes are typically treated as a category variable, not numerical. So this will be a text data type, not an integer.
The "Race" Variable :
There are 4 distinct races in the dataset:
Black / African American
Individuals who identify as Black or of African descent. This follows typical demographic labeling used in U.S. health datasets.
White
Individuals who identify as White (of European, Middle Eastern, or North African origin in many datasets).
Note: some datasets separate Middle Eastern / North African (MENA) but many US health datasets keep them under "White" unless explicitly separated.
Multi-racial
Individuals who identify with two or more races. This is used when respondents reported more than one race (for example, White + Asian).
Other Race
Individuals whose race does not fit the predefined categories or who explicitly chose "Other" and may have provided free-text. This is a catch-all.
The Ethnicity Variable
While Race is giving the racial identity, Ethnicity describes the cultural and/or the linguistic identity of the patient.
Four values are listed: Multi-ethnic, Not Span/Hispanic, Spanish/Hispanic, and Unknown.
There may be a need to create a variable that combines Race and Ethnicity.
Length of Stay
The values are all integers, except for the maximum, "120+". This is the cap value and refers to what is known as the LOS, Long Over-Stay. The value of 1 means same day, the patient got in and out on the same day.
For analysis the value 120+ will be changed to 120 and explicitly mentioned as a cap, a maximum that hides higher values.
Type of Admission
This is the administrative category under which a patient was admitted to the hospital. The variable helps distinguish planned (elective) admissions from unplanned/emergency events and identifies special groups (newborns, trauma cases).
Elective
A planned admission scheduled in advance (e.g., scheduled surgery, elective procedures). The patient and hospital had time to prepare; admission is not the result of a sudden emergency.
Emergency
An unplanned admission that follows presentation to the emergency department for acute, potentially life‑threatening conditions requiring immediate evaluation or treatment. In many datasets this is the most common “unplanned” category.
Urgent
A middle category between elective and emergency. Urgent admissions require timely attention (often within 24 hours) but are not immediate life‑threatening emergencies. Examples: worsening but not immediately fatal conditions requiring admission.
Trauma
Admission following a traumatic injury and usually associated with activation of the trauma team or admission to a trauma service/center. Trauma admissions may originate from the field (ambulance) or directly from the Emergency Department of the hospital (ED). In some systems trauma is a sub-type of emergency; in others it’s recorded separately.
Newborn
Admissions of newborn infants (typically births).
These records are special: payer, diagnosis codes, LOS, and outcomes differ; newborns are often analyzed separately from adult admissions.
Not Available
Indicates the admission type was not recorded or intentionally suppressed. Treat this as a separate category rather than null-blindly dropping rows.
Patient Disposition
Patient Disposition records where the patient went after the inpatient stay ended (place of discharge) or the administrative outcome of the episode (e.g., left Against Medical Advice or AMA, and expired). It helps classify outcomes, measure transfers to other levels of care, and calculate post-acute utilization.
Post-acute utilization refers to the use of healthcare services that follows an acute-care hospital stay. These services help patients transition out of the hospital with continued medical and rehabilitative care to promote recovery and regain independence.
Another Type Not Listed
Catch-all for disposition values not represented by the predefined options.
Cancer Center or Children's Hospital
Transfer to a specialized hospital (oncology center or pediatric hospital). This is a transfer to another acute care facility rather than discharge to home.
Age checks: newborns and pediatrics may have different disposition patterns — verify Cancer Center or Children's Hospital aligns with younger ages where appropriate.
Court/Law Enforcement
Patient released into custody (police/jail). Important for legal/forensic workups and quality metrics where follow-up may be constrained.
Critical Access Hospital
Transfer to or discharge to a Critical Access Hospital (a federally-designated small rural hospital). Treated as an inpatient transfer destination rather than home.
Expired
The patient died during the admission (in-hospital death). Usually final disposition and must be treated as such in mortality analyses.
Facility with Custodial/Supportive Care
Non-skilled long-term care settings (assisted living, group homes, supportive housing). Distinct from skilled nursing and rehab facilities.
Federal Health Care Facility
Transfer to a federal facility (e.g., VA hospital, military hospital). Considered a transfer destination.
Home or Self Care
Discharged to the patient’s residence with no formal home health services. Often coded as the standard ‘discharged to home’ outcome.
Home with Home Health Services
Discharged to home with organized home health nursing/therapy services. Useful to identify post‑acute support.
Hospice - Home
Discharged to hospice care in the patient’s home.
Hospice - Medical Facility
Discharged to hospice provided at an inpatient hospice facility or hospice unit in another facility.
Inpatient Rehabilitation Facility (IRF)
Transfer to an acute inpatient rehabilitation hospital/unit (higher-intensity rehab than SNF).
Expect longer LOS prior to SNF/IRF transfers
Left Against Medical Advice (AMA)
Patient departed the hospital against clinical advice. Flag for readmission risk analyses and quality/risk adjustment.
Expect higher readmission risk for AMA.
Medicaid Certified Nursing Facility
Transfer to a nursing home certified to accept Medicaid (often long-term care / custodial with some clinical nursing support).
Medicare Certified Long Term Care Hospital (LTCH)
Transfer to a Medicare-certified long-term care hospital — a distinct subset of hospitals that manage prolonged acute care.
Psychiatric Hospital or Unit of Hospital
Transfer to psychiatric inpatient care.
Short-term Hospital
Transfer to another short-term acute care hospital (general hospital). Different from specialized or federal facilities.
Skilled Nursing Home (SNF)
Transfer to a skilled nursing facility providing therapy and skilled nursing care (post-acute care). SNF is clinically different from custodial care.
Expect longer LOS prior to SNF/IRF transfers
CCS Diagnosis Code
This is a long list of 1 to 3 digits numerical codes, ranging from 1 to 917, without continuity. CSS stands for Clinical Classification Software, which maps ICD diagnosis codes into clinically meaningful buckets so you can analyze diagnoses at a higher level instead of thousands of ICD codes.
The maximum of 917 means that the CSS is not using the simple 1–285 single-level CCS; it’s either an expanded/refined classification, a local/state-specific index, or an internal numbering of many mapped rows.


CCS Diagnosis Description :
The values are the corresponding description of each CCS Diagnosis Code.
As seen in the picture here:
code 240 is for "Burns",
657 for "Mood disorders" and so on.
CCS Procedure Codes :
CCS Procedure Code groups raw procedure codes (ICD procedure codes) into clinically meaningful categories using AHRQ’s Clinical Classifications Software for Procedures (CCS-Procedure). For ICD‑9‑CM procedure codes, the standard HCUP/AHRQ procedure CCS contains up to 231 categories. Seeing numeric codes up to ~231 is expected for that grouper.
The value 999 is not a standard CCS procedure category; it is typically used in datasets as a sentinel for ‘Unknown/Unclassified/Other/Not mapped’. It is an indicator that a procedure could not be mapped, or that the field was suppressed, or that the record is a special code (e.g., ambulatory procedure not applicable). Always check the SPARCS / dataset data dictionary for the exact meaning.
In this dataset, the list of codes is made of contiguous numbers up to 229. Then 230 is missing and then we get 999. Because of these specificities, the procedure codes that are used are from the ICD-9-CM codes.
The code 230 is not a full ICD-10 procedure code; it is part of the diagnosis code F10.230, which stands for "Alcohol dependence with withdrawal, uncomplicated". Its absence suggests that the hospitals do not have this procedure in place.
The code 999 was used in the older ICD-9-CM system, which has been replaced in the U.S. by ICD-10. 999 was part of the range 996-999, which described "Complications of surgical and medical care, not elsewhere classified". In our dataset, 999 corresponds to the CCS procedure description value of "UNGROUPABLE".
CCS Procedure Description :
These are the corresponding values of the CCS Procedure Codes. 0 corresponds to the value "NO PROC", 134 to "CESAREAN SECTION", 180 to "OTHER CT SCAN", and so on.
APR DRG Codes :
These acronyms mean All Patient Refined Diagnosis Related Groups. It is a patient classification system that groups inpatient stays by clinical similarity and resource use.
APR‑DRG was developed by 3M Health Information Systems (proprietary) and is widely used for case‑mix, cost analysis, severity adjustment, and benchmarking.
For this dataset, the list goes from the number 3 to 956, mostly not contiguous.
Severity of Illness (SOI) and Risk of Mortality (ROM) subclasses: APR‑DRG adds 4-level subclasses (1 to 4) for SOI and ROM. These are stored as separate fields in the dataset: APR Severity of Illness Code and APR Risk of Mortality.
APR DRG Descriptions :
These are the descriptions that correspond to each APR DRG code. 753 is for "BIPOLAR DISORDERS", 540 is for "CESAREAN DELIVERY" and so on.
APR MDC Code :
The APR-MDC code is a small integer (typically 0–25) that groups an APR-DRG into one of the broad body-system / clinical categories (for example “Diseases of the Respiratory System”, “Circulatory System”, “Pregnancy/Childbirth”, etc.).
APR MDC Descriptions :
The APR-MDC Description is corresponding description of the APR MDC code. Code 22 is for "Burns", 19 for "Mental Diseases and Disorders", and so on.
APR Severity of Illness Code & APR Severity of Illness Description :
These are simply 4 numbers from 1 to 4, ranging the severity on the scale of Minor, Moderate, Major and Extreme.
APR Risk of Mortality:
Similarly to the APR Severity of Illness, the same scale is used here: Minor, Moderate, Major and Extreme.
APR Medical Surgical Description:
The 3 values of this variable are: Medical, Surgical and Not Applicable.
Not Applicable covers classifications not covered in the 2 previous, like newborns, maternity, psychiatric, rehabilitation, or certain administrative records like outpatient-only, observation, or other non-inpatient contexts.
Abortion Edit Indicator:
This indicates if the inpatient went through an abortion. Note that there is no "Y" value, indicating that none of the inpatient did. This is a privacy measure taken by the system in place and does not mean that no abortion was recorded.
Emergency Department Indicator:
This indicates if the Emergency Department was involved during the inpatient visit.
Total Charges and Total Costs:
Total Charges is the amount billed to the inpatient. The amount in the Total Costs is the cost incurred by the hospital in delivering the care.
The ratio between the two is 4, meaning most of the Charges amounts are the result of the cost amount multiplied by 4.


