Predicting hospital outcomes for patients with severe acute respiratory infections is critical for risk stratification and resource planning, yet heterogeneous electronic health record (EHR) data, class imbalance, and evolving clinical practice present persistent methodological challenges for machine learning (ML) approaches. We conducted a retrospective cohort study using EHR data harmonized to the OMOP common data model from the National COVID Cohort Collaborative (N3C; May 2020-June 2025), including 263,619 adults hospitalized with COVID-19 across 51 contributing sites. We developed penalized linear regression (elastic net), random forest, XGBoost, and multilayer perceptron (MLP) models to predict hospital length of stay (LOS) and mortality (in-hospital and 60-day), using demographics, comorbidities, prior healthcare utilization, COVID-19 vaccination status, and hospital site as predictors. Missing data were handled via multiple imputation by chained equations (MICE) and class imbalance was addressed using SMOTE. Model performance was evaluated using area under the ROC curve (AUROC), Brier score, calibration plots, and decision curve analysis, following the TRIPOD reporting framework. Mortality prediction achieved moderate discrimination across all models (test AUROC = 0.71-0.73 for in-hospital mortality; 0.72-0.73 for 60-day all-cause mortality). Models trained without SMOTE achieved the highest AUROCs but assigned virtually no patients to the mortality class at the default 0.5 threshold. SMOTE improved recall and F-1 score at the cost of reduced AUROC and precision. LOS was poorly explained by available structured predictors (best R2 = 0.059). Remdesivir-treated patients (n = 103,536; 39.3%) were older, had higher comorbidity burden, and had higher unadjusted mortality than untreated patients. Common structured EHR features offer moderate utility for mortality risk stratification in hospitalized COVID-19 patients but are insufficient for LOS prediction. The consistent SMOTE-related tradeoff between discrimination and calibration underscores the need to report threshold-dependent metrics alongside AUROC in clinical ML studies, with implications for operational planning during future respiratory disease emergencies.