<p dir="ltr">Early detection of Mild Cognitive Impairment (MCI) via speech is hampered by the inability of current models to generalize robustly across diverse cognitive tasks. Existing high-performing systems on the TAUKADIAL Challenge often exhibit a critical performance collapse when tested across all three picture description tasks. We present a language-specific, early multimodal fusion framework to overcome this limitation, utilizing the English subset of the TAUKADIAL dataset. Our model integrates three parallel feature streams: the Whisper encoder’s acoustic embeddings, RoBERTa fine-tuned via LoRA for linguistic markers, and UMAP-reduced eGeMAPS prosodic features. Trained robustly across all three tasks, this system achieved an Unweighted Average Recall (UAR) of 0.87, establishing a new state-of-the-art for the English subset. Crucially, this performance demonstrates superior generalization and stability compared to the challenge baseline (UAR:0.60) and previous task-optimized models. This work validates the necessity of targeted, multimodal optimization for creating robust, non-invasive MCI biomarkers.</p>