<p dir="ltr">Traditional deep learning models often struggle in few-shot learning scenarios, where limited labeled data is available.</p><p dir="ltr">While the Contrastive Language-Image Pre-training (CLIP) model demonstrates impressive zero-shot capabilities, its performance in few-shot scenarios remains limited. </p><p dir="ltr">Existing methods primarily aim to leverage the limited labeled dataset, but this offers limited potential for improvement.</p><p dir="ltr">To overcome the limitations of small datasets in few-shot learning, we introduce a novel framework, SSAT-Adapter, that leverages CLIP's language understanding to generate informative auxiliary tasks and improve CLIP's performance and adaptability in few-shot settings.</p>