The University of Auckland
Browse

Bowen Chen: SSAT-Adapter: Enhancing Vision-Language Model Few-shot Learning with Auxiliary Tasks

Download (3.18 MB)
poster
posted on 2025-10-16, 23:52 authored by Bowen ChenBowen Chen, Yun Sing KohYun Sing Koh, Gill DobbieGill Dobbie
<p dir="ltr">Traditional deep learning models often struggle in few-shot learning scenarios, where limited labeled data is available.</p><p dir="ltr">While the Contrastive Language-Image Pre-training (CLIP) model demonstrates impressive zero-shot capabilities, its performance in few-shot scenarios remains limited. </p><p dir="ltr">Existing methods primarily aim to leverage the limited labeled dataset, but this offers limited potential for improvement.</p><p dir="ltr">To overcome the limitations of small datasets in few-shot learning, we introduce a novel framework, SSAT-Adapter, that leverages CLIP's language understanding to generate informative auxiliary tasks and improve CLIP's performance and adaptability in few-shot settings.</p>

History

Publisher

University of Auckland

Usage metrics

    University of Auckland

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC