In connection with the ISCA SLaTE 2025 Workshop and Cambridge University Press and Assessment, we are happy to introduce the Speak & Improve Challenge 2025 to the speech and language learning community. Our goal is to advance the technology in the field of spoken language assessment and feedback by making available a new rich dataset and proposing a variety of associated tasks.
The challenge offers a unique opportunity with the pre-release of the Speak & Improve Corpus 2025 from Cambridge University Press & Assessment. This dataset is derived from the Cambridge English Speak & Improve L2 (second language) English speaking practice tool and contains annotated recordings of a wide variety of L2 English learner speech on open (spontaneous) speaking tasks.
The challenge consists of four tasks designed to advance spoken language technology and improve automated language learning assessment and feedback: Spoken Language Assessment (SLA); Spoken Grammatical Error Correction Feedback (SGECF); Automatic Speech Recognition (ASR); and Spoken Grammatical Error Correction (SGEC). Each task has a closed and open track. Participants can do as many tasks as they would like.
This task aims to advance automatic speech recognition (ASR) in the context of L2 English learners’ speech, with a focus on pronunciation, fluency, and accents.
This task evaluates learners' spoken responses and predicts scores that closely align with human assessments. Key language features such as pronunciation, fluency, intonation, and grammatical accuracy will be assessed.
Participants will focus on identifying and correcting grammatical errors in spoken language, including tense usage, subject-verb agreement, and sentence structure.
This task focuses on providing clear, actionable feedback on grammatical errors or spoken disfluencies, enhancing the usability of language-learning tools.
To encourage broad participation and innovation, each challenge task has two tracks:
Participants can choose to participate in one or more tasks in either track. Baseline systems will be provided for each task to help participants in their development efforts.
Participants will be provided with a dataset from the Speak & Improve L2 English speaking practice tool, which includes annotated responses to a range of speaking tasks across proficiency levels from CEFR A2 to C1.
The read-aloud Part 2 data will not be released as part of this Challenge to focus on open speaking tasks.
The rules for using external data sources differ between the closed and open tracks. Participants in the closed track are limited to the released data, while those in the open track may use publicly available external data sources.