Dr. Si Chan's lab has successfully secured a GRF grant! 🎉🎊👏

Project Details

Funding Scheme: General Research Fund

Project Number: 15604425

Project Title(English):

The integration of two modalities: gesture and speech prosody in Cantonese focus marking in autistic and typical populations

Project Title(Chinese) :

Principal Investigator(English): Prof Chen, Si

Principal Investigator(Chinese) :

Department: Dept of Chinese and Bilingual Studies

Institution: The Hong Kong Polytechnic University

Co - Investigator(s) :

Dr Bosker, Hans

Dr Chan, Angel Wing-Shan

Prof Chen, Zhuoming

Dr Ge, Haoyan

Prof Li, Bin

Prof Sheng, Li

Miss Tang, Tempo Po-yi

Dr Wayland, Ratree

Panel: Humanities, Social Sciences

Subject Area: Psychology and Linguistics

Exercise Year: 2025 / 26

Fund Approved: 654,000

Project Status: On-going

Completion Date :

Abstract as per original application

(English/Chinese): Abnormal speech prosody and weak quality in manual gesture production have been widely reported in autistic individuals. However, in addition to the deficits found in separate modalities, autistic individuals may have difficulties in integrating gesture and speech prosody. Due to this potential deficit, intervention plans targeting at separate modalities may not be enough. A better understanding of the relationship between gesture and speech prosody may lead to an intervention plan that improves the integration and employs the effects of gesture to boost the use of speech prosody. It is thus high time that we study 1) the temporal relationship between speech prosody and gesture in Cantonese focus marking by typical and autistic populations. It is hypothesized that synchronization is planned between gesture and speech prosody for typical population, but autistic children may show difficulties in the temporal alignment between the two as the Weak Central Coherence theory posits that autistic children focus on uni-sensory information at the expense of multisensory perception; 2) the effects of gesture on the acoustic realization of speech prosody by typical and autistic populations. It is hypothesized that gesture may boost prosodic prominence in focus marking by typical population. However, autistic children may show reduced boosting effects due to difficulties in the integration across modes. Thirty-six autistic children, 36 typical developing children aged from 10 to 12 and 36 adults will complete three experiments eliciting focus marking in speech prosody with and without gesture. Acoustic cues will be extracted using Praat and annotation of gesture that accompanies the focal word in each utterance will be performed in ELAN. Four measurements will be carried out to measure the temporal relationship. Statistical modelling will be further applied to all the extracted data. This project will be among the first to investigate the potential deficits in the integration of the visual and auditory modalities in focus marking especially among autistic children. It enhances our understanding of the temporal relationship between speech prosody and gesture, as well as how gesture may affect the acoustic realization of speech prosody in both typical and autistic populations. Therefore, the project may provide potentially valuable information for the diagnosis of autism and the design of intervention plans. Based on the findings of this study, the intervention plan will be more comprehensive as it addresses the temporal alignment of two modes and gesture may be employed to improve acoustic realization of focus marking in speech.

N/A

Research Outcome

Layman's Summary of

Completion Report: Not yet submitted

Last Updated at 2025/07/02

Our former postdoc has also received a GRF grant. 🎉🎊👏

Funding Scheme: General Research Fund

Project Number: 15603425

Project Title(English): The role of AI in courtroom evidence evaluation: showcasing the impact in forensic voice comparison in Hong Kong

Project Title(Chinese) :

Principal Investigator(English): Dr WANG, Bruce Xiao

Principal Investigator(Chinese) :

Department: Department of English and Communication

Institution: The Hong Kong Polytechnic University

Co - Investigator(s) :

Dr Hughes, Vincent

Panel: Humanities, Social Sciences

Subject Area: Psychology and Linguistics

Exercise Year: 2025 / 26

Fund Approved: 986,808

Project Status: On-going

Completion Date :

Abstract as per original application

(English/Chinese): The integration of Artificial Intelligence (AI) into various sectors, including healthcare, education, and forensic sciences, is transforming our daily lives. In forensic science, particularly in biometric recognition such as facial and voice recognition, AI adoption has surged. Forensic voice omparison (FVC) is a critical area where AI-integrated automatic speaker recognition (ASR) systems are increasingly used to compare voice recordings from unknown offenders and known suspects. Despite the advancements in ASR technology, which has shown high accuracy in controlled conditions, the reliability of these systems under varying real-world conditions remains a significant concern. This distinction between accuracy and reliability is crucial, especially in forensic contexts where errors can lead to wrongful convictions or acquittals, as well as miscarriages of justice. This research aims to investigate the reliability of AI-integrated ASR systems in forensic contexts, focusing on the three most commonly spoken languages in Hong Kong: Cantonese, English, and Mandarin. We will examine how various conditions, such as speech feature extraction, natural variability in a speaker’s voice over time, and the choice of AI models, impact the reliability of these systems. The study will also investigate how these conditions might cause errors, such as misidentifying different speakers as the same person, leading to miscarriages of justice. The proposed study will be conducted in two working stages. In the first stage, we will focus on system reliability under different conditions, addressing whether ASR systems with better accuracy also ensure greater reliability concerning speech features, speech style, speech duration, and sample size. We will also investigate whether these attributes are influenced by specific languages or accents. In the second stage, we will investigate how speech samples from the same- or different-speakers interact with and respond to various AI-integrated ASR systems. We aim to investigate how the strength of evidence varies for individual speakers using different systems and what systematic properties of individual speakers make them more or less challenging for ASR systems to distinguish. The research will deliver new knowledge on the reliability of AI-integrated ASR systems in forensic contexts, providing empirical evidence on how various conditions and individual speaker characteristics impact system performance. This knowledge is crucial for forensic experts to demonstrate the reliability and accuracy of the methods employed in forensic comparison, ensuring that forensic evidence evaluation in court is based on reliable and consistent methods, thereby minimizing the risk of miscarriages of justice.

N/A

Research Outcome

Layman's Summary of

Completion Report: Not yet submitted

Last Updated at 2025/07/02

Congratulations to Dr. Yike Yang on obtaining a FDS grant!

Project Reference No: UGC/FDS15/H09/24

Title: Effects of Diverse Training Paradigms on Enhancing Comprehensibility of Cantonese Speech in Immigrants

PI: Dr YANG Yike

Funding Period: 18 months (Jan 2025 to June 2026)

Amount Awarded: HK$649,868

Abstract:

Although there is an increasing number of people learning a second language (L2), it is widely accepted that attainment of native pronunciation is unlikely for post-puberty L2 learners. From a more practical point of view, L2 learners and teachers should focus more on the comprehensibility of L2 speech, rather than the accent. One feature of L2 learning is the lack of sufficient exposure to the L2, even in the immigration setting. Thus, the proposed study will examine the effects of various short-term training paradigms on the enhancement of comprehensibility in immigrants’ L2 Cantonese, in the hope of providing effective training methods for immigrants to compensate for a lack of sufficient exposure.

This proposed study has three aims: (1) test the effects of different training methods on L2 comprehensibility enhancement; (2) systematically examine the effects of different training methods on lexical tone production and perception; and (3) combine both acoustic and perceptual measurements for analysis of L2 Cantonese speech. This study will recruit immigrants with no prior knowledge of Cantonese before arriving in Hong Kong and will prepare different training methods to enhance the comprehensibility of their L2 Cantonese. To investigate the training effects, the participants’ performance of various tasks will be tested before and after training sessions.

As the first attempt to systematically investigate the effects of different training methods on Cantonese tone production and perception, this study will provide insight into training effectiveness and advance our theoretical knowledge of L2 speech learning. Furthermore, the results of this research will also inform language teachers of the optimal training method for Cantonese tones, allowing teachers to revise their syllabi and pedagogies when teaching L2 learners of Cantonese.