MSc thesis project proposal
[2024] An FPGA Accelerator of OpenAI Whisper Speech Recognizer
Introduction
In the evolving landscape of human-machine interaction, Automatic Speech Recognition (ASR) systems play a pivotal role by enabling machines to understand and process human speech. The emergence of AI-driven ASR technologies, such as OpenAI's Whisper, has significantly advanced the field, offering high accuracy and versatility across various languages and contexts. However, the real-time application of such systems is often hindered by computational latency. This project proposes the development of a hardware accelerator for the Whisper ASR system using the Zynq Field-Programmable Gate Array (FPGA) platform, with an ambitious goal of achieving a target latency of 10 milliseconds to ensure seamless real-time speech recognition.
Project Aims and Objectives
The primary aim of this thesis is to design and implement a hardware accelerator for the Whisper speech recognition system, optimizing for speed and efficiency. The objectives are as follows:
- Performance Optimization: Achieving a target latency of 10 milliseconds for speech recognition to enable real-time processing.
- Hardware Design: Utilizing the Verilog hardware description language to develop an FPGA-based accelerator tailored for the Whisper ASR system.
- Evaluation and Benchmarking: Assessing the accelerator's performance regarding latency in milliseconds, throughput in operations per second (OPS), accuracy in word error rate (WER), and power consumption compared to software-based GPU implementations.
Assignment
Assignment: Research and Methodology
The project will involve:
- Literature Review: Conducting an in-depth review of existing ASR systems, with a focus on hardware accelerators and FPGA implementations.
- Hardware Design: Designing the FPGA accelerator using Verilog, considering the unique computational requirements of the Whisper ASR system.
- System Integration and Testing: Integrating the hardware accelerator with the Whisper system and conducting tests to evaluate performance metrics.
- Performance Analysis: Comparing the FPGA accelerator's performance with traditional software implementations, analyzing improvements in latency, accuracy, and efficiency.
Expected Outcomes
This thesis is expected to deliver:
- A fully functional FPGA-based hardware accelerator for the Whisper ASR system, achieving real-time speech recognition with a target latency of 10 milliseconds.
- A detailed analysis of the system's performance, including comparisons with software-based ASR systems to highlight improvements.
Requirements
Requirements
- Digital signal processing and preferably machine learning.
- Experience with FPGA development, particularly using Verilog for hardware description.
Contact
dr. Chang Gao
Electronic Circuits and Architectures Group
Department of Microelectronics
Last modified: 2024-06-11