MSc thesis project proposal

[2024] An FPGA Accelerator of OpenAI Whisper Speech Recognizer

Introduction

In the evolving landscape of human-machine interaction, Automatic Speech Recognition (ASR) systems play a pivotal role by enabling machines to understand and process human speech. The emergence of AI-driven ASR technologies, such as OpenAI's Whisper, has significantly advanced the field, offering high accuracy and versatility across various languages and contexts. However, the real-time application of such systems is often hindered by computational latency. This project proposes the development of a hardware accelerator for the Whisper ASR system using the Zynq Field-Programmable Gate Array (FPGA) platform, with an ambitious goal of achieving a target latency of 10 milliseconds to ensure seamless real-time speech recognition.

Project Aims and Objectives

The primary aim of this thesis is to design and implement a hardware accelerator for the Whisper speech recognition system, optimizing for speed and efficiency. The objectives are as follows:

Performance Optimization: Achieving a target latency of 10 milliseconds for speech recognition to enable real-time processing.
Hardware Design: Utilizing the Verilog hardware description language to develop an FPGA-based accelerator tailored for the Whisper ASR system.
Evaluation and Benchmarking: Assessing the accelerator's performance regarding latency in milliseconds, throughput in operations per second (OPS), accuracy in word error rate (WER), and power consumption compared to software-based GPU implementations.

Assignment

Assignment: Research and Methodology

The project will involve:

Literature Review: Conducting an in-depth review of existing ASR systems, with a focus on hardware accelerators and FPGA implementations.
Hardware Design: Designing the FPGA accelerator using Verilog, considering the unique computational requirements of the Whisper ASR system.
System Integration and Testing: Integrating the hardware accelerator with the Whisper system and conducting tests to evaluate performance metrics.
Performance Analysis: Comparing the FPGA accelerator's performance with traditional software implementations, analyzing improvements in latency, accuracy, and efficiency.

Expected Outcomes

This thesis is expected to deliver:

A fully functional FPGA-based hardware accelerator for the Whisper ASR system, achieving real-time speech recognition with a target latency of 10 milliseconds.
A detailed analysis of the system's performance, including comparisons with software-based ASR systems to highlight improvements.

Requirements

Requirements

Digital signal processing and preferably machine learning.
Experience with FPGA development, particularly using Verilog for hardware description.

Contact

dr. Chang Gao

Electronic Circuits and Architectures Group

Department of Microelectronics

Last modified: 2024-06-11