节点文献

基于GPU的软件无线电并行算法与系统结构关键技术研究

Parallel Algorithms And System Architecture For Software Radio on GPUs

【作者】 李荣春

【导师】 窦勇;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2014, 博士

【摘要】 随着信息化发展和科技进步,宽带无线通信正在改变着人们的生活方式,人们可以随时随地享受着无线通信带来的生活便利。但是通信协议的日新月异,需要移动终端需要支持多种通信模式,例如个域网、局域网、城域网、广域网。软件无线电可以实现在不改变硬件结构的情况下,可以支持多种通信模式和通信协议。在这些无线通信协议中,MIMO技术的信道容量与发射端和接收端的最小天线数成线性关系,这使得MIMO技术成为无线通信应用最广泛的技术之一。同时,OFDM技术由于较高的频谱利用率、能有效对抗无线信道多径衰落和易于实现的特点,也成为最不可或缺的无线通信技术。目前较为先进的无线通信协议都融入了MIMO技术和OFDM技术,形成MIMO-OFDM无线通信。无线通信由于实际需要,要求具备高速率、高保真、低延迟、多用户的特点,这就需要无线通信系统不仅仅具有较高吞吐率和处理能力,同时需要有较低的误码率。MIMO-OFDM无线通信物理层主要有三类算法:信道纠错编译码算法、OFDM算法、MIMO检测算法。本文在GPU平台上提出了细粒度软件无线电通信算法,并创造性地利用GPU构建了软件无线电实时通信系统。具体的讲,主要进行了如下研究:1)提出了以CPU为控制器、GPU为基带处理器的异构软件无线电平台Cu Sora。该平台将Sora无线电平台和GPU处理器相结合,利用Sora平台无线电前端和系统整体框架收发无线信号、完成MAC层的处理,同时利用GPU处理器对系统物理层进行处理,使得吞吐率和误码性能满足无线通信协议的要求。Cu Sora平台同时设计了MAC层控制器,可以完成多模式多标准协议通信的相互切换。2)提出了基于GPU的细粒度并行编译码算法和GPU加速器结构。本文针对目前较为常见的卷积码、Turbo码和LDPC码,分析了三种纠错码计算特性,针对GPU平台选择了合适的改进算法,分别提出了基于GPU的细粒度并行算法,在取得良好并行性的同时,利用有效的误码性能保护机制,降低并行算法对误码率的影响。本文利用Fermi架构的GPU处理器实现了三种纠错码的高吞吐率高误码性能编码器和译码器。相对于通用处理器实现,三种GPU译码器可以获得两个到三个数量级的性能提升,同时都优于目前其他同类的译码器。3)提出了基于GPU的细粒度并行OFDM算法和GPU加速器结构。本文针对目前较为常见的OFDM调制解调、同步、信道估计等常见OFDM算法,分析了三类算法的计算特性,分别提出了基于GPU的细粒度并行算法。并行算法在取得良好并行性的基础上,有效地保证了各个子载波携带样本信息的正确性和子载波之间样本信息的重组和交换。本文利用Fermi架构的GPU处理器实现了三类OFDM并行算法的加速器,采用了多种加速优化方法,相对于无线协议的实时吞吐率,GPU加速器最终吞吐率可以获得一到两个数量级的性能提升。4)提出了基于GPU的细粒度并行MIMO检测算法和GPU加速器结构。本文针对PIC MIMO检测算法,分析了该类算法的计算特性,提出了基于GPU的细粒度并行算法。算法在取得良好并行性的基础上,有效地还原了各个接收向量的复数样本信息。最后利用Fermi架构的GPU处理器实现了PIC检测并行算法的加速器。相对于通用处理器,GPU加速器可以获得两个数量级的吞吐率提升,同时优于目前其他MIMO检测器。5)提出并实现了基于GPU的软件无线电原型系统。本文针对目前通用的Wi Fi(802.11a)、Wi MAX(802.16d)无线通信协议,分析两种无线通信协议物理层的算法模块链路,提出了基于GPU的OFDM无线通信参数化软件无线电系统结构。本文利用上述基于GPU的物理层中典型编译码并行算法、OFDM并行算法和MIMO检测并行算法,组合实现两种通信协议的物理层;以Cu Sora平台为基础,设计并实现了基于GPU的软件无线电原型系统,实现实时无线传输。相对于Sora软件无线电系统,本文实现的软件无线电原型系统传输速率可取得10%到30%的提升。各个算法模块的吞吐率都普遍优于Sora无线电系统实现。同时,本文基于Cu Sora的无线电系统中各个模块性能优于目前其他CPU、DSP、FPGA平台实现的同类模块性能。

【Abstract】 With the development of information technology, the Wide-band Wireless Communication(WWC) makes people change living mode. People can enjoy the convenience that the WWC bring to us everywhere. However, the mobile terminals in hand should support various wireless protocols, such as the personal area network, local area network,metro area network and wide area network. Software-defined radio is designed to handle this situation that the wireless communication system can support multiple communication modes and protocols without altering the hardware architecture. In the wireless communication technology, multiple input multiple output(MIMO) systems enable the data throughput increasing linearly with the minimum numbers of antennas at transmitter and receiver, so that it is one of the most attractive techniques in wireless communication. Orthogonal frequency-division multiplexing(OFDM) is also a key technique for WWC because of its high spectral efficiency and capability to combat multipath fading. At present,many advanced WWC protocols integrate both MIMO and OFDM techniques, forming the MIMO-OFDM WWC, which has the characteristic of high-throughput, high-fidelity,low-latency and multi-user. As a result, the WWC system not only has high-throughput and high processing ability, but also excellent bit error ratio(BER) performance. In the PHY layer of MIMO-OFDM WWC, there are three types of algorithms, i.e. the channel encoding/decoding algorithms, the OFDM algorithms and the MIMO detecting algorithms.In this thesis, the fine-grained parallel algorithm for the software radio algorithms are presented and a real-time software radio system with graphics processing unit(GPU)integrated is proposed. And the detailed works are bellows.(1) A software radio system, named as Cu Sora, is presented. In Cu Sora, the Central Processing Unit(CPU) is used as controller and the GPU is used as baseband processor.Sora platform is a soft radio platform developed by the Microsoft Asian Research Center.Cu Sora combines the Sora platform with the GPU processors. In Cu Sora, the radio-front(RF) signal is received from the air, the frequency down-conversion and analog-to-digital conversion(ADC) are performed through the Sora front-end. Cu Sora reuses the software framework and MAC layer of Sora to control the radio platform. At the meantime, Cu Sora exploits GPU as the modem processor to achieve high-speed PHY signal processing,which make the throughput and BER performance meet the requirement of wireless communication protocols. A multi-mode software controller is also designed in Cu Sora to support multiple protocols.(2) A fine-grained parallel channel encoding and decoding algorithm for forward error correction(FEC) codes and architecture for the encoder and decoder are presented.Three popular FEC codes, i.e. convolutional codes, Turbo codes, and LDPC codes, are analyzed to find the computation characteristic. Appropriate revised encoding or decoding algorithms for the GPU platform are chosen and their parallel algorithms are presented. In the proposed parallel encoding or decoding algorithms, excellent parallelism is exploited and efficient guarding schemes are used to reduce the effect to BER performance. The encoder and decoder for the three FEC codes are implemented on the GPUs with the Fermi architecture. The throughput of decoders is about two orders of magnitude faster than that of CPU implementations and superior to current GPU-based FEC decoders.(3) A fine-grained parallel algorithm and GPU accelerator architecture for the OFDM algorithms are presented. Three types of OFDM algorithms, i.e. OFDM modulation/ demodulation, synchronization, and channel estimation, are analyzed to find the computation characteristic. Then the fine-grained parallel algorithms for such modules are presented, where not only large parallelism is exploited, but also the multiple sub-carriers can carry correct samples and communication with other sub-carriers. The GPU accelerators for the three OFDM modules are implemented on the GPUs with the Fermi architecture. Several optimization methods are exploited and the throughput is about two orders of magnitude faster than that of CPU implementations.(4) A fine-grained parallel algorithm and GPU accelerator architecture for the MIMO detectors are presented. In this paper, PIC MIMO detector is chosen for its low complexity and high BER performance. The fine-grained parallel algorithms for such detectors are presented, where not only large parallelism is exploited, but also the original transmitted sample vectors can be recovered correctly. The GPU accelerators for the three MIMO detectors are implemented on the GPUs with the Fermi architecture. Several optimization methods are exploited and the throughput is about two orders of magnitude faster than that of CPU implementations.(5) A GPU-based software radio prototype system is presented and implemented.The transceiver modules in the physical layer of two wireless protocols, i.e. Wi Fi(802.11a)and Wi MAX(802.16d), are analyzed. GPU-based OFDM parameterized software defined radio(SDR) framework is presented. The encoders, decoders, OFDM modules, and MIMO detectors presented in this thesis are combined to form the physical layer of the two protocols. These modules are accelerated on the Cu Sora, and the real wireless communication is realized between two Cu Sora platforms. Compared to the Sora platform, the data rate for the wireless communication on Cu Sora can achieve about 10% to 30% performance improvement. The throughput of each module on Cu Sora is superior to that of Sora platform and CPU, DSP, FPGA in other works.

节点文献中: