Group 243

From ECE Department Wiki

Jump to: navigation, search

Contents

Program Compression for Embedded Systems

Members









Kyaw Min
Andrew Aarestad
Joseph Thomas

Introduction

Embedded systems are everywhere nowadays. From cellphones to toasters and microwave ovens, they are part of our everyday lives and improve the quality of living. With so much use seen by embedded systems there is considerable research done to optimize performance, power usage, and conserve memory while at the same time cutting down cost. Due to the fact that most embedded systems are mass produced, the gain from even the simplest optimization or will lead to substantial cost reduction.

One of the techniques used to improve the performance of a processor is implementing a compression scheme on the data busses. Our goal was to research further into this area. Specifically, we developed and analyzed two code compression techniques for use in the Texas Instruments TMS320C6000 architecture's instruction caches.

Three benefits could come from using compression in embedded systems. First is a performance boost due to increased cache hits which reduces main memory accesses. Second is a reduced memory footprint, which is the main goal of most compression in terms of simply reducing the size the information takes up. Lastly there is energy savings incurred from fewer calls to the main memory, which reduces energy consumptions from the bus traffic. We will explore further into these areas.

Requirements

  • Research current studies and proposals in code compression techniques
  • Design a code compression algorithm that could be implemented in the TMS320C6x architecture
  • Study the effects of compression on overall system performance
  • Write formal paper detailing the implementation of the compression scheme and discussing the strengths and weaknesses of the developed designs

Research

Processor Architecture

We spent quite a bit of time learning the processor architecture of the TMS320C6x. Of most interest to us were the caches, instruction cycle, and the VLIW(very long instruction word) architecture. The TMS contains a split 2-level cache system, and our focus was on the intruction caches. There is 4Kb L1 and a unified 64Kb L2 cache for us to work with. The VLIW architecture of this particular DSP refers to the fact that each opcode is 32 bits long and the processor simultaneously accesses 8 of these opcodes at a time in the form of a packet. Our goal is to compress these 32-bit instructions.

Code Compression Algorithms

Initially we were not completely familiar with compression methods so we had to research some techniques. The two kinds of compression methods are static and dynamic. Static is a type of compression where the compression algorithm doesnt care about the information but compresses simply based on its algorithm. The goal of this method is mostly the reduction of a memory footprint. An example of this type of compression is WinZip. The other method is called dynamic compression. In this case the compression is dependant upon the information that is compressed. An example of this is a dictionary algorithm where a dictionary of the compression information is kept for decompression. This is the method we will use in our algorithm.

Simulators

Finally, our analysis could not be complete without some testing to gather data about our project. After much searching, we had decided to use a third-party simulator of the TMS developed by Mr. Vinodh Cuppu in order to complement our own simulator. This simulator, called c6xsim, is a cycle-by-cycle simulator of the entire instruction fetch and decode cycle of the TMS320C6x processor.

Compression

Our first goal for this project was to develop an algorithm for compressing programs. We decided to compress the entire program using a static dictionary type compression. This will allow us to fit twice as many instructions into the cache than normal. Our dictionary compresses the instructions from 32-bits to 16-bits.

The algorithm goes through the opcodes and whenever it encounters an instruction that has not been encountered before, it adds the opcode into a table, or dictionary. Then the 32-bit opcode in the original code is replaced with a 16-bit index into the dictionary. Whenever an instruction that has already been added to the table is encountered, the algorithm simply replaces the opcode with the reference of the instruction. See Figure 1 for a simplified visual animation depicting the algorithm After the compression the table is appended to the end of the code with a reference word(2 bytes)to indicate where the dictionary is located.

In the best case scenario the program achieves a compression rate of somewhat above 50% if the program consists of a few instructions repeated very heavily. In the worst case the compressed program actually expands due to the fact that if a program does not have many repeating instructions then the 16-bit index of the opcode will be included as well the 32-bit actual opcode within the dictionary. As shown in the following figure, larger programs gain larger compression while smaller programs might even have expansion.


Caption:Figure 2: Compression Ratios

Decompression

Once compression was achieved, we needed a way to analyze the affects our compression would have on the TMS6x system. Thus we designed two theoretical systems of decompression in order to simulate the gains our compression would create and whether it would be a meaningful venture.

Our two schemes differ in terms of the location in which the decompression occurs. In our first scheme, the decompression blocks sits between the L1 and L2 cache. Thus L2 and main memory would contain compressed instructions. In the second scheme the deompression is between the L1 cache and the CPU itself. This means both levels of cache and main memory will contain compressed instructions. The result of this is that each cache will be able to hold about twice as much instructions as it could previously. We expect this system to increase the cache hit rates to increase significantly.


Caption:Figure 3: Decompression Block Diagram

Simulation

Image:243_instruction_delay.jpg

Image:243_L1_misses.jpg

Image:243_L2_misses.jpg

Downloads

Thanks To

Personal tools