Hamming project

This is a DRAFT article. This page is under development.

 

I have not been working on this project for more than 10 years. However I think this might be useful to somebody who is working on the Hamming codes.

I am not an expert of this field and it has been more than 10 years I wrote the C++ code.

I would recommend this project for mathematicians as a feed for though.

 

Calculation of quasi perfect block codes

The title has intentionally become so complex. I was searching for an answer for a simple question, and it required a lot of research and calculation.

The question: how can we select from the n bit numbers those, which are at least h hamming distance from each other? For example from the 7 bit numbers (128pcs) those, which are at least 3 Hamming distance? In this case we will see later that there are 16 numbers, and this is the maximum. However, for 11 bit and 3 Hamming distance it is not so simple to give the answer.

My motivation was just curiosity. If we know these numbers, then we can use this in embedded and ambient systems for defining a code, each number as a codeword of a command. Because of the 3 hamming distance, 1 bit error could be corrected. 

I haven’t found the answer, so I started to calculate these numbers using my algorithm. I have written a C++ code for calculating the vectors, which consist of numbers at least 3 hamming distance from each other, and for example found 139 of the 2048, 11 bit numbers, for which this is true. If we take the 11 bit numbers example, there must be less than 170 such numbers (2048/12), which are at least 3 hamming distance from each other. This is an estimation for the maximum, but hard to tell, what is the maximum length of the vector. 

 

The details of the problem

        In coding theory we are assigning to a word a code word, which can be fixed length or variable length. If the code word is a fixed length word, we call this a block code.

 

Hamming code

Hamming codes are linear block codes. The linearity means that there is a matrix „H” with which by a matrix multiplication we can generate the code words from the uncoded words. This is a kxn matrix, where k is the number of the uncoded words and n is the number of the code words.

The Hamming code is an error correcting code, because at the decoding if in a code word there is h/2 or less error, where h is the Hamming distance of the code words, then it the closest code word can be determined and error correcting is possible this way.

Another important property of the Hamming codes is that the following formula is true for them:

n = 2n-k -1; and n ≤ 2n-k -1 , In the hamming bound there is a equality.

(n is the number of the code words, k is the number of the uncoded words, the code is binary)

If this equality is true than the code is called perfect code. This means that the code words are maximally fill out the space, if we would add another code word, it would certainly closer to another code word than 3 Hamming distance.

From the practice point of view it is not necessary that a code should be perfect, it is enough if it fills out the space well. Then we can talk about quasi perfect codes.

Because of the above detailed equality requirement, the Hamming codes exist just at certain bit numbers. Where the first number is the length of the code word, and the second number is the length of the uncoded words. E.g.: (3,1) (7,4) (15,11) (31,26). For these numbers there exist the perfect code.

We can generate quasi perfect codes in a different way, e.g. if we add a parity bit to the Hamming codes. For example in case of a H(7,4) code we get an 8 bit code, which is non perfect, because  8 ≠ 28-4 -1;  however the hamming distance is 4 bit, which is able to detect 2 errors. [1.]

 

 

A C++ code for searching quasi perfect codes

Algorithm

 

Our goal is to select the code words from the n bit numbers which are at least h Hamming distance from each other. We will find the code words in a naive way. We can identify every code word as a point (1..N) in graph. If there is an edge between two points, this means that the hamming distance is larger than h.

We can represent the graph with an adjacency matrix. [5.] This matrix has a 1 in every row, where, the Hamming distance is larger between two numbers, than h.

In the figure we are numbering the words from 1..12. 

Figure 001 

 

The adjacency matrix is symmetric, it is not a directed graph. There would be enough to store one half of the matrix, however because of the later operations the whole matrix is stored by the program.

 

For example for n=3 bit and h=2 code distance first we calculate all the Hamming distances from 000 to 111, and where the hamming distance is larger than or equal to h=2, there will be a 1 in the adjacency matrix.

After this we will search a complete sub graph in the graph, so called clique. This drives us to NPcomplete problem. In a graph to determine the maximum clique size it is an NP complete problem [5.]. This is sad.

We will use a brute force algorithm for searching for the maximum cliques.

During the search we start a clique search from each single edge. This is containing the following steps:

The edge, as a clique, we start to extend, we check all the graph points, if they are adjacent with the existing clique (sub graph). If we found a point which is adjacent with the already found clique, we extend the clique.  

Based on the above matrix, for example starting with the [5,6] [101,110] edge, we can make the following steps:

 

We make a logical and operation on every element of the clique in the adjacency matrix, and we get a vector, „U”. In this U vector there are 1-s, where we can extend the clique.

As we make this extending method from each single edge, we will get a large number (equal to the original number of edges in the adjacency matrix) of cliques. Then we select the maximum cliques, and these are written as an output to a file. 

The code searching program cab be devided the following parts:

  • Calculation of Haming distances, generation of the adjacency matrix.
  • Searching for cliques
  • Selecting the largest cliques
  • Displaying the results / writing into file.

 

Results so far (PC – x86):

 

The program is generating a text file based on the N code words, and in every row there is a n=log2N bit binary numbers vestor. (These are the maximum clique int he graph, which we found.)

In these vectors all of the coordinates are at least 3 Hamming distance from each other.

 

The content of these vectors, depending on n (=log2N):

  • 139 db 11 bit code words (estiamation: max 170 db, based on the Hamming bound)
  • 68 db 10 bit code words (estimation: max 93 db)
  • 36 db 9 bit code words (estimation: max 51 db)
  • 20 db 8 bit code words (estimation: max 28)
  • 16 db 7 bit code words (estimation: max 16 lehet, perfect but non linear codes)
  • 8 db 6 bit code words (estimation: max 9)
  • 4 db 5 bit code words (estimation: max 5)

 

The algorithm can be further developed and the HW architecture as well.

I could reduce the running time of the existing C++ code to 1.25%, that the vector class which  has been extended, and the memory allocation has been rationalised, because initially at every vector. insert operation – by inserting a new vector – it has allocated a memory area increased by one, and then copied the new content to here. This was if course very slow, in the new program the memory is allocated as blocks. Furthermore, in the new program the and operation is made on a 128 bit word in the adjacency matrix, utilising the Intel processors SSE instructions from Pentium III. This means 2,9 times acceleration, according to the measurements.

For the  N=2048 (11 bit, 3 Hamming distance) my computer has calculated for 120 hours, using both cores. The found vectors were 139 length.

It would be interesting to find for an arbitrary n bit and h Hamming distance to find the longest vectors. which are containing n bit numbers and these numbers are at least h Hamming distance from each other.

Also it would be interesting to implement a FPGA based HW, which is more effective in the calculation, as the program can run parallel.

The comments in the C++ code are in Hungarian and I am not developing this  any more.

 

Results 

10 bit Numbers, 3 bit Hamming distance

Maximal Vector length: 68.

Decimal values (Example Vector):

0 7 30 42 52 57 73 82 85 99 108 127 139 140 145 165 178 198 216 224 269 275 280 289 294 324 330 368 386 404 415 424 449 487 505 510 539 557 567 590 625 634 662 669 675 686 696 709 723 745 756 789 811 818 828 835 854 857 869 872 903 905 922 932 945 972 976 994

 

11 bit Numbers, 3 bit Hamming distance

Maximal Vector length: 139.

Decimal values (Example vector)

0 7 30 42 52 57 73 82 85 99 108 127 139 140 145 165 178 198 216 224 269 275 280 289 294 324 330 368 386 404 415 424 449 487 505 510 539 557 567 590 625 634 662 669 675 686 696 709 723 745 756 789 811 818 828 835 854 857 869 872 903 905 922 932 945 972 976 994 1 053 1 071 1 075 1 115 1 125 1 142 1 144 1 175 1 178 1 190 1 193 1 212 1 219 1 229 1 236 1 258 1 265 1 291 1 302 1 324 1 333 1 338 1 351 1 361 1 372 1 378 1 385 1 413 1 422 1 433 1 443 1 456 1 480 1 490 1 508 1 537 1 542 1 560 1 598 1 623 1 632 1 643 1 661 1 679 1 717 1 723 1 753 1 758 1 767 1 772 1 778 1 823 1 831 1 849 1 869 1 882 1 902 1 907 1 908 1 920 1 939 1 948 1 962 1 965 1 974 1 990 1 995 2 005 2 017 2 040 2 047

Further results please see in the files / Results.

Haming 0 X 08 SSE Public
Plain text – 9.1 KB 182 downloads
Vector H
Plain text – 5.1 KB 159 downloads
Vector 2 H
Plain text – 6.3 KB 155 downloads
Results
Archive – 149.5 KB 153 downloads

 

References:

 

[1.] Buttyán Levente, Györfi László, Győri Sándor, Vajda István

Kódolástechnika jegyzet

http://www.crysys.hu/courses/kodolastechnika/bscinfkod.pdf

(2011-05-07)

 

[2.] Gina Sanders

Perfect codes

http://courses.csusm.edu/math540ak/codes.pdf

(2011-05-07)

 

[3.] Kiss W. Emil

Hibajavító kódok,

http://www.cs.elte.hu/~ewkiss/bboard/07o.el/Alg3_el_art_10.pdf

(2011-05-07)

 

[4.] Láng Csabáné

Hamming kód, jegyzet

http://compalg.elte.hu/~zslang/Hkod3f.pdf

(2011-05-07)

 

[5.] Katona Gyula, Recski András, Szabó Csaba

Számítástudomány alapjai

 

[6.] Tímár András Magas szintű szintézis (High Level Synthesis)

http://www.eet.bme.hu/~timar/data/hls.pdf

(2011-05-10)

 

[7.] Poppe András, Czirkos Zoltán, Rendszerszintű tervezés, System C, Catapult C

http://www.eet.bme.hu/~czirkos/systemc/eloadas.pdf

 

[8.] Philip Koopman, Tridib Chakravarty

Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks

http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf 

(2011-05-07)

 

[9.] Pong P. Chu: hardware design using VHDL: coding for efficiency, portability, and Scalability,

7.5.5, Hamming distance circuit

 

Other references

 

[10.] Gál Tibor

Nagytelesítményű mikroprocesszoros rendszerek tárgy jegyzetei, MMX, SSE

 

[11.] Microsoft MSDNA library, SSE utasítások

http://msdn.microsoft.com/en-us/library

(2011-05-07)

 

**[12.] Intel SSE4 programming reference

http://www.developers.net/intelisdshowcase/view/2550

(2011-05-07)

 

[13.] Intel® compiler options for SSE generation (SSE2, SSE3, SSE3_ATOM, SSSE3, SSE4.1, SSE4.2, AVX) and processor-specific optimizations

http://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations/

(2011-05-07)

 

[14.] Peter J. Ashenden VHDL Cookbook

http://tams-www.informatik.uni-hamburg.de/vhdl/doc/cookbook/VHDL-Cookbook.pdf

(2011-05-07)

 

[15.] Weijun Zhang VHDL Tutorial: Learn by Example

http://esd.cs.ucr.edu/labs/tutorial/

(2011-05-07)

 

[16.] Paras Mehta 2003 VHDL Syntax Reference

http://webdocs.cs.ualberta.ca/~amaral/courses/329/labs/VHDL_Reference.html

(2011-05-07)

 

[17.] VHDL TUTORIAL using Xilinx’s

WEBPACK and ModelSim

http://www.eng.ucy.ac.cy/theocharides/Courses/ECE408/hw1.pdf

(2011-05-07)