Another popular technique is Montgomery modular multiplication [10]. The troublesome reduction of S mod M we have seen in the first method is removed here. However, this comes at a small price. We actually compute (A×B×2-n) mod M instead of (A×B) modM. We pre-scale (i.e multiply) the inputs A and B first with 2n. Then we do the Montgomery step. We get ((A×2n)×(B×2n) ×2-n) = (A×B×2n) mod M. At the end, we post-multiply by 2-n mod M to remove the unwanted 2n to get (A×B) mod M. The computation of (A×B×2-n) mod M is simple which does not need modulo reduction of S as we have seen earlier. We only need to add B to S and if LSB of the Sum (S+B) is 1 add M or else do not add. The result will have LSB zero so that we score it off meaning dividing by 2 to get the new S. Thus n steps are required to divide by 2n. We illustrate Montgomery algorithm considering the same previous example.

Consider finding A×B mod M = 1001×1011 mod 1101. We multiply first both A and B by 16 since n = 4 and find mod 13. We get Aʹ = 1, Bʹ = 7. We need to find (0001×0111×2-4) mod 13 =(00000111×2-4) mod 13 using Montgomery algorithm. Since LSB is 1, add 13 to 7 to get 20 which when divided by 2 (right shifting by one bit) yields (00001010)2. Since LSB is zero, we need not add 13 but we shift by one bit to get (0000101)2. Since LSB is 1, we haveto add 13 to get (0010010)2 when right shifted by one bit gives 001001. Since LSB is 1, we add 13 and shift right by 1 bit to obtain 1011. Note that we have obtained now (0001×0111×2-4) mod 13 = 11.

At this stage, we have (A×B×2n) mod M. We can continue four more times to get the actual result by getting rid of the 2n factor. The answer is 8.

So far we have seen only multiplication modulo M. In cryptography, we need exponentiation mod M. We need to find (XY) mod M. Here we use the square and multiply technique. Consider Y a binary number say 13. We can get X13 mod M = {[(X8)mod M] × [(X4) mod M] ×[(X)mod M]} mod M. Thus, we need to have all powers of X reduced mod M which are squares of the previous one. Then we choose to multiply some of them only depending on the exponent. For example we have computed X mod M, X2 mod M, X4modM and X8 mod M. But since exponent 13 does not have 2 in the binary expansion, we ignore the X2 mod M value for multiplication. We use it only for getting X4 mod M by squaring. In general, for 1024 bit exponents as are needed today in RSA cryptography, we need to perform 1023 modulo squarings and in the worst case 1023 modulo multiplications all mod M. Today, smartcards, RFIDs or mobile handsets have got powerful cryptoengines optimized to realize these operations or use Java software routines using Barrett reduction or Montgomery algorithm. We have limited ourselves to basics here. Several techniques for speeding up the modulo multiplication and exponentiation do exist which consider more bits at a time or even words at a time see for example [11].

### 3. Forward Conversion

We need to find residue mod mi for the input big binary number. One simple and obvious method is to divide the given number X by mi to get the remainder. This needs division operation which is cumbersome compared to multiplication or addition or subtraction. We choose moduli of special type so that this can be done easily. This stems from the basic fact we know for many years. Suppose we need 4567 mod 9. We suggest adding all digits till you get one digit. We have (4+5+6+7)9 = (22)9 = 4. We have used the property that 10 mod 9 = 1, 100 mod 9 = 1 and 1000 mod 9 = 1. Hence place value mod 9 of all digits is 1. Hence the residue mod 9 is obtained by adding all the digits. We repeat till we get a single digit answer less than 9. If it is 9, the answer is zero. We use the same idea for any modulus, but we need to deal with binary numbers.

Given a modulus mi, we find its order meaning p such that 2p = 1 mod mi. Example consider mi = 7. We have 23 = 1mod7, 24 = 2mod7, 25 = 4mod7, 26 = 1mod7 etc. The residues of powers of 2 (i.e. 2x mod m) have a periodic behaviour. They keep repeating after the period. Thus, if we have a binary number (0110100010011001100)2 = (214,220)10 = 344CC (Hex) and want to find residue mod 7, divide the word into 3 bit fields add all of them. If the sum is word of length more than 3 bits repeat the procedure. Following the above, we have 0 110 100 010 011 001 100 mod 7 = 10 100 mod 7 = 6. If you want residue mod 9, we have another interesting behavior: 20mod9 = 1. 21 mod 9 = 2, 22 mod 9 = 4, 23 mod 9 = 8, 24mod9 = 7, 25mod9 = 5, 26 mod9 =1, 27 mod 9 = 2,28 mod 9 = 4, 29mod9 = 8, 210mod9 = 7,211mod9 = 5, 212mod9 = 1. Here again the order or period is 6 since 26mod9 = 1. We can observe one more characteristic. There is a half-period also since residues repeat after half-period but with negative sign. The residues are 1,2, 4, -1,-2,-4, 1,2,4, -1,-2,-4, 1,2…. . We can take advantage of this. We can add all even 3-bit fields and all odd 3-bit fields and subtract the former from the latter to get the residue mod9. For the example above, 0 110 100 010 011 001 100, sum of odd fields is 1011 an sum of even fields is 1001 and Sum of odd fields – sum of even fields = 2. The reader may verify that 214220 mod 9 =2. This simplicity of forward conversion is the reason why many multi-moduli RNS today use powers of two related moduli sets. Some examples are {31, 32,33}, {31, 32, 33, 65}, {15, 16, 17, 31} etc.

### 4. Reverse conversion

This is a difficult problem. In RNS, by looking at residues, we cannot compare two numbers or know the sign of a given number easily unless we convert the RNS number to binary number system. Hence it is important to perform fast reverse conversion i.e. residue to binary conversion.

Several tens of papers have been written on the subject and several techniques have been developed. We consider two most important techniques (a) Chinese Remainder theorem (CRT) and (b) Mixed Radix conversion (MRC). Using these we can solve the problem we started with in Section 2.

For our problem we have the RNS {3, 5,7} and the given residues are (2, 3, 2).We illustrate CRT for a three moduli set {m1, m2, m3} with given residues (r1, r2,r3). CRT states that the decoded number is X = (r1W1+r2W2+r3W3) mod M where M = m1m2m3.(1)

The values Wi are called weights. These weights are defined as . Note that and the symbol stands for multiplicative inverse of Mi with respect to modulus mi. This means that αi×Mi when divided by mi yields a remainder 1.

We will now use CRT for our problem. The various Mi values are M1 = 5×7 = 35, M2 = 3×7 = 21, M3 = 3×5 = 15. Now we have . Next, we find the weights as W1 = 70, W2 = 21 and W3 = 15. From(1), next we obtain X = (2×70+3×21+2×15) mod (105) = (233) mod (105) = 23. Thereader can verify that residues of 23 for moduli 3, 5, 7 are as given.

The problem with CRT is that the sum in (1) before mod M reduction can be as many times M as the number of moduli and we need to reduce it mod M which is cumbersome. But the advantage of CRT it is very fast since weighing the inputs with weights Wi is done in parallel whereas only addition of these weighted products and modulo M reduction needs time because M is a large number.

We next consider MRC technique. This is sequential and takes (n-1) steps for a n-moduli RNS. The binary number X corresponding to given residues is written in Mixed Radix form as

X = d 0 + d 1 m 1 + d 2 m 1 m 2 + d 3 m 1 m 2 m 3 + . + d n m 1 m 2 m 3 . m n − 1(2)