Цитата(sergey sva @ Jul 15 2015, 09:32)

быстрее алгритм Radix 4 division
CODE
#include <stdio.h>
#include <stdlib.h>
/* This code assumes that unsigned short is 16 bits, and unsigned int is 32 bits */
unsigned short radix4_division_u16(unsigned short divisor, unsigned short dividend)
{
unsigned short quotient = 0;
unsigned char i = 16; /* Number of bits to process */
if(dividend == 0)
{
printf("Divide by zero");
exit(0);
}
while(i > 0)
{
unsigned char j;
unsigned int d1,d2,d3;
i -= 2;
d1 = dividend << i;
d2 = dividend << (i+1);
d3 = d1+d2;
if(divisor < d1)
j = 0;
else if(divisor < d2) {
j = 1;
divisor -= d1;
} else if(divisor < d3) {
j = 2;
divisor -= d2;
} else {
j = 3;
divisor -= d3;
}
quotient = (quotient << 2)+j;
}
return quotient;
}
int main(int c, char *v[])
{
int i,j;
for(i = 0; i < 256*256; i++)
{
for(j = 1; j < 256*256; j++)
{
unsigned k = radix4_division_u16(i,j);
if(i/j != k)
{
printf("Error with %i/%i != %i\n",i,j,k);
exit(0);
}
}
if(i%650 == 0)
printf("%2.2f%% tested\n",i*100.0/256/256);
}
return 0;
}
алгоритм по шагам:
CODE
Each step
Inputs:
Existing quotient
Current dividend
Divisor
Clock signal
Bit offset of bits to generate
Constant (generic) information also needed:
Width of Divisor
Width of Quotient
Bit offset of bits to generate
Outputs:
updated quotient
updated dividend
Divisor
It is possible to reduce the length of comparisons - rather than comparing n2-2 bits only 4 bits need subtraction - the lowwer order bits can just be ORed together.
Equations for first step of a/b:
Comparing against divisor
x1 <= (0 & a(n-1 ... n-2)))
y1 <= (OR(b(n-1 ... 2)) & b(1 ... 0))
out1 <= (x-y)(1 ... 0) & a(n-3 ... 0)
Comparing against 2*divisor
x2 <= (0 & a(n-1))
y2 <= (OR(b(n-1 ... 1)) & b(0))
out2 <= (x-y)(0) & a(n-2 ... 0)
Comparing against 3*divisor
sum <= (0 & b(1 ... 0)) + (0 & b(0) & 0)
x3 <= (0 & a(n-1) & a(n-2))
y3 <= (OR(d(n-1 ... 1),sum(2)) & sum(1 ... 0))
out3 <= (x-y)(1 ... 0) & a(n-3 ... 0)
which 'out' is passed to the next stage is selected based on which of out1, out2 or out3 were positive (if any).
If you have n bits, you need 'n/2' stages, and if you number stages from (n/2-1) (dealing to the high bits) to 0 (dealing to bits 1 and 0) the equations become more generic:
For stage i (need to verify!):
Comparing against divisor
x1 <= (0 & a(n-1 ... 2i) )
y1 <= (OR(b(n-1 ... n-2i)) & b(n-2i ... 0))
out1 <= (x-y)(n/2-i-1 ... 0) & a(2i-1 ... 0)
Comparing against 2*divisor
x2 <= (0 & a(n-1 ... 2i+1) )
y2 <= (OR(b(n-1 ... n-2i+1)) & b(0))
out2 <= (x-y)(n/2-2i-2..0) & a(2i-1 ... 0)
Comparing against 3*divisor
sum <= (0 & b(n-2i-1 .. 0)) + (0 & b(n-2i-2 ... 0) & 0)
x3 <= (0 & a(n-1 ... 2i))
y3 <= (OR(d(n-1 ... n-2i),sum(2)) & sum(1 ... 0))
out3 <= (x-y)(n/2-i-1 ... 0) & a(2i-1 ... 0)
PS насчет скорости и кол-во ресурсов не знаю не реализовывал, но вроде считает раза в 2 быстрее (если не изменяет память) предложенного ранее реализации
PS PS есть Radix более высокого прядка , например Radix8
PS PS PS алгоритмы деления на константу - рекомендую к просмотру
If it doesn't work in simulation, it won't work on the board.
"Ты живешь в своих поступках, а не в теле. Ты — это твои действия, и нет другого тебя" Антуан де Сент-Экзюпери повесть "Маленький принц"