Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
305 views
in Technique[技术] by (71.8m points)

signal processing - Understanding behaviour of MATLAB's convn

I'm doing convolution of some tensors.

Here is small test in MATLAB:

    ker= rand(3,4,2);
    a= rand(5,7,2);
    c=convn(a,ker,'valid');
    c11=sum(sum(a(1:3,1:4,1).*ker(:,:,1)))+sum(sum(a(1:3,1:4,2).*ker(:,:,2)));
    c(1,1)-c11  % not equal!

The third line performs a N-D convolution with convn, and I want to compare the result of the first row, first column of convn with computing the value manually. However, my computation in comparison to convn is not equal.

So what is behind MATLAB's convn? Is my understanding of tensor convolution is wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You almost have it correct. There are two things slightly wrong with your understanding:

  1. You chose valid as the convolution flag. This means that the output returned from the convolution has its size so that when you are using the kernel to sweep over the matrix, it has to fit comfortably inside the matrix itself. Therefore, the first "valid" output that is returned is actually for the computation at location (2,2,1) of your matrix. This means that you can fit your kernel comfortably at this location, and this corresponds to position (1,1) of the output. To demonstrate, this is what a and ker look like for me using your above code:

    >> a
    
    a(:,:,1) =
    
    0.9930    0.2325    0.0059    0.2932    0.1270    0.8717    0.3560
    0.2365    0.3006    0.3657    0.6321    0.7772    0.7102    0.9298
    0.3743    0.6344    0.5339    0.0262    0.0459    0.9585    0.1488
    0.2140    0.2812    0.1620    0.8876    0.7110    0.4298    0.9400
    0.1054    0.3623    0.5974    0.0161    0.9710    0.8729    0.8327
    
    
    a(:,:,2) =
    
    0.8461    0.0077    0.5400    0.2982    0.9483    0.9275    0.8572
    0.1239    0.0848    0.5681    0.4186    0.5560    0.1984    0.0266
    0.5965    0.2255    0.2255    0.4531    0.5006    0.0521    0.9201
    0.0164    0.8751    0.5721    0.9324    0.0035    0.4068    0.6809
    0.7212    0.3636    0.6610    0.5875    0.4809    0.3724    0.9042
    
    >> ker
    
    ker(:,:,1) =
    
    0.5395    0.4849    0.0970    0.3418
    0.6263    0.9883    0.4619    0.7989
    0.0055    0.3752    0.9630    0.7988
    
    
    ker(:,:,2) =
    
    0.2082    0.4105    0.6508    0.2669
    0.4434    0.1910    0.8655    0.5021
    0.7156    0.9675    0.0252    0.0674
    

    As you can see, at position (2,2,1) in the matrix a, ker can fit comfortably inside the matrix and if you recall from convolution, it is simply a sum of element-by-element products between the kernel and the subset of the matrix at position (2,2,1) that is the same size as your kernel (actually, you need to do something else to the kernel which I will reserve for my next point - see below). Therefore, the coefficient that you are calculating is actually the output at (2,2,1), not at (1,1,1). From the gist of it though, you already know this, but I wanted to put that out there in case you didn't know.

  2. You are forgetting that for N-D convolution, you need to flip the mask in each dimension. If you remember from 1D convolution, the mask must be flipped in the horizontally. What I mean by flipped is that you simply place the elements in reverse order. An array of [1 2 3 4] for example would become [4 3 2 1]. In 2D convolution, you must flip both horizontally and vertically. Therefore, you would take each row of your matrix and place each row in reverse order, much like the 1D case. Here, you would treat each row as a 1D signal and do the flipping. Once you accomplish this, you would take this flipped result, and treat each column as a 1D signal and do the flipping again.

    Now, in your case for 3D, you must flip horizontally, vertically and temporally. This means that you would need to perform the 2D flipping for each slice of your matrix independently, you would then grab single columns in a 3D fashion and treat those as 1D signals. In MATLAB syntax, you would get ker(1,1,:), treat this as a 1D signal, then flip. You would repeat this for ker(1,2,:), ker(1,3,:) etc. until you are finished with the first slice. Bear in mind that we don't go to the second slice or any of the other slices and repeat what we just did. Because you are taking a 3D section of your matrix, you are inherently operating over all of the slices for each 3D column you extract. Therefore, only look at the first slice of your matrix, and so you need to do this to your kernel before computing the convolution:

    ker_flipped = flipdim(flipdim(flipdim(ker, 1), 2), 3);
    

    flipdim performs the flipping on a specified axis. In our case, we are doing it vertically, then taking the result and doing it horizontally, and then again doing it temporally. You would then use ker_flipped in your summation instead. Take note that it doesn't matter which order you do the flipping. flipdim operates on each dimension independently, and so as long as you remember to flip all dimensions, the output will be the same.


To demonstrate, here's what the output looks like with convn:

c =

    4.1837    4.1843    5.1187    6.1535
    4.5262    5.3253    5.5181    5.8375
    5.1311    4.7648    5.3608    7.1241

Now, to determine what c(1,1) is by hand, you would need to do your calculation on the flipped kernel:

ker_flipped = flipdim(flipdim(flipdim(ker, 1), 2), 3);
c11 = sum(sum(a(1:3,1:4,1).*ker_flipped(:,:,1)))+sum(sum(a(1:3,1:4,2).*ker_flipped(:,:,2)));

The output of what we get is:

c11 =

    4.1837

As you can see, this verifies what we get by hand with the calculation done in MATLAB using convn. If you want to compare more digits of precision, use format long and compare them both:

>> format long;
>> disp(c11)

   4.183698205668000

>> disp(c(1,1))

   4.183698205668001

As you can see, all of the digits are the same, except for the last one. That is attributed to numerical round-off. To be absolutely sure:

>> disp(abs(c11 - c(1,1)));

   8.881784197001252e-16

... I think a difference of an order or 10-16 is good enough for me to show that they're equal, right?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...