Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

If you have set things up so that the alignment is the same throughout the computation, a number of the 16 texels are outside of the discs and consequently always have coefficient 0, which means they can be dropped. (Maybe this requires reflections of the data to put it in "standard position"---like is done in Nohalo---and so on to make it work and consequently is not worth it.)
Last edited by NicolasRobidoux on 2014-06-08T01:01:59-07:00, edited 2 times in total.
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

One last "manic perfectionist" thing: Some of the positions, you know ahead of time that they are within 1, or farther than 1. So, you could use a special weight function for these special cases and skip some branches for these "indexes".
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

You write beautifully clear code.
Hyllian
Posts: 17
Joined: 2014-06-06T04:28:29-07:00
Authentication code: 6789

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by Hyllian »

NicolasRobidoux wrote:Suggestion: Compute the weights using the formulas used in ImageMagick's resize.c

Code: Select all

if (x < 1.0)
  return(resize_filter->coefficient[0]+x*(x*(resize_filter->coefficient[1]+x*resize_filter->coefficient[2])));
if (x < 2.0)
  return(resize_filter->coefficient[3]+x*(resize_filter->coefficient[4]+x*(resize_filter->coefficient[5]+x*resize_filter->coefficient[6])));
return(0.0);
and save one flop.
It saved me like 20% of processing time! Indeed it works! Thanks.
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

Hyllian wrote:It saved me like 20% of processing time! Indeed it works! Thanks.
Standard polynomial evaluation trick: Horner's rule.
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

NicolasRobidoux wrote:One last "manic perfectionist" thing: Some of the positions, you know ahead of time that they are within 1, or farther than 1. So, you could use a special weight function for these special cases and skip some branches for these "indexes".
What I mean is this:
I assume that you fix things so that the sampling point is within the convex hull of the four central input pixel locations within the 4x4. (I could figure this from your code but I'm too lazy.)
If so, you know right off the bat that these four closest input pixels cannot be at more than a distance of 2 (because sqrt(1+1)=sqrt(2)<2). This means that the third branch of the weight computation is not applicable to the four "inner" input pixels.
You also know right off the bat the the outer input pixels (the 16-4=12 that are not discussed above) cannot be at a distance that is less than 1. This means that first branch of the weight computation is not applicable to the 12 "outer" input pixels.
Now, the weight computation for all input pixels has only two branches, instead of three. You should be able to exploit this to make the code faster. (This may require computing contributions one position at a time instead of looping. That is, getting speed out of this may require manually unrolling the loop that goes over all 16 input pixel positions.)
P.S.
This comment is not specifically about doing the unrolling here, but besides this, unless your library/compiler is really smart, you probably should organize

Code: Select all

color = mul(weights[0], float4x3(c00, c10, c20, c30));
color+= mul(weights[1], float4x3(c01, c11, c21, c31));
color+= mul(weights[2], float4x3(c02, c12, c22, c32));
color+= mul(weights[3], float4x3(c03, c13, c23, c33));
like this

Code: Select all

color1 = mul(weights[0], float4x3(c00, c10, c20, c30));
color2 = mul(weights[1], float4x3(c01, c11, c21, c31));
color3 = mul(weights[2], float4x3(c02, c12, c22, c32));
color4 = mul(weights[3], float4x3(c03, c13, c23, c33));
color = ( color1 + color2 ) + ( color3 + color4 );
The reason for this is that, just like I suggested with min and max, you are splitting the operations into two parallel tracks, which are merged at the end. This is a standard trick, the name of which I forget. (In standard C, it's basically "Use multiple accumulators to minimize latency." If you visualise the components of the computation as (very short) trees, it's basically a red-black trick.) Having split things like this, it's easy to integrate the advice I give at the top: One now has a natural split of the input data and weights in four groups of four, which gives 4+12 painlessly.
(Hopefully, I am not making incorrect assumptions about your computing environment. This is how I'd go at things if I was working with an HSLS programmer.)
P.S. I don't like playing Sudoku, but I love doing this kind of optimization puzzle :)
Hyllian
Posts: 17
Joined: 2014-06-06T04:28:29-07:00
Authentication code: 6789

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by Hyllian »

Nicolas, for some reason, your last optimization trick actually made the code slower (119 vs 129 cycles), measured using nvshaderperf. OTOH, the Horner's rule one was very good (119 vs 143 cycles).
Maybe my Cg compiler is smart for some of these tricks already, and dumb for others.

A question for you: that jinc2 filter I made, technically, should I call it ewa-lanczos2sharp?
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

Sounds like you're running out of "registers". "Red-black" tricks (this includes my initial suggestion about min and max being computed as

Code: Select all

min(min(.,.),min(.,.))
instead of

Code: Select all

min(.,min(.,min(.,.)))
) generally use more memory.
If it's not too much to ask, could you try

Code: Select all

kolor = mul(weights[0], float4x3(c00, c10, c20, c30));
color = mul(weights[1], float4x3(c01, c11, c21, c31));
kolor += mul(weights[2], float4x3(c02, c12, c22, c32));
color += mul(weights[3], float4x3(c03, c13, c23, c33));
color += kolor
?
Last edited by NicolasRobidoux on 2014-06-08T08:06:45-07:00, edited 2 times in total.
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

Hyllian wrote: A question for you: that jinc2 filter I made, technically, should I call it ewa-lanczos2sharp?
It's not really what I call EWA Lanczos2Sharp because it does not use Jinc and it does not use one of my standard deblurs.
It's a deblurred EWA Sinc-windowed Sinc 2-lobe. <- Too long for a short name.
So I don't know.
Hyllian
Posts: 17
Joined: 2014-06-06T04:28:29-07:00
Authentication code: 6789

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by Hyllian »

NicolasRobidoux wrote: If it's not too much to ask, could you try

Code: Select all

kolor = mul(weights[0], float4x3(c00, c10, c20, c30));
color = mul(weights[1], float4x3(c01, c11, c21, c31));
kolor += mul(weights[2], float4x3(c02, c12, c22, c32));
color += mul(weights[3], float4x3(c03, c13, c23, c33));
color += kolor
?
Sure, but no gain (113 vs 113 cycles).

I couldn't get rid of jaggies using ewa-cubic. The clown image at 4x I've got is this (B=0.0, C=0.5, Catmull-Rom):

Image
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

EWA Catmull-Rom is super jaggy. Some people have liked it for downsampling but I've never liked it, up or down.
Try EWA RobidouxSoft:

Code: Select all

B = (9-3*sqrt(2))/7 = 0.67962275898295921
C = (1-B)/2 = 0.1601886205085204
Hyllian
Posts: 17
Joined: 2014-06-06T04:28:29-07:00
Authentication code: 6789

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by Hyllian »

NicolasRobidoux wrote:EWA Catmull-Rom is super jaggy. Some people have liked it for downsampling but I've never liked it, up or down.
Try EWA RobidouxSoft:

Code: Select all

B = (9-3*sqrt(2))/7 = 0.67962275898295921
C = (1-B)/2 = 0.1601886205085204
Very soft, indeed. A bit too blurry:
Image

I have the feeling we can't get the ewa-lanczos quality using cubic.
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

Don't give up too fast.
Let's first try Keys cubics: Once you choose B, set C=(1-B)/2.
Start with Mitchell which is the Keys with B = 1/3.
Then, vary B until you're happy with what you get.
Hyllian
Posts: 17
Joined: 2014-06-06T04:28:29-07:00
Authentication code: 6789

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by Hyllian »

NicolasRobidoux wrote:Don't give up too fast.
Let's first try Keys cubics: Once you choose B, set C=(1-B)/2.
Start with Mitchell which is the Keys with B = 1/3.
Then, vary B until you're happy with what you get.
No dice!

There isn't a single Keys config that comes close to this ewa-lanczos (WA=0.4, WB=0.9) quality:

Image

I think there is a need to derive new cubic functions that switch between x=1.1 and x=1.3, and not at 1.0 and 2.0 as is the default points. But they need to be splines (so, first derivative smooth at the switch point). I can't just chnage the swtch points using the current cubic functions, because some discontinuity will arise. Just an idea.
NicolasRobidoux
Posts: 1944
Joined: 2010-08-28T11:16:00-07:00
Authentication code: 8675308
Location: Montreal, Canada

Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling

Post by NicolasRobidoux »

I persists in thinking that if you vary B and C (without, possibly, sticking to Keys cubics) you'll find a combination that compares.
The only thing that could make a comparable result reachable (correction: UNreachable) with 4x is that you extend your disc up to radius 2.5.
Last edited by NicolasRobidoux on 2014-06-09T00:35:26-07:00, edited 1 time in total.
Post Reply