## News:

Message to All Guests
NB: Posting URL's See here: Posted URL Change

## BlendPixelNonPremult lipwebdemux

Started by guga, July 12, 2024, 01:44:04 PM

#### guga

QuoteSome considerations about BlendPixelNonPremult function in  lipwebdemux library

The src is the front (top) image
The dst is the background (bottom) image

In webp library (on the demuxer) the function BlendPixelNonPremult is responsible to merge 2 different images taking onto account their alpha values. The main problem of this function is that it is a bit slow and lead to some inaccuracy.
The original function makes heavy use of div instruction, and also make 3 internal calls to another function "BlendChannelNonPremult" which also uses a internal _aulldiv function (from ntdll or the own library of the compiler)
After porting the function to a mathematical equation, the general approach is as follows:

1) The general approach:

Blended = (256*(A*x+B*y)/(256*(x+y)-(x*y)))/2
Simplifying: 128*(A*x+B*y)/(256*(x+y)-(x*y))
where:
A = Pixel on the src_channel in the range [0, 255].
B = Pixel on the dst_channel in the range [0, 255].
x = Alpha on the src channel in the range [0, 255].
y = Alpha on the dest channel  in the range [0, 255].

The problem with that is that, it has a maximum resultant of 254. So, A, B, x, y = 255 lead us to: 254,0077821011673

Since a pixel has a range of 0 to 255, we need to adapt the algorithm to the limits of a pixel.
In order to achieve a perfect blending (x,y, A, B and the Blending result are all inside 0 to 255), we should do:

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y)) which is the same as:
= (A*x+B*y)/(2*(x+y-(x*y/255)))
where:
A = Pixel on the src_channel in the range [0, 255].
B = Pixel on the dst_channel in the range [0, 255].
x = Alpha on the src channel in the range [0, 255].
y = Alpha on the dest channel  in the range [0, 255].

2) Specific cases:

a) Alpha in src and destination are the same:

When x = y, we have:

Blended = (255/2)*(A*x+B*x)/(255*(x+x)-(x*x))
Simplifying lead us to: (255 *(A + B))/(2 *(510-x))
127.5*(A+B)/(510-x)

This is particular, interesting because it will result on a perfect blending, specially for opaque images.
Example, is x = 255. The formula is simply: (A+B)/2. So, we since we have a range for 0 to 255 for Alpha (x) we can establish the minimum and maximum values and also a table to faster retrieve the blending values when the alpha on both images are the same.

Minimum Value (alpha = 0) = (A+B)/4
Maximum Value (alpha = 255) = (A+B)/2
Ok, then we have some ratio multiplied by (A+B), leading us to:
Blended = k*(A+B), where
k = 127.5/(510-alpha)

So, we can have the following table to use for the values of k:
x = 0, k = 1/4 = 255/1020 ... 1020 = 255*4-0
x = 1, k = 255/1018 .... 1018 = (255*4-2)
x = 2, k = 255/1016 .... 1016 = (255*4-4)
x = 3, k = 85/338 = 255/1014.... 1014 = (255*4-6)
x = 4, k = 255/1012.... 1012 = (255*4-8)
...
x = 200, k = 51/124 = 255/620.... 620 = (255*4-400)
x = 227, k = 255/566.... 566 = (255*4-454)
...
x = 255, k = 1/2.... 2 = (255*4-510)

Ok, from the uppon values we conclude that
k = 255/(255*4-Alpha*2) = 127.5/(510-Alpha)

So, we can have a KTable such as:
k0 = 127.5/(510-0)
k1 = 127.5/(510-1)
k2 = 127.5/(510-2)
k3 = 127.5/(510-3)
...
k255 = 127.5/(510-255)

On such way, The blended pixel is given by:
Blended = k(n) * (A+B)

b) Alpha in src is zero (Transparent)

When alpha in source is zeroed, we have the following mathematical result:

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
= (255/2)*(A*0+B*y)/(255*(0+y)-(0*y))
= B/2

So, if we use the equation it implies that the resultant blending image is half of the destination

But, in image processing, when Alpha = 0, it means the full image is transparent. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the destination image, since the source is fully transparent.

So, on such cases, the blended value is simply:

Blended = Dst. So, the full destination image is the result.

c) Alpha in destination is zero (Transparent)

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
= (255/2)*(A*x+B*0)/(255*(x+0)-(x*0))
= A/2

So, if we use the equation it implies that the resultant blending image is half of the source
But, in image processing, when Alpha = 0, it means the full image is transparent. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the Source image, since the destination is fully transparent.

So, on such cases, the blended value is simply:

Blended = Src. So, the full destination image is the result.

d) Alpha in src is 255 (Opaque)

When alpha in source is 255, we have the following mathematical result:

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
= (255/2)*(A*255+B*y)/(255*(255+y)-(255*y))
= A/2 + B*y/510

So, if we use the equation it implies that the resultant blending image is half of the source plus some ratio applied on the destination

But, in image processing, when Alpha = 255, it means the full image is opaque. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the source image, since the source is fully opaque.

So, on such cases, the blended value is simply:

Blended = Src. So, the full resultant image is the src itself.

e) Alpha in dst is 255 (Opaque)

When alpha in destination is 255, we have the following mathematical result:

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
= (255/2)*(A*x+B*255)/(255*(x+255)-(x*255))
= B/2 + A*x/510

So, if we use the equation it implies that the resultant blending image is half of the destination plus some ratio applied on the source

But, in image processing, when Alpha = 255, it means the full image is opaque. Since what is opaque is the background (dst/botom) image, then it will be merged with the foreground image using this equation.
So, on such cases, the blended value is simply:

Blended = B/2 + A*x/510

Which is the most common blending mode, specially for watermarks and other types of superposition of an image.

The pixels must be in RGBA format
[RGBA:
RGBA.Red: B\$ 0
RGBA.Green: B\$ 0
RGBA.Blue: B\$ 0
RGBA.Alpha: B\$ 0]

I´m rebuilding (or trying) to rebuild the whole library to RosAsm, and this is the resultant function i´ve got so far

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

continuation...

Necessary data
`[kBlendNonPremultTbl: kBlendNonPremultTbl.Data0: F\$  0.25 ; (127.5/(510-0) kBlendNonPremultTbl.Data1: F\$  0.25049115913556 ; (127.5/(510-1) kBlendNonPremultTbl.Data2: F\$  0.2509842519685 ; (127.5/(510-2) kBlendNonPremultTbl.Data3: F\$  0.25147928994083 ; (127.5/(510-3) kBlendNonPremultTbl.Data4: F\$  0.25197628458498 ; (127.5/(510-4) kBlendNonPremultTbl.Data5: F\$  0.25247524752475 ; (127.5/(510-5) kBlendNonPremultTbl.Data6: F\$  0.25297619047619 ; (127.5/(510-6) kBlendNonPremultTbl.Data7: F\$  0.25347912524851 ; (127.5/(510-7) kBlendNonPremultTbl.Data8: F\$  0.25398406374502 ; (127.5/(510-8) kBlendNonPremultTbl.Data9: F\$  0.25449101796407 ; (127.5/(510-9) kBlendNonPremultTbl.Data10: F\$  0.255 ; (127.5/(510-10) kBlendNonPremultTbl.Data11: F\$  0.25551102204409 ; (127.5/(510-11) kBlendNonPremultTbl.Data12: F\$  0.25602409638554 ; (127.5/(510-12) kBlendNonPremultTbl.Data13: F\$  0.25653923541248 ; (127.5/(510-13) kBlendNonPremultTbl.Data14: F\$  0.2570564516129 ; (127.5/(510-14) kBlendNonPremultTbl.Data15: F\$  0.25757575757576 ; (127.5/(510-15) kBlendNonPremultTbl.Data16: F\$  0.2580971659919 ; (127.5/(510-16) kBlendNonPremultTbl.Data17: F\$  0.25862068965517 ; (127.5/(510-17) kBlendNonPremultTbl.Data18: F\$  0.25914634146342 ; (127.5/(510-18) kBlendNonPremultTbl.Data19: F\$  0.25967413441955 ; (127.5/(510-19) kBlendNonPremultTbl.Data20: F\$  0.26020408163265 ; (127.5/(510-20) kBlendNonPremultTbl.Data21: F\$  0.26073619631902 ; (127.5/(510-21) kBlendNonPremultTbl.Data22: F\$  0.26127049180328 ; (127.5/(510-22) kBlendNonPremultTbl.Data23: F\$  0.26180698151951 ; (127.5/(510-23) kBlendNonPremultTbl.Data24: F\$  0.26234567901235 ; (127.5/(510-24) kBlendNonPremultTbl.Data25: F\$  0.26288659793814 ; (127.5/(510-25) kBlendNonPremultTbl.Data26: F\$  0.26342975206612 ; (127.5/(510-26) kBlendNonPremultTbl.Data27: F\$  0.2639751552795 ; (127.5/(510-27) kBlendNonPremultTbl.Data28: F\$  0.26452282157676 ; (127.5/(510-28) kBlendNonPremultTbl.Data29: F\$  0.26507276507277 ; (127.5/(510-29) kBlendNonPremultTbl.Data30: F\$  0.265625 ; (127.5/(510-30) kBlendNonPremultTbl.Data31: F\$  0.26617954070981 ; (127.5/(510-31) kBlendNonPremultTbl.Data32: F\$  0.26673640167364 ; (127.5/(510-32) kBlendNonPremultTbl.Data33: F\$  0.26729559748428 ; (127.5/(510-33) kBlendNonPremultTbl.Data34: F\$  0.26785714285714 ; (127.5/(510-34) kBlendNonPremultTbl.Data35: F\$  0.26842105263158 ; (127.5/(510-35) kBlendNonPremultTbl.Data36: F\$  0.26898734177215 ; (127.5/(510-36) kBlendNonPremultTbl.Data37: F\$  0.26955602536998 ; (127.5/(510-37) kBlendNonPremultTbl.Data38: F\$  0.27012711864407 ; (127.5/(510-38) kBlendNonPremultTbl.Data39: F\$  0.27070063694268 ; (127.5/(510-39) kBlendNonPremultTbl.Data40: F\$  0.27127659574468 ; (127.5/(510-40) kBlendNonPremultTbl.Data41: F\$  0.27185501066098 ; (127.5/(510-41) kBlendNonPremultTbl.Data42: F\$  0.2724358974359 ; (127.5/(510-42) kBlendNonPremultTbl.Data43: F\$  0.27301927194861 ; (127.5/(510-43) kBlendNonPremultTbl.Data44: F\$  0.27360515021459 ; (127.5/(510-44) kBlendNonPremultTbl.Data45: F\$  0.2741935483871 ; (127.5/(510-45) kBlendNonPremultTbl.Data46: F\$  0.27478448275862 ; (127.5/(510-46) kBlendNonPremultTbl.Data47: F\$  0.27537796976242 ; (127.5/(510-47) kBlendNonPremultTbl.Data48: F\$  0.27597402597403 ; (127.5/(510-48) kBlendNonPremultTbl.Data49: F\$  0.2765726681128 ; (127.5/(510-49) kBlendNonPremultTbl.Data50: F\$  0.27717391304348 ; (127.5/(510-50) kBlendNonPremultTbl.Data51: F\$  0.27777777777778 ; (127.5/(510-51) kBlendNonPremultTbl.Data52: F\$  0.27838427947598 ; (127.5/(510-52) kBlendNonPremultTbl.Data53: F\$  0.27899343544858 ; (127.5/(510-53) kBlendNonPremultTbl.Data54: F\$  0.2796052631579 ; (127.5/(510-54) kBlendNonPremultTbl.Data55: F\$  0.28021978021978 ; (127.5/(510-55) kBlendNonPremultTbl.Data56: F\$  0.28083700440529 ; (127.5/(510-56) kBlendNonPremultTbl.Data57: F\$  0.28145695364238 ; (127.5/(510-57) kBlendNonPremultTbl.Data58: F\$  0.2820796460177 ; (127.5/(510-58) kBlendNonPremultTbl.Data59: F\$  0.28270509977827 ; (127.5/(510-59) kBlendNonPremultTbl.Data60: F\$  0.28333333333333 ; (127.5/(510-60) kBlendNonPremultTbl.Data61: F\$  0.28396436525613 ; (127.5/(510-61) kBlendNonPremultTbl.Data62: F\$  0.28459821428571 ; (127.5/(510-62) kBlendNonPremultTbl.Data63: F\$  0.28523489932886 ; (127.5/(510-63) kBlendNonPremultTbl.Data64: F\$  0.28587443946188 ; (127.5/(510-64) kBlendNonPremultTbl.Data65: F\$  0.28651685393258 ; (127.5/(510-65) kBlendNonPremultTbl.Data66: F\$  0.28716216216216 ; (127.5/(510-66) kBlendNonPremultTbl.Data67: F\$  0.28781038374718 ; (127.5/(510-67) kBlendNonPremultTbl.Data68: F\$  0.28846153846154 ; (127.5/(510-68) kBlendNonPremultTbl.Data69: F\$  0.2891156462585 ; (127.5/(510-69) kBlendNonPremultTbl.Data70: F\$  0.28977272727273 ; (127.5/(510-70) kBlendNonPremultTbl.Data71: F\$  0.29043280182232 ; (127.5/(510-71) kBlendNonPremultTbl.Data72: F\$  0.29109589041096 ; (127.5/(510-72) kBlendNonPremultTbl.Data73: F\$  0.29176201372998 ; (127.5/(510-73) kBlendNonPremultTbl.Data74: F\$  0.29243119266055 ; (127.5/(510-74) kBlendNonPremultTbl.Data75: F\$  0.29310344827586 ; (127.5/(510-75) kBlendNonPremultTbl.Data76: F\$  0.29377880184332 ; (127.5/(510-76) kBlendNonPremultTbl.Data77: F\$  0.29445727482679 ; (127.5/(510-77) kBlendNonPremultTbl.Data78: F\$  0.29513888888889 ; (127.5/(510-78) kBlendNonPremultTbl.Data79: F\$  0.29582366589327 ; (127.5/(510-79) kBlendNonPremultTbl.Data80: F\$  0.29651162790698 ; (127.5/(510-80) kBlendNonPremultTbl.Data81: F\$  0.2972027972028 ; (127.5/(510-81) kBlendNonPremultTbl.Data82: F\$  0.29789719626168 ; (127.5/(510-82) kBlendNonPremultTbl.Data83: F\$  0.29859484777518 ; (127.5/(510-83) kBlendNonPremultTbl.Data84: F\$  0.29929577464789 ; (127.5/(510-84) kBlendNonPremultTbl.Data85: F\$  0.3 ; (127.5/(510-85) kBlendNonPremultTbl.Data86: F\$  0.30070754716981 ; (127.5/(510-86) kBlendNonPremultTbl.Data87: F\$  0.30141843971631 ; (127.5/(510-87) kBlendNonPremultTbl.Data88: F\$  0.3021327014218 ; (127.5/(510-88) kBlendNonPremultTbl.Data89: F\$  0.30285035629454 ; (127.5/(510-89) kBlendNonPremultTbl.Data90: F\$  0.30357142857143 ; (127.5/(510-90) kBlendNonPremultTbl.Data91: F\$  0.30429594272076 ; (127.5/(510-91) kBlendNonPremultTbl.Data92: F\$  0.30502392344498 ; (127.5/(510-92) kBlendNonPremultTbl.Data93: F\$  0.30575539568345 ; (127.5/(510-93) kBlendNonPremultTbl.Data94: F\$  0.30649038461539 ; (127.5/(510-94) kBlendNonPremultTbl.Data95: F\$  0.30722891566265 ; (127.5/(510-95) kBlendNonPremultTbl.Data96: F\$  0.30797101449275 ; (127.5/(510-96) kBlendNonPremultTbl.Data97: F\$  0.30871670702179 ; (127.5/(510-97) kBlendNonPremultTbl.Data98: F\$  0.30946601941748 ; (127.5/(510-98) kBlendNonPremultTbl.Data99: F\$  0.31021897810219 ; (127.5/(510-99) kBlendNonPremultTbl.Data100: F\$  0.3109756097561 ; (127.5/(510-100) kBlendNonPremultTbl.Data101: F\$  0.31173594132029 ; (127.5/(510-101) kBlendNonPremultTbl.Data102: F\$  0.3125 ; (127.5/(510-102) kBlendNonPremultTbl.Data103: F\$  0.31326781326781 ; (127.5/(510-103) kBlendNonPremultTbl.Data104: F\$  0.314039408867 ; (127.5/(510-104) kBlendNonPremultTbl.Data105: F\$  0.31481481481482 ; (127.5/(510-105) kBlendNonPremultTbl.Data106: F\$  0.31559405940594 ; (127.5/(510-106) kBlendNonPremultTbl.Data107: F\$  0.31637717121588 ; (127.5/(510-107) kBlendNonPremultTbl.Data108: F\$  0.31716417910448 ; (127.5/(510-108) kBlendNonPremultTbl.Data109: F\$  0.31795511221945 ; (127.5/(510-109) kBlendNonPremultTbl.Data110: F\$  0.31875 ; (127.5/(510-110) kBlendNonPremultTbl.Data111: F\$  0.31954887218045 ; (127.5/(510-111) kBlendNonPremultTbl.Data112: F\$  0.32035175879397 ; (127.5/(510-112) kBlendNonPremultTbl.Data113: F\$  0.32115869017632 ; (127.5/(510-113) kBlendNonPremultTbl.Data114: F\$  0.3219696969697 ; (127.5/(510-114) kBlendNonPremultTbl.Data115: F\$  0.32278481012658 ; (127.5/(510-115) kBlendNonPremultTbl.Data116: F\$  0.32360406091371 ; (127.5/(510-116) kBlendNonPremultTbl.Data117: F\$  0.32442748091603 ; (127.5/(510-117) kBlendNonPremultTbl.Data118: F\$  0.32525510204082 ; (127.5/(510-118) kBlendNonPremultTbl.Data119: F\$  0.32608695652174 ; (127.5/(510-119) kBlendNonPremultTbl.Data120: F\$  0.32692307692308 ; (127.5/(510-120) kBlendNonPremultTbl.Data121: F\$  0.32776349614396 ; (127.5/(510-121) kBlendNonPremultTbl.Data122: F\$  0.32860824742268 ; (127.5/(510-122) kBlendNonPremultTbl.Data123: F\$  0.32945736434109 ; (127.5/(510-123) kBlendNonPremultTbl.Data124: F\$  0.33031088082902 ; (127.5/(510-124) kBlendNonPremultTbl.Data125: F\$  0.33116883116883 ; (127.5/(510-125) kBlendNonPremultTbl.Data126: F\$  0.33203125 ; (127.5/(510-126) kBlendNonPremultTbl.Data127: F\$  0.33289817232376 ; (127.5/(510-127) kBlendNonPremultTbl.Data128: F\$  0.33376963350785 ; (127.5/(510-128) kBlendNonPremultTbl.Data129: F\$  0.33464566929134 ; (127.5/(510-129) kBlendNonPremultTbl.Data130: F\$  0.33552631578947 ; (127.5/(510-130) kBlendNonPremultTbl.Data131: F\$  0.33641160949868 ; (127.5/(510-131) kBlendNonPremultTbl.Data132: F\$  0.33730158730159 ; (127.5/(510-132) kBlendNonPremultTbl.Data133: F\$  0.33819628647215 ; (127.5/(510-133) kBlendNonPremultTbl.Data134: F\$  0.33909574468085 ; (127.5/(510-134) kBlendNonPremultTbl.Data135: F\$  0.34 ; (127.5/(510-135) kBlendNonPremultTbl.Data136: F\$  0.34090909090909 ; (127.5/(510-136) kBlendNonPremultTbl.Data137: F\$  0.34182305630027 ; (127.5/(510-137) kBlendNonPremultTbl.Data138: F\$  0.34274193548387 ; (127.5/(510-138) kBlendNonPremultTbl.Data139: F\$  0.34366576819407 ; (127.5/(510-139) kBlendNonPremultTbl.Data140: F\$  0.3445945945946 ; (127.5/(510-140) kBlendNonPremultTbl.Data141: F\$  0.34552845528455 ; (127.5/(510-141) kBlendNonPremultTbl.Data142: F\$  0.34646739130435 ; (127.5/(510-142) kBlendNonPremultTbl.Data143: F\$  0.34741144414169 ; (127.5/(510-143) kBlendNonPremultTbl.Data144: F\$  0.34836065573771 ; (127.5/(510-144) kBlendNonPremultTbl.Data145: F\$  0.34931506849315 ; (127.5/(510-145) kBlendNonPremultTbl.Data146: F\$  0.35027472527473 ; (127.5/(510-146) kBlendNonPremultTbl.Data147: F\$  0.35123966942149 ; (127.5/(510-147) kBlendNonPremultTbl.Data148: F\$  0.35220994475138 ; (127.5/(510-148) kBlendNonPremultTbl.Data149: F\$  0.35318559556787 ; (127.5/(510-149) kBlendNonPremultTbl.Data150: F\$  0.35416666666667 ; (127.5/(510-150) kBlendNonPremultTbl.Data151: F\$  0.35515320334262 ; (127.5/(510-151) kBlendNonPremultTbl.Data152: F\$  0.35614525139665 ; (127.5/(510-152) kBlendNonPremultTbl.Data153: F\$  0.35714285714286 ; (127.5/(510-153) kBlendNonPremultTbl.Data154: F\$  0.35814606741573 ; (127.5/(510-154) kBlendNonPremultTbl.Data155: F\$  0.35915492957747 ; (127.5/(510-155) kBlendNonPremultTbl.Data156: F\$  0.36016949152542 ; (127.5/(510-156) kBlendNonPremultTbl.Data157: F\$  0.36118980169972 ; (127.5/(510-157) kBlendNonPremultTbl.Data158: F\$  0.36221590909091 ; (127.5/(510-158) kBlendNonPremultTbl.Data159: F\$  0.36324786324786 ; (127.5/(510-159) kBlendNonPremultTbl.Data160: F\$  0.36428571428571 ; (127.5/(510-160) kBlendNonPremultTbl.Data161: F\$  0.36532951289398 ; (127.5/(510-161) kBlendNonPremultTbl.Data162: F\$  0.36637931034483 ; (127.5/(510-162) kBlendNonPremultTbl.Data163: F\$  0.36743515850144 ; (127.5/(510-163) kBlendNonPremultTbl.Data164: F\$  0.36849710982659 ; (127.5/(510-164) kBlendNonPremultTbl.Data165: F\$  0.3695652173913 ; (127.5/(510-165) kBlendNonPremultTbl.Data166: F\$  0.37063953488372 ; (127.5/(510-166) kBlendNonPremultTbl.Data167: F\$  0.37172011661808 ; (127.5/(510-167) kBlendNonPremultTbl.Data168: F\$  0.37280701754386 ; (127.5/(510-168) kBlendNonPremultTbl.Data169: F\$  0.37390029325513 ; (127.5/(510-169) kBlendNonPremultTbl.Data170: F\$  0.375 ; (127.5/(510-170) kBlendNonPremultTbl.Data171: F\$  0.37610619469027 ; (127.5/(510-171) kBlendNonPremultTbl.Data172: F\$  0.37721893491124 ; (127.5/(510-172) kBlendNonPremultTbl.Data173: F\$  0.37833827893175 ; (127.5/(510-173) kBlendNonPremultTbl.Data174: F\$  0.37946428571429 ; (127.5/(510-174) kBlendNonPremultTbl.Data175: F\$  0.38059701492537 ; (127.5/(510-175) kBlendNonPremultTbl.Data176: F\$  0.38173652694611 ; (127.5/(510-176) kBlendNonPremultTbl.Data177: F\$  0.38288288288288 ; (127.5/(510-177) kBlendNonPremultTbl.Data178: F\$  0.38403614457831 ; (127.5/(510-178) kBlendNonPremultTbl.Data179: F\$  0.38519637462236 ; (127.5/(510-179) kBlendNonPremultTbl.Data180: F\$  0.38636363636364 ; (127.5/(510-180) kBlendNonPremultTbl.Data181: F\$  0.38753799392097 ; (127.5/(510-181) kBlendNonPremultTbl.Data182: F\$  0.38871951219512 ; (127.5/(510-182) kBlendNonPremultTbl.Data183: F\$  0.38990825688073 ; (127.5/(510-183) kBlendNonPremultTbl.Data184: F\$  0.39110429447853 ; (127.5/(510-184) kBlendNonPremultTbl.Data185: F\$  0.39230769230769 ; (127.5/(510-185) kBlendNonPremultTbl.Data186: F\$  0.39351851851852 ; (127.5/(510-186) kBlendNonPremultTbl.Data187: F\$  0.39473684210526 ; (127.5/(510-187) kBlendNonPremultTbl.Data188: F\$  0.39596273291926 ; (127.5/(510-188) kBlendNonPremultTbl.Data189: F\$  0.39719626168224 ; (127.5/(510-189) kBlendNonPremultTbl.Data190: F\$  0.3984375 ; (127.5/(510-190) kBlendNonPremultTbl.Data191: F\$  0.39968652037618 ; (127.5/(510-191) kBlendNonPremultTbl.Data192: F\$  0.40094339622642 ; (127.5/(510-192) kBlendNonPremultTbl.Data193: F\$  0.40220820189274 ; (127.5/(510-193) kBlendNonPremultTbl.Data194: F\$  0.40348101265823 ; (127.5/(510-194) kBlendNonPremultTbl.Data195: F\$  0.40476190476191 ; (127.5/(510-195) kBlendNonPremultTbl.Data196: F\$  0.40605095541401 ; (127.5/(510-196) kBlendNonPremultTbl.Data197: F\$  0.4073482428115 ; (127.5/(510-197) kBlendNonPremultTbl.Data198: F\$  0.40865384615385 ; (127.5/(510-198) kBlendNonPremultTbl.Data199: F\$  0.40996784565916 ; (127.5/(510-199) kBlendNonPremultTbl.Data200: F\$  0.41129032258065 ; (127.5/(510-200) kBlendNonPremultTbl.Data201: F\$  0.4126213592233 ; (127.5/(510-201) kBlendNonPremultTbl.Data202: F\$  0.41396103896104 ; (127.5/(510-202) kBlendNonPremultTbl.Data203: F\$  0.41530944625407 ; (127.5/(510-203) kBlendNonPremultTbl.Data204: F\$  0.41666666666667 ; (127.5/(510-204) kBlendNonPremultTbl.Data205: F\$  0.41803278688525 ; (127.5/(510-205) kBlendNonPremultTbl.Data206: F\$  0.41940789473684 ; (127.5/(510-206) kBlendNonPremultTbl.Data207: F\$  0.42079207920792 ; (127.5/(510-207) kBlendNonPremultTbl.Data208: F\$  0.42218543046358 ; (127.5/(510-208) kBlendNonPremultTbl.Data209: F\$  0.42358803986711 ; (127.5/(510-209) kBlendNonPremultTbl.Data210: F\$  0.425 ; (127.5/(510-210) kBlendNonPremultTbl.Data211: F\$  0.42642140468227 ; (127.5/(510-211) kBlendNonPremultTbl.Data212: F\$  0.42785234899329 ; (127.5/(510-212) kBlendNonPremultTbl.Data213: F\$  0.42929292929293 ; (127.5/(510-213) kBlendNonPremultTbl.Data214: F\$  0.43074324324324 ; (127.5/(510-214) kBlendNonPremultTbl.Data215: F\$  0.43220338983051 ; (127.5/(510-215) kBlendNonPremultTbl.Data216: F\$  0.43367346938776 ; (127.5/(510-216) kBlendNonPremultTbl.Data217: F\$  0.43515358361775 ; (127.5/(510-217) kBlendNonPremultTbl.Data218: F\$  0.43664383561644 ; (127.5/(510-218) kBlendNonPremultTbl.Data219: F\$  0.43814432989691 ; (127.5/(510-219) kBlendNonPremultTbl.Data220: F\$  0.43965517241379 ; (127.5/(510-220) kBlendNonPremultTbl.Data221: F\$  0.44117647058824 ; (127.5/(510-221) kBlendNonPremultTbl.Data222: F\$  0.44270833333333 ; (127.5/(510-222) kBlendNonPremultTbl.Data223: F\$  0.44425087108014 ; (127.5/(510-223) kBlendNonPremultTbl.Data224: F\$  0.4458041958042 ; (127.5/(510-224) kBlendNonPremultTbl.Data225: F\$  0.44736842105263 ; (127.5/(510-225) kBlendNonPremultTbl.Data226: F\$  0.44894366197183 ; (127.5/(510-226) kBlendNonPremultTbl.Data227: F\$  0.45053003533569 ; (127.5/(510-227) kBlendNonPremultTbl.Data228: F\$  0.45212765957447 ; (127.5/(510-228) kBlendNonPremultTbl.Data229: F\$  0.45373665480427 ; (127.5/(510-229) kBlendNonPremultTbl.Data230: F\$  0.45535714285714 ; (127.5/(510-230) kBlendNonPremultTbl.Data231: F\$  0.45698924731183 ; (127.5/(510-231) kBlendNonPremultTbl.Data232: F\$  0.45863309352518 ; (127.5/(510-232) kBlendNonPremultTbl.Data233: F\$  0.46028880866426 ; (127.5/(510-233) kBlendNonPremultTbl.Data234: F\$  0.46195652173913 ; (127.5/(510-234) kBlendNonPremultTbl.Data235: F\$  0.46363636363636 ; (127.5/(510-235) kBlendNonPremultTbl.Data236: F\$  0.46532846715329 ; (127.5/(510-236) kBlendNonPremultTbl.Data237: F\$  0.46703296703297 ; (127.5/(510-237) kBlendNonPremultTbl.Data238: F\$  0.46875 ; (127.5/(510-238) kBlendNonPremultTbl.Data239: F\$  0.47047970479705 ; (127.5/(510-239) kBlendNonPremultTbl.Data240: F\$  0.47222222222222 ; (127.5/(510-240) kBlendNonPremultTbl.Data241: F\$  0.47397769516729 ; (127.5/(510-241) kBlendNonPremultTbl.Data242: F\$  0.47574626865672 ; (127.5/(510-242) kBlendNonPremultTbl.Data243: F\$  0.47752808988764 ; (127.5/(510-243) kBlendNonPremultTbl.Data244: F\$  0.47932330827068 ; (127.5/(510-244) kBlendNonPremultTbl.Data245: F\$  0.4811320754717 ; (127.5/(510-245) kBlendNonPremultTbl.Data246: F\$  0.48295454545455 ; (127.5/(510-246) kBlendNonPremultTbl.Data247: F\$  0.48479087452472 ; (127.5/(510-247) kBlendNonPremultTbl.Data248: F\$  0.48664122137405 ; (127.5/(510-248) kBlendNonPremultTbl.Data249: F\$  0.48850574712644 ; (127.5/(510-249) kBlendNonPremultTbl.Data250: F\$  0.49038461538462 ; (127.5/(510-250) kBlendNonPremultTbl.Data251: F\$  0.49227799227799 ; (127.5/(510-251) kBlendNonPremultTbl.Data252: F\$  0.49418604651163 ; (127.5/(510-252) kBlendNonPremultTbl.Data253: F\$  0.49610894941634 ; (127.5/(510-253) kBlendNonPremultTbl.Data254: F\$  0.498046875 ; (127.5/(510-254) kBlendNonPremultTbl.Data255: F\$  0.5] ; (127.5/(510-255)[<16 Float_Half4: F\$ 0.5, 0.5, 0.5, 0.5][<16 Float_510_Inv4: F\$ (1/510.0), F\$ (1/510.0), F\$ (1/510.0), F\$ (1/510.0)][<16 Float255: F\$ 255][<16 Float_HalfPixel: F\$ 127.5]`
SSE2 macros used:

`[SSE2_CONV4BYTE_TO_4FLOAT | punpcklbw #1 #1 | punpcklwd #1 #1 | psrld #1 24 | cvtdq2ps #1 #1]; using values from 0 to 3[SHUFFLE | (255 - ((#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4))]; using values from 3 to 0[SHUFFLE_INV | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Marinus/Sieekmanski[SSE_INVERT_DWORDS 27]      ; invert the order of dwords[SSE2_CONV_BYTE_TO_4BYTE | punpcklbw #1 #1 | punpcklwd #1 #1][SSE2_EXTRACT_FIRST_BYTES | movdqu xmm7 #1 | psrldq xmm7 3 | orpd #1 xmm7 | psrldq xmm7 3 | orpd #1 xmm7 | psrldq xmm7 3 | orpd #1 xmm7][pshufd | pshufd #1 #2 #3][shufps | shufps #1 #2 #3][shufpd | shufpd #1 #2 #3][pshuflw | pshuflw #1 #2 #3][pshufhw | pshufhw #1 #2 #3][SSE_CONV_4FLOAT_TO_4INT | cvttps2dq #1 #1]`
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

continuation...

Function:
`Proc BlendPixelNonPremult:    Arguments @Src, @Dst    Local @Src_A    Uses edx, ecx    xorps xmm5 xmm5    mov esi D@Dst    mov eax D@Src | shr eax 24 | movzx eax al | mov D@Src_A eax ; Alpha of Source A    ..If eax <> 0        mov edx D@Dst | shr edx 24 | movzx edx dl ; Alpha of Destination A        .If_Or dl = 0, al = 255 ; alpha in destination = 0 or alpha in src = 255            mov eax D@Src        .Else_If dl = 255 ; Alpha in dst is 255 (Opaque)            ; calculate B/2 + A*x/510            movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0 ; A            movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1 ; B            ; get alpha from src            pshufd xmm6 xmm0 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from src = x            mulps xmm6 X\$Float_510_Inv4            mulps xmm0 xmm6 ; A*x/510            mulps xmm1 X\$Float_Half4 ; B/2            addps xmm0 xmm1            SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0 ; convert them to integer            SSE2_EXTRACT_FIRST_BYTES xmm0   ; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0            movd eax xmm0            ; now we need to place the original value of Src Alpha back on its place            mov edx D@Src            shr edx 24 | shl edx 24            shl eax 8 | shr eax 8            or eax edx        .Else_If al <> dl            movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0 ; A            movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1 ; B            ; get alpha from src and dst            pshufd xmm6 xmm0 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from src = x            pshufd xmm7 xmm1 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from dst = y            ; Let´s do (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))            ; calculate denominator denominator = (255*(x+y)-(x*y))            addss xmm6 xmm7 | mulss xmm6 X\$Float255            pshufd xmm6 xmm6 SSE_INVERT_DWORDS ; invert the order            ; now do x*y, but put the result in xmm7            mulss xmm7 xmm6            ; invert again            pshufd xmm6 xmm6 SSE_INVERT_DWORDS ; invert the order            subss xmm6 xmm7            movups xmm5 X\$Float_HalfPixel | divss xmm5 xmm6 ; now we have as amultiplicand: (255/2)/(255*(x+y)-(x*y))            pshufd xmm5 xmm5 {SHUFFLE 0,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin            ;pshufd xmm5 xmm5 {SHUFFLE 3,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin            ; restore the original alpha values            pshufd xmm6 xmm6 {SHUFFLE 1,1,1,1}            pshufd xmm7 xmm7 {SHUFFLE 1,1,1,1}            ; calculate (A*x+B*y) = The rest of our numerator            mulps xmm0 xmm6 ; A*x            mulps xmm1 xmm7 ; B*y            addps xmm0 xmm1 ; (A*x+B*y)            ; multiply by our new numerator from xmm5            mulps xmm0 xmm5            ; and finally, convert them to integer            SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0            ; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0            SSE2_EXTRACT_FIRST_BYTES xmm0            movd eax xmm0            ; now we need to place the original value of Src Alpha back on its place            mov edx D@Src_A            shl edx 24            or eax edx        .Else            ; src alpha = dst alpha            movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0            movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1            movss xmm5 X\$kBlendNonPremultTbl+eax*4            pshufd xmm5 xmm5 {SHUFFLE 0,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin            ; calculate (A+B)*k            addps xmm0 xmm1 ; (A+B)            mulps xmm0 xmm5 ; (A+B)*k            ; and finally, convert them to integer            SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0            ; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0            SSE2_EXTRACT_FIRST_BYTES xmm0            movd eax xmm0            ; now we need to place the original value of Src Alpha back on its place            shl edx 24            or eax edx        .End_If    ..Else        ; alpha in scr = 0        mov eax D@Dst    ..End_IfEndP`
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

also the other function BlendPixelPremult used, is faster on simple x86 opcodes.

`; called in BlendPixelRowPremult; A + B*(1-x/255)Proc BlendPixelPremult:    Arguments @Src, @Dst    Uses edx, ecx, esi    mov eax D@Src    mov edx D@Dst    mov ecx eax | shr ecx 24 | movzx ecx cl | neg ecx | add ecx 256    mov esi edx | and esi 0FF00FF | imul esi ecx | shr esi 8 | and esi 0FF00FF    shr edx 8 | and edx 0FF00FF | imul edx ecx | and edx 0FF00FF00    or esi edx    add eax esiEndP`
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com