## News:

Message to All Guests
NB: Posting URL's See here: Posted URL Change

## BlendPixelNonPremult lipwebdemux

Started by guga, July 12, 2024, 01:44:04 PM

#### guga

QuoteSome considerations about BlendPixelNonPremult function in  lipwebdemux library

The src is the front (top) image
The dst is the background (bottom) image

In webp library (on the demuxer) the function BlendPixelNonPremult is responsible to merge 2 different images taking onto account their alpha values. The main problem of this function is that it is a bit slow and lead to some inaccuracy.
The original function makes heavy use of div instruction, and also make 3 internal calls to another function "BlendChannelNonPremult" which also uses a internal _aulldiv function (from ntdll or the own library of the compiler)
After porting the function to a mathematical equation, the general approach is as follows:

1) The general approach:

Blended = (256*(A*x+B*y)/(256*(x+y)-(x*y)))/2
Simplifying: 128*(A*x+B*y)/(256*(x+y)-(x*y))
where:
A = Pixel on the src_channel in the range [0, 255].
B = Pixel on the dst_channel in the range [0, 255].
x = Alpha on the src channel in the range [0, 255].
y = Alpha on the dest channel  in the range [0, 255].

The problem with that is that, it has a maximum resultant of 254. So, A, B, x, y = 255 lead us to: 254,0077821011673

Since a pixel has a range of 0 to 255, we need to adapt the algorithm to the limits of a pixel.
In order to achieve a perfect blending (x,y, A, B and the Blending result are all inside 0 to 255), we should do:

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y)) which is the same as:
= (A*x+B*y)/(2*(x+y-(x*y/255)))
where:
A = Pixel on the src_channel in the range [0, 255].
B = Pixel on the dst_channel in the range [0, 255].
x = Alpha on the src channel in the range [0, 255].
y = Alpha on the dest channel  in the range [0, 255].

2) Specific cases:

a) Alpha in src and destination are the same:

When x = y, we have:

Blended = (255/2)*(A*x+B*x)/(255*(x+x)-(x*x))
Simplifying lead us to: (255 *(A + B))/(2 *(510-x))
127.5*(A+B)/(510-x)

This is particular, interesting because it will result on a perfect blending, specially for opaque images.
Example, is x = 255. The formula is simply: (A+B)/2. So, we since we have a range for 0 to 255 for Alpha (x) we can establish the minimum and maximum values and also a table to faster retrieve the blending values when the alpha on both images are the same.

Minimum Value (alpha = 0) = (A+B)/4
Maximum Value (alpha = 255) = (A+B)/2
Ok, then we have some ratio multiplied by (A+B), leading us to:
Blended = k*(A+B), where
k = 127.5/(510-alpha)

So, we can have the following table to use for the values of k:
x = 0, k = 1/4 = 255/1020 ... 1020 = 255*4-0
x = 1, k = 255/1018 .... 1018 = (255*4-2)
x = 2, k = 255/1016 .... 1016 = (255*4-4)
x = 3, k = 85/338 = 255/1014.... 1014 = (255*4-6)
x = 4, k = 255/1012.... 1012 = (255*4-8)
...
x = 200, k = 51/124 = 255/620.... 620 = (255*4-400)
x = 227, k = 255/566.... 566 = (255*4-454)
...
x = 255, k = 1/2.... 2 = (255*4-510)

Ok, from the uppon values we conclude that
k = 255/(255*4-Alpha*2) = 127.5/(510-Alpha)

So, we can have a KTable such as:
k0 = 127.5/(510-0)
k1 = 127.5/(510-1)
k2 = 127.5/(510-2)
k3 = 127.5/(510-3)
...
k255 = 127.5/(510-255)

On such way, The blended pixel is given by:
Blended = k(n) * (A+B)

b) Alpha in src is zero (Transparent)

When alpha in source is zeroed, we have the following mathematical result:

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
= (255/2)*(A*0+B*y)/(255*(0+y)-(0*y))
= B/2

So, if we use the equation it implies that the resultant blending image is half of the destination

But, in image processing, when Alpha = 0, it means the full image is transparent. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the destination image, since the source is fully transparent.

So, on such cases, the blended value is simply:

Blended = Dst. So, the full destination image is the result.

c) Alpha in destination is zero (Transparent)

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
= (255/2)*(A*x+B*0)/(255*(x+0)-(x*0))
= A/2

So, if we use the equation it implies that the resultant blending image is half of the source
But, in image processing, when Alpha = 0, it means the full image is transparent. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the Source image, since the destination is fully transparent.

So, on such cases, the blended value is simply:

Blended = Src. So, the full destination image is the result.

d) Alpha in src is 255 (Opaque)

When alpha in source is 255, we have the following mathematical result:

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
= (255/2)*(A*255+B*y)/(255*(255+y)-(255*y))
= A/2 + B*y/510

So, if we use the equation it implies that the resultant blending image is half of the source plus some ratio applied on the destination

But, in image processing, when Alpha = 255, it means the full image is opaque. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the source image, since the source is fully opaque.

So, on such cases, the blended value is simply:

Blended = Src. So, the full resultant image is the src itself.

e) Alpha in dst is 255 (Opaque)

When alpha in destination is 255, we have the following mathematical result:

Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
= (255/2)*(A*x+B*255)/(255*(x+255)-(x*255))
= B/2 + A*x/510

So, if we use the equation it implies that the resultant blending image is half of the destination plus some ratio applied on the source

But, in image processing, when Alpha = 255, it means the full image is opaque. Since what is opaque is the background (dst/botom) image, then it will be merged with the foreground image using this equation.
So, on such cases, the blended value is simply:

Blended = B/2 + A*x/510

Which is the most common blending mode, specially for watermarks and other types of superposition of an image.

The pixels must be in RGBA format
[RGBA:
RGBA.Red: B\$ 0
RGBA.Green: B\$ 0
RGBA.Blue: B\$ 0
RGBA.Alpha: B\$ 0]

I´m rebuilding (or trying) to rebuild the whole library to RosAsm, and this is the resultant function i´ve got so far

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

continuation...

Necessary data

[kBlendNonPremultTbl:
kBlendNonPremultTbl.Data0: F\$  0.25 ; (127.5/(510-0)
kBlendNonPremultTbl.Data1: F\$  0.25049115913556 ; (127.5/(510-1)
kBlendNonPremultTbl.Data2: F\$  0.2509842519685 ; (127.5/(510-2)
kBlendNonPremultTbl.Data3: F\$  0.25147928994083 ; (127.5/(510-3)
kBlendNonPremultTbl.Data4: F\$  0.25197628458498 ; (127.5/(510-4)
kBlendNonPremultTbl.Data5: F\$  0.25247524752475 ; (127.5/(510-5)
kBlendNonPremultTbl.Data6: F\$  0.25297619047619 ; (127.5/(510-6)
kBlendNonPremultTbl.Data7: F\$  0.25347912524851 ; (127.5/(510-7)
kBlendNonPremultTbl.Data8: F\$  0.25398406374502 ; (127.5/(510-8)
kBlendNonPremultTbl.Data9: F\$  0.25449101796407 ; (127.5/(510-9)
kBlendNonPremultTbl.Data10: F\$  0.255 ; (127.5/(510-10)
kBlendNonPremultTbl.Data11: F\$  0.25551102204409 ; (127.5/(510-11)
kBlendNonPremultTbl.Data12: F\$  0.25602409638554 ; (127.5/(510-12)
kBlendNonPremultTbl.Data13: F\$  0.25653923541248 ; (127.5/(510-13)
kBlendNonPremultTbl.Data14: F\$  0.2570564516129 ; (127.5/(510-14)
kBlendNonPremultTbl.Data15: F\$  0.25757575757576 ; (127.5/(510-15)
kBlendNonPremultTbl.Data16: F\$  0.2580971659919 ; (127.5/(510-16)
kBlendNonPremultTbl.Data17: F\$  0.25862068965517 ; (127.5/(510-17)
kBlendNonPremultTbl.Data18: F\$  0.25914634146342 ; (127.5/(510-18)
kBlendNonPremultTbl.Data19: F\$  0.25967413441955 ; (127.5/(510-19)
kBlendNonPremultTbl.Data20: F\$  0.26020408163265 ; (127.5/(510-20)
kBlendNonPremultTbl.Data21: F\$  0.26073619631902 ; (127.5/(510-21)
kBlendNonPremultTbl.Data22: F\$  0.26127049180328 ; (127.5/(510-22)
kBlendNonPremultTbl.Data23: F\$  0.26180698151951 ; (127.5/(510-23)
kBlendNonPremultTbl.Data24: F\$  0.26234567901235 ; (127.5/(510-24)
kBlendNonPremultTbl.Data25: F\$  0.26288659793814 ; (127.5/(510-25)
kBlendNonPremultTbl.Data26: F\$  0.26342975206612 ; (127.5/(510-26)
kBlendNonPremultTbl.Data27: F\$  0.2639751552795 ; (127.5/(510-27)
kBlendNonPremultTbl.Data28: F\$  0.26452282157676 ; (127.5/(510-28)
kBlendNonPremultTbl.Data29: F\$  0.26507276507277 ; (127.5/(510-29)
kBlendNonPremultTbl.Data30: F\$  0.265625 ; (127.5/(510-30)
kBlendNonPremultTbl.Data31: F\$  0.26617954070981 ; (127.5/(510-31)
kBlendNonPremultTbl.Data32: F\$  0.26673640167364 ; (127.5/(510-32)
kBlendNonPremultTbl.Data33: F\$  0.26729559748428 ; (127.5/(510-33)
kBlendNonPremultTbl.Data34: F\$  0.26785714285714 ; (127.5/(510-34)
kBlendNonPremultTbl.Data35: F\$  0.26842105263158 ; (127.5/(510-35)
kBlendNonPremultTbl.Data36: F\$  0.26898734177215 ; (127.5/(510-36)
kBlendNonPremultTbl.Data37: F\$  0.26955602536998 ; (127.5/(510-37)
kBlendNonPremultTbl.Data38: F\$  0.27012711864407 ; (127.5/(510-38)
kBlendNonPremultTbl.Data39: F\$  0.27070063694268 ; (127.5/(510-39)
kBlendNonPremultTbl.Data40: F\$  0.27127659574468 ; (127.5/(510-40)
kBlendNonPremultTbl.Data41: F\$  0.27185501066098 ; (127.5/(510-41)
kBlendNonPremultTbl.Data42: F\$  0.2724358974359 ; (127.5/(510-42)
kBlendNonPremultTbl.Data43: F\$  0.27301927194861 ; (127.5/(510-43)
kBlendNonPremultTbl.Data44: F\$  0.27360515021459 ; (127.5/(510-44)
kBlendNonPremultTbl.Data45: F\$  0.2741935483871 ; (127.5/(510-45)
kBlendNonPremultTbl.Data46: F\$  0.27478448275862 ; (127.5/(510-46)
kBlendNonPremultTbl.Data47: F\$  0.27537796976242 ; (127.5/(510-47)
kBlendNonPremultTbl.Data48: F\$  0.27597402597403 ; (127.5/(510-48)
kBlendNonPremultTbl.Data49: F\$  0.2765726681128 ; (127.5/(510-49)
kBlendNonPremultTbl.Data50: F\$  0.27717391304348 ; (127.5/(510-50)
kBlendNonPremultTbl.Data51: F\$  0.27777777777778 ; (127.5/(510-51)
kBlendNonPremultTbl.Data52: F\$  0.27838427947598 ; (127.5/(510-52)
kBlendNonPremultTbl.Data53: F\$  0.27899343544858 ; (127.5/(510-53)
kBlendNonPremultTbl.Data54: F\$  0.2796052631579 ; (127.5/(510-54)
kBlendNonPremultTbl.Data55: F\$  0.28021978021978 ; (127.5/(510-55)
kBlendNonPremultTbl.Data56: F\$  0.28083700440529 ; (127.5/(510-56)
kBlendNonPremultTbl.Data57: F\$  0.28145695364238 ; (127.5/(510-57)
kBlendNonPremultTbl.Data58: F\$  0.2820796460177 ; (127.5/(510-58)
kBlendNonPremultTbl.Data59: F\$  0.28270509977827 ; (127.5/(510-59)
kBlendNonPremultTbl.Data60: F\$  0.28333333333333 ; (127.5/(510-60)
kBlendNonPremultTbl.Data61: F\$  0.28396436525613 ; (127.5/(510-61)
kBlendNonPremultTbl.Data62: F\$  0.28459821428571 ; (127.5/(510-62)
kBlendNonPremultTbl.Data63: F\$  0.28523489932886 ; (127.5/(510-63)
kBlendNonPremultTbl.Data64: F\$  0.28587443946188 ; (127.5/(510-64)
kBlendNonPremultTbl.Data65: F\$  0.28651685393258 ; (127.5/(510-65)
kBlendNonPremultTbl.Data66: F\$  0.28716216216216 ; (127.5/(510-66)
kBlendNonPremultTbl.Data67: F\$  0.28781038374718 ; (127.5/(510-67)
kBlendNonPremultTbl.Data68: F\$  0.28846153846154 ; (127.5/(510-68)
kBlendNonPremultTbl.Data69: F\$  0.2891156462585 ; (127.5/(510-69)
kBlendNonPremultTbl.Data70: F\$  0.28977272727273 ; (127.5/(510-70)
kBlendNonPremultTbl.Data71: F\$  0.29043280182232 ; (127.5/(510-71)
kBlendNonPremultTbl.Data72: F\$  0.29109589041096 ; (127.5/(510-72)
kBlendNonPremultTbl.Data73: F\$  0.29176201372998 ; (127.5/(510-73)
kBlendNonPremultTbl.Data74: F\$  0.29243119266055 ; (127.5/(510-74)
kBlendNonPremultTbl.Data75: F\$  0.29310344827586 ; (127.5/(510-75)
kBlendNonPremultTbl.Data76: F\$  0.29377880184332 ; (127.5/(510-76)
kBlendNonPremultTbl.Data77: F\$  0.29445727482679 ; (127.5/(510-77)
kBlendNonPremultTbl.Data78: F\$  0.29513888888889 ; (127.5/(510-78)
kBlendNonPremultTbl.Data79: F\$  0.29582366589327 ; (127.5/(510-79)
kBlendNonPremultTbl.Data80: F\$  0.29651162790698 ; (127.5/(510-80)
kBlendNonPremultTbl.Data81: F\$  0.2972027972028 ; (127.5/(510-81)
kBlendNonPremultTbl.Data82: F\$  0.29789719626168 ; (127.5/(510-82)
kBlendNonPremultTbl.Data83: F\$  0.29859484777518 ; (127.5/(510-83)
kBlendNonPremultTbl.Data84: F\$  0.29929577464789 ; (127.5/(510-84)
kBlendNonPremultTbl.Data85: F\$  0.3 ; (127.5/(510-85)
kBlendNonPremultTbl.Data86: F\$  0.30070754716981 ; (127.5/(510-86)
kBlendNonPremultTbl.Data87: F\$  0.30141843971631 ; (127.5/(510-87)
kBlendNonPremultTbl.Data88: F\$  0.3021327014218 ; (127.5/(510-88)
kBlendNonPremultTbl.Data89: F\$  0.30285035629454 ; (127.5/(510-89)
kBlendNonPremultTbl.Data90: F\$  0.30357142857143 ; (127.5/(510-90)
kBlendNonPremultTbl.Data91: F\$  0.30429594272076 ; (127.5/(510-91)
kBlendNonPremultTbl.Data92: F\$  0.30502392344498 ; (127.5/(510-92)
kBlendNonPremultTbl.Data93: F\$  0.30575539568345 ; (127.5/(510-93)
kBlendNonPremultTbl.Data94: F\$  0.30649038461539 ; (127.5/(510-94)
kBlendNonPremultTbl.Data95: F\$  0.30722891566265 ; (127.5/(510-95)
kBlendNonPremultTbl.Data96: F\$  0.30797101449275 ; (127.5/(510-96)
kBlendNonPremultTbl.Data97: F\$  0.30871670702179 ; (127.5/(510-97)
kBlendNonPremultTbl.Data98: F\$  0.30946601941748 ; (127.5/(510-98)
kBlendNonPremultTbl.Data99: F\$  0.31021897810219 ; (127.5/(510-99)
kBlendNonPremultTbl.Data100: F\$  0.3109756097561 ; (127.5/(510-100)
kBlendNonPremultTbl.Data101: F\$  0.31173594132029 ; (127.5/(510-101)
kBlendNonPremultTbl.Data102: F\$  0.3125 ; (127.5/(510-102)
kBlendNonPremultTbl.Data103: F\$  0.31326781326781 ; (127.5/(510-103)
kBlendNonPremultTbl.Data104: F\$  0.314039408867 ; (127.5/(510-104)
kBlendNonPremultTbl.Data105: F\$  0.31481481481482 ; (127.5/(510-105)
kBlendNonPremultTbl.Data106: F\$  0.31559405940594 ; (127.5/(510-106)
kBlendNonPremultTbl.Data107: F\$  0.31637717121588 ; (127.5/(510-107)
kBlendNonPremultTbl.Data108: F\$  0.31716417910448 ; (127.5/(510-108)
kBlendNonPremultTbl.Data109: F\$  0.31795511221945 ; (127.5/(510-109)
kBlendNonPremultTbl.Data110: F\$  0.31875 ; (127.5/(510-110)
kBlendNonPremultTbl.Data111: F\$  0.31954887218045 ; (127.5/(510-111)
kBlendNonPremultTbl.Data112: F\$  0.32035175879397 ; (127.5/(510-112)
kBlendNonPremultTbl.Data113: F\$  0.32115869017632 ; (127.5/(510-113)
kBlendNonPremultTbl.Data114: F\$  0.3219696969697 ; (127.5/(510-114)
kBlendNonPremultTbl.Data115: F\$  0.32278481012658 ; (127.5/(510-115)
kBlendNonPremultTbl.Data116: F\$  0.32360406091371 ; (127.5/(510-116)
kBlendNonPremultTbl.Data117: F\$  0.32442748091603 ; (127.5/(510-117)
kBlendNonPremultTbl.Data118: F\$  0.32525510204082 ; (127.5/(510-118)
kBlendNonPremultTbl.Data119: F\$  0.32608695652174 ; (127.5/(510-119)
kBlendNonPremultTbl.Data120: F\$  0.32692307692308 ; (127.5/(510-120)
kBlendNonPremultTbl.Data121: F\$  0.32776349614396 ; (127.5/(510-121)
kBlendNonPremultTbl.Data122: F\$  0.32860824742268 ; (127.5/(510-122)
kBlendNonPremultTbl.Data123: F\$  0.32945736434109 ; (127.5/(510-123)
kBlendNonPremultTbl.Data124: F\$  0.33031088082902 ; (127.5/(510-124)
kBlendNonPremultTbl.Data125: F\$  0.33116883116883 ; (127.5/(510-125)
kBlendNonPremultTbl.Data126: F\$  0.33203125 ; (127.5/(510-126)
kBlendNonPremultTbl.Data127: F\$  0.33289817232376 ; (127.5/(510-127)
kBlendNonPremultTbl.Data128: F\$  0.33376963350785 ; (127.5/(510-128)
kBlendNonPremultTbl.Data129: F\$  0.33464566929134 ; (127.5/(510-129)
kBlendNonPremultTbl.Data130: F\$  0.33552631578947 ; (127.5/(510-130)
kBlendNonPremultTbl.Data131: F\$  0.33641160949868 ; (127.5/(510-131)
kBlendNonPremultTbl.Data132: F\$  0.33730158730159 ; (127.5/(510-132)
kBlendNonPremultTbl.Data133: F\$  0.33819628647215 ; (127.5/(510-133)
kBlendNonPremultTbl.Data134: F\$  0.33909574468085 ; (127.5/(510-134)
kBlendNonPremultTbl.Data135: F\$  0.34 ; (127.5/(510-135)
kBlendNonPremultTbl.Data136: F\$  0.34090909090909 ; (127.5/(510-136)
kBlendNonPremultTbl.Data137: F\$  0.34182305630027 ; (127.5/(510-137)
kBlendNonPremultTbl.Data138: F\$  0.34274193548387 ; (127.5/(510-138)
kBlendNonPremultTbl.Data139: F\$  0.34366576819407 ; (127.5/(510-139)
kBlendNonPremultTbl.Data140: F\$  0.3445945945946 ; (127.5/(510-140)
kBlendNonPremultTbl.Data141: F\$  0.34552845528455 ; (127.5/(510-141)
kBlendNonPremultTbl.Data142: F\$  0.34646739130435 ; (127.5/(510-142)
kBlendNonPremultTbl.Data143: F\$  0.34741144414169 ; (127.5/(510-143)
kBlendNonPremultTbl.Data144: F\$  0.34836065573771 ; (127.5/(510-144)
kBlendNonPremultTbl.Data145: F\$  0.34931506849315 ; (127.5/(510-145)
kBlendNonPremultTbl.Data146: F\$  0.35027472527473 ; (127.5/(510-146)
kBlendNonPremultTbl.Data147: F\$  0.35123966942149 ; (127.5/(510-147)
kBlendNonPremultTbl.Data148: F\$  0.35220994475138 ; (127.5/(510-148)
kBlendNonPremultTbl.Data149: F\$  0.35318559556787 ; (127.5/(510-149)
kBlendNonPremultTbl.Data150: F\$  0.35416666666667 ; (127.5/(510-150)
kBlendNonPremultTbl.Data151: F\$  0.35515320334262 ; (127.5/(510-151)
kBlendNonPremultTbl.Data152: F\$  0.35614525139665 ; (127.5/(510-152)
kBlendNonPremultTbl.Data153: F\$  0.35714285714286 ; (127.5/(510-153)
kBlendNonPremultTbl.Data154: F\$  0.35814606741573 ; (127.5/(510-154)
kBlendNonPremultTbl.Data155: F\$  0.35915492957747 ; (127.5/(510-155)
kBlendNonPremultTbl.Data156: F\$  0.36016949152542 ; (127.5/(510-156)
kBlendNonPremultTbl.Data157: F\$  0.36118980169972 ; (127.5/(510-157)
kBlendNonPremultTbl.Data158: F\$  0.36221590909091 ; (127.5/(510-158)
kBlendNonPremultTbl.Data159: F\$  0.36324786324786 ; (127.5/(510-159)
kBlendNonPremultTbl.Data160: F\$  0.36428571428571 ; (127.5/(510-160)
kBlendNonPremultTbl.Data161: F\$  0.36532951289398 ; (127.5/(510-161)
kBlendNonPremultTbl.Data162: F\$  0.36637931034483 ; (127.5/(510-162)
kBlendNonPremultTbl.Data163: F\$  0.36743515850144 ; (127.5/(510-163)
kBlendNonPremultTbl.Data164: F\$  0.36849710982659 ; (127.5/(510-164)
kBlendNonPremultTbl.Data165: F\$  0.3695652173913 ; (127.5/(510-165)
kBlendNonPremultTbl.Data166: F\$  0.37063953488372 ; (127.5/(510-166)
kBlendNonPremultTbl.Data167: F\$  0.37172011661808 ; (127.5/(510-167)
kBlendNonPremultTbl.Data168: F\$  0.37280701754386 ; (127.5/(510-168)
kBlendNonPremultTbl.Data169: F\$  0.37390029325513 ; (127.5/(510-169)
kBlendNonPremultTbl.Data170: F\$  0.375 ; (127.5/(510-170)
kBlendNonPremultTbl.Data171: F\$  0.37610619469027 ; (127.5/(510-171)
kBlendNonPremultTbl.Data172: F\$  0.37721893491124 ; (127.5/(510-172)
kBlendNonPremultTbl.Data173: F\$  0.37833827893175 ; (127.5/(510-173)
kBlendNonPremultTbl.Data174: F\$  0.37946428571429 ; (127.5/(510-174)
kBlendNonPremultTbl.Data175: F\$  0.38059701492537 ; (127.5/(510-175)
kBlendNonPremultTbl.Data176: F\$  0.38173652694611 ; (127.5/(510-176)
kBlendNonPremultTbl.Data177: F\$  0.38288288288288 ; (127.5/(510-177)
kBlendNonPremultTbl.Data178: F\$  0.38403614457831 ; (127.5/(510-178)
kBlendNonPremultTbl.Data179: F\$  0.38519637462236 ; (127.5/(510-179)
kBlendNonPremultTbl.Data180: F\$  0.38636363636364 ; (127.5/(510-180)
kBlendNonPremultTbl.Data181: F\$  0.38753799392097 ; (127.5/(510-181)
kBlendNonPremultTbl.Data182: F\$  0.38871951219512 ; (127.5/(510-182)
kBlendNonPremultTbl.Data183: F\$  0.38990825688073 ; (127.5/(510-183)
kBlendNonPremultTbl.Data184: F\$  0.39110429447853 ; (127.5/(510-184)
kBlendNonPremultTbl.Data185: F\$  0.39230769230769 ; (127.5/(510-185)
kBlendNonPremultTbl.Data186: F\$  0.39351851851852 ; (127.5/(510-186)
kBlendNonPremultTbl.Data187: F\$  0.39473684210526 ; (127.5/(510-187)
kBlendNonPremultTbl.Data188: F\$  0.39596273291926 ; (127.5/(510-188)
kBlendNonPremultTbl.Data189: F\$  0.39719626168224 ; (127.5/(510-189)
kBlendNonPremultTbl.Data190: F\$  0.3984375 ; (127.5/(510-190)
kBlendNonPremultTbl.Data191: F\$  0.39968652037618 ; (127.5/(510-191)
kBlendNonPremultTbl.Data192: F\$  0.40094339622642 ; (127.5/(510-192)
kBlendNonPremultTbl.Data193: F\$  0.40220820189274 ; (127.5/(510-193)
kBlendNonPremultTbl.Data194: F\$  0.40348101265823 ; (127.5/(510-194)
kBlendNonPremultTbl.Data195: F\$  0.40476190476191 ; (127.5/(510-195)
kBlendNonPremultTbl.Data196: F\$  0.40605095541401 ; (127.5/(510-196)
kBlendNonPremultTbl.Data197: F\$  0.4073482428115 ; (127.5/(510-197)
kBlendNonPremultTbl.Data198: F\$  0.40865384615385 ; (127.5/(510-198)
kBlendNonPremultTbl.Data199: F\$  0.40996784565916 ; (127.5/(510-199)
kBlendNonPremultTbl.Data200: F\$  0.41129032258065 ; (127.5/(510-200)
kBlendNonPremultTbl.Data201: F\$  0.4126213592233 ; (127.5/(510-201)
kBlendNonPremultTbl.Data202: F\$  0.41396103896104 ; (127.5/(510-202)
kBlendNonPremultTbl.Data203: F\$  0.41530944625407 ; (127.5/(510-203)
kBlendNonPremultTbl.Data204: F\$  0.41666666666667 ; (127.5/(510-204)
kBlendNonPremultTbl.Data205: F\$  0.41803278688525 ; (127.5/(510-205)
kBlendNonPremultTbl.Data206: F\$  0.41940789473684 ; (127.5/(510-206)
kBlendNonPremultTbl.Data207: F\$  0.42079207920792 ; (127.5/(510-207)
kBlendNonPremultTbl.Data208: F\$  0.42218543046358 ; (127.5/(510-208)
kBlendNonPremultTbl.Data209: F\$  0.42358803986711 ; (127.5/(510-209)
kBlendNonPremultTbl.Data210: F\$  0.425 ; (127.5/(510-210)
kBlendNonPremultTbl.Data211: F\$  0.42642140468227 ; (127.5/(510-211)
kBlendNonPremultTbl.Data212: F\$  0.42785234899329 ; (127.5/(510-212)
kBlendNonPremultTbl.Data213: F\$  0.42929292929293 ; (127.5/(510-213)
kBlendNonPremultTbl.Data214: F\$  0.43074324324324 ; (127.5/(510-214)
kBlendNonPremultTbl.Data215: F\$  0.43220338983051 ; (127.5/(510-215)
kBlendNonPremultTbl.Data216: F\$  0.43367346938776 ; (127.5/(510-216)
kBlendNonPremultTbl.Data217: F\$  0.43515358361775 ; (127.5/(510-217)
kBlendNonPremultTbl.Data218: F\$  0.43664383561644 ; (127.5/(510-218)
kBlendNonPremultTbl.Data219: F\$  0.43814432989691 ; (127.5/(510-219)
kBlendNonPremultTbl.Data220: F\$  0.43965517241379 ; (127.5/(510-220)
kBlendNonPremultTbl.Data221: F\$  0.44117647058824 ; (127.5/(510-221)
kBlendNonPremultTbl.Data222: F\$  0.44270833333333 ; (127.5/(510-222)
kBlendNonPremultTbl.Data223: F\$  0.44425087108014 ; (127.5/(510-223)
kBlendNonPremultTbl.Data224: F\$  0.4458041958042 ; (127.5/(510-224)
kBlendNonPremultTbl.Data225: F\$  0.44736842105263 ; (127.5/(510-225)
kBlendNonPremultTbl.Data226: F\$  0.44894366197183 ; (127.5/(510-226)
kBlendNonPremultTbl.Data227: F\$  0.45053003533569 ; (127.5/(510-227)
kBlendNonPremultTbl.Data228: F\$  0.45212765957447 ; (127.5/(510-228)
kBlendNonPremultTbl.Data229: F\$  0.45373665480427 ; (127.5/(510-229)
kBlendNonPremultTbl.Data230: F\$  0.45535714285714 ; (127.5/(510-230)
kBlendNonPremultTbl.Data231: F\$  0.45698924731183 ; (127.5/(510-231)
kBlendNonPremultTbl.Data232: F\$  0.45863309352518 ; (127.5/(510-232)
kBlendNonPremultTbl.Data233: F\$  0.46028880866426 ; (127.5/(510-233)
kBlendNonPremultTbl.Data234: F\$  0.46195652173913 ; (127.5/(510-234)
kBlendNonPremultTbl.Data235: F\$  0.46363636363636 ; (127.5/(510-235)
kBlendNonPremultTbl.Data236: F\$  0.46532846715329 ; (127.5/(510-236)
kBlendNonPremultTbl.Data237: F\$  0.46703296703297 ; (127.5/(510-237)
kBlendNonPremultTbl.Data238: F\$  0.46875 ; (127.5/(510-238)
kBlendNonPremultTbl.Data239: F\$  0.47047970479705 ; (127.5/(510-239)
kBlendNonPremultTbl.Data240: F\$  0.47222222222222 ; (127.5/(510-240)
kBlendNonPremultTbl.Data241: F\$  0.47397769516729 ; (127.5/(510-241)
kBlendNonPremultTbl.Data242: F\$  0.47574626865672 ; (127.5/(510-242)
kBlendNonPremultTbl.Data243: F\$  0.47752808988764 ; (127.5/(510-243)
kBlendNonPremultTbl.Data244: F\$  0.47932330827068 ; (127.5/(510-244)
kBlendNonPremultTbl.Data245: F\$  0.4811320754717 ; (127.5/(510-245)
kBlendNonPremultTbl.Data246: F\$  0.48295454545455 ; (127.5/(510-246)
kBlendNonPremultTbl.Data247: F\$  0.48479087452472 ; (127.5/(510-247)
kBlendNonPremultTbl.Data248: F\$  0.48664122137405 ; (127.5/(510-248)
kBlendNonPremultTbl.Data249: F\$  0.48850574712644 ; (127.5/(510-249)
kBlendNonPremultTbl.Data250: F\$  0.49038461538462 ; (127.5/(510-250)
kBlendNonPremultTbl.Data251: F\$  0.49227799227799 ; (127.5/(510-251)
kBlendNonPremultTbl.Data252: F\$  0.49418604651163 ; (127.5/(510-252)
kBlendNonPremultTbl.Data253: F\$  0.49610894941634 ; (127.5/(510-253)
kBlendNonPremultTbl.Data254: F\$  0.498046875 ; (127.5/(510-254)
kBlendNonPremultTbl.Data255: F\$  0.5] ; (127.5/(510-255)

[<16 Float_Half4: F\$ 0.5, 0.5, 0.5, 0.5]
[<16 Float_510_Inv4: F\$ (1/510.0), F\$ (1/510.0), F\$ (1/510.0), F\$ (1/510.0)]
[<16 Float255: F\$ 255]
[<16 Float_HalfPixel: F\$ 127.5]

SSE2 macros used:

[SSE2_CONV4BYTE_TO_4FLOAT | punpcklbw #1 #1 | punpcklwd #1 #1 | psrld #1 24 | cvtdq2ps #1 #1]

; using values from 0 to 3
[SHUFFLE | (255 - ((#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4))]

; using values from 3 to 0
[SHUFFLE_INV | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Marinus/Sieekmanski
[SSE_INVERT_DWORDS 27]      ; invert the order of dwords
[SSE2_CONV_BYTE_TO_4BYTE | punpcklbw #1 #1 | punpcklwd #1 #1]
[SSE2_EXTRACT_FIRST_BYTES | movdqu xmm7 #1 | psrldq xmm7 3 | orpd #1 xmm7 | psrldq xmm7 3 | orpd #1 xmm7 | psrldq xmm7 3 | orpd #1 xmm7]

[pshufd | pshufd #1 #2 #3]
[shufps | shufps #1 #2 #3]
[shufpd | shufpd #1 #2 #3]
[pshuflw | pshuflw #1 #2 #3]
[pshufhw | pshufhw #1 #2 #3]
[SSE_CONV_4FLOAT_TO_4INT | cvttps2dq #1 #1]

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

continuation...

Function:

Proc BlendPixelNonPremult:
Arguments @Src, @Dst
Local @Src_A
Uses edx, ecx

xorps xmm5 xmm5
mov esi D@Dst
mov eax D@Src | shr eax 24 | movzx eax al | mov D@Src_A eax ; Alpha of Source A
..If eax <> 0

mov edx D@Dst | shr edx 24 | movzx edx dl ; Alpha of Destination A

.If_Or dl = 0, al = 255 ; alpha in destination = 0 or alpha in src = 255

mov eax D@Src

.Else_If dl = 255 ; Alpha in dst is 255 (Opaque)

; calculate B/2 + A*x/510

movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0 ; A
movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1 ; B
; get alpha from src
pshufd xmm6 xmm0 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from src = x
mulps xmm6 X\$Float_510_Inv4
mulps xmm0 xmm6 ; A*x/510
mulps xmm1 X\$Float_Half4 ; B/2
SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0 ; convert them to integer
SSE2_EXTRACT_FIRST_BYTES xmm0   ; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0
movd eax xmm0
; now we need to place the original value of Src Alpha back on its place
mov edx D@Src
shr edx 24 | shl edx 24
shl eax 8 | shr eax 8
or eax edx

.Else_If al <> dl

movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0 ; A
movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1 ; B
; get alpha from src and dst
pshufd xmm6 xmm0 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from src = x
pshufd xmm7 xmm1 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from dst = y

; Let´s do (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))

; calculate denominator denominator = (255*(x+y)-(x*y))
addss xmm6 xmm7 | mulss xmm6 X\$Float255
pshufd xmm6 xmm6 SSE_INVERT_DWORDS ; invert the order
; now do x*y, but put the result in xmm7
mulss xmm7 xmm6
; invert again
pshufd xmm6 xmm6 SSE_INVERT_DWORDS ; invert the order
subss xmm6 xmm7
movups xmm5 X\$Float_HalfPixel | divss xmm5 xmm6 ; now we have as amultiplicand: (255/2)/(255*(x+y)-(x*y))

pshufd xmm5 xmm5 {SHUFFLE 0,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin
;pshufd xmm5 xmm5 {SHUFFLE 3,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin

; restore the original alpha values
pshufd xmm6 xmm6 {SHUFFLE 1,1,1,1}
pshufd xmm7 xmm7 {SHUFFLE 1,1,1,1}

; calculate (A*x+B*y) = The rest of our numerator
mulps xmm0 xmm6 ; A*x
mulps xmm1 xmm7 ; B*y

; multiply by our new numerator from xmm5
mulps xmm0 xmm5

; and finally, convert them to integer
SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0
; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0
SSE2_EXTRACT_FIRST_BYTES xmm0
movd eax xmm0
; now we need to place the original value of Src Alpha back on its place
mov edx D@Src_A
shl edx 24
or eax edx
.Else
; src alpha = dst alpha
movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0
movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1
movss xmm5 X\$kBlendNonPremultTbl+eax*4
pshufd xmm5 xmm5 {SHUFFLE 0,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin
; calculate (A+B)*k
mulps xmm0 xmm5 ; (A+B)*k

; and finally, convert them to integer
SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0
; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0
SSE2_EXTRACT_FIRST_BYTES xmm0
movd eax xmm0

; now we need to place the original value of Src Alpha back on its place
shl edx 24
or eax edx

.End_If
..Else
; alpha in scr = 0
mov eax D@Dst
..End_If

EndP

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

also the other function BlendPixelPremult used, is faster on simple x86 opcodes.

; called in BlendPixelRowPremult
; A + B*(1-x/255)
Proc BlendPixelPremult:
Arguments @Src, @Dst
Uses edx, ecx, esi

mov eax D@Src
mov edx D@Dst
mov ecx eax | shr ecx 24 | movzx ecx cl | neg ecx | add ecx 256
mov esi edx | and esi 0FF00FF | imul esi ecx | shr esi 8 | and esi 0FF00FF
shr edx 8 | and edx 0FF00FF | imul edx ecx | and edx 0FF00FF00
or esi edx