News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

BlendPixelNonPremult lipwebdemux

Started by guga, July 12, 2024, 01:44:04 PM

Previous topic - Next topic

guga

QuoteSome considerations about BlendPixelNonPremult function in  lipwebdemux library

    The src is the front (top) image
    The dst is the background (bottom) image


    In webp library (on the demuxer) the function BlendPixelNonPremult is responsible to merge 2 different images taking onto account their alpha values. The main problem of this function is that it is a bit slow and lead to some inaccuracy.
    The original function makes heavy use of div instruction, and also make 3 internal calls to another function "BlendChannelNonPremult" which also uses a internal _aulldiv function (from ntdll or the own library of the compiler)
    After porting the function to a mathematical equation, the general approach is as follows:

    1) The general approach:
   
        Blended = (256*(A*x+B*y)/(256*(x+y)-(x*y)))/2
        Simplifying: 128*(A*x+B*y)/(256*(x+y)-(x*y))
        where:
            A = Pixel on the src_channel in the range [0, 255].
            B = Pixel on the dst_channel in the range [0, 255].
            x = Alpha on the src channel in the range [0, 255].
            y = Alpha on the dest channel  in the range [0, 255].

    The problem with that is that, it has a maximum resultant of 254. So, A, B, x, y = 255 lead us to: 254,0077821011673
   
    Since a pixel has a range of 0 to 255, we need to adapt the algorithm to the limits of a pixel.
    In order to achieve a perfect blending (x,y, A, B and the Blending result are all inside 0 to 255), we should do:

    Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y)) which is the same as:
            = (A*x+B*y)/(2*(x+y-(x*y/255)))
        where:
            A = Pixel on the src_channel in the range [0, 255].
            B = Pixel on the dst_channel in the range [0, 255].
            x = Alpha on the src channel in the range [0, 255].
            y = Alpha on the dest channel  in the range [0, 255].   

    2) Specific cases:
   
        a) Alpha in src and destination are the same:
           
            When x = y, we have:
           
                Blended = (255/2)*(A*x+B*x)/(255*(x+x)-(x*x))
                Simplifying lead us to: (255 *(A + B))/(2 *(510-x))
                                        127.5*(A+B)/(510-x)
       
            This is particular, interesting because it will result on a perfect blending, specially for opaque images.
            Example, is x = 255. The formula is simply: (A+B)/2. So, we since we have a range for 0 to 255 for Alpha (x) we can establish the minimum and maximum values and also a table to faster retrieve the blending values when the alpha on both images are the same.
           
            Minimum Value (alpha = 0) = (A+B)/4
            Maximum Value (alpha = 255) = (A+B)/2
            Ok, then we have some ratio multiplied by (A+B), leading us to:
            Blended = k*(A+B), where
                      k = 127.5/(510-alpha)
       
            So, we can have the following table to use for the values of k:
            x = 0, k = 1/4 = 255/1020 ... 1020 = 255*4-0
            x = 1, k = 255/1018 .... 1018 = (255*4-2)
            x = 2, k = 255/1016 .... 1016 = (255*4-4)
            x = 3, k = 85/338 = 255/1014.... 1014 = (255*4-6)
            x = 4, k = 255/1012.... 1012 = (255*4-8)
            ...
            x = 200, k = 51/124 = 255/620.... 620 = (255*4-400)
            x = 227, k = 255/566.... 566 = (255*4-454)
            ...
            x = 255, k = 1/2.... 2 = (255*4-510)
           
            Ok, from the uppon values we conclude that
            k = 255/(255*4-Alpha*2) = 127.5/(510-Alpha)
           
            So, we can have a KTable such as:
            k0 = 127.5/(510-0)
            k1 = 127.5/(510-1)
            k2 = 127.5/(510-2)
            k3 = 127.5/(510-3)
            ...
            k255 = 127.5/(510-255)
       
            On such way, The blended pixel is given by:
            Blended = k(n) * (A+B)
       
        b) Alpha in src is zero (Transparent)
       
            When alpha in source is zeroed, we have the following mathematical result:
           
            Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
                    = (255/2)*(A*0+B*y)/(255*(0+y)-(0*y))
                    = B/2
       
            So, if we use the equation it implies that the resultant blending image is half of the destination
           
            But, in image processing, when Alpha = 0, it means the full image is transparent. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the destination image, since the source is fully transparent.
           
            So, on such cases, the blended value is simply:
           
            Blended = Dst. So, the full destination image is the result.
       

        c) Alpha in destination is zero (Transparent)

            Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
                    = (255/2)*(A*x+B*0)/(255*(x+0)-(x*0))
                    = A/2
       
            So, if we use the equation it implies that the resultant blending image is half of the source
            But, in image processing, when Alpha = 0, it means the full image is transparent. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the Source image, since the destination is fully transparent.
           
            So, on such cases, the blended value is simply:
           
            Blended = Src. So, the full destination image is the result.
       
        d) Alpha in src is 255 (Opaque)
       
            When alpha in source is 255, we have the following mathematical result:
           
            Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
                    = (255/2)*(A*255+B*y)/(255*(255+y)-(255*y))
                    = A/2 + B*y/510
       
            So, if we use the equation it implies that the resultant blending image is half of the source plus some ratio applied on the destination
           
            But, in image processing, when Alpha = 255, it means the full image is opaque. Therefore, despite the resultant value of the equation, in fact, all that will be shown is the source image, since the source is fully opaque.
           
            So, on such cases, the blended value is simply:
           
            Blended = Src. So, the full resultant image is the src itself.

        e) Alpha in dst is 255 (Opaque)
       
            When alpha in destination is 255, we have the following mathematical result:
           
            Blended = (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))
                    = (255/2)*(A*x+B*255)/(255*(x+255)-(x*255))
                    = B/2 + A*x/510
       
            So, if we use the equation it implies that the resultant blending image is half of the destination plus some ratio applied on the source
           
            But, in image processing, when Alpha = 255, it means the full image is opaque. Since what is opaque is the background (dst/botom) image, then it will be merged with the foreground image using this equation.
            So, on such cases, the blended value is simply:
           
            Blended = B/2 + A*x/510
           
            Which is the most common blending mode, specially for watermarks and other types of superposition of an image.


The pixels must be in RGBA format
[RGBA:
 RGBA.Red: B$ 0
 RGBA.Green: B$ 0
 RGBA.Blue: B$ 0
 RGBA.Alpha: B$ 0]

I´m rebuilding (or trying) to rebuild the whole library to RosAsm, and this is the resultant function i´ve got so far


Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

continuation...

Necessary data

[kBlendNonPremultTbl:
 kBlendNonPremultTbl.Data0: F$  0.25 ; (127.5/(510-0)
 kBlendNonPremultTbl.Data1: F$  0.25049115913556 ; (127.5/(510-1)
 kBlendNonPremultTbl.Data2: F$  0.2509842519685 ; (127.5/(510-2)
 kBlendNonPremultTbl.Data3: F$  0.25147928994083 ; (127.5/(510-3)
 kBlendNonPremultTbl.Data4: F$  0.25197628458498 ; (127.5/(510-4)
 kBlendNonPremultTbl.Data5: F$  0.25247524752475 ; (127.5/(510-5)
 kBlendNonPremultTbl.Data6: F$  0.25297619047619 ; (127.5/(510-6)
 kBlendNonPremultTbl.Data7: F$  0.25347912524851 ; (127.5/(510-7)
 kBlendNonPremultTbl.Data8: F$  0.25398406374502 ; (127.5/(510-8)
 kBlendNonPremultTbl.Data9: F$  0.25449101796407 ; (127.5/(510-9)
 kBlendNonPremultTbl.Data10: F$  0.255 ; (127.5/(510-10)
 kBlendNonPremultTbl.Data11: F$  0.25551102204409 ; (127.5/(510-11)
 kBlendNonPremultTbl.Data12: F$  0.25602409638554 ; (127.5/(510-12)
 kBlendNonPremultTbl.Data13: F$  0.25653923541248 ; (127.5/(510-13)
 kBlendNonPremultTbl.Data14: F$  0.2570564516129 ; (127.5/(510-14)
 kBlendNonPremultTbl.Data15: F$  0.25757575757576 ; (127.5/(510-15)
 kBlendNonPremultTbl.Data16: F$  0.2580971659919 ; (127.5/(510-16)
 kBlendNonPremultTbl.Data17: F$  0.25862068965517 ; (127.5/(510-17)
 kBlendNonPremultTbl.Data18: F$  0.25914634146342 ; (127.5/(510-18)
 kBlendNonPremultTbl.Data19: F$  0.25967413441955 ; (127.5/(510-19)
 kBlendNonPremultTbl.Data20: F$  0.26020408163265 ; (127.5/(510-20)
 kBlendNonPremultTbl.Data21: F$  0.26073619631902 ; (127.5/(510-21)
 kBlendNonPremultTbl.Data22: F$  0.26127049180328 ; (127.5/(510-22)
 kBlendNonPremultTbl.Data23: F$  0.26180698151951 ; (127.5/(510-23)
 kBlendNonPremultTbl.Data24: F$  0.26234567901235 ; (127.5/(510-24)
 kBlendNonPremultTbl.Data25: F$  0.26288659793814 ; (127.5/(510-25)
 kBlendNonPremultTbl.Data26: F$  0.26342975206612 ; (127.5/(510-26)
 kBlendNonPremultTbl.Data27: F$  0.2639751552795 ; (127.5/(510-27)
 kBlendNonPremultTbl.Data28: F$  0.26452282157676 ; (127.5/(510-28)
 kBlendNonPremultTbl.Data29: F$  0.26507276507277 ; (127.5/(510-29)
 kBlendNonPremultTbl.Data30: F$  0.265625 ; (127.5/(510-30)
 kBlendNonPremultTbl.Data31: F$  0.26617954070981 ; (127.5/(510-31)
 kBlendNonPremultTbl.Data32: F$  0.26673640167364 ; (127.5/(510-32)
 kBlendNonPremultTbl.Data33: F$  0.26729559748428 ; (127.5/(510-33)
 kBlendNonPremultTbl.Data34: F$  0.26785714285714 ; (127.5/(510-34)
 kBlendNonPremultTbl.Data35: F$  0.26842105263158 ; (127.5/(510-35)
 kBlendNonPremultTbl.Data36: F$  0.26898734177215 ; (127.5/(510-36)
 kBlendNonPremultTbl.Data37: F$  0.26955602536998 ; (127.5/(510-37)
 kBlendNonPremultTbl.Data38: F$  0.27012711864407 ; (127.5/(510-38)
 kBlendNonPremultTbl.Data39: F$  0.27070063694268 ; (127.5/(510-39)
 kBlendNonPremultTbl.Data40: F$  0.27127659574468 ; (127.5/(510-40)
 kBlendNonPremultTbl.Data41: F$  0.27185501066098 ; (127.5/(510-41)
 kBlendNonPremultTbl.Data42: F$  0.2724358974359 ; (127.5/(510-42)
 kBlendNonPremultTbl.Data43: F$  0.27301927194861 ; (127.5/(510-43)
 kBlendNonPremultTbl.Data44: F$  0.27360515021459 ; (127.5/(510-44)
 kBlendNonPremultTbl.Data45: F$  0.2741935483871 ; (127.5/(510-45)
 kBlendNonPremultTbl.Data46: F$  0.27478448275862 ; (127.5/(510-46)
 kBlendNonPremultTbl.Data47: F$  0.27537796976242 ; (127.5/(510-47)
 kBlendNonPremultTbl.Data48: F$  0.27597402597403 ; (127.5/(510-48)
 kBlendNonPremultTbl.Data49: F$  0.2765726681128 ; (127.5/(510-49)
 kBlendNonPremultTbl.Data50: F$  0.27717391304348 ; (127.5/(510-50)
 kBlendNonPremultTbl.Data51: F$  0.27777777777778 ; (127.5/(510-51)
 kBlendNonPremultTbl.Data52: F$  0.27838427947598 ; (127.5/(510-52)
 kBlendNonPremultTbl.Data53: F$  0.27899343544858 ; (127.5/(510-53)
 kBlendNonPremultTbl.Data54: F$  0.2796052631579 ; (127.5/(510-54)
 kBlendNonPremultTbl.Data55: F$  0.28021978021978 ; (127.5/(510-55)
 kBlendNonPremultTbl.Data56: F$  0.28083700440529 ; (127.5/(510-56)
 kBlendNonPremultTbl.Data57: F$  0.28145695364238 ; (127.5/(510-57)
 kBlendNonPremultTbl.Data58: F$  0.2820796460177 ; (127.5/(510-58)
 kBlendNonPremultTbl.Data59: F$  0.28270509977827 ; (127.5/(510-59)
 kBlendNonPremultTbl.Data60: F$  0.28333333333333 ; (127.5/(510-60)
 kBlendNonPremultTbl.Data61: F$  0.28396436525613 ; (127.5/(510-61)
 kBlendNonPremultTbl.Data62: F$  0.28459821428571 ; (127.5/(510-62)
 kBlendNonPremultTbl.Data63: F$  0.28523489932886 ; (127.5/(510-63)
 kBlendNonPremultTbl.Data64: F$  0.28587443946188 ; (127.5/(510-64)
 kBlendNonPremultTbl.Data65: F$  0.28651685393258 ; (127.5/(510-65)
 kBlendNonPremultTbl.Data66: F$  0.28716216216216 ; (127.5/(510-66)
 kBlendNonPremultTbl.Data67: F$  0.28781038374718 ; (127.5/(510-67)
 kBlendNonPremultTbl.Data68: F$  0.28846153846154 ; (127.5/(510-68)
 kBlendNonPremultTbl.Data69: F$  0.2891156462585 ; (127.5/(510-69)
 kBlendNonPremultTbl.Data70: F$  0.28977272727273 ; (127.5/(510-70)
 kBlendNonPremultTbl.Data71: F$  0.29043280182232 ; (127.5/(510-71)
 kBlendNonPremultTbl.Data72: F$  0.29109589041096 ; (127.5/(510-72)
 kBlendNonPremultTbl.Data73: F$  0.29176201372998 ; (127.5/(510-73)
 kBlendNonPremultTbl.Data74: F$  0.29243119266055 ; (127.5/(510-74)
 kBlendNonPremultTbl.Data75: F$  0.29310344827586 ; (127.5/(510-75)
 kBlendNonPremultTbl.Data76: F$  0.29377880184332 ; (127.5/(510-76)
 kBlendNonPremultTbl.Data77: F$  0.29445727482679 ; (127.5/(510-77)
 kBlendNonPremultTbl.Data78: F$  0.29513888888889 ; (127.5/(510-78)
 kBlendNonPremultTbl.Data79: F$  0.29582366589327 ; (127.5/(510-79)
 kBlendNonPremultTbl.Data80: F$  0.29651162790698 ; (127.5/(510-80)
 kBlendNonPremultTbl.Data81: F$  0.2972027972028 ; (127.5/(510-81)
 kBlendNonPremultTbl.Data82: F$  0.29789719626168 ; (127.5/(510-82)
 kBlendNonPremultTbl.Data83: F$  0.29859484777518 ; (127.5/(510-83)
 kBlendNonPremultTbl.Data84: F$  0.29929577464789 ; (127.5/(510-84)
 kBlendNonPremultTbl.Data85: F$  0.3 ; (127.5/(510-85)
 kBlendNonPremultTbl.Data86: F$  0.30070754716981 ; (127.5/(510-86)
 kBlendNonPremultTbl.Data87: F$  0.30141843971631 ; (127.5/(510-87)
 kBlendNonPremultTbl.Data88: F$  0.3021327014218 ; (127.5/(510-88)
 kBlendNonPremultTbl.Data89: F$  0.30285035629454 ; (127.5/(510-89)
 kBlendNonPremultTbl.Data90: F$  0.30357142857143 ; (127.5/(510-90)
 kBlendNonPremultTbl.Data91: F$  0.30429594272076 ; (127.5/(510-91)
 kBlendNonPremultTbl.Data92: F$  0.30502392344498 ; (127.5/(510-92)
 kBlendNonPremultTbl.Data93: F$  0.30575539568345 ; (127.5/(510-93)
 kBlendNonPremultTbl.Data94: F$  0.30649038461539 ; (127.5/(510-94)
 kBlendNonPremultTbl.Data95: F$  0.30722891566265 ; (127.5/(510-95)
 kBlendNonPremultTbl.Data96: F$  0.30797101449275 ; (127.5/(510-96)
 kBlendNonPremultTbl.Data97: F$  0.30871670702179 ; (127.5/(510-97)
 kBlendNonPremultTbl.Data98: F$  0.30946601941748 ; (127.5/(510-98)
 kBlendNonPremultTbl.Data99: F$  0.31021897810219 ; (127.5/(510-99)
 kBlendNonPremultTbl.Data100: F$  0.3109756097561 ; (127.5/(510-100)
 kBlendNonPremultTbl.Data101: F$  0.31173594132029 ; (127.5/(510-101)
 kBlendNonPremultTbl.Data102: F$  0.3125 ; (127.5/(510-102)
 kBlendNonPremultTbl.Data103: F$  0.31326781326781 ; (127.5/(510-103)
 kBlendNonPremultTbl.Data104: F$  0.314039408867 ; (127.5/(510-104)
 kBlendNonPremultTbl.Data105: F$  0.31481481481482 ; (127.5/(510-105)
 kBlendNonPremultTbl.Data106: F$  0.31559405940594 ; (127.5/(510-106)
 kBlendNonPremultTbl.Data107: F$  0.31637717121588 ; (127.5/(510-107)
 kBlendNonPremultTbl.Data108: F$  0.31716417910448 ; (127.5/(510-108)
 kBlendNonPremultTbl.Data109: F$  0.31795511221945 ; (127.5/(510-109)
 kBlendNonPremultTbl.Data110: F$  0.31875 ; (127.5/(510-110)
 kBlendNonPremultTbl.Data111: F$  0.31954887218045 ; (127.5/(510-111)
 kBlendNonPremultTbl.Data112: F$  0.32035175879397 ; (127.5/(510-112)
 kBlendNonPremultTbl.Data113: F$  0.32115869017632 ; (127.5/(510-113)
 kBlendNonPremultTbl.Data114: F$  0.3219696969697 ; (127.5/(510-114)
 kBlendNonPremultTbl.Data115: F$  0.32278481012658 ; (127.5/(510-115)
 kBlendNonPremultTbl.Data116: F$  0.32360406091371 ; (127.5/(510-116)
 kBlendNonPremultTbl.Data117: F$  0.32442748091603 ; (127.5/(510-117)
 kBlendNonPremultTbl.Data118: F$  0.32525510204082 ; (127.5/(510-118)
 kBlendNonPremultTbl.Data119: F$  0.32608695652174 ; (127.5/(510-119)
 kBlendNonPremultTbl.Data120: F$  0.32692307692308 ; (127.5/(510-120)
 kBlendNonPremultTbl.Data121: F$  0.32776349614396 ; (127.5/(510-121)
 kBlendNonPremultTbl.Data122: F$  0.32860824742268 ; (127.5/(510-122)
 kBlendNonPremultTbl.Data123: F$  0.32945736434109 ; (127.5/(510-123)
 kBlendNonPremultTbl.Data124: F$  0.33031088082902 ; (127.5/(510-124)
 kBlendNonPremultTbl.Data125: F$  0.33116883116883 ; (127.5/(510-125)
 kBlendNonPremultTbl.Data126: F$  0.33203125 ; (127.5/(510-126)
 kBlendNonPremultTbl.Data127: F$  0.33289817232376 ; (127.5/(510-127)
 kBlendNonPremultTbl.Data128: F$  0.33376963350785 ; (127.5/(510-128)
 kBlendNonPremultTbl.Data129: F$  0.33464566929134 ; (127.5/(510-129)
 kBlendNonPremultTbl.Data130: F$  0.33552631578947 ; (127.5/(510-130)
 kBlendNonPremultTbl.Data131: F$  0.33641160949868 ; (127.5/(510-131)
 kBlendNonPremultTbl.Data132: F$  0.33730158730159 ; (127.5/(510-132)
 kBlendNonPremultTbl.Data133: F$  0.33819628647215 ; (127.5/(510-133)
 kBlendNonPremultTbl.Data134: F$  0.33909574468085 ; (127.5/(510-134)
 kBlendNonPremultTbl.Data135: F$  0.34 ; (127.5/(510-135)
 kBlendNonPremultTbl.Data136: F$  0.34090909090909 ; (127.5/(510-136)
 kBlendNonPremultTbl.Data137: F$  0.34182305630027 ; (127.5/(510-137)
 kBlendNonPremultTbl.Data138: F$  0.34274193548387 ; (127.5/(510-138)
 kBlendNonPremultTbl.Data139: F$  0.34366576819407 ; (127.5/(510-139)
 kBlendNonPremultTbl.Data140: F$  0.3445945945946 ; (127.5/(510-140)
 kBlendNonPremultTbl.Data141: F$  0.34552845528455 ; (127.5/(510-141)
 kBlendNonPremultTbl.Data142: F$  0.34646739130435 ; (127.5/(510-142)
 kBlendNonPremultTbl.Data143: F$  0.34741144414169 ; (127.5/(510-143)
 kBlendNonPremultTbl.Data144: F$  0.34836065573771 ; (127.5/(510-144)
 kBlendNonPremultTbl.Data145: F$  0.34931506849315 ; (127.5/(510-145)
 kBlendNonPremultTbl.Data146: F$  0.35027472527473 ; (127.5/(510-146)
 kBlendNonPremultTbl.Data147: F$  0.35123966942149 ; (127.5/(510-147)
 kBlendNonPremultTbl.Data148: F$  0.35220994475138 ; (127.5/(510-148)
 kBlendNonPremultTbl.Data149: F$  0.35318559556787 ; (127.5/(510-149)
 kBlendNonPremultTbl.Data150: F$  0.35416666666667 ; (127.5/(510-150)
 kBlendNonPremultTbl.Data151: F$  0.35515320334262 ; (127.5/(510-151)
 kBlendNonPremultTbl.Data152: F$  0.35614525139665 ; (127.5/(510-152)
 kBlendNonPremultTbl.Data153: F$  0.35714285714286 ; (127.5/(510-153)
 kBlendNonPremultTbl.Data154: F$  0.35814606741573 ; (127.5/(510-154)
 kBlendNonPremultTbl.Data155: F$  0.35915492957747 ; (127.5/(510-155)
 kBlendNonPremultTbl.Data156: F$  0.36016949152542 ; (127.5/(510-156)
 kBlendNonPremultTbl.Data157: F$  0.36118980169972 ; (127.5/(510-157)
 kBlendNonPremultTbl.Data158: F$  0.36221590909091 ; (127.5/(510-158)
 kBlendNonPremultTbl.Data159: F$  0.36324786324786 ; (127.5/(510-159)
 kBlendNonPremultTbl.Data160: F$  0.36428571428571 ; (127.5/(510-160)
 kBlendNonPremultTbl.Data161: F$  0.36532951289398 ; (127.5/(510-161)
 kBlendNonPremultTbl.Data162: F$  0.36637931034483 ; (127.5/(510-162)
 kBlendNonPremultTbl.Data163: F$  0.36743515850144 ; (127.5/(510-163)
 kBlendNonPremultTbl.Data164: F$  0.36849710982659 ; (127.5/(510-164)
 kBlendNonPremultTbl.Data165: F$  0.3695652173913 ; (127.5/(510-165)
 kBlendNonPremultTbl.Data166: F$  0.37063953488372 ; (127.5/(510-166)
 kBlendNonPremultTbl.Data167: F$  0.37172011661808 ; (127.5/(510-167)
 kBlendNonPremultTbl.Data168: F$  0.37280701754386 ; (127.5/(510-168)
 kBlendNonPremultTbl.Data169: F$  0.37390029325513 ; (127.5/(510-169)
 kBlendNonPremultTbl.Data170: F$  0.375 ; (127.5/(510-170)
 kBlendNonPremultTbl.Data171: F$  0.37610619469027 ; (127.5/(510-171)
 kBlendNonPremultTbl.Data172: F$  0.37721893491124 ; (127.5/(510-172)
 kBlendNonPremultTbl.Data173: F$  0.37833827893175 ; (127.5/(510-173)
 kBlendNonPremultTbl.Data174: F$  0.37946428571429 ; (127.5/(510-174)
 kBlendNonPremultTbl.Data175: F$  0.38059701492537 ; (127.5/(510-175)
 kBlendNonPremultTbl.Data176: F$  0.38173652694611 ; (127.5/(510-176)
 kBlendNonPremultTbl.Data177: F$  0.38288288288288 ; (127.5/(510-177)
 kBlendNonPremultTbl.Data178: F$  0.38403614457831 ; (127.5/(510-178)
 kBlendNonPremultTbl.Data179: F$  0.38519637462236 ; (127.5/(510-179)
 kBlendNonPremultTbl.Data180: F$  0.38636363636364 ; (127.5/(510-180)
 kBlendNonPremultTbl.Data181: F$  0.38753799392097 ; (127.5/(510-181)
 kBlendNonPremultTbl.Data182: F$  0.38871951219512 ; (127.5/(510-182)
 kBlendNonPremultTbl.Data183: F$  0.38990825688073 ; (127.5/(510-183)
 kBlendNonPremultTbl.Data184: F$  0.39110429447853 ; (127.5/(510-184)
 kBlendNonPremultTbl.Data185: F$  0.39230769230769 ; (127.5/(510-185)
 kBlendNonPremultTbl.Data186: F$  0.39351851851852 ; (127.5/(510-186)
 kBlendNonPremultTbl.Data187: F$  0.39473684210526 ; (127.5/(510-187)
 kBlendNonPremultTbl.Data188: F$  0.39596273291926 ; (127.5/(510-188)
 kBlendNonPremultTbl.Data189: F$  0.39719626168224 ; (127.5/(510-189)
 kBlendNonPremultTbl.Data190: F$  0.3984375 ; (127.5/(510-190)
 kBlendNonPremultTbl.Data191: F$  0.39968652037618 ; (127.5/(510-191)
 kBlendNonPremultTbl.Data192: F$  0.40094339622642 ; (127.5/(510-192)
 kBlendNonPremultTbl.Data193: F$  0.40220820189274 ; (127.5/(510-193)
 kBlendNonPremultTbl.Data194: F$  0.40348101265823 ; (127.5/(510-194)
 kBlendNonPremultTbl.Data195: F$  0.40476190476191 ; (127.5/(510-195)
 kBlendNonPremultTbl.Data196: F$  0.40605095541401 ; (127.5/(510-196)
 kBlendNonPremultTbl.Data197: F$  0.4073482428115 ; (127.5/(510-197)
 kBlendNonPremultTbl.Data198: F$  0.40865384615385 ; (127.5/(510-198)
 kBlendNonPremultTbl.Data199: F$  0.40996784565916 ; (127.5/(510-199)
 kBlendNonPremultTbl.Data200: F$  0.41129032258065 ; (127.5/(510-200)
 kBlendNonPremultTbl.Data201: F$  0.4126213592233 ; (127.5/(510-201)
 kBlendNonPremultTbl.Data202: F$  0.41396103896104 ; (127.5/(510-202)
 kBlendNonPremultTbl.Data203: F$  0.41530944625407 ; (127.5/(510-203)
 kBlendNonPremultTbl.Data204: F$  0.41666666666667 ; (127.5/(510-204)
 kBlendNonPremultTbl.Data205: F$  0.41803278688525 ; (127.5/(510-205)
 kBlendNonPremultTbl.Data206: F$  0.41940789473684 ; (127.5/(510-206)
 kBlendNonPremultTbl.Data207: F$  0.42079207920792 ; (127.5/(510-207)
 kBlendNonPremultTbl.Data208: F$  0.42218543046358 ; (127.5/(510-208)
 kBlendNonPremultTbl.Data209: F$  0.42358803986711 ; (127.5/(510-209)
 kBlendNonPremultTbl.Data210: F$  0.425 ; (127.5/(510-210)
 kBlendNonPremultTbl.Data211: F$  0.42642140468227 ; (127.5/(510-211)
 kBlendNonPremultTbl.Data212: F$  0.42785234899329 ; (127.5/(510-212)
 kBlendNonPremultTbl.Data213: F$  0.42929292929293 ; (127.5/(510-213)
 kBlendNonPremultTbl.Data214: F$  0.43074324324324 ; (127.5/(510-214)
 kBlendNonPremultTbl.Data215: F$  0.43220338983051 ; (127.5/(510-215)
 kBlendNonPremultTbl.Data216: F$  0.43367346938776 ; (127.5/(510-216)
 kBlendNonPremultTbl.Data217: F$  0.43515358361775 ; (127.5/(510-217)
 kBlendNonPremultTbl.Data218: F$  0.43664383561644 ; (127.5/(510-218)
 kBlendNonPremultTbl.Data219: F$  0.43814432989691 ; (127.5/(510-219)
 kBlendNonPremultTbl.Data220: F$  0.43965517241379 ; (127.5/(510-220)
 kBlendNonPremultTbl.Data221: F$  0.44117647058824 ; (127.5/(510-221)
 kBlendNonPremultTbl.Data222: F$  0.44270833333333 ; (127.5/(510-222)
 kBlendNonPremultTbl.Data223: F$  0.44425087108014 ; (127.5/(510-223)
 kBlendNonPremultTbl.Data224: F$  0.4458041958042 ; (127.5/(510-224)
 kBlendNonPremultTbl.Data225: F$  0.44736842105263 ; (127.5/(510-225)
 kBlendNonPremultTbl.Data226: F$  0.44894366197183 ; (127.5/(510-226)
 kBlendNonPremultTbl.Data227: F$  0.45053003533569 ; (127.5/(510-227)
 kBlendNonPremultTbl.Data228: F$  0.45212765957447 ; (127.5/(510-228)
 kBlendNonPremultTbl.Data229: F$  0.45373665480427 ; (127.5/(510-229)
 kBlendNonPremultTbl.Data230: F$  0.45535714285714 ; (127.5/(510-230)
 kBlendNonPremultTbl.Data231: F$  0.45698924731183 ; (127.5/(510-231)
 kBlendNonPremultTbl.Data232: F$  0.45863309352518 ; (127.5/(510-232)
 kBlendNonPremultTbl.Data233: F$  0.46028880866426 ; (127.5/(510-233)
 kBlendNonPremultTbl.Data234: F$  0.46195652173913 ; (127.5/(510-234)
 kBlendNonPremultTbl.Data235: F$  0.46363636363636 ; (127.5/(510-235)
 kBlendNonPremultTbl.Data236: F$  0.46532846715329 ; (127.5/(510-236)
 kBlendNonPremultTbl.Data237: F$  0.46703296703297 ; (127.5/(510-237)
 kBlendNonPremultTbl.Data238: F$  0.46875 ; (127.5/(510-238)
 kBlendNonPremultTbl.Data239: F$  0.47047970479705 ; (127.5/(510-239)
 kBlendNonPremultTbl.Data240: F$  0.47222222222222 ; (127.5/(510-240)
 kBlendNonPremultTbl.Data241: F$  0.47397769516729 ; (127.5/(510-241)
 kBlendNonPremultTbl.Data242: F$  0.47574626865672 ; (127.5/(510-242)
 kBlendNonPremultTbl.Data243: F$  0.47752808988764 ; (127.5/(510-243)
 kBlendNonPremultTbl.Data244: F$  0.47932330827068 ; (127.5/(510-244)
 kBlendNonPremultTbl.Data245: F$  0.4811320754717 ; (127.5/(510-245)
 kBlendNonPremultTbl.Data246: F$  0.48295454545455 ; (127.5/(510-246)
 kBlendNonPremultTbl.Data247: F$  0.48479087452472 ; (127.5/(510-247)
 kBlendNonPremultTbl.Data248: F$  0.48664122137405 ; (127.5/(510-248)
 kBlendNonPremultTbl.Data249: F$  0.48850574712644 ; (127.5/(510-249)
 kBlendNonPremultTbl.Data250: F$  0.49038461538462 ; (127.5/(510-250)
 kBlendNonPremultTbl.Data251: F$  0.49227799227799 ; (127.5/(510-251)
 kBlendNonPremultTbl.Data252: F$  0.49418604651163 ; (127.5/(510-252)
 kBlendNonPremultTbl.Data253: F$  0.49610894941634 ; (127.5/(510-253)
 kBlendNonPremultTbl.Data254: F$  0.498046875 ; (127.5/(510-254)
 kBlendNonPremultTbl.Data255: F$  0.5] ; (127.5/(510-255)


[<16 Float_Half4: F$ 0.5, 0.5, 0.5, 0.5]
[<16 Float_510_Inv4: F$ (1/510.0), F$ (1/510.0), F$ (1/510.0), F$ (1/510.0)]
[<16 Float255: F$ 255]
[<16 Float_HalfPixel: F$ 127.5]


SSE2 macros used:

[SSE2_CONV4BYTE_TO_4FLOAT | punpcklbw #1 #1 | punpcklwd #1 #1 | psrld #1 24 | cvtdq2ps #1 #1]

; using values from 0 to 3
[SHUFFLE | (255 - ((#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4))]

; using values from 3 to 0
[SHUFFLE_INV | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Marinus/Sieekmanski
[SSE_INVERT_DWORDS 27]      ; invert the order of dwords
[SSE2_CONV_BYTE_TO_4BYTE | punpcklbw #1 #1 | punpcklwd #1 #1]
[SSE2_EXTRACT_FIRST_BYTES | movdqu xmm7 #1 | psrldq xmm7 3 | orpd #1 xmm7 | psrldq xmm7 3 | orpd #1 xmm7 | psrldq xmm7 3 | orpd #1 xmm7]

[pshufd | pshufd #1 #2 #3]
[shufps | shufps #1 #2 #3]
[shufpd | shufpd #1 #2 #3]
[pshuflw | pshuflw #1 #2 #3]
[pshufhw | pshufhw #1 #2 #3]
[SSE_CONV_4FLOAT_TO_4INT | cvttps2dq #1 #1]

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

continuation...

Function:

Proc BlendPixelNonPremult:
    Arguments @Src, @Dst
    Local @Src_A
    Uses edx, ecx

    xorps xmm5 xmm5
    mov esi D@Dst
    mov eax D@Src | shr eax 24 | movzx eax al | mov D@Src_A eax ; Alpha of Source A
    ..If eax <> 0

        mov edx D@Dst | shr edx 24 | movzx edx dl ; Alpha of Destination A

        .If_Or dl = 0, al = 255 ; alpha in destination = 0 or alpha in src = 255

            mov eax D@Src

        .Else_If dl = 255 ; Alpha in dst is 255 (Opaque)

            ; calculate B/2 + A*x/510

            movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0 ; A
            movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1 ; B
            ; get alpha from src
            pshufd xmm6 xmm0 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from src = x
            mulps xmm6 X$Float_510_Inv4
            mulps xmm0 xmm6 ; A*x/510
            mulps xmm1 X$Float_Half4 ; B/2
            addps xmm0 xmm1
            SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0 ; convert them to integer
            SSE2_EXTRACT_FIRST_BYTES xmm0   ; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0
            movd eax xmm0
            ; now we need to place the original value of Src Alpha back on its place
            mov edx D@Src
            shr edx 24 | shl edx 24
            shl eax 8 | shr eax 8
            or eax edx

        .Else_If al <> dl

            movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0 ; A
            movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1 ; B
            ; get alpha from src and dst
            pshufd xmm6 xmm0 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from src = x
            pshufd xmm7 xmm1 {SHUFFLE 0,0,0,0} ; xmm6 = alpha from dst = y

            ; Let´s do (255/2)*(A*x+B*y)/(255*(x+y)-(x*y))

            ; calculate denominator denominator = (255*(x+y)-(x*y))
            addss xmm6 xmm7 | mulss xmm6 X$Float255
            pshufd xmm6 xmm6 SSE_INVERT_DWORDS ; invert the order
            ; now do x*y, but put the result in xmm7
            mulss xmm7 xmm6
            ; invert again
            pshufd xmm6 xmm6 SSE_INVERT_DWORDS ; invert the order
            subss xmm6 xmm7
            movups xmm5 X$Float_HalfPixel | divss xmm5 xmm6 ; now we have as amultiplicand: (255/2)/(255*(x+y)-(x*y))

            pshufd xmm5 xmm5 {SHUFFLE 0,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin
            ;pshufd xmm5 xmm5 {SHUFFLE 3,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin

            ; restore the original alpha values
            pshufd xmm6 xmm6 {SHUFFLE 1,1,1,1}
            pshufd xmm7 xmm7 {SHUFFLE 1,1,1,1}

            ; calculate (A*x+B*y) = The rest of our numerator
            mulps xmm0 xmm6 ; A*x
            mulps xmm1 xmm7 ; B*y
            addps xmm0 xmm1 ; (A*x+B*y)

            ; multiply by our new numerator from xmm5
            mulps xmm0 xmm5

            ; and finally, convert them to integer
            SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0
            ; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0
            SSE2_EXTRACT_FIRST_BYTES xmm0
            movd eax xmm0
            ; now we need to place the original value of Src Alpha back on its place
            mov edx D@Src_A
            shl edx 24
            or eax edx
        .Else
            ; src alpha = dst alpha
            movd xmm0 D@Src | SSE2_CONV4BYTE_TO_4FLOAT xmm0
            movd xmm1 D@Dst | SSE2_CONV4BYTE_TO_4FLOAT xmm1
            movss xmm5 X$kBlendNonPremultTbl+eax*4
            pshufd xmm5 xmm5 {SHUFFLE 0,3,3,3} ; all 3 floats has the same value to be multiplied. The Alpha will remain 0, since we xored xmm7 at the begin
            ; calculate (A+B)*k
            addps xmm0 xmm1 ; (A+B)
            mulps xmm0 xmm5 ; (A+B)*k

            ; and finally, convert them to integer
            SSE_CONV_4FLOAT_TO_4INT xmm0 xmm0
            ; convert the 1st byte on each one of the 4 dwords to a single dword in xmm0
            SSE2_EXTRACT_FIRST_BYTES xmm0
            movd eax xmm0

            ; now we need to place the original value of Src Alpha back on its place
            shl edx 24
            or eax edx

        .End_If
    ..Else
        ; alpha in scr = 0
        mov eax D@Dst
    ..End_If

EndP

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

also the other function BlendPixelPremult used, is faster on simple x86 opcodes.

; called in BlendPixelRowPremult
; A + B*(1-x/255)
Proc BlendPixelPremult:
    Arguments @Src, @Dst
    Uses edx, ecx, esi

    mov eax D@Src
    mov edx D@Dst
    mov ecx eax | shr ecx 24 | movzx ecx cl | neg ecx | add ecx 256
    mov esi edx | and esi 0FF00FF | imul esi ecx | shr esi 8 | and esi 0FF00FF
    shr edx 8 | and edx 0FF00FF | imul edx ecx | and edx 0FF00FF00
    or esi edx
    add eax esi

EndP
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com