Skip to content

Conversation

ZERICO2005
Copy link
Contributor

@ZERICO2005 ZERICO2005 commented Sep 30, 2025

Added multiply high signed/unsigned routines. These can be used to optimize division by a constant. __smulhu is optimized, but the rest are not well optimized. They use the exact same calling convention as the regular multiplication routines. We can optimize these routines in later PR's.

__smulhu   :         HL = ((uint32_t)         HL * (uint32_t)      BC) >> 16
__imulhu   :        UHL = ((uint48_t)        UHL * (uint48_t)     UBC) >> 24
__lmulhu   :      E:UHL = ((uint64_t)      E:UHL * (uint64_t)   A:UBC) >> 32
__i48mulhu :    UDE:UHL = ((uint96_t)    UDE:UHL * (uint96_t) UIY:UBC) >> 48
__llmulhu  : BC:UDE:UHL = ((uint128_t)BC:UDE:UHL * (uint128_t) (SP64)) >> 64

__smulhs   :         HL = ((int32_t)          HL * (int32_t)       BC) >> 16
__imulhs   :        UHL = ((int48_t)         UHL * (int48_t)      UBC) >> 24
__lmulhs   :      E:UHL = ((int64_t)       E:UHL * (int64_t)    A:UBC) >> 32
__i48mulhs :    UDE:UHL = ((int96_t)     UDE:UHL * (int96_t)  UIY:UBC) >> 48
__llmulhs  : BC:UDE:UHL = ((int128_t) BC:UDE:UHL * (int128_t)  (SP64)) >> 64
__smulhu   :  32 bytes |  33F +  12R +   9W +  17
__imulhu   : 117 bytes | 118F +  39R +  38W +  37
__lmulhu   : 1 call to __llmulu
__i48mulhu :  93 bytes | 902F + 246R + 182W + 344
__llmulhu  : 4 calls to __llmulu

__bmulhu was not added since it is just mlt bc \ ld a, b (and the 8-bit calling convention is not well defined).

In the case of __llmulhu, it is a lot faster than compiler code as it avoids expensive 64bit shift operations. While there are a few micro optimizations that can be applied to each routine, the real performance boost would come from calling __u32_mul_u32_to_u64 instead of calling __llmulu for example.

@ZERICO2005 ZERICO2005 marked this pull request as draft September 30, 2025 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

1 participant