-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update BitBlt support (primarily for 64-bit ARM) #565
base: Cog
Are you sure you want to change the base?
Commits on May 4, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 9ebc245 - Browse repository at this point
Copy the full SHA 9ebc245View commit details -
Correct various "#if ENABLE_FAST_BLT" to "#ifdef"
ENABLE_FAST_BLT is typically not assigned a value even when it is defined, so "#if" form is tecnically a syntax error.
Configuration menu - View commit details
-
Copy full SHA for 38c4283 - Browse repository at this point
Copy the full SHA 38c4283View commit details -
Don't assume sourcePPW is valid on entry to copyBitsFallback
This is not the case when being called from "fuzz" or "bench" test applications. It may also not be accurate if a fast path has been synthesised from a combination of copyBitsFallback and one or more other fast paths.
Configuration menu - View commit details
-
Copy full SHA for 43ce975 - Browse repository at this point
Copy the full SHA 43ce975View commit details -
Fallback routines need extra help to detect intra-image operations
In some places, sourceForm and destForm were being compared to determine which code path to follow. However, when being called from fuzz or other test tools, these structures aren't used to pass parameters, so the pointers haven't been initialised and default to 0, so the wrong code path is followed. Detect such cases and initialise them from sourceBits and destBits instead, since these will perform the same under equality tests.
Configuration menu - View commit details
-
Copy full SHA for df667ca - Browse repository at this point
Copy the full SHA df667caView commit details -
Remove invalid shortcut in rgbComponentAlphawith
This shortcut is triggered more frequently than it used to be, due to improvements in copyLoop() that avoid buffer overruns.
Configuration menu - View commit details
-
Copy full SHA for 51df83e - Browse repository at this point
Copy the full SHA 51df83eView commit details -
Fix bug in 32-bit ARM fast paths
When classed as "wide" because each line is long enough to warrant pipelined prefetching as we go along, the inner loop is unrolled enough that there is at least one prefetch instruction per iteration. Loading the source image can only be done in atoms of 32 bit words due to big-endian packing, so when destination pixels are 8 times wider (or more) than source pixels, the loads happen less frequently than the store atoms (quadwords) and a conditional branch per subblock is required to decide whether to do a load or not, depending on the skew and the number of pixels remaining to process. The 'x' register is only updated once per loop, so an assembly-time constant derived from the unrolling subblock number needs to be factored in, but since the number of pixels remaining decreases as the subblock number increases, this should have been a subtraction. In practice, since only the least-significant bits of the result matter, addition and subtraction behave the same when the source:destination pixel ratio is 8, so the only operations affected were 1->16bpp, 2->32bpp and 1->32bpp. The exact threshold that counts as "wide" depends on the prefetch distance that was selected empirically, but typically would require an operation that is several hundreds of pixels wide.
Configuration menu - View commit details
-
Copy full SHA for a64c5b6 - Browse repository at this point
Copy the full SHA a64c5b6View commit details -
In fastPathDepthConv (which combines sourceWord colour-depth conversion with another fast path for another combinationRule at a constant colour depth) and fastPathRightToLeft, it could overflow the temporary buffer and thereby corrupt other local variables if the last chunk of a pixel row was 2048 bytes (or just under). This was most likely to happen with 32bpp destination images and widths of about 512 pixels.
Configuration menu - View commit details
-
Copy full SHA for b577ab2 - Browse repository at this point
Copy the full SHA b577ab2View commit details -
Fix corruption bugs with wide 1bpp source images
For images that were wide enough to invoke intra-line preloads, there was a register clash between the preload address calculation and one of the registers holding the deskewed source pixels (this only occurred once per destination cacheline).
Configuration menu - View commit details
-
Copy full SHA for 9084c17 - Browse repository at this point
Copy the full SHA 9084c17View commit details -
Fix type of halftone array for 64-bit targets
The halftone array is accessed using a hard-coded multiplier of 4 bytes, therefore the type of each element needs to be 32 bit on every platform. `sqInt` is not appropriate for this use, since it is a 64-bit type on 64-bit platforms. Rather than unilaterally introduce C99 stdint types, use `unsigned int` since this wil be 32-bit on both current fast path binary targets.
Configuration menu - View commit details
-
Copy full SHA for 2b0279a - Browse repository at this point
Copy the full SHA 2b0279aView commit details -
Detect and add a new fast path flag for effective-1bpp colour maps
Sometimes, colour maps are used such that all entries except the first contain the same value. Combined with the fact that only source colour 0 uses colour map entry 0 (any other colours for which all non-0 bits would otherwise be discarded during index generation are forced to use entry 1 instead), this effectively acts as a 2-entry (or 1bpp) map, depending on whether the source colour is 0 or not. This is far more efficiently coded in any fast path by a test against zero, than by a table lookup - it frees up 2 KB, 16 KB or 128 KB of data cache space, depending on whether a 9-, 12- or 15-bit colour map was used. There is an up-front cost to scanning the colour map to see if its entries are of this nature, however in most "normal" colour maps, this scan will rapidly be aborted.
Configuration menu - View commit details
-
Copy full SHA for e22ae0b - Browse repository at this point
Copy the full SHA e22ae0bView commit details -
C fast path for 32bpp alphaBlend
This runs approx 2.6x faster when benchmarked on Cortex-A72 in AArch64.
Configuration menu - View commit details
-
Copy full SHA for e44e2c8 - Browse repository at this point
Copy the full SHA e44e2c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 45649aa - Browse repository at this point
Copy the full SHA 45649aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 405f35b - Browse repository at this point
Copy the full SHA 405f35bView commit details -
Configuration menu - View commit details
-
Copy full SHA for dac723f - Browse repository at this point
Copy the full SHA dac723fView commit details -
Apply scalar halftoning to colour map entries instead for 32bpp desti…
…nation This makes better use of existing fast paths, and applies to all platforms.
Configuration menu - View commit details
-
Copy full SHA for 80cd2da - Browse repository at this point
Copy the full SHA 80cd2daView commit details -
Configuration menu - View commit details
-
Copy full SHA for e4a27ec - Browse repository at this point
Copy the full SHA e4a27ecView commit details -
Configuration menu - View commit details
-
Copy full SHA for 10d8a11 - Browse repository at this point
Copy the full SHA 10d8a11View commit details