Bot releases are hidden (Show)

FEX - FEX-2405 Latest Release

Published by Sonicadvance1 6 months ago

Read the blog post at FEX-Emu's Site!

One month older and we have a new release for FEX. This month is a little bit slower for user facing features but behind the scenes we had a large amount of refactoring to facilitate future improvements.

Support OpenGL and Vulkan thunking with forwarding X11

A thorn in FEX's side has been forwarding X11 calls when thunking OpenGL and Vulkan. This has caused us pain since X11's API is a fairly leaky
abstraction that leaks data symbols which FEX's thunking can't accurately thunk. We have now instead redirect the X11 Display object directly in
OpenGL and Vulkan. This not only reduces the amount of code we need to thunk, but also is required for us to eventually thunk 32-bit libraries.

This may change behaviour in some games when thunks are enabled, so it will be useful to do some testing and regression testing.

Enable Enhanced REP MOVS when memcpy TSO is disabled

Alongside last month's optimization when optimizing memcpy instructions, we now enable this CPUID bit when memcpy TSO emulation is disabled. This
means that glibc will take advantage of the optimization when it is doing memcpy and memset operations. This is a minor performance improvement
depending on the application.

Implement support for SMSW instruction

This instruction isn't too notable since all it does on recent x86 CPUs is return the same data no matter what, but legacy CPUs it was useful for
checking if x87 was supported. As this is considered a system level instruction, FEX didn't implement it originally but we finally found a game that
uses it. The original Far Cry game from 2004 uses this instruction for some reason. Now that we have implemented the instruction the game at least
gets to the menus but seems to still stall out when going in-game. Kind of neat!

Fary Cry running under FEX

Fix disabling TSO emulation on some stack accesses

When emulating the x86 memory model, we can get away with not emulating it when a thread accesses its stack. This works since we know that stack
accesses are usually not shared between threads. While we usually disabled the TSO emulation in these cases, we had accidentally missed some cases.
This will mean that there is some performance improvements for "free."

Fix ADC and SBC instructions

Back in FEX-2403 we had landed some optimizations for these instructions. Turns out that we had inadvertently introduced some broken behaviour as an
edge case. Most games didn't hit these edge cases so they were fine but it completely broke rendering in the Steam release of Final Fantasy 7.
With that fixed, we now get correct rendering again!

Final Fantasy 7 under FEX

Option to disable half-barrier TSO emulation on unaligned atomics

On x86 most atomic instructions don't care about alignment, with a feature that Intel calls "Split-locks" to even support unaligned atomics that cross
a cacheline. On ARM we don't have this luxury, where all atomic instructions need to be naturally aligned to their size. Newer ARM CPUs added a
feature that allows unaligned atomics inside of a 16-byte region, but if the data crosses the edge then FEX needs to handle this instruction in a
different way. We backpatch the instruction stream with a "half-barrier" access which still emulates the x86 memory model but is exceedingly heavy.

Now we have an option to convert these "half-barrier" implementations in to non-atomic access. While this doesn't match true x86 behaviour, this can
accelerate some games that heavily abuse unaligned atomic memory accesses. So if a game is running slow, this is an option to try out!

FEXConfig option

Refactor code using clang-format

As alluded to at the start, the FEX codebase has now been completely refactored using clang-format to ensure a more consistent coding experience. We
provide a helper script and CI enforcement to ensure this stays in place. This took quite a bit of work to ensure the feature was up to everyone's
standards. Major thanks to everyone that worked on this!

Eliminate cross-block liveness for instructions

This is preparation work for future JIT improvements and speedups. Cross-block liveness makes our register allocator more complex and by removing this
from our JIT we can start working on improving the register allocator. Look forward to register allocator improvements in the coming months.

Support arm64ec function calls

This changes how some of FEX's JIT infrastructure works in order to support the ARM64ec windows ABI. This will get used in conjunction with Wine's
ARM64ec support. While still a work in progress, this is starting to become interesting as WINE continues to implement more features to handle this
case.

Video game showcase

YouTube Link

Raw Changes

FEX Release FEX-2405

Allocator
Fixes compiling on Fedora 40 (https://github.com/FEX-Emu/FEX/commit/7b88b0f2711b1ee9f2f47f7547b6ee3d2a184d90)
AllocatorHooks
Mark JIT code memory as EC code on ARM64EC (https://github.com/FEX-Emu/FEX/commit/bb24e1419c5ef07ee9d93b2d77fa08c8253902cf)
AppConfig
Disable libGL forwarding for steamwebhelper (https://github.com/FEX-Emu/FEX/commit/02ebb6e32074ca26d60727b998d39fe04a2a2465)
Arm64
Adds another TSO hack to disable half-barrier TSO (https://github.com/FEX-Emu/FEX/commit/1069cabad068c2b2e0aedae3ac874936c0f621d3)
CI
Drop use of obsolete ENABLE_X86_HOST_DEBUG setting (https://github.com/FEX-Emu/FEX/commit/c126b209f12c8f0cf23afe6d0ff0ab2d1d4036ec)
CPUID
Removes FEX version string from CPU model name (https://github.com/FEX-Emu/FEX/commit/faa494c28884af61f5520bb20ec3a1f0e4948c21)
Fix inverted RDTSCP check (https://github.com/FEX-Emu/FEX/commit/9781b957d06b5f61f1b3e140d3a12580d1707072)
Enable enhanced rep movs in more situations (https://github.com/FEX-Emu/FEX/commit/68e543c81dcb33325ee4c12b699034ed16cbe484)
DebugData
Remove header (https://github.com/FEX-Emu/FEX/commit/1616b4e77cdafe85742b496918bcc34a4f2782bb)
FEXCore
Support x64 -> arm64ec calls (https://github.com/FEX-Emu/FEX/commit/1a8b61b9fc153be667a9342cac9f536e6efe3d11)
FHU
Switch over win32 file operations to std::filesystem (https://github.com/FEX-Emu/FEX/commit/ae5e388e1a8d47148990505ca1c480adac2e08e6)
IR
Remove VectorZero (https://github.com/FEX-Emu/FEX/commit/a0f2cae1cbf999d47915d94fa9ce938adad763b1)
JIT
fix neon vec4 faddv (https://github.com/FEX-Emu/FEX/commit/fe70ec727782471f89ffbc88739d1724aa29f49c)
fix ShiftFlags shuffles (https://github.com/FEX-Emu/FEX/commit/376936c808fba5e852c387a48e447b4d721ce05f)
Adds support for spilling/Filling GPRPair (https://github.com/FEX-Emu/FEX/commit/eedb120fd0c25bf02cffe88d9814aa1ae16bb3ae)
LookupCache
Track ARM64EC page state in the code cache (https://github.com/FEX-Emu/FEX/commit/f0dad86332850c75243e987933c6b4a5d3b6dfb7)
OpcodeDispatcher
Implement support for SMSW (https://github.com/FEX-Emu/FEX/commit/20bd86473b7b7e669cced6655f81cfa101fcbaff)
Fixes disabling TSO access on RSP SIB stores (https://github.com/FEX-Emu/FEX/commit/1ba678f6316dda36db46ea7b676a2530f8d4d7ce)
Add helper for making segment offset addresses (https://github.com/FEX-Emu/FEX/commit/66cbb6673271ed4396a4c05c6ed2030a6d1cd185)
RCLSE
disable store-after-store optimization (https://github.com/FEX-Emu/FEX/commit/5cb11aed3d046fff1d7fa0b9fb1a206408bb0c11)
X87
Simplify constant loading for FLD family (https://github.com/FEX-Emu/FEX/commit/271700e9f6da257028a1d3816f5921ee69e7d8d7)
Misc
Library Forwarding: Support libGL/libvulkan without forwarding libX11 (https://github.com/FEX-Emu/FEX/commit/b33e0e38394960c49d85a88ef3d5dc73b9afa5f3)
clang-format: allow overriding clang-format (https://github.com/FEX-Emu/FEX/commit/3fda47e870c7220a64b02e1210ecf3361ee4da2a)
Fix 8/16-bit ADC/SBC (https://github.com/FEX-Emu/FEX/commit/9c6f749ebeea4d02d8db67a965830f62c9ff1107)
Library Forwarding: Fix issues with libGL's fake X11 dependency (https://github.com/FEX-Emu/FEX/commit/81a420680579ed97e9a2ab478de7c5d39b32b4e1)
Factor out SetWriteCursorBefore (https://github.com/FEX-Emu/FEX/commit/a0bf6a4255695c5fd651f22fd90795ff2ca95c83)
WOW64 backend code invalidation fixes (https://github.com/FEX-Emu/FEX/commit/81c219c2129e43a6210ab2bb994d371b365003be)
Clang Format file and script (https://github.com/FEX-Emu/FEX/commit/53bed6de6a5bed0b1896b5a7a4b189c4820a80f9)
CI workflow to check clang-format usage on pull requests (https://github.com/FEX-Emu/FEX/commit/f9097c0a92ea500a3814711c55fb45768562fcf6)
Add second reformat to git blame ignore file (https://github.com/FEX-Emu/FEX/commit/55c054bd8e764325ae5fcba901b21b132f3822f1)
Reformat until fixed-point (https://github.com/FEX-Emu/FEX/commit/07e58f8db745b140b7b13641e8e427a606363a0d)
Create .git-blame-ignore-revs with whole-tree reformat sha (https://github.com/FEX-Emu/FEX/commit/7e97db8d9879db18dd9f1467cee2845348f7cb35)
Remove trace of clang-tidy experiment from CMakeLists.txt (https://github.com/FEX-Emu/FEX/commit/7614ac9f146d60cb002fb575df533ceb1b906c56)
Whole-tree reformat (https://github.com/FEX-Emu/FEX/commit/1d806ac21848937126fb0b486ee5ad363518b5c4)
Move const to the left in preparation for reformatting (https://github.com/FEX-Emu/FEX/commit/028c220041f5cebb714b698055f69f1e0017164e)
Enable jemalloc for ARM64EC (https://github.com/FEX-Emu/FEX/commit/a9b7ad841c428d28077878a43992bfaa759cc5f4)
Validate that we have no crossblock liveness (https://github.com/FEX-Emu/FEX/commit/e91e1d553308a110942b6f540a5d4bf68f8772c0)
Eliminate xblock liveness for shifts (https://github.com/FEX-Emu/FEX/commit/cbfa426b594e78abf10505329acffbfab2e330f5)
7 hours ago 7c79e5d zip tar.gz

FEX - FEX-2404

Published by Sonicadvance1 6 months ago

Read the blog post at FEX-Emu's Site!

After last month having an absolute ton of improvements, this month of changes is going to look positively tiny in comparison. We have some good new
options for tinkering with FEX's behaviour and more performance improvements. Let's get in to it!

Implement more memory model emulation toggles

The biggest performance hit with FEX's x86 emulation has always been emulating the memory model of x86. ARM has added various extensions over the years to make this emulation faster but it still isn't enough.

FEAT_LSE - Adds a bunch of atomic memory instructions
- Original ARMv8.0 doesn't support this. Massive impact on performance.
FEAT_LSE2 - Adds unaligned atomics (within a 16-byte granule) to improve performance of x86 atomics.
- Doesn't quite cover the full 64-byte cacheline of unaligned atomics that x86 supports
FEAT_LRCPC - Adds new load instructions which match the x86 memory model
FEAT_LRCPC2 - Adds even more loadstore instructions which match x86 instructions
FEAT_LRCPC3 - Adds even more, including vector loadstore instructions
- No hardware today supports this extension

Even with this set of extensions, emulating x86's memory model can have near a 10x performance hit. This performance impact is most felt in games because they use vector instructions very heavily, which is because of the lack of the FEAT_LRCPC3 extension.
With this in mind, we are introducing some sub-options around emulating x86's TSO memory model to try and lessen the impact when we can get away with it. These new options can be found in the FEXConfig Hacks tag.

These two new options are only available for toggling when TSO emulation is enabled. If your CPU supports FEAT_LRCPC and FEAT_LRCPC2 then a recommended configuration is to keep the TSO Enabled option enabled, but disable the Vector and Memcpy options.
While this will incur a performance hit compared to disabling TSO emulation, it is significantly more stable to keep TSO emulation on.

If you still need more performance, then it may be beneficial to turn off TSO emulation entirely. It's unstable though! It's incorrect emulation to gain speed!

Vector TSO enabled

This option enables emulating the memory model around vector loadstore instructions. This has a HUGE performance impact even on latest generation hardware.

Memcpy TSO enabled

This option enables emulating the memory model around x86's REP MOVS and REP STOS instructions. These are used for doing memory copying and memory setting respectively.

The impact of this option depends heavily on the application. This is because most memcpy and memset functions actually use vectors to modify the memory.

JIT core improvements

Once again this month there has been a focus on JIT optimizations. Although this time it might be hard to see what is improving. Overall in benchmarks
there has been roughly a 3% performance improvement. With a mixture of improvements this month being foundational work to lower JIT compile time
overhead in the coming months. As usual there is too much to dive in to each change individually so we'll just have a list.

Optimize LOOP/N/E
Negate more inline constants
Optimize PF calculation using integers rather than vectors
Optimize CLC
Optimize cmpxchg
A bunch of instructions cleaned up and rewritten to remove small amounts of overhead
Improves 32-bit address mode accesses
Implements support for prefetch and rdpid instructions

Optimize memcpy and memset IR operations when TSO emulation is disabled

Speaking of the previous optimization. We have now optimized the implementation of the memcpy and memset instructions to be significantly faster. Sometimes a compiler will inline these instructions which was causing upwards of 5% CPU time doing memory copies.

With this optimization in place we have benchmarked and improvement from 2-3GB/s up to 88GB/s! That'll teach that code to be slow.

Fix memory leaks in thread creation

A memory leak that has occured where FEX would leak some thread stacks when they shutdown. This has now been resolved which lowers memory usage for
long running applications that shutdown threads. In particular this makes Steam consume less RAM.

We have more memory leaks to solve as we move forward but they are significantly less severe than this.

A ton of small cleanups in the code

This month has had a lot of code cleanup in FEX but these aren't user facing so it isn't very interesting. Let it be known although that something
like half the commits this month were cleaning up various bits of code or restructuring which isn't getting a focus.

Raw Changes

FEX Release FEX-2404

Allocator
Cleanup StealMemoryRegions implementation (https://github.com/FEX-Emu/FEX/commit/8a3d08e1d87c946ef3c5f349dca09004dfcfb8e6)
ConstProp
drop dead code (https://github.com/FEX-Emu/FEX/commit/202a60b77a1b4ba38c20584d09c00541e6a9e7a5)
ELFParser
Stop using a VLA (https://github.com/FEX-Emu/FEX/commit/32ec4a3c817c1d5d44ec96b00ab520d2e80ded23)
Externals
Update Catch2 to v3.5.3 (https://github.com/FEX-Emu/FEX/commit/b892da72f3081da5f20adeb6bb6d6eea9b17f161)
FEXCore
Fixes priority of FEX_APP_CONFIG (https://github.com/FEX-Emu/FEX/commit/7786c23405a538c2ae8a7aa1e803f9445ec01169)
Move nearly all IR definitions to internal (https://github.com/FEX-Emu/FEX/commit/e2a095372ec1a4980d54f0a6f04aade8a6ff382f)
Moves CodeLoader to frontend (https://github.com/FEX-Emu/FEX/commit/3bed305660668fd1891643ceaff320d6644869df)
Moves CPUBackend definition internal (https://github.com/FEX-Emu/FEX/commit/f6639c35945d4f3dd8f2f8efe8af40fce02a6054)
Remove DebugStore map (https://github.com/FEX-Emu/FEX/commit/2ad170b7d787b5740615bfab161e117e99a84e94)
Adds more TSO control levers (https://github.com/FEX-Emu/FEX/commit/24fd28ed9e3e4011a611df4de5dd752463be8f4a)
Removes vestigial mman SMC checking (https://github.com/FEX-Emu/FEX/commit/542f45463027f34f52085a805dc8f5b60e400429)
Fallback to the memcpy slow path for overlaps within 32 bytes (https://github.com/FEX-Emu/FEX/commit/167896dc9d97ebe431fde1220392cb2e702081dd)
Add non-atomic Memcpy and Memset IR fast paths (https://github.com/FEX-Emu/FEX/commit/7dcacfe9909488365035fff2606db20c363d1576)
FEXLoader
Add a way to sleep a process on startup (https://github.com/FEX-Emu/FEX/commit/624bc3fce5e49b86205974b041ba7b478837c6be)
Add some debug-only tracking for FEX owned FDs (https://github.com/FEX-Emu/FEX/commit/79454ed8a6932214de02f5db205a1b0b4e87ecc3)
InstcountCI
Adds a block that is causing panic spilling (https://github.com/FEX-Emu/FEX/commit/150af80f3fbf19f0e4cc5117f2eadaafb397ccf0)
IoctlEmulation
Add missing nouveau ioctl (https://github.com/FEX-Emu/FEX/commit/6d94d794091c1e04400863bf8e1a2f8a95245a85)
JIT
Optimize pmovmaskb with a named vector constant (https://github.com/FEX-Emu/FEX/commit/ab8ee64352e79ca70be1e0933de7ebe0fcdfd166)
Linux
Expose support for v6.8 (https://github.com/FEX-Emu/FEX/commit/e33a76a2ad960d650dc46666eb9d7cb53911ec08)
Threads
Fixes a stack memory leak for pthreads (https://github.com/FEX-Emu/FEX/commit/3d31291c3d9fbdcd0eb843ba0f9c9455f24c2c6a)
OpcodeDispatcher
clean up shifts (https://github.com/FEX-Emu/FEX/commit/c43af8e975ced26fb61f534aefd00ad96fd90ef9)
drop ZeroMultipleFlags (https://github.com/FEX-Emu/FEX/commit/b632f7215c059bb29a72c86e7041196eacbd0ff5)
eliminate xblock liveness for rcl/rcr (https://github.com/FEX-Emu/FEX/commit/cd9ffd204579371a885303e2c9fed69c3c57948b)
eliminate branch in cmpxchg pair (https://github.com/FEX-Emu/FEX/commit/aa26b6288e5071c12aac96f06f8c8f258d17e6c6)
Fixes 32-bit mode LOOP RCX register usage (https://github.com/FEX-Emu/FEX/commit/76983476b9be2e99a81306219260ffafbb9ae8b6)
optimize LOOP/N/E (https://github.com/FEX-Emu/FEX/commit/8852d94416ec14a0fa22dbf35dea2f1c72d4b2f7)
Implement support for the various prefetch instructions (https://github.com/FEX-Emu/FEX/commit/2a9fcc6a66d8f0dbb9a38370ad06af8bab4a33de)
Implement rdpid (https://github.com/FEX-Emu/FEX/commit/ba3029b1f66abb0716f0506ceadeb38952b2b88c)
RA
drop dead block interference code (https://github.com/FEX-Emu/FEX/commit/f2d001e7219b2af2474bcc630caffee28828b677)
Removes VLA usage (https://github.com/FEX-Emu/FEX/commit/67baff8a57029efe9f43c02ead2d4b0baa3b4e37)
Adds RIP when a block panic spills (https://github.com/FEX-Emu/FEX/commit/a8b59c16d6d27fdcf7d30d8490ff63597e7250ea)
RCLSE
Optimize store-after-store (https://github.com/FEX-Emu/FEX/commit/ca6b2e43e6de07ed29c1debfe49c2ffa2ead2ea3)
Telemetry
Allow redirecting directory that data is written to (https://github.com/FEX-Emu/FEX/commit/970d5d5b1340ec64f1b083553b333946498e0646)
Adds tracker for non-canonical memory access crash (https://github.com/FEX-Emu/FEX/commit/7f90ca53f7439624064daea59cc55e75af74b18f)
Rename old file instead of copying (https://github.com/FEX-Emu/FEX/commit/002ca360f80bb59bee904ebc2b16404d61398550)
Misc
Negate more to inline constants (https://github.com/FEX-Emu/FEX/commit/aa8d04c341764734b851e4cbddbf81a6b304dc90)
Minor cleanups around flags (https://github.com/FEX-Emu/FEX/commit/37f2b417e48bd263cee31a001c4fe13e877d445c)
Use scalar integer code to calculate PF (https://github.com/FEX-Emu/FEX/commit/bd0b5eceb8227b38f34f53e8708fa24f04db8ceb)
Eliminate xblock liveness with rep cmp/lod/scas (https://github.com/FEX-Emu/FEX/commit/e8abc88702dc6feb1710ed1f306f0551a76b9df6)
rewrite ROL/ROR (https://github.com/FEX-Emu/FEX/commit/29c6281e113e1d16a7dff7030eeb0ab49206525f)
Fix reference to out of bounds address in offsetof (https://github.com/FEX-Emu/FEX/commit/4214d9bda0e8b9f69785b776da0eddd8510f4257)
optimize clc (https://github.com/FEX-Emu/FEX/commit/b1ddd8cd3b53f0866f04668240a927daf25f18fd)
Moves FHU TypeDefines to FEXCore includes (https://github.com/FEX-Emu/FEX/commit/5c29c9d464cf4cb32d34d7c852f9b1e0babd93c5)
Optimize cmpxchg with flagm (https://github.com/FEX-Emu/FEX/commit/2a625a467bb1af9bfcdfc96d803b33eb5e6f5e18)
Eliminate crossblock liveness in xsave/xrstor (https://github.com/FEX-Emu/FEX/commit/d25ace43aaf09ae25bf83c07ac574692df6f31d8)
rewrite Demon Addition Adjust (DAA) and other demonic opcodes (https://github.com/FEX-Emu/FEX/commit/7b74ca19315c4abfa02d8c630d9e1aed85acf4d3)
Library Forwarding/vulkan: Fix query of vkCreateInstance function pointer (https://github.com/FEX-Emu/FEX/commit/4ea63059400951b5b8f81e58cebafbab4fc7eb21)
Put <20M in double quotes to avoid truncate error (https://github.com/FEX-Emu/FEX/commit/1450c92b60db10a1244d85acecec8b6a61d526e3)
Removes false termux support (https://github.com/FEX-Emu/FEX/commit/0c24aea27eb29fa3f66cb636941284eeeae02648)
Optimize DF representation (https://github.com/FEX-Emu/FEX/commit/cd2a6ce820ca46db8e38413b3cf372ccbb6a457c)
Library Forwarding: Don't map float/double to fixed-size integers (https://github.com/FEX-Emu/FEX/commit/4e269d8b801cb6e6ff2e0387da98770ee46cba2a)
Disable assert in release (https://github.com/FEX-Emu/FEX/commit/c37a12e80615d0b4759d9a62f9282809e2f1ccf5)
Improve 32bit ld/st addressing mode propagation (https://github.com/FEX-Emu/FEX/commit/ff0c7637c9aacf0ef3fb6e9d17cdaf6de13991e7)
Library Forwarding: Fix accidental data copying when converting from host to guest layout (https://github.com/FEX-Emu/FEX/commit/26a66790abacc65e16ded97484e8ef1e8ff55495)
unittests
ASM
Adds a test for overlapping memcpy using rep movs (https://github.com/FEX-Emu/FEX/commit/6ce366ef35780b783489f79995408d91a3ac3048)
Implements a unit test for https://github.com/FEX-Emu/FEX/pull/3478 (https://github.com/FEX-Emu/FEX/commit/caff3cb799f0e8a9be34d9321f556ce844781d3b)

FEX - FEX-2403

Published by Sonicadvance1 8 months ago

Read the blog post at FEX-Emu's Site!

Welcome back to another new tagged version of FEX-Emu! This month we have quite a few important bug fixes and optimizations, so let's get right in to
it!

Steam fix

As of Steam's February 27th update there was a fairly major change to how Steam starts its embedded Chromium instance.
With this most recent change it now is run inside of the Steam Linux Runtime environment. In turn Steam has disabled the sandbox feature of the
Chromium instance because it is incompatible. FEX was already disabling this sandbox and forcibly passing in the argument to disable it.

Chromium really didn't like the argument being passed in twice and it was causing it to crash early. We have now removed our application profile and
let Steam configure the arguments as required.

As a side effect of Steam updating their version of Chromium, some users have noted that they are experiencing problems with GPU acceleration on
Raspberry Pi systems. This is seemingly a video driver problem and unrelated to this crash that was fixed. It is currently unknown if we can fix this
problem, as it is working find on Tegra and Snapdragon systems.

Rootfs images updated

FEX's rootfs images have been updated to include the latest versions of Mesa, gfxreconstruct, and Renderdoc. The major change here is having Mesa
updated to 24.0 as the other two packages are mostly for developers.

Fix a potential hang on forking with memory allocations

We have fixed a known hang that occurs when a process is forking while another thread is allocating memory. This tended to occur as a hang when
running Proton applications. While this fixes one hang, we still have another one that sporatically happens that we haven't tracked down. While the
occurence is relatively rare, it's good to watch the process trees if the program is stuck waiting on a futex.

A bunch of CPU optimizations

As per usual, this last month has added a bunch of CPU optimizations! We have noted up to a 14% performance improvement in one benchmark and an
average of around 4% in Geekbench. We need to commend our developers for
hammering out these optimizations, even a small optimization can have big impacts on games that abuse a particular feature.

Use FlagM SETF8/16 for INC/DEC
Optimize LOCK DEC
Optimize ADC/SBC
Fuse add+cmn in to adds
Misc other optimizations
Optimize less than 32-bit add and sub

Small timestamp counter scaling

Recently we have found out that some games rely on an x86 CPU's RDTSC instruction operating at Ghz frequencies. While this is not a good idea to make
assumptions, it is relatively common that x86 CPUs have a really high frequency cycle counter frequency. Most laptop CPUs operating in the 1-2Ghz
range while desktop CPUs can go up to 3Ghz in our testing.

Unreal Engine 5 has a new work graph system that spin-loops CPUs for a fixed number of cycles, expecting to not spin for very long. While this is
relatively okay at 1Ghz, since it is only a few nanoseconds, When an ARM CPU's cycle counter gets added to the mix it starts encountering problems.

The primary problem here is that all 64-bit Snapdragon processors ship with a fixed rate 19.2Mhz cycle counter. This continues all the way to their
latest flagship the Snapdragon 8 Gen 3. Additionally other ARM devices we have tested like the Nvidia Jetson ARM boards and Apple M1 also ship a
similarly low cycle counter. So while Unreal engine will only spin-loop for a measily 1597 cycles, on an ARM board this takes ~51,000 nanoseconds but
on an x86 PC it only takes 591 nanoseconds! This was causing games to burn CPU time unnecessarily and run slower than they should!

To compensate for these slow cycle counters on most ARM devices, we are now scaling the value we return to the applications by multiplying the value
by 128 times! This makes snapdragon cycle counters behave more similarly to a 2457 Mhz cycle counter, but with a 128 cycle granularity. This improves
the FPS in Tekken 8 and will also improve performance in all other Unreal Engine 5 games. There may be other games affected as well!

As a side-note, a 1Ghz cycle counter is now mandated by ARMv8.6 and ARMv9.1 spec! So this problem will soon go away as new SoCs get on to the market.

Introduce ARM64EC static register allocation mapping

As part of the ongoing effort to support WINE's Arm64EC code, we have changed the order in which our registers get allocated to more closely match
what the Arm64EC ABI
wants for register layout. Matching what Arm64EC wants for the regster layout means that when the JIT jumps out to some code, we shuffle less data
around which gives a performance improvement. The Linux side of code doesn't need this, so this only happens when building as a WINE module.

32-bit thunking improvements

This last month has had an exciting milestone for 32-bit thunking! We have landed support for thunking Wayland on 32-bit. Which this means with our
previously implemented Vulkan thunking, we can now run some games using Wayland plus Vulkan and Zink thrown in to the mix! In particular we have been
able to test that Super Meat Boy works in this configuration! We still have more work to do before X11 and GLX works with 32-bit thunking so stay
tuned to the future!

Memory leak fix

FEX had an issue with long running processes leaking memory. This showed up in applications that would start hundreds of threads and tear them down
over and over. Steam is one of these long running processes that would starve the system of memory if left open over night. This is because the
program spins up helper threads fairly aggressively and then shuts them down.

We have fixed one major memory leak but we still have a few more to go before its nailed down!

Syscall passthrough optimization

One important thing to be wary of when running games is syscall overhead. Every time an ioctl or other syscall is made, FEX can incur significant
overhead compared to running the application natively. Additionally if we are passing syscalls through to glibc helpers then this can add more
overhead and sometimes introduce bad behaviour.

This month we spent some time looking at how syscalls are handled when we know that we can pass the data directly to the kernel. This allows us to
more quickly add new system calls when the kernel adds them, and ensures they are as fast as possible. With this optimization in-place FEX now
directly emits small syscall handlers per syscall and jumps directly to the kernel if possible. This lowers CPU overhead for the most common syscalls,
thus removing emulation overhead. While FEX's syscalls were already fairly low overhead, this just improves the situation further!

Raw Changes

FEX Release FEX-2403

ASM
Another sign extend bug in https://github.com/FEX-Emu/FEX/pull/3421 (https://github.com/FEX-Emu/FEX/commit/b7984e86511cb923c77fc95aa448c098e3e5d757)
Arm64
Stop moving source in atomic swap (https://github.com/FEX-Emu/FEX/commit/98572b9e23c0935b4fa1012b748371de564eca4d)
Arm64Emitter
Introduce ARM64EC SRA mappings (https://github.com/FEX-Emu/FEX/commit/6ec628fa31c319cefcf02a72d4a56d537483e253)
CMake
Define _M_ARM_64EC when building for ARM64EC (https://github.com/FEX-Emu/FEX/commit/3e5694bd0632a35b6c8119eb794955c46ea5eec3)
FEXCore
Expose AbsoluteLoopTopAddress to the frontend (https://github.com/FEX-Emu/FEX/commit/d4be2dc6366471ba6ef567251c634012f977ed94)
Add a frontend pointer to InternalThreadState (https://github.com/FEX-Emu/FEX/commit/5769ffbba71ea6f660e827088ebd0ef7611f82f1)
FEXLoader
Allocate the second 4GB of virtual memory when executing 32-bit (https://github.com/FEX-Emu/FEX/commit/0b34035085474b980e162075799af1122aa35078)
FileFormatCheck
Fixes FD leak (https://github.com/FEX-Emu/FEX/commit/0505b30d34311c7f593df2a12140e1e8d799cae3)
InstCountCI
enable preserve_all (https://github.com/FEX-Emu/FEX/commit/2f9449cb5aad16288278ffcc31443fcb4befe8f8)
add FMOD block (https://github.com/FEX-Emu/FEX/commit/cc9c80d79f6125b0fc2e16739c354cb496498f9b)
add The Witcher 3 block (https://github.com/FEX-Emu/FEX/commit/60e8da05cd4079e45ba451b2cd26846aed41a7b1)
InstcountCI
Add a monster of a game block (https://github.com/FEX-Emu/FEX/commit/f41674bb7d22dac29dd863be81ea065c695f952e)
Adds a vector addition loop from bytemark (https://github.com/FEX-Emu/FEX/commit/44d1502b5aaeb4d0b62a56ffdae31df947a64cee)
Adds addressing limitations to instcountci (https://github.com/FEX-Emu/FEX/commit/cf067994f359586a5b42f4bfe1301c731329cbcf)
Linux
Converts passthrough syscalls to direct passthrough handlers (https://github.com/FEX-Emu/FEX/commit/b74de5305627feb0b5a91630699cddf7f8736a7f)
More safe stack cleanup for clone (https://github.com/FEX-Emu/FEX/commit/9687ac51f0edc5f9a2732e8492600dc25115f69a)
Make sure to destroy thread object when thread shuts down (https://github.com/FEX-Emu/FEX/commit/a7c7fe4a3516936bc0eaea4337c2273b9c1e4781)
OpcodeDispatcher
Don't use AddShift with no shift (https://github.com/FEX-Emu/FEX/commit/d24446ed139b6740d5e606ddc05b7028b5fecaf8)
RedundantFlagCalculationElimination
fix missing NEG case (https://github.com/FEX-Emu/FEX/commit/32a4abbea7e5a2e925452ec56b61828fcfde5eaa)
Syscalls
Fix SourcecodeMap generation for GDB JIT integration (https://github.com/FEX-Emu/FEX/commit/7e2f20cabb72c02756044fae55c637cb4b6897b5)
Misc
Removes steamwebhelper config (https://github.com/FEX-Emu/FEX/commit/d66a83a98fa48ba23217c11f6e5d45ba22588e10)
Use SETF8/16 for 8/16-bit INC/DEC (https://github.com/FEX-Emu/FEX/commit/ea7d1697b1c0068075afd3ea876cd19a79696842)
Library Forwarding: Update Vulkan definitions to v1.3.278 (https://github.com/FEX-Emu/FEX/commit/2dd922c3ce2aa4a54efef524748297afa2a73f6a)
Optimize lock dec (https://github.com/FEX-Emu/FEX/commit/009ae55ff090596bdb1c9ae12848d871321b6e44)
Misc little opts (https://github.com/FEX-Emu/FEX/commit/f27e2246e2df740dd19c2271c84590bb6592fd57)
Fix reserving range check (https://github.com/FEX-Emu/FEX/commit/4779fb74dea9ecac4d17e42d7e18278cab782c47)
Update xxhash to v0.8.2 (https://github.com/FEX-Emu/FEX/commit/139367d2489abc88408f66a4640edd84e65cdb30)
Optimize SBC (https://github.com/FEX-Emu/FEX/commit/49e798ab2b1a7e0b69e55eed313dd5a5b7356d66)
Update vixl (https://github.com/FEX-Emu/FEX/commit/8c0d5c6583adff1c858ad5ecefbf1eed225db341)
Capture a 64-bit process trying to jump to 32-bit syscall handler (https://github.com/FEX-Emu/FEX/commit/946c805d84b7cd375885e2b489ea3cbb6ca88be1)
Track unittest dependencies through to the custom target (https://github.com/FEX-Emu/FEX/commit/118b8b200eab0b5893885cd98e0fbf34f6890e9f)
Adds a unittest for a bug from https://github.com/FEX-Emu/FEX/pull/3421 (https://github.com/FEX-Emu/FEX/commit/aa9d7c5629c626ea8c9a2ee3d1b798e4b2dfd6b3)
Optimize ADC (https://github.com/FEX-Emu/FEX/commit/5f16f357afe2599c4d3927f026413721091fce4f)
Fixes zero register flag generation (https://github.com/FEX-Emu/FEX/commit/0ef72bf118bf1694777c2de4d6dc0501d50792b8)
Adds MGRR hottest block on render thread (https://github.com/FEX-Emu/FEX/commit/49ca0e2181e6e2a11cc646813faa01b4745ed239)
Fuse add + cmn -> adds (https://github.com/FEX-Emu/FEX/commit/d8a18687e847646a977a55e026d40e84a6ea2035)
Moves SignalDelegator TLS tracking to the frontend (https://github.com/FEX-Emu/FEX/commit/9b93495d45261ad640a5f2de922b067bfd7cc066)
Moves JITSymbol allocation (https://github.com/FEX-Emu/FEX/commit/59ec88f48d9014887609e0c89f5b51d0730b6e43)
Fix instcountci (https://github.com/FEX-Emu/FEX/commit/90256730c36ff87717ebe8f9cc5bc3f1121387ef)
Simplify CalculateAF (https://github.com/FEX-Emu/FEX/commit/5378ae2e7620f8a1d76e8d0dcb73b3622ec90e31)
Library Forwarding: Add support for 32-bit Wayland (https://github.com/FEX-Emu/FEX/commit/66feea9e8e4b6456d448da85bfff2ecb78ca94b6)
Implement small TSC scaling (https://github.com/FEX-Emu/FEX/commit/2bcd2858511c511010a4a6dfed8b25787a94c632)
Cleanup NZCV metadata (https://github.com/FEX-Emu/FEX/commit/9c38332e7ea9a42bd1caabe6cae26e7d41ddf552)
Fixes VDSO crash in 64-bit code (https://github.com/FEX-Emu/FEX/commit/bbac014d6d56febb60880f57b822df877921fdef)
Library Forwarding: Allocate packed arguments on the guest stack if needed (https://github.com/FEX-Emu/FEX/commit/9cab746aa71a7b378d3e8beae3fcc39ad87ddb3b)
Library Forwarding: Disable struct padding for packed arguments (https://github.com/FEX-Emu/FEX/commit/b888bb5ce5156655e9263fd8403e458ef685608d)
Optimize 8/16-bit adds/subs (https://github.com/FEX-Emu/FEX/commit/47218254a10cfe68b4fef578862ea01a27f6450c)

FEX - FEX-2402

Published by Sonicadvance1 8 months ago

Read the blog post at FEX-Emu's Site!

Welcome back everyone! After last month's cancelled release and this month being a bit late we have a lot of changes that happened.

More JIT performance improvements

A lot of the work these paste two months have been optimizing our JIT more. We have run Geekbench and Bytemark for these which showed a marginal
performance improvement in these benchmarks. Bytemark showing the biggest improvement of 16% in one sub-benchmark. A lot of the performance
improvements are targeting real-world applications rather than benchmarks which shows as those games getting more of an improvement.

As typical, explaining each individual optimization would take too long so we're going to spam out a bunch in a list.

Removes a vtable indirection for syscalls
Fix RCL/RCR wraparound behaviour
Remove process-wide lock in JIT
Fixes syscall rcx/r11 state
Optimize SIB address calculation from three instructions to one
Optimize TST instruction with -1
Optimize TST more
Improve XCHG instructions
Optimize rotates
Optimize CDQ
Optimize shifts
Optimize PTEST, VTESTP, PDEP
Optimize SHA256 instructions to remove spilling
Optimize CMPXCHG
Stop zero extending a bunch of instructions where it doesn't matter
Optimize ANDN
Optimize a bunch of instructions using NZCV flags

Fix glibc clone usage of CLONE_CLEAR_SIGHAND

Newer glibc versions starting with 2.38 have started using this new clone flag for executing a program. We also fixed this in 2312.1 but can now make
a note of it.

Fix VDSO symbol fetching on ARM64

This is a fairly minor change but can have a big performance hit. When FEX was querying for VDSO interface functions we were using the wrong names on
ARM64. Since the wrong names were used, this meant we always fell back to the slower glibc implementation of functions. This in particular fixes a
performance hit when games call clock_gettime excessively.

Fix Proton again

Sometime in December there were some changes to Valve's Proton layer which caused us to break it. This has now been fixed.

Expose Linux 6.6

With some relatively minor changes we now support reporting kernel version 6.6 to the guest application. This gives us a range from v5.0 to v6.6 now.

Workaround hang when process is forking

A long-standing bug in FEX is that sometimes a process can hang when it is forking, usually to execute another program. We have now worked around this
issue to an extent that lets the application continue. It's not a full fix because we can still have a crash but that is easier to see instead of a
program hanging forever in the background.

Commonize some WOW64 code to share with ARM64ec

In preparation for sharing some code with FEX built for ARM64ec, this has shared move some Windows code to a common location to be used.

An absolute ton of work went in to thunking

Over the past couple months this has been one of the more active projects within FEX. Today FEX has support for thunking 64-bit x86 libraries across
to ARM64. A significant portion of this work is doing analysis of API interfaces in order to allow thunking 32-bit x86 libraries over to ARM64
libraries with data repacking. This isn't yet complete but since a ton of work has gone in to this, we wanted to call it out.

NOTE: Memory leak on long-running processes like Steam

We have found a memory leak when a process shuts down a thread that has been around for quite a while. We only identified this memory leak this last month which hasn't been fixed.
We are hoping to fix this bug for the next release but be aware that long running processes like Steam has a relatively aggressive memory leak. This is exacerbated by how Steam spins up threads for
doing work which makes this application particularly heavy.

Raw Changes

FEX Release FEX-2402

Arm64
Removes a vtable indirection in syscalls (https://github.com/FEX-Emu/FEX/commit/743df8dfae711432ab2b613e2ba12f93c44144b6)
BranchOps
Fix unused-variable warning (https://github.com/FEX-Emu/FEX/commit/f515b1eeb126bace3e5cb5375d97d0c203439399)
ArmEmitter
Support single use forward labels (https://github.com/FEX-Emu/FEX/commit/8c3163096b2c99b3d7eeec46433bf30edcdf07d8)
CPUID
Removes Init and just uses constructor (https://github.com/FEX-Emu/FEX/commit/f5997a084efe6392736aeaeff23db8f4764cfc0b)
Config
Fixes JSON parsing of "ArgumentHandler" types (https://github.com/FEX-Emu/FEX/commit/bd1e029f35abfa36d587f98ae9046b8b67176460)
Dispatcher
Convert GetCompileBlockPtr to using PMF helper (https://github.com/FEX-Emu/FEX/commit/82ce76b1d304784a03cc02744175ae0909cbab4a)
Removes unused asserting CompileBlock function (https://github.com/FEX-Emu/FEX/commit/5b4e9c69074b2061369ae8d9788fbd0efc28c245)
Externals
Update xbyak to v7.02 and switch away from fork (https://github.com/FEX-Emu/FEX/commit/9da08b40bd5d35e65472370373da9ca75b2bfc65)
FEX
Removes legacy kernel 32-bit allocator (https://github.com/FEX-Emu/FEX/commit/de2cd469a225bbd5f556e5d7861271e4a50234ce)
FEXConfig
Initialize paths before trying to read configuration files (https://github.com/FEX-Emu/FEX/commit/79526b9c9e049f28a6e8c076adec705dbe146e15)
FEXCore
Fix RCL/RCR shift wraparound behaviour (https://github.com/FEX-Emu/FEX/commit/c0be9742728dafef33ef506796e5d6dfd55d4408)
Use TMP1-4 for values that need preserving across spills (https://github.com/FEX-Emu/FEX/commit/0e97f8fb2bb9c28607c787f9a8b36909acc57d5d)
Decompose some std::function usage to regular pointers (https://github.com/FEX-Emu/FEX/commit/615cfe024624a8fae522e799cc3e52e310f3b985)
Pass thread object to HandleUnalignedAccess (https://github.com/FEX-Emu/FEX/commit/d488592eda8cc1c2b0653ee3bb1067d369c1cc18)
Removes SRA option, it's now permanently enabled (https://github.com/FEX-Emu/FEX/commit/5467c3e47881a94463483bb382cb1a3d1e89e11b)
Removes context wide and map lookup (https://github.com/FEX-Emu/FEX/commit/eea2e7bb576e5145a8f1eb6189486060b73cbdd9)
Optimize HostFeatures and CPUID feature calculation (https://github.com/FEX-Emu/FEX/commit/0071c1bda27f6d7082c6b5299bb635c597798c45)
Warn if MDWE is set (https://github.com/FEX-Emu/FEX/commit/00669a1c8947880160f040d2ebb6c3113c962e09)
Changes ParentThread ownership from the CTX to the frontend, take 2 (https://github.com/FEX-Emu/FEX/commit/b4b8e81f24a0512c46c4423735d06fdb7f0e764f)
Describe exit function linking object with a structure (https://github.com/FEX-Emu/FEX/commit/93ec676ce810bd36cfcbc4fcda98cc1e15db2a02)
Removes stale references to x86 JIT (https://github.com/FEX-Emu/FEX/commit/86654907bf7a8f4091d4619ea24d94eff753b722)
Removes old InternalThreadState header (https://github.com/FEX-Emu/FEX/commit/12b72f908b4e16f0e65f927e98e2c8ff7d96108b)
Moves OS thread creation to the frontend (https://github.com/FEX-Emu/FEX/commit/0a4e064da4ff901e42940e1890c14ffdcda620a9)
Moves XID check to the frontend (https://github.com/FEX-Emu/FEX/commit/7524029a066b7e227300bcea43b2cea7c49519e1)
FEXLinuxTests
Fix build warnings (https://github.com/FEX-Emu/FEX/commit/d34302a968e20b4a9a8f7f97d3c48b9c61531717)
FEXLoader
Moves thread management to the frontend (https://github.com/FEX-Emu/FEX/commit/5e26b77a7c0ecc640fc84918de287b600f03a53a)
Temporarily disable CLONE_CLEAR_SIGHAND (https://github.com/FEX-Emu/FEX/commit/26c9d5d427999ea329124894497c8de059ff137d)
Fix incorrect format strings (https://github.com/FEX-Emu/FEX/commit/3a5ac39bb1c8ad49e5062dc07d5763ff5c5bf47a)
GdbServer
Fixes crash on gdb detach (https://github.com/FEX-Emu/FEX/commit/a1cf14f2d9fb5ebd2b6ee4d61c404bab4375a93a)
HostFeatures
Supports runtime disabling of preserve_all (https://github.com/FEX-Emu/FEX/commit/235f32ce8cd41438c41312f7686c4d148ab98c77)
InstCountCI
Fixes test to not use relative data (https://github.com/FEX-Emu/FEX/commit/3db31a655c88616b5970ad4839c804ddf3d552c4)
JIT
Fixes broken register in VTBX1 (https://github.com/FEX-Emu/FEX/commit/98419839550cdffbe51bfb1fd4aef60e859a1fc3)
Jitarm64
Implements spin-loop futex for JIT blocks (https://github.com/FEX-Emu/FEX/commit/750b0b70bc5bbdefb23d00b54a0d08e14db459f1)
Linux
Decouple thread object creation and tracking (https://github.com/FEX-Emu/FEX/commit/5e5984a29b9f6852d247398aff96d1c3d19d509b)
Implements a fault safe memcpy routine (https://github.com/FEX-Emu/FEX/commit/8e3d4a3e0251bdebeb0408f87b415532bb580bc9)
Adjust when clone allocates stack memory (https://github.com/FEX-Emu/FEX/commit/bd1305270832ba447c72844208534da5afdd31e4)
OpcodeDispatcher
Fixes syscall rcx/r11 generation (https://github.com/FEX-Emu/FEX/commit/56d8080ec952676080104020ee9f4e79a2541690)
Initial support for runtime long-mode switch (https://github.com/FEX-Emu/FEX/commit/4b3792196fa32773270c50e2ea8d4d1db81c9b53)
Fixes flags generation in imul (https://github.com/FEX-Emu/FEX/commit/3d2cbc5d08ff1549d872ef71aac31a44a843a9ca)
Optimize SIB addr calculation (https://github.com/FEX-Emu/FEX/commit/81c85d73b28642ae9d4c9978a6aecc0b60f26634)
PassManager
Removes unused exit handler (https://github.com/FEX-Emu/FEX/commit/12923ba1b784bce0b1cd35baaf50715055f19056)
Scripts
More changes to InstallFEX script (https://github.com/FEX-Emu/FEX/commit/b613576ac42bfb48374a7af81e1e3e8ec663ab8f)
Updates InstallFEX with supported Ubuntu versions (https://github.com/FEX-Emu/FEX/commit/eb5cf1a667b5648f7737fefdef45c4f6b5a3c512)
SpinLockWait
Fixes unexpected lock success (https://github.com/FEX-Emu/FEX/commit/472a701e2b3e2a4df2271f048fea04d1521849d5)
SpinWaitLock
Removes unused variable in spin-loop fallback (https://github.com/FEX-Emu/FEX/commit/a4a1d607d5f9ed2f271520f260a9f5041cc39af9)
TestHarnessRunner
Move to its own tool folder (https://github.com/FEX-Emu/FEX/commit/68d6cf5f1446541ec81f02ed51df71654c92a200)
ThreadManager
StealAndDropActiveLocks in the child forked process (https://github.com/FEX-Emu/FEX/commit/930d2650f81b4e3788d13efadb4b3413d49f31bc)
Thunks
Add workarounds for pointers not readable by 32-bit guests (https://github.com/FEX-Emu/FEX/commit/60b0852cde83233bef2c92a36a0e846e75c7e799)
Allow querying customized functions through vkGetInstanceProcAddr (https://github.com/FEX-Emu/FEX/commit/86c6ca3049d9f77d4677ebe8cf1ee5104cd9da14)
Tools
Adds new FEXpidof tool (https://github.com/FEX-Emu/FEX/commit/be4d1a88603559fd464e2f145c58627408e7768f)
Moves IRLoader to independent folder (https://github.com/FEX-Emu/FEX/commit/9dda9605294b2f0bc28d1228bc67dd635bd614ac)
VDSO
Fixes symbol fetching for a few symbols (https://github.com/FEX-Emu/FEX/commit/4333261639e66125075e1ce3f675ef2bf110ab0a)
Windows
Commonise WOW64 logic that can be shared with ARM64EC (https://github.com/FEX-Emu/FEX/commit/8ff4b525fcb838e29a3ab184e0bf329487f3ccbe)
X86Tables
Converts tables to be mostly consteval (https://github.com/FEX-Emu/FEX/commit/ec89a00a9234d5913e356f9f81cfabc4c4518336)
Misc
Fix https://github.com/FEX-Emu/FEX/issues/3419 (https://github.com/FEX-Emu/FEX/commit/780b48620bbb10daab729f02d4cf82ed29e563a1)
Add NZCV+PF/AF optimization pass (https://github.com/FEX-Emu/FEX/commit/df3d6938ae8cba911f1ba10f11d1d7591a4c31ad)
Fixes one mutex hang (https://github.com/FEX-Emu/FEX/commit/ba41da7da0501a5d08469f7536cd8807c415f1f4)
Optimize test -1 (https://github.com/FEX-Emu/FEX/commit/806e5b8bcf7690f93e25a468d267e5aa14269daa)
Optimize TST (https://github.com/FEX-Emu/FEX/commit/4331753ca0aed7846515488f648fdebda2551e4c)
Clean up access to possible nullptr (https://github.com/FEX-Emu/FEX/commit/cdcc432399806e02cbdc729b05d20224203a55de)
Revert "Add cmake option DISABLE_CLANG_PRESERVE_ALL" (https://github.com/FEX-Emu/FEX/commit/435b67aa0c7db0872013470b31d63a83d5860446)
Revert "Revert "FEXLoader: Moves thread management to the frontend"" (https://github.com/FEX-Emu/FEX/commit/a41ebe2d85382ddc5c7065ab5e12ca105897c4fd)
Eliminate spilling in sha256rnds2 and sha1rnds4 (https://github.com/FEX-Emu/FEX/commit/557cb593e14a528cdb80325c0f57e766f655c200)
Revert https://github.com/FEX-Emu/FEX/pull/3303 (https://github.com/FEX-Emu/FEX/commit/6993f4fd8dbe73b78050be201c5cb7ea529b437d)
Improve XCHG operations (https://github.com/FEX-Emu/FEX/commit/920a8db3696f59af5112043090914a6b40215f85)
Library Forwarding: Handle cross-architecture differences of integer types (https://github.com/FEX-Emu/FEX/commit/9c37c0f1c30cf375e646f3c8941a2bb6188fd043)
Optimize CDQOp (https://github.com/FEX-Emu/FEX/commit/cec1814a09eb5b4c3c87631d36fdfca31c24088e)
Optimize rotates (https://github.com/FEX-Emu/FEX/commit/4d49ac7c3d2257ae60f538e586f6673f8840e0cf)
Code cleanup - mainly dead store removal; NFC (https://github.com/FEX-Emu/FEX/commit/6d13d9fb56418727a7530cec82909809d10e07b6)
Optimize shifts a bit (https://github.com/FEX-Emu/FEX/commit/ae7dc250db74f89265840296c52eab47ecf8a404)
Optimize bit manipulation instructions (https://github.com/FEX-Emu/FEX/commit/f4086b25e626e8d7f160c7ba0493cdabbacea57d)
Add cmake option DISABLE_CLANG_PRESERVE_ALL (https://github.com/FEX-Emu/FEX/commit/37a611bcd6bcc2bc0929fecb01e7651f50359dc0)
Library Forwarding: Don't attempt custom repacking for non-struct types (https://github.com/FEX-Emu/FEX/commit/2884337d85c2d69ac48bb06befcf5d0b3a4366ee)
Check that path arguments to TestHarnessRunner exist (https://github.com/FEX-Emu/FEX/commit/3036f3b3ff7f71764f13fe6afc773d8428b0a89a)
Allow upper garbage on a bunch of instructions (https://github.com/FEX-Emu/FEX/commit/fa3352004ec3801f9b50f66e6dfc73c2b58a9364)
Fix typos; NFC (https://github.com/FEX-Emu/FEX/commit/bc67910ee4e5138aaa29801804fca179d0609873)
Optimize PTEST and VTESTP (https://github.com/FEX-Emu/FEX/commit/31a4158957f15a93b266549417f8d5840a74f2ea)
Optimize PDEP (https://github.com/FEX-Emu/FEX/commit/58f3d3caf5db10163078af2f2a0521cc30af4bf3)
Library Forwarding: Flip order of nullptr check and custom repacking (https://github.com/FEX-Emu/FEX/commit/d04f3a288e46c05ea86e0d9bff1f52dee72df491)
Library Forwarding: Test interaction of struct repacking and assume_compatible_data_layout (https://github.com/FEX-Emu/FEX/commit/3bfb4bedd5614de401a5263dc27495b4b683f7c2)
Improvements to the Dockerfile (https://github.com/FEX-Emu/FEX/commit/162733130cdf5c916bd18321927cd9a9f351792b)
Library Forwarding: Implement assisted struct repacking (https://github.com/FEX-Emu/FEX/commit/6efc4a967c5a4920c3a64479bf90ca7699e424c4)
Optimize GPR cmpxchg (https://github.com/FEX-Emu/FEX/commit/f956f008eaf33884392043ddf75c131bd75dfd40)
Library Forwarding: Implement automatic argument repacking (https://github.com/FEX-Emu/FEX/commit/832072362bae3b857c4645b792f6ac69f74f4897)
Fixes some new glibc allocations that cropped up (https://github.com/FEX-Emu/FEX/commit/dae16aaf18c1afe6621c3570caf9561fd98d5952)
Linux uprev to v6.6 (https://github.com/FEX-Emu/FEX/commit/c333aac4f9021036f636e3bf96ba1c6f05795e3b)
Revert "FEXLoader: Moves thread management to the frontend" (https://github.com/FEX-Emu/FEX/commit/db7d7a6bd75a31dc1edb8cd9ed19afb87a28ec89)
FEXCore interface cleaning (https://github.com/FEX-Emu/FEX/commit/04a88ed3abd140ee6ac3e367182c8655b0455878)
Removes IRLoader, unittests, and public interface (https://github.com/FEX-Emu/FEX/commit/f785b38e4da6447a0e252a16546e47bc592c8389)
Support disabling SHA in CPUID (https://github.com/FEX-Emu/FEX/commit/058e691ef122075cb556b36166bc5b2e7501592e)
Library Forwarding: Emit layout helpers to allow repacking struct data (https://github.com/FEX-Emu/FEX/commit/d806db53ecff84cf4253e6a45e9c94c9d821941b)
Fixes Proton again (https://github.com/FEX-Emu/FEX/commit/b1c3737aef7a9bd9f15eddfd721d1b7edd780726)
rm andn masking (https://github.com/FEX-Emu/FEX/commit/0fe5e3d1e783906f6962b3b781638befb79ac248)
instcountci
Adds panicspill test from steamwebhelper (https://github.com/FEX-Emu/FEX/commit/b9378856da20af5179ee8d46d3998a54d96ac219)

FEX - FEX-2312

Published by Sonicadvance1 11 months ago

Read the blog post at FEX-Emu's Site!

We're back with another month of changes. After last month being a bit slower, we're back in the swing of implementing more optimizations and bug fixes. No dilly dallying, let's get right in to it!

More optimizations this month!

Once again this month has a whole bunch of optimizations that is very exciting! We will lightly go over the changes to talk about what changed.

Keep guest SF/ZF/CF/OF flags resident in host NZCV

This is one of the bigger optimizations this month. A bit of backstory is needed for what this optimization is for. x86 has a flags register called EFLAGS which contains quite a few random bits of information. The subflags we care about here are the SF, ZF, CF, and OF flags inside of it. These are various flags that are set typically from ALU operations for information depending on the result. So something like an integer Add will set ZF if the result was zero, SF if the result has the sign bit set, CF if a carry occured, and OF if the operation overflowed. These are usually quite cheap for the CPU to calculate by itself, but manually calculating the flags usually takes a few additional instructions each.

The original implementation inside of FEX for calculating these flags would spend the additional instructions and calculate each one manually. This
would usually end up with a dozen or so of additional instructions for calculating flags. While FEX would typically optimize out the calculations if
they weren't used, it would still add CPU time when we couldn't.

Luckily ARM also has a flags register called NZCV which maps almost perfectly to x86's EFLAGS. This lets us optimize these instruction implementations
to instead use the ARM flags directly. This has a couple of effects, not only does it remove the instructions from our code generation, it has
knock-on effects that the flags are now stored inside of the NZCV which reduces memory accesses. A multi-hit combo for improving performance.

While not all x86 instructions map their flags registers 1:1, this has a fairly significant performance uplift in most situations!

Dedicate registers for PF/AF

Related to the previous change, x86 has two flags registers stored inside of EFLAGS that doesn't have a direct equivalent on ARM CPUs. These two
flags are fairly uncommon but instructions will still generate them. These flags have the additional problem that they are fairly costly to calculate,
with one of them requiring a GPR population count instruction which ARM doesn't even support until new instruction extensions called CSSC. While in
most cases the result of these flags isn't used, the overhead of calculating them can add up a bit. This is why we are now dedicating two registers to
these flags to reduce their overhead as much as possible!

Misc optimizations

Optimize BT/BTC/BTS/BTR
Optimize shifts/rotates
Optimize selects & branches & more nzcv goodies
Optimize three sha instructions
Make "not" not garbage
Optimize memcpy and memset when direction is compile time constant

With all these optimizations in place this month we have a fairly significant performance uplift!

<-- Geekbench and bytemark graphs -->

While Geekbench is showing a fairly modest 17.6% performance uplift, bytemark is showing up to a 60% performance uplift! Over the course of
the last three months we have had benchmarks that have improved by over 100%! These improvements can be seen in games as well, with some CPU heavy
games have had their FPS improve by over 2x. In a lot of games tested they have changed from being CPU limited to GPU limited on our Lenovo X13s
laptops even! We are looking forward to when these companies release new laptops based on Snapdragon X
Elite in the middle of next year!

Various bug fixes

In addition to performance improvements, we have some bug fixes this month.

Fixed corruption in the JIT
- Caused corruption with x87 heavy games
Fixes integer multiply corrupting results
- Corrupted some register state, which was breaking the game Dungeon Defenders

Support extracting erofs images

One of the features that FEXRootFSFetcher was missing was the ability to extract erofs images once downloaded. This was because we didn't know that
erofs-utils provided an application for extracting these images without FUSE. Turns out the developers put an extractor inside of their fsck
application that we had completely missed! Now if a user wants to extract an x86 rootfs image for lower overhead, they can do this directly from our
FEXRootFSFetcher tool.

Preparation for improving gdbserver

GDBServer is a socket interface that GDB supports for remotely debugging applications. One of the harder things about working on FEX-Emu is that the
ability to debug an application is usually quite hard. GDBServer is a way to improve this situation so that GDB can remotely connect to a FEX process.

There's a bunch of work this month towards cleaning up this interface and getting it to work correctly. While it is still not quite usable for
debugging, we are working towards this so applications can actually be debugged!

Improvements to WOW64 compatibility for newer WINE

Newer versions of WINE has changed some behaviour around WOW64 support. So this month we have added support for some of this newer behaviour. Thanks
again to Bylaws for implementing this!

FEX rootfs image updates

This month we are updating our rootfs images to incorporate the latest Mesa 23.3.0 release that
occured a few days ago. We have updated our Ubuntu 22.04, 23.04, 23.10, ArchLinux, and Fedora 38 images with this latest version of mesa. As usual if
there are any issues, let us know so we can sort them out.

Raw Changes

FEX Release FEX-2312

Arm64Emitter
Dedicate registers for PF/AF (https://github.com/FEX-Emu/FEX/commit/9b646746b21e0ec4f566fa46033798b3e40a9da3)
Fixes warning (https://github.com/FEX-Emu/FEX/commit/bf147f47b5bc5480b757c887891ff5449152f382)
Config
Removes Threads option (https://github.com/FEX-Emu/FEX/commit/3c7335713dc29b2ca86a58145784c358ac5f5f96)
Dispatcher
Fixes corruption when spilling SRA registers (https://github.com/FEX-Emu/FEX/commit/f6b1434d63f1074f9ea6823b7fd91355fb00413c)
EmulatedFiles
Stop relying on O_TMPFILE (https://github.com/FEX-Emu/FEX/commit/2e24f34a3f87fb55335792101e238f5128170bab)
FEX
Only pass CPU tunables to FEXCore and FEXLoader (https://github.com/FEX-Emu/FEX/commit/b4eeb9637569973c68672b86587f013d3672761d)
FEXCore
Removes GetProgramStatus (https://github.com/FEX-Emu/FEX/commit/c8ef77c15f1792709c71f5ee94e4eab421fff56d)
Removes InitializeContext API (https://github.com/FEX-Emu/FEX/commit/b35fadf7e314e4619222b5df84801fa358350d0a)
Fixes passing arguments to ABI helpers (https://github.com/FEX-Emu/FEX/commit/d0f54bcb234ea2f5fa4856865cf3838beae05f9c)
Removes Get/SetCPUState (https://github.com/FEX-Emu/FEX/commit/7de66ac3a46aea8ebb1f858bdd7d6d43b69c4712)
Optimize memcpy and memset when direction is compile time constant (https://github.com/FEX-Emu/FEX/commit/85a1c1ff25d5187de301388bde95c889033eb15f)
Removes FEX_PACKED from CPUState (https://github.com/FEX-Emu/FEX/commit/8e892ece59c97d15cac90e6db70bd869ba2415a4)
Moves debug strings to gdbserver (https://github.com/FEX-Emu/FEX/commit/3f02d7c6658fc592aea866d07e474f24edc41e43)
Start changing how thread creation works (https://github.com/FEX-Emu/FEX/commit/f328fca8806a82a22568c38f11f3141a27330a92)
Moves more SignalDelegator functions to the frontend (https://github.com/FEX-Emu/FEX/commit/6e8af295c54bfe3647c1bb797e2f98204eb8341e)
Removes x86 DebugInfo table (https://github.com/FEX-Emu/FEX/commit/8015ce20995d0bcd3e0c4dae8da6ceb9cd6211f1)
Removes GetExitReason (https://github.com/FEX-Emu/FEX/commit/bdf4089264e2e279909a3a937d7a2a8b64d9ec2f)
Disables RPRES until AFP is audited and enabled (https://github.com/FEX-Emu/FEX/commit/b02711399805d0f7ad7e6a3794993af22ed18dc6)
Fixes imul returning garbage data (https://github.com/FEX-Emu/FEX/commit/389c6b11dd316946a1a903c699471de7300c703c)
Work around broken preserve_all support in Windows clang (https://github.com/FEX-Emu/FEX/commit/98f9a65202ef599714366b255d01a6aa887ffacf)
FEXLoader
Wire up gdbserver in the frontend (https://github.com/FEX-Emu/FEX/commit/efc5eb2933ec83a91d478dd9f5c2453c9199cc93)
FEXRootFSFetcher
Supports extracting erofs images (https://github.com/FEX-Emu/FEX/commit/aa1344aadd86ff22d16cf833877ebcd578774210)
GdbServer
Switch over to a unix domain socket (https://github.com/FEX-Emu/FEX/commit/0357bb23e8ee1a8ed94c8e99d52e74a5ad7e33a9)
IR
Moves remaining NZCV operations to use DestSize (https://github.com/FEX-Emu/FEX/commit/b619f381b368ced49c45e2437bb491c1bb21966c)
IRDumper
Fixes missing conditional name (https://github.com/FEX-Emu/FEX/commit/8892580c4133da558d7bf46af1798dd34d28edc2)
InstCountCI
Moves Sonic Mania code to 32-bit file (https://github.com/FEX-Emu/FEX/commit/470615b8960b83c7629f79985d4a3dfa80cb77f8)
InstructionCountCI
Remove Optimal flags (https://github.com/FEX-Emu/FEX/commit/4a31b619fa505fe8365b18fd2cab8a1be8a69e83)
JIT
Fixes crash in TestNZ (https://github.com/FEX-Emu/FEX/commit/a8ab8bbe8e45f54a9af030beed8853751df10e0c)
OpcodeDispatcher
Optimize three sha instructions (https://github.com/FEX-Emu/FEX/commit/0e1e4c16b1fdadf2010af180bea93d4fc7f39aed)
Make "not" not garbage (https://github.com/FEX-Emu/FEX/commit/3767f3633db8dd18dd43b5cd04bbb85742668fa2)
ScopedSignalMask
Clean up API and use std::unique_lock/shared_lock (https://github.com/FEX-Emu/FEX/commit/8726c8fb7377509bb569826f13cb96ba44c0814c)
Thunks
libX11
Change lock functions to nullptr by default (https://github.com/FEX-Emu/FEX/commit/db63241fd4671f1270e99f15d07a2cc78670daa2)
Misc
Optimize BT/BTC/BTS/BTR (https://github.com/FEX-Emu/FEX/commit/250ffb6d23508ecbf3cfa37a4f4810102185f73f)
Improvements for WOW64 compat with newer wine (https://github.com/FEX-Emu/FEX/commit/43cf2e4e2c2fdb5b448609f407bd8e1e6b405780)
More GdbServer improvements (https://github.com/FEX-Emu/FEX/commit/1c115096c41eee22c9e457ea2ed1853f2213956b)
Optimize shifts/rotates (https://github.com/FEX-Emu/FEX/commit/c1d5fae018b26685fddf7e73ddbff283c0a0b18e)
Optimize selects & branches & more nzcv goodies (https://github.com/FEX-Emu/FEX/commit/1e2d059890483001f12a9525e523b51818e32073)
Keep guest SF/ZF/CF/OF flags resident in host NZCV (https://github.com/FEX-Emu/FEX/commit/af3253947e021f29adefb8f42b41fbf779c63e1c)
Add helper for deriving ops by opcode (https://github.com/FEX-Emu/FEX/commit/3f1f7faf3460038b553f593719e1a132e9810023)
unittests
ASM
Adds unittest found from Ender Lilies that crashed with NZCV (https://github.com/FEX-Emu/FEX/commit/996a4c023c12c7ce9c55df018ab943da34872250)

FEX - FEX-2311

Published by Sonicadvance1 12 months ago

Read the blog post at FEX-Emu's Site!

Another month gone by and another FEX release out the door! This last month was a bit of a less busy month as most of our team spent a week in Spain
to take part in XDC 2023! We did still have the rest of the month to do some work although, so let's get
to the changes!

Small bug fixes

This month we fixed a couple of bugs with could have caused spurious crashes! In fact while testing some upcoming performance optimizations, we fixed
a few unrelated bugs that was crashing Steam periodically! Always nice to see a bunch of little work that just improves the software, even if they
aren't a single big fix.

Fix register corruption when jumping out of JIT
Fixes double munmap which would cause spurious pointer unmaps
- Fixes crashes when a program would shut down a thread
Implements RPRES support and Fix implementation issue with ARM's new RPRES feature
- RPRES gives us the ability to do reciprocals in one instruction instead of using ARM's divide instruction.
- The bug would have caused invalid data to be returned
- No CPU supports this yet luckily
Fixed issue with *at syscalls not working with absolute paths
- Broke Proton's pressure-vessel in weird and unique ways
Fixes bug with named enum argument parser
- This is used to override CPU features with the FEX_HOSTFEATURES option so typically not hit

32-bit thunking infrastructure

While 32-bit thunking is not yet in place, and this month it still isn't fully integrated, some of the code has been landing to work towards this
goal. In order to do 32-bit thunking the right way we are spending bunch of time ensuring that we have a proper daya layout analysis system in place
that is based on clang to do a couple of things. This analysis will let us to automatic translations of data structure from 32-bit in to 64-bit and
also alert us if something needs to be manually translated. This needs to be in place because otherwise we can end up in a situation where we
unknowingly corrupt data and it would be a nightmare to find. So this month we now have the ability to annotate our thunk definitions and start having
clang work for us. While not complete, some of the work has shown to have thunking working for 32-bit Super Meat Boy to work! It's getting there!

NZCV usage preparation

A big performance improvement that FEX is working on is to use the CPU's flags to directly emulate the x86 flags when possible. This is a long and
arduous task but the performance improvements will be huge once the code lands! A bunch of prep work this month has landed to start down this path but
we're going to need to let this sit in the oven for a bit longer. Check back next month to see if we get there!

Minor optimizations

With XDC being in the middle of the month, it caused most of the bigger work to be delayed so we have a bunch of smaller things this month!

Minor optimization to bfi/bfxil
- Removes one or two instructions for some instruction translations
Optimize atomic fetch operations in to atomic if the result isn't used
- Removes a couple of instructions if the resulting fetch data isn't used.
Implements support for ARM's new AFP extension
- Currently disabled until we can audit the codebase to ensure we aren't corrupting anything
- Lets us remove an insert after every scalar operation to match SSE behaviour
Optimize palignr that behaves like a move
- Compilers shouldn't use this, but now we optimize it to a move
Optimize pblendw
- A fairly uncommon instruction but now its implementation is basically as fast as it can be
Optimize blendps
- We had already optimized blendpd last month, so this time was to optimize the 32-bit version
- Fairly commonly used so should improve perf in some games
Optimize dpps and dppd
- These instructions do a dot product and a broadcast of their result but we couldn't find a game using it heavily
- So while this is now optimal, this is unlikely to affect any real game
Optimize some 3DNow! instructions
- 3DNow! is a really old instruction extension that is basically only used in some really old games
- All of these instruction implementations are basically as fast as we can make them now, which is good!
Optimize direction flag pointer offset calculation
- This converts a three instruction calculation down to one and stops using a ternary selection
- This happens with x86's repeat instructions, which typically happens for memcpy and memset
- Used a lot but is a minimal improvement.
A few other random bits and bobs!

AVX optimizations!

While nothing supports our AVX implementation today, we have optimized a handful of implementations once hardware supports what we need. We have
optimized a smattering of instruction translations.

256-bit VExtr, VFCADD, VURAvg, VFDiv, VSMax, VSMin, VUMax, VUMin
Removes a bunch of truncating moves
- If we know an AVX instruction is operating at 128-bit width, we can remove a redundant move which speeds things up!
Raw Changes

FEX Release FEX-2311

ARMEmitter
Fix GPR fill mask in FillStaticRegs (https://github.com/FEX-Emu/FEX/commit/3702e513f55be29cb38d4f4558386a4a75901a45)
Arm64
Minor optimization to bfxil and bfi (https://github.com/FEX-Emu/FEX/commit/8181e53727f83a867252c5a3a3593bd088d4d4ee)
ArmEmitter
Adds sized Scalar 1 source and 2 source helpers (https://github.com/FEX-Emu/FEX/commit/2e1389b25efc8f61bb7b709ee68f543288a1c9da)
CPUID
Adds some missing cpu core names (https://github.com/FEX-Emu/FEX/commit/ff3f7345b63523005a7713ade29c0bc81f64bcee)
Config
Fixes string enum parser with multiple arguments (https://github.com/FEX-Emu/FEX/commit/bbd20b47bad8d6b77b7fd74660fed3bf11d9e89d)
External
Remove a spurious license (https://github.com/FEX-Emu/FEX/commit/26ee63cc2470445f954e01e17e3ebfc46d0d6670)
FEXCore
Removes gdb pause check handler (https://github.com/FEX-Emu/FEX/commit/d4a6b031ea4445a8182e1a42a11f7388a06bc37b)
Fixes bug in vector ZextAndMaskingElimination pass (https://github.com/FEX-Emu/FEX/commit/a261d9909e45a5b57bf0a58d5b733c56d72969cc)
Removes a warning about assume discarding side-effects (https://github.com/FEX-Emu/FEX/commit/462fff2c6771e8b62fa54b8c52f2d8b279acb6a0)
Renames raw FLAGS location names to signify they can't be used directly (https://github.com/FEX-Emu/FEX/commit/8dab35cbf84e7e260a635def141f5822ef94c18c)
Implements support for RPRES (https://github.com/FEX-Emu/FEX/commit/b2a8b0ca121211e6a110a40a5025d53a3c0256ec)
Support crypto extensions in HostFeatures override (https://github.com/FEX-Emu/FEX/commit/fc70fc35068fdfe5934d035edc01bb03b22f6a03)
FileLoading
Updates helper to load file that is backed by memory (https://github.com/FEX-Emu/FEX/commit/190f7c27e026d72622f51d15341391008a95402e)
IR
Changes over to automated IR dispatch generation (https://github.com/FEX-Emu/FEX/commit/6543a80ff91e10e97bcdd6f3b1c260b68d432ef5)
FEXLinuxTests
Adds a unittest for eflags and signals around a inlined syscall (https://github.com/FEX-Emu/FEX/commit/b92e716d0c419e0f2467247f913ac8121370cf77)
Compile tests with masm=intel (https://github.com/FEX-Emu/FEX/commit/b45023bedf0f58945eee1a605bdadd657f771b10)
Temporarily limit thunk test execution to 64-bit guests (https://github.com/FEX-Emu/FEX/commit/1ea40ae6767591cc07d65634eb0e467b257c1408)
FEXLoader
Query runtime page size (https://github.com/FEX-Emu/FEX/commit/65e8d094ef822bed84c6db8d7bb5627ae9dd3d9d)
GDBServer
Preparation work to get this moved to the frontend (https://github.com/FEX-Emu/FEX/commit/e91c5ff906bd7335787ebca06f6ac1c0454db8d7)
GdbServer
Fixes returning thread names (https://github.com/FEX-Emu/FEX/commit/18065199e323c10cc93ac10e896b9c64a58fb59e)
IR
Print assert code for IR EmitValidation (https://github.com/FEX-Emu/FEX/commit/5431aa5a28a7aefa975c9241372683b8139a127a)
Optimize unused result atomic fetch mop to just atomic mop (https://github.com/FEX-Emu/FEX/commit/14e5ea1e22e64a767c676953bf5f1c44798c432e)
Adds scalar vector insert operations (https://github.com/FEX-Emu/FEX/commit/6253f4f708ea3d4282e63697a5092e780894a0bc)
InstCountCI
Update rounds{s,d} classification (https://github.com/FEX-Emu/FEX/commit/a287f2a18944e946bfa76ed0e5ac81232e38436f)
Adds two missing variants of movd/movq (https://github.com/FEX-Emu/FEX/commit/8538f5bac4876e8be6061cffa6e0fbab9234795d)
Support disabling flagm extensions (https://github.com/FEX-Emu/FEX/commit/6db2125b4126284ffa26870908e021a33b1d2779)
Adds some multi instruction tests (https://github.com/FEX-Emu/FEX/commit/483423674aa670b20d3b5468dae0ff0e3ec717ba)
Support multiple instructions in the tests (https://github.com/FEX-Emu/FEX/commit/f036a0b84f4d933191755b128023771365bdf8e1)
Adds missing atomic tests (https://github.com/FEX-Emu/FEX/commit/a5f82a57fa185295fb0463ee3d1a9d5fde6a3b32)
Fixes recursive tests with same filename (https://github.com/FEX-Emu/FEX/commit/9c36d1061bccaf3b8f08c28d69a0032926445816)
Support overriding AFP features (https://github.com/FEX-Emu/FEX/commit/5a3cc7b469d4acb61f962d365770d6d99f434f81)
JIT
Implements Print support for vixl sim (https://github.com/FEX-Emu/FEX/commit/5b70209728a6a50d44a63731d9b5116fb46256da)
JITArm64
Fixes double munmap issue that was causing crashes (https://github.com/FEX-Emu/FEX/commit/8ee5b5cf50e78264b89f309c4a0b9cae6ecc5535)
Fixes bug in rpres scalar operations (https://github.com/FEX-Emu/FEX/commit/8f8f37684a2b31231bfe647cc25b7f03aded391f)
Linux
Fixes issue with *at syscalls with absolute paths not working (https://github.com/FEX-Emu/FEX/commit/cf9c2aa72c109477744e5a0a2471c8447b753a18)
Fixes warning in 32-bit clock_settime (https://github.com/FEX-Emu/FEX/commit/b4ddf36582985fdf19cd820423826041f40ea7b4)
OpcodeDispatcher
Optimize palignr with zero immediate (https://github.com/FEX-Emu/FEX/commit/9612b2fe4b10ea81c36b7e5b8640a1211b8b814d)
Optimize pblendw (https://github.com/FEX-Emu/FEX/commit/ef5503f0b79f1caf4959eb3a0fa976f5f3f07e18)
Optimize blendps (https://github.com/FEX-Emu/FEX/commit/f45722d2cdc61c2851f94d1ff646e9f8026486fb)
Optimize 128-bit DPPS and DPPD (https://github.com/FEX-Emu/FEX/commit/77d92872bcf8b784229fa3bd99aa30b6015c4003)
Optimize a few 3DNow! operations (https://github.com/FEX-Emu/FEX/commit/4045bfd1872a6cd5baca07a5a8f6df91dc69e634)
Allow garbage in upper bits for more ALU ops (https://github.com/FEX-Emu/FEX/commit/f5822f83b0292f210cc2c90525d051eda57398cd)
Optimize DF pointer offset calculation (https://github.com/FEX-Emu/FEX/commit/63e4c3682def5b51e77eeab80710d88c82994cf1)
Handle SSE vector moves into themselves a little better (https://github.com/FEX-Emu/FEX/commit/5c93a085d274aad40fc5423fdf0c0c1e5dfc21cf)
Remove unnecessary 128-bit truncating moves from StoreResult (https://github.com/FEX-Emu/FEX/commit/ef321e4bf8997f4c03b07437cb51889f9d314aa4)
Put extra LoadSource options in a struct (https://github.com/FEX-Emu/FEX/commit/6d39f369b01c5f850daade090e3a558cba09aae3)
Remove redundant moves from rorx (https://github.com/FEX-Emu/FEX/commit/efb479f88fa2970b944ff584fc411de0f90f89b9)
Optimizes < 32-bit register push (https://github.com/FEX-Emu/FEX/commit/cc558fd5dc85d74b800426c87afd8867174eed48)
TestHarnessRunner
Don't hardcode stack allocation to 4096 bytes (https://github.com/FEX-Emu/FEX/commit/0f3d14e7c05cf92fdd753e89740595f155280410)
Thunks
Print error if guest-provided callbacks are called asynchronously (https://github.com/FEX-Emu/FEX/commit/e305a9a0d5d24c5f600333b763eee4d4328604a3)
Skip data layout analysis for types that are always assumed compatible (https://github.com/FEX-Emu/FEX/commit/a379d507298df917e3c221b89672822c30145941)
Fix function pointer support on 32-bit (https://github.com/FEX-Emu/FEX/commit/39c5ab1c81e0f01e27ef20672532ab8b6158c355)
Annotate pointer parameters throughout all thunked libraries (https://github.com/FEX-Emu/FEX/commit/5bf790324ca4a669eb824c013b3097ef2fe30951)
Add new pointer annotations to assist data layout analysis (https://github.com/FEX-Emu/FEX/commit/978f607dd9b33845362aa53720750eae403e6ac3)
Oops deleted an entry point (https://github.com/FEX-Emu/FEX/commit/e0ef32e0bfb60840779264b59c7276c62debbc6f)
Fixes missing vulkan definitions (https://github.com/FEX-Emu/FEX/commit/cb53a704bae0dfa4863de0fe49fe08b4633bced8)
xcb
Drop unused and incomplete support for asynchronous callbacks (https://github.com/FEX-Emu/FEX/commit/15c825f362b6996866ddaa2de13179116e1b8a25)
VectorOps
Handle SVE VExtr a little better (https://github.com/FEX-Emu/FEX/commit/2e694412f40b447f894b7d2b0d0c7cebb86f5df5)
Handle SVE VFCADD a little better (https://github.com/FEX-Emu/FEX/commit/1cb8e4891c3e8964b0f0556d80b09825f6a2ea6e)
Handle SVE VURAvg a little better (https://github.com/FEX-Emu/FEX/commit/3c5c23bf3668f771feb44ddde2da4991487fbd4e)
Handle SVE VFDiv a little better (https://github.com/FEX-Emu/FEX/commit/93792577ebfb1f389c73c96445eb7a8a0aac26d5)
Handle SVE VSMax/VSMin and VUMax/VUMin paths a little better (https://github.com/FEX-Emu/FEX/commit/8238de024f6138b8eef94360fb24c31a7b505ec6)
Misc
Preparatory patches for nzcv (https://github.com/FEX-Emu/FEX/commit/5103f2d92b9ff7458d4018cd308657c4dd0c6056)
Prep commits for NZCV modelling (https://github.com/FEX-Emu/FEX/commit/e018917f76066c22e42179fa7b4bd5362348b457)
(https://github.com/FEX-Emu/FEX/commit/8f246b206bcd174eef021bf0169fafccfa18d82f)
(https://github.com/FEX-Emu/FEX/commit/c612fa8f2f65a4c0bf0462f4c925fede0d0ce42d)
github
Enables Vixl simulator on x86 host for instcountci (https://github.com/FEX-Emu/FEX/commit/9ba78c977165e53da13880d21290196661e6b2ad)
unittests
Adds test for bug from https://github.com/FEX-Emu/FEX/pull/3162 (https://github.com/FEX-Emu/FEX/commit/c77a3d673cd54898aa4cbc1af0b9397d4fbd5ce9)
ASM
Adds unittest for implicit flag clobber for https://github.com/FEX-Emu/FEX/pull/3254 (https://github.com/FEX-Emu/FEX/commit/65b7d4007e715faf7d67a3465e4d9f9d77541b85)

FEX - FEX-2310

Published by Sonicadvance1 about 1 year ago

Read the blog post at FEX-Emu's Site!

Welcome back to another monthly release for FEX-Emu. You might be thinking that after last month's optimizations that we wouldn't have much to show
for this month. Well you would be wrong! We optimized even more! Let's get in to it!

More instruction optimizations!

As stated last month, we introduced Instruction Count CI which has allowed us to do targeted optimizations of our code. One again we have optimized so
many instructions that it would be impossible to go through each individual change. Check our detailed change log if you want to see all the
instructions optimized. Let's just look at the final benchmark numbers compared to last month.

<- Geekbench 5 versus last month ->
<- Bytemark versus last month ->

Let's talk about the Geekbench 5.4 results first since they don't look very
impressive at first glance. While we are only showing ~13% of a performance improvement, the problem with this result is that this number is an
aggregate of multiple smaller benchmarks. Looking at the breakdown of all the subtests there are some that have improved by up to 66%! This is of
course because some benchmarks take advantage of some instructions that we optimized more heavily than others. Luckily this improvement also scales to
other video games as well.

The Bytemark improvements are a bit hard to make out, some numbers are hardly changed at all while a couple stand out as huge improvements. This
mostly comes down to some very specific instruction optimizations that significantly improved performance in a couple of tests and the rest don't show
up as much.

With this months optimizations and last months combined these optimizations end up being significantly more interesting. Some
Geekbench results are showing an average of 50% to 65% higher performance
sometimes even higher. Some benchmark results showing nearly 2x the performance compared to before! These numbers translate very well to gaming
performance where some games have more than doubled their FPS over the past couple months.

We're not slowing down either, we still have a ton of optimizations to go on our march to get our emulation close to native performance.

Support preserve_all for interpreter fallbacks

We're calling out this particular optimization for three reasons.

It improves performance of x87 heavy code
It only works with the super recently released Clang 17
wine packages in FEX's rootfs use x87 heavily in some instances.

Let's talk about what this optimization is and how it improves performance. In Clang 17 they added support for a new function calling ABI called
preserve_all. x86 has supported this ABI for a very long time but it is a new addition for Arm64. This ABI breaks convention from the regular AAPCS64
ABI in that if a small function needs to more registers then they need to first save pretty much any of them. Unlike AAPCS64 where it has a bunch of
registers free for using. This is beneficial for FEX's JIT since we can save signicant time by not saving any state when we need to jump out of the
JIT and execute x87 softfloat code.

In particular this manifests to upwards of a 200% performance improvement in some microbenchmarks around x87 code! While this advantage is quite
significant, the only way to take advantage of it is to compile FEX with Clang 17. Since this compiler release came out only last month, pretty much
no distros have adopted it so it is unlikely to be used soon. In a few months time, or years depending on distro, they should naturally upgrade their
compiler stack and free performance improvements will happen.

As a fairly major side note to this excursion, FEX has found that the 32-bit wine packages that is compiled with Canonical's repository uses x87
heavily in some instances. This causes some really bad performance issues with some 32-bit games and installers. It is recommended to use Proton where
you can here since it compiles its 32-bit libraries with SSE optimizations instead which work significantly better.

FEX-Emu may look to provide its own wine packages in the future with this same optimization in place to help alleviate some of this burden. Until then
it is recommended to use FEX's x87 reduced precision mode to try and alleviate some of the overhead.

Fixes a bug when chrooting in to rootfs

For quite a few months now FEX-Emu has changed some behaviour around chrooting in to the FEX rootfs.
While chrooting isn't generally advised, if a user wants to modify the rootfs then it's the only option. While we provide some scripts inside of our
rootfs images to facilitate this, it has been broken for a few months.

We have now fixed this bug in both FEX-Emu and the scripts inside of our rootfs images. So if you want to modify packages inside of the image you will
now be able to do so again. Make sure to update your image to get the new scripts!

Remove x86-64 JIT and Interpreter

This has been a long time coming in the FEX-Emu project. We have had support for an IR interpreter and x86-64 host JIT for compatibility testing since
the project's inception. It has always been the case that if these CPU backends get in the way of the ARM64 JIT that they would get removed.

That time has finally come. Due to some upcoming changes around how flags are getting represented in FEX's JIT and the general burden of implemented
FEX's IR operations three times, often undoing an x86->Arm64 translation to go back to x86. It has been deemed too much of a burden and these have
been removed. This is a necessary step for our ARM64 JIT to gain more performance that we will be gaining in the coming months!

We are looking forward to future ARM platforms that can take Radeon GPUs through PCIe slots to regain a platform which can test RADV directly, but
until that point we will have to make due with our current devices.

Instruction Count CI on x86-64 hosts

While we removed our x86-64 JIT, we do have a fun addition to our instruction count CI. Now developers that don't have an Arm64 device handy can still
run the Instruction Count CI and attempt to optimize implementations without even having an ARM64 device to run it on. This is as simple as building
FEX on an x86-64 device with the Vixl disassembler and simulator enabled and you will be able to optimize to your hearts content!

We've got a need for JIT speed! Let's go fast!

Implement first optimizations using 128-bit SVE

This is a fairly minor change but previously FEX was not using any 128-bit SVE instructions. This is primarily because there aren't really any SVE
supporting devices in the consumer market, even though Snapdragon hardware theoretically supports it. 128-bit SVE adds a couple of optimizations that
we can use.

Wide-element shifts
Index instruction for generating simple index masks

While these are fairly simple initially, they change some from being translated to six instructions down to one or two depending. This is a fairly
minor change, but it is good to note that FEX is now taking advantage of SVE if it is available!

Adds WOW64 frontend

This has been a long time coming, with us adding initial mingw support back in FEX-2305. FEXCore now supports being built with a brand new WOW64 WINE
frontend. While currently not being utilized, this will allow WINE to integrate FEX directly in to its WOW64 layer for running both x86 and x86-64
applications on Arm64 host devices.

This is a very substantial change to how WINE integrates with FEX, since today FEX-Emu just runs the full x86-64 WINE process and eats the overhead of
emulating everything WINE needs to do. With the WOW64 layer now implemented, a bunch of the WINE code can now be Arm64 native code and when it needs
to execute application code it just jumps back to the emulator. This is similar to how Windows natively handles its emulation through its "XTA" layer.
Sadly today this is only wired up to work through a 32-bit x86 part of the layer, we need to get setup to support Wine when it inevitably supports
Wow64 for x86_64->Arm64.

Big shout out to ByLaws implementing support for this! We look forward to future Wine integration work landing!

Implement thunking support for wayland-client and zink

We have some improvements to thunking this month! As we are working towards supporting thunking more code, we implemented some features to get
wayland-client thunking wired up. While this support is early, it is enough to get Super Meat Boy up and running using wayland and zink overrides
within a Wayland environment. We look forward to additional thunking improvements going forward so that performance can be improved everywhere.

Raw Changes

FEX Release FEX-2310

AppConfig
Removes Steam config (02da6d6)
Arm64
Fixes inline syscalls (4e9a114)
Optimize wide shifts slightly for 64-bit OpSize (f5c4e28)
Recover two unused vector vector temporary registers (90f7937)
ALUOps
Remove spills in PEXT (4604c01)
VectorOps
Elide moves where applicable in 128-bit VSQXTUN2 (fd1b639)
Improve handling of 128-bit vector VInsElement (950a8db)
Elide moves in ASIMD VUShrNI2 if possible (b3269f2)
Assert VTMP1 and VTMP2 are sequential in VTBL2 (8168a49)
Fix SVE aliasing-path move in VSShr (ffb5876)
CI
Run tests with <30s runtime first (e1eb151)
CPUID
Enabled Enhanced REP MOVSB/STOSB (6fe643d)
Config
Fixes core sanitization (da3e172)
ConstProp
Fixes unscaled signed 9-bit range (72d092e)
DeadContextStoreElimination
Silence unused function warning (773e946)
ELFCodeLoader
Expose FSGSBase in getauxval HWCAP2 (fbc4bda)
FEX
Moves Linux utils and adds spdx (ba56e51)
Common
Adds SPDX identifier (9f5f09b)
Tools
Adds SPDX identifier (ddf4b5c)
FEXCore
Support CpuState relative vector named constants (3413eb3)
Merge Arm64Dispatcher in to Dispatcher (935b3a3)
Removes x86 JIT. (879b41c)
Removes vestigial Interpreter code (65b6df9)
Support preserve_all ABI for interpreter fallbacks (fea72ce)
Gut interpreter (7d99eb0)
Adds SPDX identifier (67680d7)
Implements support for shifted bitwise ops (d578256)
Disable Enhanced REP MOVSB if Atomic TSO is enabled (0604336)
Defer setting x87 softflow rounding mode until use (9866e23)
Minor optimization to StoreRegisterSRA (6fdf2f9)
Include
Adds SPDX identifier (d86f41e)
JitSymbols
Buffer writes to reduce overhead (2ea2300)
FEXServerClient
Adds back ServerSocketPath config option (e795ec6)
FHU
Prepend SPDX identifier (3b188b7)
FileManagement
Fix inverted boolean check for procfs/interpreter support (615ab8d)
Github
Changes jobs to have unique names (220761a)
HostFeatures
Fix x86 CLWB support check (6e08ac6)
Detect FlagM/2 (98f1487)
IR
Changes Select operation to not have implicit sizes (a8c1720)
Changes crc32 operation to always return a 32-bit result. (6d9b524)
RCLSE: Partially reenables the RCLSE pass (879fcdc)
InstCountCI
Enable running on x86 hosts (a1a709f)
Support f64 reduced precision mode tests (5eed24a)
Fail CI if there was any difference. (93aeb15)
Adds negative immediate primary tests (c38beff)
Add log before compiling instruction (1804b00)
Adds missing instructions from Secondary OpSize tables (750d909)
OpcodeDispatcher
Don't mask logic op inputs (d1d3de8)
Optimize lock btr (2b7e1d1)
Optimize reconstructing FSW (5fc8699)
Removes non-explicit SelectCC function (43fd159)
Improve output of SHLX/SHRX/SARX (ad8b0c6)
Improve output of MULX (e574cfe)
Handle RORX corner cases better (647629a)
Optimize cmov (c8e7c34)
Optimize CRC32 (96bbd01)
Optimize 16-bit MOVBE (f84a264)
Optimize blendp{s,d} (92824f5)
Optimize pins{b,w,d,q} (213d3c4)
Optimize pextr{b,w} (d4c6749)
Optimize shufpd (655cee0)
Implement shufps with VTBL2 in worst case (31d8283)
Optimize a bunch of shufps variants (cfe620a)
Optimize 32-bit bswap (ebdca02)
Optimize NOP vector move (f7e652b)
Minor optimization to BT/BTC/BTR/BTS (48521a4)
Update 32/64-bit RCL for operating size (950007c)
Update 32/64-bit RCR for operating size (d029394)
PassManager
Optimize out CPUID and XGetBV calls (234e029)
RCLSE
Optimize redundant store->load operations (6dc5c0d)
Scripts
Update generate_doc_outline for moved FEXCore (9ff5544)
Thunks
Only build guest target for libfex_thunk_test if FEXLinuxTests are enabled (507cf82)
Analyze data layout to detect platform differences (48fa4f1)
Fix AddressSanitizer build (1439874)
Update Vulkan thunk to v1.3.261.1 (76d4637)
Avoid recompiling thunk interfaces on FEXLoader changes (533f359)
Minor restructuring and small cleanups (be07254)
wayland
Add support for APIs required by zink and Super Meat Boy (0d9dce9)
Tools
Fixes usage of waitpid in the face of EINTR (dda5861)
X87F64
Implement FABS with vector instruction (3a25dd6)
Use Bfe for rounding mode, FCHS use float instruction (ccfd770)
Misc
Minor AVX optimizations (3ba1c79)
Optimize ASCII flags (6b4ff4a)
Use adcs (ca87d86)
Optimize 8/16-bit CF calculation (8b3881b)
Optimize PF calculation in lahf (19a7b51)
More opts to the dispatcher + 1 to the JIT (bee9730)
Requiem for the x86 jit (86ad35c)
Add WOW64 JIT frontend (797c890)
Optimize reconstructing x87, harder (5444810)
Make x87 FCMOV slightly less terrible (65d558b)
Minor/flag opts (8b52308)
InstCountCI/VEX_map3: Add missing zeroing vperm2f128/vperm2i128 test cases (3d0b664)
Inline constant with PF calculation (9152fb0)
Optimize out carry invert for DEC (b6922df)
unittests
Instruct CTest to print output from tests on failure (e32601f)
Add test thunk library (000fb2e)
ASM
Implements tests for vpgatherqd/vgatherqps (ab4642a)
Implements tests for vpgatherqq/vgatherqpd (d94e5ce)
Implements tests for vpgatherdq/vgatherpq (dad7086)
Implements tests for vpgatherdd/vgatherps (85da0f0)
Adds unit test caught by #3153 (0d12cce)

FEX - FEX-2309.1

Published by Sonicadvance1 about 1 year ago

Hotfix patch to fix a bug around accessing files!

FEX - FEX-2309

Published by Sonicadvance1 about 1 year ago

Read the blog post at FEX-Emu's Site!

Last month we hinted that we didn't get all optimizations in that we wanted. There's more of that this month but we have also had an entire month to
push optimizations in. This month was a whirlwind of optimizations improving performance all over the place because of one feature that landed;
Instruction Count continous integration! Let's dive in to what this is.

Instruction Count CI

This is a major feature that we added last month that doesn't directly affect users but is such a huge quality of life improvement to our developers
that we need to discuss what it is. At its core, InstCountCI is a database (Actually JSON) of x86 instructions that FEX-Emu supports and shows how
that instruction gets converted to Arm64 instructions. This is in textual format for easily reading these instruction implementations and updating
quickly when the implementation changes. This has had a profound effect on our developers where they can't help but look at poor instruction
implementations and finding ways to optimize them.

<- Optimized versus non-optimized picture ->

As you can see in the example, one very complex instruction that was not optimal before has now translated in to something much more reasonable.
So far this has nerdsniped at least half a dozen developers in to finding more optimal implementations of these instruction translations!

Some design considerations of this must be understood when looking at FEX's instruction implementations although. The most important thing to remember
is that these implementations are looking at the instruction in a vacuum. These are translated as only single instruction entities, so any sort of
multi-instruction optimization is not going to be visible in this CI system. Additionally this isn't getting run on hardware in our CI, so
implementations that are close on instruction count may have wildly different performance characteristics depending on the hardware. So while it is a
good guide for getting eyes on the assembly, there still needs to be some knowledge as for what the translation is doing to ensure it's both fast and
correct.

This CI system was used heavily this last month for what our next topic is.

Optimization Extravaganza!

With InstCountCI in place, we can now quantify optimizations going in to the FEX CPU JIT without accidentally compromising performance of other
instructions. With this in-place we have had an absolute ton of CPU optimizations land in our JIT, enough that if we went through them all it would
take longer than all of previous progress reports!

Instead of going through each individual change, let's just discuss the main optimizations that have landed. The bulk of optimizations has
been making sure the translation between SSE instructions to Arm64's ASIMD instructions is more optimal. This is because reasoning about vector
optimizations is easier in this instance, and also because games more heavily abuse vector instructions than regular desktop applications. There were
other optimizations like some flag generation instructions becoming more optimal and eliminating redundant move instructions as well!

Let's take a look at the bytemark results.

<- Bytemark graphs ->

There's some surprising uplift in numbers here! Even more so since bytemark shouldn't heavily utilize SSE instructions so this is more just coming
from general optimizations that occured. Let's take a look at another benchmark for fun.

<- Geekbench 5.4.0 graph ->

Whoa, that is a surprising uplift in one month! Geekbench actually has some
benchmarks that use vector operations so they can get improvements more improvements than expected. We should expect even more performance once we
start optimizing more non-vector instruction translations!

As for gaming benchmarks, we're not going to do some in this blog post, but we have been told that due to various optimizations this month that Portal
performance has gained 30% and Oblivion has 50%. Big improvements towards making games feel better when playing them. Main concern here is that the
Adreno 690 in our Lenovo X13s test systems are actually quite unstable during testing, so finding suitable games that are CPU bound without crashing
the kernel driver is surprisingly difficult. Most of the lighter games that don't crash the MSM kernel driver are already running at hundreds of FPS
anyway so it isn't interesting.

A fun quirk of optimizing vector operations this month, we have finally landed our first optimizations that use ARM's SVE instruction set when
operating at 128-bit width. Turns out there are a few optimizations that can be done here aside from implementing AVX with the 256-bit version! I'm
sure we will see more of these as we continue optimizing.

Remove most implicit sized IR operations

Continuing from the last topic, this is one of the main changes that allows us to start working on non-vector instruction optimizations. FEX's IR
around general purpose ALU operations has a history of using implicit sized IR operations. This means we would check the size of the incoming data
sources and make an assumption for what the operating size of the whole thing should be. While this worked, it has been an absolute thorn in our side
for years at this point. Any time we would make a seemingly innocuous change it would subtly change the behaviour of some IR operations as a new size
propagates through the stack. Now that all of these operations explicitly state their operating size at generation time there is less room for error.

This follows with how our vector operations worked, where all of these were explicitly sized from the start and has had significantly less issues over
time.

With this change in place we can start optimizing general purpose ALU operations with less worry about breaking the world.

Mingw work

Some more work this month towards getting WINE WOW64 support wired up. Adding a toolchain file to help
facilitate cross compiling, stop saving and restoring the x18 platform register and various other things. While full support isn't yet merged, there's
a lot of preliminary work landing so we can support this. While this work is very early, it is already showing significant performance improvements
for Windows native games. A game like Bioshock Infinite is already running faster than FEX emulating x86 WINE fully! Look forward to future
improvements and integrations as this gets wired up!

Raw Changes

FEX Release FEX-2309

ARM64
Optimize vector zeroing (https://github.com/FEX-Emu/FEX/commit/eaed5c47040d98d3444880308cdcea0759660b15)
ARMEmitter
Handle SVE load and broadcast quadword groups (https://github.com/FEX-Emu/FEX/commit/a9dea29f03f0903e75075854fee3b9c919bcf459)
Handle SVE load and broadcast element group (https://github.com/FEX-Emu/FEX/commit/710a3928ff5954003b29003a51bce6f94877ca71)
Handle load/store multiple structures (scalar plus scalar) groups (https://github.com/FEX-Emu/FEX/commit/eda67eb0a5770e581197e38e6a60a8adf0e15c20)
Handle SVE ADR (https://github.com/FEX-Emu/FEX/commit/139dd4ccbcc48dc3acdf291bee81327316af9641)
Handle SVE CPY (immediate) (https://github.com/FEX-Emu/FEX/commit/72357e50a237c2b5acde1aa0e40fdea045e9b84f)
Migrate off vixl float utils (https://github.com/FEX-Emu/FEX/commit/5a0a6dd0ca42c102f4aa4e435d606a800fce6931)
Handle SVE FCPY (predicated) (https://github.com/FEX-Emu/FEX/commit/8fce13386af824075df1209b049d56e382c0c4bc)
Remove resolved TODO comment (https://github.com/FEX-Emu/FEX/commit/5de7eeea2072f97deb1335993711fb54ac665eae)
Handle contiguous first fault load (scalar plus scalar) group (https://github.com/FEX-Emu/FEX/commit/0109e88082f8d761d2690667db1ef38eb7e19075)
Handle SVE FP multiply-add long groups (https://github.com/FEX-Emu/FEX/commit/6f4a23dd152f2207aeb4ee4734d30789b2f6ebde)
Arm64
Only allocate vixl::Decoder if enabled (https://github.com/FEX-Emu/FEX/commit/2d78b1fbcaa98e5d09ddb975487d815faee7d69d)
Optimize AES operations by caching a zero register (https://github.com/FEX-Emu/FEX/commit/7f997387b0231f9c367d0ed0de105c083ed13b8b)
Optimize AESKeyGenAssist (https://github.com/FEX-Emu/FEX/commit/02b891c0febd7f5816b0a4141028f9f21dc04ec8)
Optimize VFMin/VFMax (https://github.com/FEX-Emu/FEX/commit/0819338dbf8d8d112f12a8aeff1e87a6ef45cc6c)
Optimize SVE VInsElement (https://github.com/FEX-Emu/FEX/commit/1f2c5fc6c64813c3478b8672b847599797fefbec)
Stop abusing orr in LoadConstant (https://github.com/FEX-Emu/FEX/commit/6d562f8b3bfdbc7f109e0a8231f2864c74ab8e7e)
Optimize non-optimal BFI move case (https://github.com/FEX-Emu/FEX/commit/1029bb1faeb0899ab4e4cf7bdf561c5a2114ada3)
Optimize CacheLine{Clear,Clean} (https://github.com/FEX-Emu/FEX/commit/1343c14db049f8a9b3a9c2bea4865387b83374bd)
Adds stats to the disassembly (https://github.com/FEX-Emu/FEX/commit/53ac8abce924b5e8b2558b9522ceaf97232d7dbf)
Implement first SVE-128bit optimization (https://github.com/FEX-Emu/FEX/commit/fe351353f6e20b87114e9dfb69811db6cb2982c4)
Remove erroneous LoadConstant (https://github.com/FEX-Emu/FEX/commit/c4c7620ed5a293a0d1c17e5b5a8b5131d2794624)
ConversionOps
Remove redundant moves in AdvSIMD VInsGPR (https://github.com/FEX-Emu/FEX/commit/6e4765d48b15317df70233a8eb7fc6934c72d1fc)
Add missing half-precision conversions to scalar functions (https://github.com/FEX-Emu/FEX/commit/172c8f3ba6ac8d4dc9a0e9f7f741a3707bfdf289)
Add scalar support to Vector_FToI (https://github.com/FEX-Emu/FEX/commit/a62ba75ede98de3a6f208a72ef44d09052f601e3)
EncryptionOps
Use MOVI reg, #0 to zero vectors (https://github.com/FEX-Emu/FEX/commit/4286d44e92833927ff05d24832886d5284863b7e)
VectorOps
Remove redundant moves in SVE VExtr when possible (https://github.com/FEX-Emu/FEX/commit/4297e13fcf50ed63679412682c71eda150c93235)
Remove redundant moves from VSQXTN2/VSQXTUN2/VSQSHL/VSRSHR (https://github.com/FEX-Emu/FEX/commit/5b8a0f1e0d7ac7e6051746d4818954981b72e02f)
Remove redundant moves from SVE variable/immediate/vector shifts when possible (https://github.com/FEX-Emu/FEX/commit/926b8c2c97f6bcc697f3a23660f0f42706c593fc)
Remove redundant moves from SVE BSL when possible (https://github.com/FEX-Emu/FEX/commit/ec6548e3028997f108dfcc449d9fab1fb45f26ab)
Remove redundant moves from SVE V{S,U}Min/V{S,U}Max when possible (https://github.com/FEX-Emu/FEX/commit/350bca97c6a1ea3cc8b2ac5e881457190fb51625)
Remove redundant moves from SVE VFMin/VFMax when possible (https://github.com/FEX-Emu/FEX/commit/226405880f37a913b28c25cbade33c482b45fea2)
Remove moves from SVE VFDiv if possible (https://github.com/FEX-Emu/FEX/commit/da098d8204644c49c5979c4ed39a2b098b59c7e4)
Remove redundant moves from SVE VURAvg if possible (https://github.com/FEX-Emu/FEX/commit/2501ebc1cdd955427c36761603be4e6b17dd2e30)
Remove redundant move in VFRSqrt SVE path (https://github.com/FEX-Emu/FEX/commit/6ad053a1e6574412fbed8f2311e5e5ea41a859ec)
Arm64Emitter
Stop saving and restoring platform register (https://github.com/FEX-Emu/FEX/commit/affbcd22417decd8ba05a0b48256cd815d01cb47)
Ensure that 128-bit predicate is generated with SVE (https://github.com/FEX-Emu/FEX/commit/49b8b7cd2c6bf7d6844682e220a7131622c25fb5)
CMake
Add mingw toolchain file (https://github.com/FEX-Emu/FEX/commit/f1aa6208eb54943ee508bdade9ef0d53d2df76be)
Config
Minor changes (https://github.com/FEX-Emu/FEX/commit/3885bc42f6457cbab231329138374f991e45e27d)
ConstProp
Adds constpool distance heuristic (https://github.com/FEX-Emu/FEX/commit/67a26a0e98f915be4fffe5a2f2ee36d3eb1db31f)
Fix set-but-not-used mask variable (https://github.com/FEX-Emu/FEX/commit/2f0c690d5437bbd6cbb3b63406204fc38c29f826)
Context
Adds helper to reconstruct and consume packed EFLAGS (https://github.com/FEX-Emu/FEX/commit/ea965810e5c0dd7b89c8800e56fee3d9323fee9a)
Externals
Update Catch2 to v2.13.10 (https://github.com/FEX-Emu/FEX/commit/e025d325314fad25cd1bf28398ea624beed8a25e)
Update fmt to 10.1.0 (https://github.com/FEX-Emu/FEX/commit/b0ec4197bae08d88487c5e1f7b211576b05843a6)
FEX
Adds instruction count CI (https://github.com/FEX-Emu/FEX/commit/6c7371af60160d1365a016b8252749dde6597dc7)
Create a CommonTools static library (https://github.com/FEX-Emu/FEX/commit/f3182036bcab26933abcaa5233b145557e79898a)
FEXCore
Rework X87 tag word handling (https://github.com/FEX-Emu/FEX/commit/8bae58dcc3b57949607b9274153d11dfef0e9bd4)
Allows disabling telemetry at runtime (https://github.com/FEX-Emu/FEX/commit/d738538a34001bea000b40737cd81034d1b1edd8)
Adds telemetry around legacy segment register setting (https://github.com/FEX-Emu/FEX/commit/a523858f667c7b36b6262f9721dfeb172fa3f22f)
Allow for interrupting the JIT on block entry (https://github.com/FEX-Emu/FEX/commit/fc84f6b345918beee3867cb7d3ef4519cb38e7d6)
Fixes bug with 32-bit adcx (https://github.com/FEX-Emu/FEX/commit/da17e24996207074edf59ecdd7bda655e31972cb)
Fixes Arm64 stats disassembly (https://github.com/FEX-Emu/FEX/commit/099f29f1ed6b398b92af79e4f5b5ea3aa59bf6cf)
Fixes vector shifts by zero (https://github.com/FEX-Emu/FEX/commit/c77ed78f5a7755dae586910e00836c4f61fcfffb)
IR
Fixes bug in IRDumper without specification (https://github.com/FEX-Emu/FEX/commit/6cb0f52e9487e6ab7eaf57839f681ffa370b5149)
FEXInterpeter
Fixes compilation when telemetry is disabled (https://github.com/FEX-Emu/FEX/commit/5cc30bdf1072092829f1dcf872f866f2937cfa1b)
FEXInterpreter
Supports procfs/interpreter (https://github.com/FEX-Emu/FEX/commit/016c3c0f07e3c66f0a3666eb0485e52cb48e81a3)
Filemanagement
Optimize GetSelf using a string_view (https://github.com/FEX-Emu/FEX/commit/9f3730f251c20c12fccc2f55309c1d5c630430f5)
GIthub
Only enable InstCountCI on an ARM platform (https://github.com/FEX-Emu/FEX/commit/df3d4efc80340c24a9fa3f54940b2eca48f3a231)
Github
Adds a CI runner for 128-bit SVE testing (https://github.com/FEX-Emu/FEX/commit/969ad9b3b063ac5acdf10f92bbf322592143c6cf)
IR
Remove phi nodes (https://github.com/FEX-Emu/FEX/commit/16f826c18e8f05551101d518dd2ae1f496b41376)
Adds printer for OpSize (https://github.com/FEX-Emu/FEX/commit/2c2081c9778dde2c2c17000cd76fddb39e4328e3)
Adds IR::OpSize to IRDumper (https://github.com/FEX-Emu/FEX/commit/e1c2033fa9fac9927be6af77199cc6f80787fbc8)
Removes implicit sized add (https://github.com/FEX-Emu/FEX/commit/b40f784da0600357cc6d12b62e8909a7c286c4fa)
Removes implicit sized bfe (https://github.com/FEX-Emu/FEX/commit/351c0eee4252cf817c60c36dde61c09ce48fec6c)
Removes implicit sized and (https://github.com/FEX-Emu/FEX/commit/8239f8aa272d18543b628ad70c90eb5e93af5fbd)
Removes implicit sized sub (https://github.com/FEX-Emu/FEX/commit/59a4d15907154cc01f665e75e15411c758ecdb2a)
Removes bfi from variable size (https://github.com/FEX-Emu/FEX/commit/227ba9f9fc898532d15ab039c6c972d4f8f851bd)
Removes implicit sized xor (https://github.com/FEX-Emu/FEX/commit/9b55a34e75c19f0bd7ae0b3c2de1d58e484a0d72)
Removes implicit sized andn (https://github.com/FEX-Emu/FEX/commit/e9a3848602591ebb119721cba6523f9a1e88a2dc)
Removes implicit sized or (https://github.com/FEX-Emu/FEX/commit/516f27bff562e6b7b6318e6554ede535895d3dfc)
Removes implicit sized lshr (https://github.com/FEX-Emu/FEX/commit/24a9254e37eca256d0863561070e8c51c73b29d5)
Removes implicit sized lshl (https://github.com/FEX-Emu/FEX/commit/8534d3dfbf964e86f0f75d9c5410661a8bc1295a)
Removes implicit sized mul ops (https://github.com/FEX-Emu/FEX/commit/915e52046cdaa4ace98938c633ef6e52b36ed3f6)
Removes sext IR helper (https://github.com/FEX-Emu/FEX/commit/a55616e4dbf6e1ef510f4460cc39901bff7b0168)
Removes implicit sized {Create,Extract}ElementPair (https://github.com/FEX-Emu/FEX/commit/6f661534a2f7c42eeddba121245e3ee5e6b6fa64)
Convert all Move+Atomic+ALU ops from implicit to explicit size (https://github.com/FEX-Emu/FEX/commit/590345b1256155cacfdea794dd602a729bf8a2c2)
Fixes RAValidation for 32-bit applications (https://github.com/FEX-Emu/FEX/commit/547daf8b6e93d260e2ce07323e6822578c82e4a9)
Implements support for wide scalar shifts (https://github.com/FEX-Emu/FEX/commit/084d102c9ce6f1629a9e19d848e7d27ef577f575)
Allow 128-bit broadcasts in VBroadcastFromMem (https://github.com/FEX-Emu/FEX/commit/425b0347a9e04d6b98054751808a359e1c9775ba)
Add VBroadcastFromMem opcode (https://github.com/FEX-Emu/FEX/commit/cfc63680645bc38395de425f93a1901225c46a71)
Adds Option to run the IRDumper with more configurations (https://github.com/FEX-Emu/FEX/commit/ea8fbc61c2f7e687ce5e54a7e145ca65dcc03bdd)
ConstProp
Ensure that BFI with constant bitfields can optimize to Andn or Or (https://github.com/FEX-Emu/FEX/commit/a2e5c231ae6b233787b157534b15c1a1fd47105c)
InstCountCI
Update for previous changes (https://github.com/FEX-Emu/FEX/commit/b801bacb3663d7c56246f96ebeac68cdb29c4f52)
Test rorx at max mask size (https://github.com/FEX-Emu/FEX/commit/e2ac97f4dbccbc3b5ada14cca24f71bb60727701)
Fix some mislabeled instructions (https://github.com/FEX-Emu/FEX/commit/4443c667ec598ce27422404b8258a8dd718061e6)
Add newline to end of file (https://github.com/FEX-Emu/FEX/commit/f4f9b20f3261b39834e972149ce6ee3c2f990775)
Support encoding expected Arm64 ASM in JSON (https://github.com/FEX-Emu/FEX/commit/dc0cf98a8125af972f732caa876fd949bddb8168)
Update tests from actual ARM64 device (https://github.com/FEX-Emu/FEX/commit/f95ef7092c22fad8d4eebaba4f6f59bc49e51a6d)
Adds RNG support (https://github.com/FEX-Emu/FEX/commit/dac220a6ff5d9b30fc0ae99b4eaa557e52c82893)
Disables tests with unsupported configurations (https://github.com/FEX-Emu/FEX/commit/da334fe7d53e988d9b5f7d3b29ee3590beb78b62)
Ensure output nasm name doesn't conflict (https://github.com/FEX-Emu/FEX/commit/3c99fb84e2bf435104dbf726c9adce20033cd0b2)
Adds primary tables (https://github.com/FEX-Emu/FEX/commit/63f28eae4fd4805e4714524c3e428106d50dea34)
Adds secondary prefix tables (https://github.com/FEX-Emu/FEX/commit/135b9ac4259347c42b74b3d2c8796d8ff684db27)
Adds secondary tables (https://github.com/FEX-Emu/FEX/commit/24d01cd8d2af9edaa5e23e76cb208c0819557e61)
Adds Primary group tables (https://github.com/FEX-Emu/FEX/commit/103e6044fa2cb7d9e4169eee38b234403075e393)
Adds VEX map3 tables (https://github.com/FEX-Emu/FEX/commit/e334278f80c27c735318733af0f6850b9e8c55c8)
Adds VEX map group tables (https://github.com/FEX-Emu/FEX/commit/7fe2d3b3057b7267e690fd3d92daca4ca40eff22)
Adds FEX map2 tables (https://github.com/FEX-Emu/FEX/commit/871434dab946b5e14b1104ddf8555da8eaa78bd0)
Adds VEX map1 tables (https://github.com/FEX-Emu/FEX/commit/a4647982cd307c947eec9793b498e27bd5e43703)
Adds x87 table (https://github.com/FEX-Emu/FEX/commit/1146c42044cf3297d864505f850de30af4a6422a)
Adds H0F38 table (https://github.com/FEX-Emu/FEX/commit/a15934bd5564e65dc0a96be2d0ff1b861d394b5b)
InstructionCountCI
Adds three more instruction tables (https://github.com/FEX-Emu/FEX/commit/6d1fcfce0909e8f10019a5a7c8e05461370fd731)
Interpreter
Tie SSA data elements to supported vector width (https://github.com/FEX-Emu/FEX/commit/e765bd8986976a9fa82aa321b34de56df37475ee)
Use alias for temporary vector data (https://github.com/FEX-Emu/FEX/commit/8b9ee997b212b9806cd3e8a85fa3dd6da273d642)
Linux
Call exit_group when application tries (https://github.com/FEX-Emu/FEX/commit/0195bb6e5a4ecdda5080ff05565036131bdcb4f2)
LogMan
Commonise log level to string conversion (https://github.com/FEX-Emu/FEX/commit/b4d172649e958d37782b0f7f4ff44ae45b534f12)
OpcodeDispatcher
Optimize CMC (https://github.com/FEX-Emu/FEX/commit/c27f69dd6bcd81e091b46a604f9b412e53e5a5b0)
Fixes NZCV and PF flag compacting (https://github.com/FEX-Emu/FEX/commit/b18592f153586f099ce69aba2f382d5d3201ac3d)
Remove final assumptions about small IR operating sizes (https://github.com/FEX-Emu/FEX/commit/8017a91e52110e8f97a245c684a46b93ecaf733a)
Cleans up RFLAGS size handling (https://github.com/FEX-Emu/FEX/commit/62fcf6cbd0ef8a2b3ff4b3d7b3a0ab7c27a35e17)
Optimize calls with push (https://github.com/FEX-Emu/FEX/commit/7c81a0d4fe63d058929d90d1819675c76b21ef76)
Optimize BLENDV when xmm0 is one of the sources (https://github.com/FEX-Emu/FEX/commit/9fb8c95ef450fe7b717d91569c5ac29591dfdb5a)
Removes erroneous debug log (https://github.com/FEX-Emu/FEX/commit/012750f2bbec3d0bcef04fb1d518963e9cc4a664)
Optimize Get{Src,Dst}Size (https://github.com/FEX-Emu/FEX/commit/c1b4c11e54f75f9a7f436f98c2b3c180d395d1da)
Optimizes SSE movmaskps (https://github.com/FEX-Emu/FEX/commit/1d7c2803677476d8648e62d5e32f90aa31176277)
Optimize PSHUF{LW, HW, D}! (https://github.com/FEX-Emu/FEX/commit/db3dc3edf4610c401dd896cdad1ea6dd583952e1)
Optimize 128-bit movmaskpd (https://github.com/FEX-Emu/FEX/commit/bf12f082185c49548d4897e7f8cddbd07f76b8a7)
Optimize movddup from register (https://github.com/FEX-Emu/FEX/commit/1f7d138d2a47a57773488d8a8e889f879c849f82)
Optimize cvtdq2pd from register source (https://github.com/FEX-Emu/FEX/commit/f36f07055a5af6431774f8e28f98e02f16617051)
Optimizes movq (https://github.com/FEX-Emu/FEX/commit/30a1a382c45be95ccb75f8391be69df2a7840b4a)
Optimize nontemporal moves (https://github.com/FEX-Emu/FEX/commit/631655dd8165d7265d86db7151e556aebb6a42f0)
Generate more optimal code for scalar GPR converts (https://github.com/FEX-Emu/FEX/commit/0ef439f830e2e467316c5499282da5d6d7ac4665)
Optimize cvtps2pd (https://github.com/FEX-Emu/FEX/commit/80d871fb18d1c332f330d92e710299a469bb0df9)
Optimize MMX conversion operation (https://github.com/FEX-Emu/FEX/commit/4d58ec10251b7a61b365a1963bdb0de4dc32e67c)
Optimize addsubp{s,d} using fcadd (https://github.com/FEX-Emu/FEX/commit/200dbdd0e62fa994386a3a116cd3ece201ef92e9)
Cache named vector constants in the block (https://github.com/FEX-Emu/FEX/commit/df99b7b9b63314d730b9776f7bd5f63fcf600f16)
Optimize AddSubP{S,D} (https://github.com/FEX-Emu/FEX/commit/ab83ab42ddb5b866c2998600cc36e5558697cec1)
Optimize PMULH{U,}W using new IR operations (https://github.com/FEX-Emu/FEX/commit/66c6f961203bf2c63f75dd494bd4f4bedc440251)
Remove redundant moves from PCLMULQDQ and AES operations (https://github.com/FEX-Emu/FEX/commit/1aa2c534f7a4be43e65cd8a2f16a83f79a2fa61f)
Use new IR ops for pack instructions (https://github.com/FEX-Emu/FEX/commit/ee10153d140a2708f4dc9cb83154622bc15813ae)
Optimizes scalar movd/movq (https://github.com/FEX-Emu/FEX/commit/4b06069c0d735b1122120a398cb7b7e043397183)
Remove redundant moves from remaining AVX ops (https://github.com/FEX-Emu/FEX/commit/b646f4b78106d6924cf96ef2f4df0e6d4fd91d4c)
Remove redundant moves from VPACKUSOP/VPACKSSOp (https://github.com/FEX-Emu/FEX/commit/a40526a541f198e99563dd434dd7f0c56bb00c46)
Remove unnecessary moves from AVX move ops where applicable (https://github.com/FEX-Emu/FEX/commit/86ef6fe48d290235466eefce8619cb89397e4fcd)
Remove unnecessary moves from AVXExtendVectorElements (https://github.com/FEX-Emu/FEX/commit/6624f50abf8c20af1d5cb698bfa11c071e21ddf3)
Remove unnecessary moves in AVXVFCMPOp (https://github.com/FEX-Emu/FEX/commit/cd1f401363e2d15290537b0f6c34e1cb0c716762)
Remove redundant moves in AVX blend special cases (https://github.com/FEX-Emu/FEX/commit/76430baf888b7a5f994d59b56a06c5042c762023)
Optimize MOVHP{S,D} (https://github.com/FEX-Emu/FEX/commit/819fe110daadcfb69c3857aef77e523818c93f8d)
Remove unnecessary moves from AVX inserts (https://github.com/FEX-Emu/FEX/commit/adfd6787c0bcd6dd3c82518af57aac4a301bc974)
Optimize MOVLP{S,D} loads (https://github.com/FEX-Emu/FEX/commit/bb2f7107cdc26f37f3c778e6a15dbbf540f15eda)
Remove unnecessary moves from AVX register shifts (https://github.com/FEX-Emu/FEX/commit/5d44a445ddd9c64f146746cb72545d4454b205ea)
Remove redundant moves from AVX immediate shifts (https://github.com/FEX-Emu/FEX/commit/fb65fb29c7debe4f97c0294efe3c8021c7db43ad)
Remove unnecessary moves from AVX conversion operations (https://github.com/FEX-Emu/FEX/commit/36a54183f532dd8f21fbd8b3263d84b7d1cd0d6d)
Remove unnecessary moves from AVXVariableShiftImpl (https://github.com/FEX-Emu/FEX/commit/ead141fd904c5cce4afec0e36bafefb860905397)
Remove unnecessary move from VPHMINPOSUW (https://github.com/FEX-Emu/FEX/commit/14144523f7324567ad2e40802b033d366059ce73)
Optimize phminposuw (https://github.com/FEX-Emu/FEX/commit/ed7f1b017d19f7d5c8037107dd8096e350fdaa36)
Optimize PFNACC (https://github.com/FEX-Emu/FEX/commit/6c7933e7b10f653e82446e78fa03c482887b3fc7)
Optimize hsubp (https://github.com/FEX-Emu/FEX/commit/364f08460420de07c982d7dd7a139554b8512dd2)
Remove redundant move from VPSIGN (https://github.com/FEX-Emu/FEX/commit/ffa8f1e3dcea6a08eeda2bddd9a08fc2c298f6fa)
Optimize pmuludq (https://github.com/FEX-Emu/FEX/commit/9df94d8a938fdb02b00c309c43ff2e4bca61a9c7)
Remove redundant move in AVXVectorScalarALUOpImpl (https://github.com/FEX-Emu/FEX/commit/71984fc0ea11b7171606ab9bc96abf075aba8d08)
Remove redundant moves in AVXVectorALUOp (https://github.com/FEX-Emu/FEX/commit/3c88671cca0d37afb69b05bcdf34b5adbf903914)
Optimize pmaddwd (https://github.com/FEX-Emu/FEX/commit/185e3bfcb6513e1ff795142fbb23880ab1774778)
Optimize phsub (https://github.com/FEX-Emu/FEX/commit/3c49b3238a3fedd3e5c8de025d8bc8f0da80788e)
Handle zero immediate shifts better (https://github.com/FEX-Emu/FEX/commit/3a2a576c35bbb7781cbe55d8e7af2d03e21a4d0a)
Optimizes mpsadbw (https://github.com/FEX-Emu/FEX/commit/8ee6262e5e427380e80971d0045cc20eba4f63b0)
Optimize SSE/AVX pmaddubsw (https://github.com/FEX-Emu/FEX/commit/34a7feffe8b2bbba94287755d8731fcda99fd524)
Implement support for push IR operation (https://github.com/FEX-Emu/FEX/commit/9e4888c6a13fa4ca1a3a9896db05083ea6b91a79)
Improve SHA1MSG1 output (https://github.com/FEX-Emu/FEX/commit/277345d2a5d1420384f7cfa3058ba001553ff3b9)
Minor optimization around clearing flags (https://github.com/FEX-Emu/FEX/commit/34722348e87ea3c5cbac1098607909a936de1cf6)
Eliminate redundant moves in {AVX}VectorRound (https://github.com/FEX-Emu/FEX/commit/b973c193beb42b3287cd09f26f80e68521eb364d)
Eliminate unnecessary moves in {AVX}VFCMPOp (https://github.com/FEX-Emu/FEX/commit/4c409ea47da91df4eaeca0789800e4bfd0ded19d)
Remove unnecessary moves in {AVX}VectorUnaryOp (https://github.com/FEX-Emu/FEX/commit/ac53913c3766cc54af754f7ae6e49b95111aad52)
Remove extraneous moves in {V}CVTSD2SS/{V}CVTSS2SD (https://github.com/FEX-Emu/FEX/commit/2224c23c79bcd07788036c2aa29a6b3577bcb868)
Remove unnecessary moves in {AVX}VectorScalarALUOp (https://github.com/FEX-Emu/FEX/commit/6ce380d7a9fb5a9e60ee5ed5797af9cfb9e0ee73)
Remove redundant moves from {V}CVTSD2SI/{V}CVTSS2SI (https://github.com/FEX-Emu/FEX/commit/8dade7eea1dada405f3217d05eed8abb39ae2b93)
Remove some extraneous MOVs from VMOVSD/VMOVSS (https://github.com/FEX-Emu/FEX/commit/add5baeba52ec30c08d804a982698f885f79e92a)
Handle broadcasting cases in VPERMQ/VPERMPD (https://github.com/FEX-Emu/FEX/commit/db60a2fd4bb85422d0a6a9c3c6c10d1f3f648992)
Improve {V}PSRLDQ shift by 0 (https://github.com/FEX-Emu/FEX/commit/f09d9af3dbc39050848e4de32ba077945aeafa9b)
Remove unnecessary conditionals in {V}PSLLIOp (https://github.com/FEX-Emu/FEX/commit/461ca6fe7c738383f289d391971af3d5737ae7e4)
Improve VMOVDDUP output (https://github.com/FEX-Emu/FEX/commit/a4a68b47ce5d02021946e28fdaf03cc303da6b7a)
Improve output of {V}MOVSLDUP/{V}MOVSHDUP (https://github.com/FEX-Emu/FEX/commit/f70b6f37a2449f2d8d07c02c4f8d33d2e39429d7)
Remove unused variable in AVXVectorUnaryOpImpl (https://github.com/FEX-Emu/FEX/commit/6580796169a516c6f2f9f873af555e001d28d2b7)
Eliminate unnecessary moves in AVXVectorUnaryOpImpl (https://github.com/FEX-Emu/FEX/commit/0e52158cef950c6ca909f46fb5e4fa307de231b7)
Flags
Update SHLimm to use Opsize upfront (https://github.com/FEX-Emu/FEX/commit/2d2217669986e804bbe5140dead2603dbd817b79)
Update ShiftLeft to use Opsize upfront (https://github.com/FEX-Emu/FEX/commit/338cb199a0d3e9824374fd47f7f502522ac66833)
SignalDelegator
Fix build with telemetry disabled (https://github.com/FEX-Emu/FEX/commit/81046efcd350dc24dc80f53a4d08f5ecc4466347)
Allow getting the internal configuration (https://github.com/FEX-Emu/FEX/commit/6960fca256d4bfb312a078d27a09986daebdbfe4)
Syscalls
Fix telemetry with exit_group (https://github.com/FEX-Emu/FEX/commit/f0ab9603e1c08c1a3c67e66910b34c2a7424547f)
X86Tables
Optimize MOVLPD stores (https://github.com/FEX-Emu/FEX/commit/42200bf7b6abab6839b812a140975f1f2f71353a)
X8764
Ensure frndint uses host rounding mode (https://github.com/FEX-Emu/FEX/commit/7c6660f634a6d8faef064486af840784c042114f)
Misc
Defer PF calculation completely (https://github.com/FEX-Emu/FEX/commit/04228952fac198827b6b17544372427df596af9f)
Remove ABINoPF option (https://github.com/FEX-Emu/FEX/commit/8184c5542425d170bd34097af262a4ad57d16942)
Defer second XOR for AF (https://github.com/FEX-Emu/FEX/commit/a8bc6bbb2e8f0f3c6ade42787513a411ec26f778)
Stop zeroing undefined flags (https://github.com/FEX-Emu/FEX/commit/252ca88b3e9d654e1292aff861522581def1e10e)
Defer AF extract (https://github.com/FEX-Emu/FEX/commit/486f0ba1e35f2ad05b3b9bc5a692063b967e02a3)
Optimize ADD flag calculation (https://github.com/FEX-Emu/FEX/commit/ee8092bdc82eea349d6cfc9800788f7fac51a186)
Remove implicit sized IR ops part atomic (https://github.com/FEX-Emu/FEX/commit/bc1e89d91dd3c82f9e90dd383d626d7dd5a51c96)
Remove implicit sized IR ops part 1 (https://github.com/FEX-Emu/FEX/commit/5415e4c95fa9bf3cc334c395d2e3649035857939)
IR/Passes/RA: Enable SRA for 32-bit GPRs (https://github.com/FEX-Emu/FEX/commit/6f2b3e76ac92a4d1fdebef45a5cf0340bc235448)
x86_64/MemoryOps: Fix mislabeled IR op messages (https://github.com/FEX-Emu/FEX/commit/c5f358b47adfc7afe00f7aeeddbd8fb28b5f601c)
Optimize PSIGN and VBSL (https://github.com/FEX-Emu/FEX/commit/2d7a3a578e988fc1a2fbb0297c8f8c2b663e9bc8)
Support for Config.json loading on WIN32 (https://github.com/FEX-Emu/FEX/commit/d3f0c7e969b16826cf574135c8090212cbbcc8f9)
Move External/FEXCore/ to FEXCore/ (https://github.com/FEX-Emu/FEX/commit/c9856daaeee65bbfd0aa9c88957312e7773e5159)
Various warning fixes (https://github.com/FEX-Emu/FEX/commit/9d26af95ab419ddd433db1590ac3672958472bf3)

FEX - FEX-2308

Published by Sonicadvance1 about 1 year ago

Read the blog post at FEX-Emu's Site!

Whoa jeez, another month already? We've had our heads down working hard this last month, trying to make FEX-Emu the greatest x86/x86-64 emulator on
Linux. A huge focus this month is optimizations because of course what we want is to go fast. We're all cats and we've got the zoomies.

Every day we're optimizing

As said before, this month has been an absolute mess of optimizations as we've been optimizing the project as thoroughly as possible. We could spend
another month talking about the optimizations that we did this last month, so let's blast through what we did. First let's show a graph for how much
FEX has improved over this last month.

Look at those numbers! Some benchmarks from bytemark have cracked the 200% mark! While a couple benchmarks do have regressions, we're pretty sure that
we know what they are and they will be rectified soon. These are the sorts of optimizations that can be felt in real games though.

So lets quickly run through some of the optimizations we ran in to this last month.

Switch to using half-barriers for memory accesses

When ARM hits an unaligned atomic memory access, we previous wrapped that load or store in two slow barrier instructions. We can now safely only use
one barrier on one half of the instruction! This makes unaligned accesses quite a bit quicker.

Optimize x87 memory accesses

This removes a couple instructions when we access 80-bit floats.

Only clear icache for code

Some large code blocks can generate a decent amount of metadata that don't need an icache clear. Can remove a bit of stutter.

Const prop BFI operation

Sometimes when a BFI instruction has constants in it, we can remove the BFI instruction

Optimize vector TSO loadstores

vector operations typically need an additional add on its address if it can't fit in the instruction encoding for the immediate offset. We missed the
optimization in which the immediate offset CAN actually fit. Removes an instruction per vector loadstore commonly

Use TST instead of CMN

Sometimes these instructions hit a slow path on Cortex-A57 so a minor win there.

Optimize xor reg, reg

x86's universally agreed upon instruction for generating zero in a register is xor. This instruction isn't actually optimal in ARM hardware. We now
emit a move of constant zero which gets optimized to register rename on most ARM hardware.

More instructions optimized

These mostly just make the implementations use less instructions which makes them faster. There will be way more of this in the coming month

rotate flag calculations
phsubsw/phaddsw
cmpxchg8b/16b
psad*
8-bit, 16-bit rcr
fcmov
shld/shrd
movss
maskmovdqu
maskmovq
phminposuw
fild
PF flag calculation optimization
Optimizing packing RFLAGS
Optimize ADD/ADC OF flag packing

Fixes bug in SSE4.2 pcmpestri

This was causing Java applications to crash. Now that we fixed a different bug last month, we now have Java working to an extent. It still crashes on
shutdown which is interesting and not all games are expected to work. But good luck testing random Java games!

Pack NZCV flags

This is the first step towards FEX generating x86 flags in a more optimal way. These flags match the ARM flags fairly closely and can be emulated in a
more optimal way if we pack them together. This is likely what causes the regression in bytemark, but since this is an intermediate step it is
expected to go away with the next optimization after this. Look forward to future optimizations that make this faster!

Remove weak symbol declarations in thunks

A bug that cropped up in thunks has been a crash that occurs when trying to use thunks from Ubuntu's PPA system. This has been a major thorn in FEX's
side for months because once you rebuild the project locally, it would never reproduce. The problem stems from the fact that clang would decide that
it can inline a "weak" symbol if its implementation is visible. This would only occur on Canonical's ARM builders, potentially due to whatever device
they use to compile the code on. This would cause our thunks to crash almost immediately if a user tried them from the PPA system. We have now worked
around this clang quirk and this will now fix thunks when enabled from the PPA system.

Mingw build work

As part of FEX's effort towards supporting running as a WINE dll, we have been slowly adding support for compiling FEXCore as a Windows DLL.
This month we have removed a bunch of Linux assumptions and API usages from FEXCore and moved it to the frontend FEXInterpreter application. In doing
so, FEXCore can now be compiled using llvm-mingw as a WINE specific DLL. This is completely unusable for users today but sets the groundwork towards
what will eventually become a WoW64 integration in the future. We have also added mingw building of FEXCore to our CI so we ensure it doesn't get
broken.
To be clear, even though this work allows us to compile as a Windows DLL, this doesn't allow us to run under Windows. FEX still does a bunch of things
that are Linux specific inside of the code.

ARMEmitter cleanups

Another improvement that doesn't affect our users but good to shoutout the improvement for our developers. @Lioncache
has spent a good amount of time this last month adding missing instructions and aliases to our AArch64 code emitter. While our code emitter covers a
decent amount of the AArch64 instruction space, it takes time to ensure full coverage. Whenever we're writing code for our JIT and an instruction is
missing, it slows down whatever we are working on. So kudos for improving our coverage because it makes everyone's lives easier.

Implement missing accept4, recvmmsg, sendmmsg for 32-bit socketcall

In a recent Steam client update, it started using accept4 for some background thing. This would cause it to spam a bunch of logs when failing to
accept some connection. A simple fix just for a few missing system calls, Steam now no longer is complaining loudly.

Fix variadic packing in X11 thunking

WINE had broken X11 thunking for all of FEX's history without any indication as to why. We never had time to look in to this but this last month we
finally hit a game that crashed which made this easier to debug. This bug occured because WINE is one of the few applications that pass more than
seven arguments through a few variadic API calls. This triggered a bug in FEX's variadic repacking code once we starting packing the arguments on to
the stack. With this fixed, WINE X11 thunking now works in significantly more games. This means that both OpenGL and Vulkan applications can be
thunked under WINE.

Fixes dead context store elimination pass

This optimization pass removes redundant stores to FEX's CPU context state. While this usually doesn't save much, it can improve performance for some
edge cases in FEX's JIT. While this is a performance optimization, it likely won't affect many things.

Fix 16-bit POPA instruction

This instruction was accidentally zero extending the 16-bit value in to the 32-bit register. We now insert the 16-bits as expected. This fixes an
issue with OpenAL in some cases.

Raw Changes

ARMEmitter
Add missing atomic aliases (https://github.com/FEX-Emu/FEX/commit/68cb6e61d1c0e9cb0b057dd5790f3105545cf8fe)
Add cinc/cinv/csetm aliases (https://github.com/FEX-Emu/FEX/commit/30ab4d3f5855d2422a369f2e0f80a6b3675c639e)
Add ngc/ngcs aliases (https://github.com/FEX-Emu/FEX/commit/eebcbfda965dbe408d6c06e99162a8df13bb3856)
Add bfc/bfxil aliases (https://github.com/FEX-Emu/FEX/commit/d2bca9b997c8180d3a3b1b9250f4d11a45ae8fd7)
Add sbfiz/tst/ubfiz aliases (https://github.com/FEX-Emu/FEX/commit/4681061011dc7df961747116b407ed3fdde21def)
Finish off remaining SVE Integer Wide Immediate - Unpredicated categories (https://github.com/FEX-Emu/FEX/commit/2fc6542d15ad57c3574d6c06d13552597c70fd86)
Implement cmn alias (https://github.com/FEX-Emu/FEX/commit/1ce0ea8f3b9a0ae35b21193ac387b6931725b66f)
Arm64
Switch to using half barriers (https://github.com/FEX-Emu/FEX/commit/94273fbf4f8397d680eb5d5d30b6c3db64a07c9a)
Fixes LR corruption in 128-bit divides (https://github.com/FEX-Emu/FEX/commit/5821175ddb55cf8bdccaddd59c75ff702a12953c)
Optimize {Load,Store}ContextIndexed address generation (https://github.com/FEX-Emu/FEX/commit/536b2ed495f88173deefc1ec90c643caf8010fe0)
Only clear icache for code (https://github.com/FEX-Emu/FEX/commit/0674dfab0a41ed6165e6e7d52abaf1f7749da0c3)
Emitter: Handle LD1{}/LDFF1{} Vector + Immediate encodings (https://github.com/FEX-Emu/FEX/commit/421214e7235698318a4908eaeb7c4ff5bc844a4f)
Emitter
Add remaining missing SVE predicate range assertions (https://github.com/FEX-Emu/FEX/commit/64c72430bf28dece1c428857b1dcff3661cc50af)
Deduplicate some more SVE implementations pt. 2 (https://github.com/FEX-Emu/FEX/commit/0a1820d44478e639dbed92ed868cb783964aefd1)
Deduplicate some more SVE implementations (https://github.com/FEX-Emu/FEX/commit/9c175da0e2997fef8e44c3f36ad17a06f2578139)
Reorganize some base opcode and assert locations (https://github.com/FEX-Emu/FEX/commit/0be68a54d0bbd88bfb0a24ef4a61a9060d023c3e)
Simplify SVE immediate shift helper (https://github.com/FEX-Emu/FEX/commit/e689c6fcfa48cd1f17ea1ae7cdf2867ebb6b2677)
Collapse encoding cases for indexed dup (https://github.com/FEX-Emu/FEX/commit/d6697fce32463a1fd16dedf60c0ac91affbfa4b9)
Handle SVE FP convert precision group (https://github.com/FEX-Emu/FEX/commit/e633ef7cf55711b8507fe32062cef97df8bd82a5)
Handle SVE FP arithmetic with immediate (predicated) group (https://github.com/FEX-Emu/FEX/commit/754bc188135cc10daa7a19158818ad8eeedbcd3e)
Handle SVE XAR (https://github.com/FEX-Emu/FEX/commit/842b71cf8318c3c7b0405512fbf99d04efc185be)
Add helper for encoding SVE shift immediates (https://github.com/FEX-Emu/FEX/commit/41b3c52663a4512981cb86d344fd4031d7378bb4)
Deduplicate some opcode values (https://github.com/FEX-Emu/FEX/commit/d7200e2a1ebc521199c267b2da20048db35b021e)
Handle unsized contiguous STR variants (https://github.com/FEX-Emu/FEX/commit/46d3f283b87bc58b2410ddf728132857109b8f92)
Handle ST1{*} scatter store variants (https://github.com/FEX-Emu/FEX/commit/724a8e13bf9d06e135d58fa62cfb3db896078061)
Simplify SVEMemOperand data union (https://github.com/FEX-Emu/FEX/commit/699c3f5762a8e15d1bd5c0edf58e8076a999be85)
Handle LD1{}/LDFF1{} scalar + vector load variants (https://github.com/FEX-Emu/FEX/commit/5c27febbc2cda29c51a331992720ebaafa3db0be)
VectorOps
Hoist asserts out of VInsElement cases (https://github.com/FEX-Emu/FEX/commit/5b261b0d2e8af7331a3913b2406443f91a1f980e)
ArmEmitter
Support 32-bit bitmask moves (https://github.com/FEX-Emu/FEX/commit/d9b52fd67df400a98eb78dea18376763a686d6f5)
CMake
Allow overriding linker (https://github.com/FEX-Emu/FEX/commit/0d6837f1a1c42197c507ecd3530098cd354330ac)
Fix pkg version extraction for xcb (https://github.com/FEX-Emu/FEX/commit/96b428dcaab1f146ff0f22c2e817f7a0499c5353)
Stop installing fmt (https://github.com/FEX-Emu/FEX/commit/2c3361be9e0295d6d00ce7d2cdbe862737dbe691)
ConstProp
Handle constant Bfi (https://github.com/FEX-Emu/FEX/commit/29dc77c44bb367b089c5c58c90e2638162b80118)
Fix shift mask in const-prop (https://github.com/FEX-Emu/FEX/commit/daeba0625f6f20cd6aa0fb10001d54db511fccab)
Context
Pull out some long std::function declarations into aliases (https://github.com/FEX-Emu/FEX/commit/7562ff2308a460096928b7baccbaef4ebdef70ad)
FEXCore
Ensure that the man page follows DESTDIR (https://github.com/FEX-Emu/FEX/commit/d196709162dee5d44a2c89fb9d27c401924893da)
Remove erroneous asserts in the project (https://github.com/FEX-Emu/FEX/commit/f5262446a00868c454b88db6c9f794ce219d4e86)
Removes unused TLS variable (https://github.com/FEX-Emu/FEX/commit/6455c4817a4353833037ae4a180039aa4d89893c)
Fixes WIN32 compiling again (https://github.com/FEX-Emu/FEX/commit/2283c73faef847e283721b1406bfe1d65e80df18)
Minor cleanup (https://github.com/FEX-Emu/FEX/commit/462feec2a623cab0af182052c70ee8e64c909f9c)
Config
Adds support for enum mask configuration array (https://github.com/FEX-Emu/FEX/commit/2c91b5cff60fdc744b7623547db5a1a2c93ee272)
FEXLinuxTests
Fixes race in smc-mt-2 (https://github.com/FEX-Emu/FEX/commit/af35e18979d9748ae86b6d4a20e6e7597221589a)
FEXRootFSFetcher
Make verification percent easier to read (https://github.com/FEX-Emu/FEX/commit/7765bbc7b87f4fc4b4a5cb373fe7a461e93aa238)
FHU
Avoid calling faccessat on WIN32 (https://github.com/FEX-Emu/FEX/commit/b34401bb33534d559894371630766bf326a9ed01)
Fix WIN32 getcpu implementation (https://github.com/FEX-Emu/FEX/commit/70d54122b2f938a30eb2c0db368738bcdd2d56df)
Frontend
Remove unused ModRMDecoded instance (https://github.com/FEX-Emu/FEX/commit/f9a9645ef310d6d5191ecf80d9a5aa5025397570)
Remove redundant lookups in BranchTargetInMultiblockRange() (https://github.com/FEX-Emu/FEX/commit/c81b1c34324e9c13fcbd31c462242ae37a540c62)
Handle VSIB byte (https://github.com/FEX-Emu/FEX/commit/4a7fa7f2bcd850ee6074aa3fc2c9fdf9ba0443a7)
Github
Adds mingw build test workflow (https://github.com/FEX-Emu/FEX/commit/0f748bd7247e959106cfdcf9acc06b90d7d96026)
IR
Expand to 16-bit opcodes (https://github.com/FEX-Emu/FEX/commit/003c88e53700b68fe124e1ec0521d5dc09ad02a4)
Print SSA values as %123 instead of %ssa123 (https://github.com/FEX-Emu/FEX/commit/599b64e9753fe82eddfe9082d69f7a49f0d3d321)
Optimize vector TSO loadstore address calculation (https://github.com/FEX-Emu/FEX/commit/810c7d926c0cfab5e302834d27dde0ef7cb79983)
Passes
Fixes DeadStoreElimination pass (https://github.com/FEX-Emu/FEX/commit/ee66985ae08f562a6bcd8d6aea0e744a4ac37fe5)
JIT
Use TST instead of CMN (https://github.com/FEX-Emu/FEX/commit/5a53c9231b6616e63a4ef3c62a7bc784f68c2e94)
Jit
Add block links directly through the lookup cache on thread exit (https://github.com/FEX-Emu/FEX/commit/8d8b64d2b73e4657bc581d361e370c54f404288d)
Linux
Implement {recv,send}mmsg inside of socketcall (https://github.com/FEX-Emu/FEX/commit/0a5db6c404dd14acd8ffbcc811e802ca41e25358)
Implement accept4 inside of socketcall (https://github.com/FEX-Emu/FEX/commit/7c4e4c440970adb6cf5621bbd08a09d13cc73f87)
LongDivideRemovalPass
Remove unused variable (https://github.com/FEX-Emu/FEX/commit/759747b3c5e01d371153b2f43b80f44a9f26db2f)
Mingw
Fixes compiling again (https://github.com/FEX-Emu/FEX/commit/dc8f063a4be554f7b30706b28c0b640093fa8f47)
OpcodeDispatcher
Optimize rotates (https://github.com/FEX-Emu/FEX/commit/25b2af14fdebade8afdc52a5a5bdb40083689862)
Optimize phsubsw/phaddsw (https://github.com/FEX-Emu/FEX/commit/76059949eab5fec27d397e0526b589aaf5d355f0)
Optimize CMPXCHG{8B,16B} final comparison (https://github.com/FEX-Emu/FEX/commit/6e15c9c213e7812a7c602b437ee97b217037396b)
Optimize PSAD* to use vuabdl{2,} (https://github.com/FEX-Emu/FEX/commit/173b70d191500c10f42c33401b92a3f176b964a1)
Fixes SHRD by immediate OF flag calculation (https://github.com/FEX-Emu/FEX/commit/52d7efda1030d25a9c9bb655291f7d662cbf7bdc)
Optimize 8/16-bit RCR (https://github.com/FEX-Emu/FEX/commit/072f027885b53d384d43a4e714eb47bddb92689e)
Another FCMov minor optimization (https://github.com/FEX-Emu/FEX/commit/f7b7997c77917d1f53da84ec3d9f872d84bbea90)
Handle VSIB byte (https://github.com/FEX-Emu/FEX/commit/6979dc9c4e5d36bd9db7b0c15516293b6500d735)
Remove unused member variables and reorganize (https://github.com/FEX-Emu/FEX/commit/be71886990ae02fc92e45efa71f03f3ae8ba54af)
Minor optimization to phminposuw (https://github.com/FEX-Emu/FEX/commit/d3a27951060725188a4a89ab7d3bb7da14f89d3e)
Optimize 32/64-bit SH{L,R}D with extr (https://github.com/FEX-Emu/FEX/commit/3cd6c2d91ab96d47e1f4fdc1a024440926510022)
Minor optimization to FILD (https://github.com/FEX-Emu/FEX/commit/1b1e9e0fb5888091dfc15d932ed32f64914c3e18)
Narrow use of LoadXMMRegister in StoreResult_WithOpSize (https://github.com/FEX-Emu/FEX/commit/68a2441e65c6c0ea6c5517d79122d2fde8a1550c)
Fix and optimize PF calculation (https://github.com/FEX-Emu/FEX/commit/1d7b4bb522612f163072b1269293d5726feac7c8)
Optimize ADD/ADC OF flag packing (https://github.com/FEX-Emu/FEX/commit/9722c4c5a4f26ff0f20c4811c3c3798090063583)
Remove spurious bfe with flag storing (https://github.com/FEX-Emu/FEX/commit/ddd6dbfdcc4daf1a0b602ca01a87e9c23f28c929)
Optimize MOVSS to register (https://github.com/FEX-Emu/FEX/commit/7f2557e3221f7d517b84b2f495f29c3fc74ee4e1)
Optimize MOVSS to memory destination (https://github.com/FEX-Emu/FEX/commit/0121e858f280b0a270445e344b94a2a85f683146)
Optimize some shifts size masking (https://github.com/FEX-Emu/FEX/commit/98eda5e16373221aa5d36cc76958c08a2e35054e)
Fixes bug with pcmpestri (https://github.com/FEX-Emu/FEX/commit/592935790e1525684950a5f32b1b61ea5cb244b2)
Optimize GetPackedRFLAG (https://github.com/FEX-Emu/FEX/commit/8a4bfba47c94ea43c6e7313f512d4e8d09b2bab4)
Optimize MASKMOVDQU and MASKMOVQ (https://github.com/FEX-Emu/FEX/commit/69ea03f0eb4f01a47404d27625f3deffb600c992)
RCLSE
Rename Node to ValueNode (https://github.com/FEX-Emu/FEX/commit/d83960d4ddca7af4ecd8ff5f30bbeeb65c2f4c49)
SignalDelegator
Moves last TLS variable to the frontend (https://github.com/FEX-Emu/FEX/commit/b86abfbccf1573d906884974621c649f31fbc481)
StructPackVerifier
Fixes missing cursorkind again (https://github.com/FEX-Emu/FEX/commit/296adf183003d1dbe7b7ae937e3cf2839a2e717d)
Thunks
Set bitness flags for 64-bit guests (https://github.com/FEX-Emu/FEX/commit/96aa0a844e8ebba8405b61e78d65c31cc8e4a8ba)
Remove weak symbol definitions (https://github.com/FEX-Emu/FEX/commit/ff3b40400c3ab6c379600afad62826c26c321aed)
Fixes thunks in non-multiarch (https://github.com/FEX-Emu/FEX/commit/9def04c705b187580c6bf4786067edf8a9e778a7)
X11
Fixes variadic packing and callbacks. (https://github.com/FEX-Emu/FEX/commit/82295b29435f98e521ac8f4b9645c1b708686e04)
xcb
Fixes typo in version check. (https://github.com/FEX-Emu/FEX/commit/8c53a373bfbde1a5b314571cd9ecea03f689e93f)
ThunksDB
Adds X11 dependency to XCB (https://github.com/FEX-Emu/FEX/commit/597da880353b18b9a4e52d435dc347266d22a206)
X86Tables
Fixes CLZero destination address (https://github.com/FEX-Emu/FEX/commit/5d0b2060e28010c9f8c44f6b1e611b053722cb38)
Adds spaceship operator to couple op types (https://github.com/FEX-Emu/FEX/commit/20aaad15e496234673cdc9b3eac1804d3b363093)
Misc
Fix 16-bit popa insertion behaviour (https://github.com/FEX-Emu/FEX/commit/21eb6e03c724175ae169bd1532b86c59bcaead06)
Pack NZCV flags (https://github.com/FEX-Emu/FEX/commit/91bd3aa62a7bf4be94a2c032bb9b3d7a73fc7fd3)
Preserve PF across zero shift (https://github.com/FEX-Emu/FEX/commit/abf5e8c6a535662b1439454df39354556c670bcf)
Optimize xor %eax, %eax (https://github.com/FEX-Emu/FEX/commit/77c88ffe53bbdabc9b6a304f4d7d57c2e56b637e)
github-actions: Adds timeout to upload results (https://github.com/FEX-Emu/FEX/commit/6baee3b7c10f0d61072acd65b3db082de59e831c)
fix spelling errors (https://github.com/FEX-Emu/FEX/commit/22f95e627dbed91a0673865838171640f292b08a)
unittests
Adds missing header (https://github.com/FEX-Emu/FEX/commit/573f3396473f74f3d761b5de9ec56227c9f9c724)
gvisor
Adds all socket tests to flakes (https://github.com/FEX-Emu/FEX/commit/6f2452e2eab3eb440c8f6127ae91806aedb11515)

FEX - FEX-2307

Published by Sonicadvance1 over 1 year ago

Read the blog post at FEX-Emu's Site!

This release we had a bit of a slower month as some larger pieces were being worked on, but we still have some good stuff that is worth talking about.

Implement per-instruction RIP reconstruction

This was a fairly curious bug that FEX encountered. When trying to run the game Ultimate Chicken Horse then the game would crash very in its startup.
While investigating the game we determined that this was one of the first games we tested that uses Unity Engine's AOT scripting reflection(?) mechanism. This codepath seemingly heavily relies on either tagged pointers or some other
mechanism that causes a SIGSEGV when accessing it the first time. After that point the Unity AOT will catch the SIGSEGV and depending on the RIP of
the instruction, it will change behaviour. One of the problems with FEX is that on synchronous faults like SIGSEGV, we don't yet support full state
reconstruction. Since it seems like this only relies on RIP being correct, we can fairly easily wire this up and get Ultimate Chicken Horse running!

AVX work completed!

This last month FEX has done the last remaining work to implement AVX. With this month the remaining SSE4.2 instructions were finished,
and the prerequisite XSAVE and XRSTOR instructions were implemented. Although while the feature is effectively complete we aren't yet enabling the
CPUID bit yet. We are wanting to investigate a potential crash that has cropped up in Java games due to the extension first, and additionally we want
to finish up AVX2 work and enable them both in one step! Next month is looking to be the first version with AVX and AVX2 support in the source.

Fix 32-bit robust futex fetching

This issue has been a thorn in our sides for quite a while now. Usually this only ever manifested as an issue if the user was running Steam using
FEX's official PPA binaries in their setup. Once the user tried running Steam, then it would
crash with a really obscure message about "Fata error: futex robust_list not initialized by pthreads." This was something that would then never
reproduce if the code was rebuilt locally.

With a bit of poking around and using a local pbuilder version of FEX we were finally able to reproduce the error. Turns out FEX was writing a 64-bit
pointer back in to the result when the application tried querying the robust list pointer, overwriting part of the stack and corrupting its data.
This falls under one of the circumstances of "How did this ever work!?" but now with it resolved, theoretically Steam should finally work for our
users that are using the PPA build of FEX. Enjoy~!

Fix application hangs due to mutexes being locked on forks

This has been a very spicy bug that has been haunting FEX for years at this point. Whenever an application in modern day wants to execute a process it
will use a combination of fork and execve. Fork might end up being a vfork, or might end up being a clone syscall that does the same thing. Regardless
fork when executing in a threaded environment has some very strict requirements that it basically can only do an execve afterwards. vfork even adds an
additional restriction that it can't corrupt the stack at all because its sharing memory space.

The problem with this approach is that even if the application is only ever going to call execve after the fact, FEX needs to do a bunch of
bookkeeping or additional JIT emission and execution. This causes the problem that FEX's mutexes might end up being in an unknown state going in to a
fork, which will cause this new child process to hang indefinitely on the mutex.

To work around this issue, FEX will now globally lock all mutexes that matter, do the fork, and then immediately unlock the mutexes on the parent
side. On the child process FEX needs to be a bit mean to these mutexes, resetting them to zero to ensure no thread is holding the mutex. While this is
fairly heavy-handed this dramatically reduces how frequently FEX hangs when fork is used.

Specifically Steam tended to launch a bunch of background processes which would hang indefinitely, causing Proton games or downloads to never
continue. This should pretty much entirely be fixed!

Stop using faccessat2 to emulated faccessat

This was an oops on our part. faccessat2 was added in Linux kernel 5.8, so if your device was running an older kernel this syscall would /never/ work.
We didn't notice this since most of our devices are running a new enough kernel that faccessat2 just worked.

Thanks to the user that found this problem!

Handle xattr syscalls with overlayfs rootfs

Turns out that FEX had missed the various syscalls that access files to get xattr information. This was causing weird failures where some applications
would say that a file doesn't exist purely because it was in the rootfs overlay only.
Sadly Linux doesn't support *at variants of these syscalls so they aren't quite as fast as native execution, but that's fine.

Fix conflicting ARM64 register allocation

A couple months ago we added one more register to our register allocation for slightly more optimal register allocation. This broke a game called
Osmos under FEX. This is purely a bug but in resolving it, we likely fixed crashes in various
applications that we didn't notice before. Oops!

RootFS additions

This month we have a couple new rootfs images on our server that have been hotly requested! We now have an ArchLinux rootfs image and a Fedora 38
rootfs image. These haven't been as thoroughly tested as our Ubuntu images so if you find any problems with them, make sure to let us know on our
Discord

Raw Changes

Arm64
Fixes paranoidtso option for CPUs that support LRCPC/2 (https://github.com/FEX-Emu/FEX/commit/e5189d63a2cc82427efb277b8f7bab1e4d6b3d26)
Fixes GPR pair allocation to get one pair back (https://github.com/FEX-Emu/FEX/commit/16f70022226f9be26e8f5914d8e021413aebbd3e)
Fixes register pair conflict. (https://github.com/FEX-Emu/FEX/commit/7c4729678beea090b9176a4b31821568c8b4d65d)
Context
Removes dead AddVirtualMemoryMapping function (https://github.com/FEX-Emu/FEX/commit/c3e123df259a7caeb3dbf6785af3c43eb69aba44)
Emitter
Adds support for CSSC (https://github.com/FEX-Emu/FEX/commit/8047007a7abea220ea46d9884064943297ae2483)
External
Update jemalloc trees (https://github.com/FEX-Emu/FEX/commit/f8721992c2384305a660afbbeb3ffda4aa1a856a)
jemalloc
Updates external jemallocs (https://github.com/FEX-Emu/FEX/commit/1a4d5a1abb6522757d73073dd3ea091557235344)
Externals
Update fmt to 10.0.0 (https://github.com/FEX-Emu/FEX/commit/7ee6fc0d7fbbfc8e30c42b02f239ca923d4bcb47)
FEXServerClient
Ensure server socket is created with SOCK_CLOEXEC (https://github.com/FEX-Emu/FEX/commit/e86a792189c0b8ab2002770460b8c5a90a36fed3)
FHU
Workaround libstdc++ version 13+ bug (https://github.com/FEX-Emu/FEX/commit/8a4c5bcc65c4e4ba3c4dc4ceab6bf9d24fff871c)
IR
Move VPCMPESTRX REX handling to OpcodeDispatcher (https://github.com/FEX-Emu/FEX/commit/f39163b1e153005bc261bfcab86708c495452cd4)
Pad IROp_Header to be 32-bit in width (https://github.com/FEX-Emu/FEX/commit/fe06f1b15161d256d45df47a62b953a60827b86c)
JIT
Implement support for per-instruction RIP reconstruction (https://github.com/FEX-Emu/FEX/commit/9dcc1deec04bdbb6b768d53f2942c0a0e69c77f5)
Linux
Fixes hangs due to mutexes locked while fork happens. (https://github.com/FEX-Emu/FEX/commit/e72fa02897b94beca1ee69cf1f6f869a72cf9efb)
Handle xattr syscalls with emulated paths. (https://github.com/FEX-Emu/FEX/commit/5a53931b92e4faae88ab2efbed1293e8f49f16cb)
Stop using faccessat2 for faccessat emulation (https://github.com/FEX-Emu/FEX/commit/f444b03317d493008a86119526669a32c8402461)
Remove warning that isn't necessary anymore (https://github.com/FEX-Emu/FEX/commit/cac798574af44ed5db301978fe2a1c697312069f)
Optimize CalculateHostKernelVersion (https://github.com/FEX-Emu/FEX/commit/1506a192296f77ca172f2e03dde7d655cef02492)
OpcodeDispatcher
Ensure MXCSR is saved/restored with FXSAVE/FXRSTOR (https://github.com/FEX-Emu/FEX/commit/66d4206cd71f31636217feab9e35038fa64939b7)
Handle XSAVE/XRSTOR (https://github.com/FEX-Emu/FEX/commit/a082161d724214bf618d14b5dd5b53416b595d13)
Scripts
Disable using catchsegv if it doesn't exist (https://github.com/FEX-Emu/FEX/commit/d6c9b549dff9adfe865c8304e17de3e0a344d8ce)
VectorFallbacks
Fix PCMPSTR fallback ZF/SF flag setting (https://github.com/FEX-Emu/FEX/commit/9b5e1c44c8bfc7ec542a8b1b978640635c0c461e)
Misc
Some small fixes for android building (https://github.com/FEX-Emu/FEX/commit/d2032da452f79fe33015aae23e9c2211f04eb320)
Move config layers to the frontend (https://github.com/FEX-Emu/FEX/commit/2997257d6d8759afa2d9de8d5512b1f455eec0ed)
unittests
Add include search path for asm tests (https://github.com/FEX-Emu/FEX/commit/cb8bf1add6ddba15aa9519a64b0ee530ab26f299)
x32
Thread
Fixes robust futex fetching (https://github.com/FEX-Emu/FEX/commit/e652399fd30b2f66d89afcd8601a4270459eae5d)

FEX - FEX-2306

Published by Sonicadvance1 over 1 year ago

Read the blog post at FEX-Emu's Site!

Another interesting month of changes for FEX-Emu! While this release is shorter than last, this also only has a month of work rather than two. We had
some great work done this month, including a bunch of plumbing that most people won't notice. Let's see what changed!

Adds support for hardware TSO memory emulation prctl

Emulating the x86 memory model is the number one thing that slows down FEX emulation today. Apple Silicon supports this memory model in hardware which
is why Rosetta on MacOS can get amazing performance. With some recent changes from the
Asahi developers, FEX can ask the hardware to enable the TSO emulation bit. If the kernel reports back that hardware TSO memory is enabled, FEX can
take a more lax approach to its memory emulation, getting an automatic speedup for Asahi systems.

Additionally not only is this a speed-up, it's required for correct emulation. When this feature isn't supported by the hardware, FEX needs to emulate
the memory model using atomics and LRCPC instructions. This absolutely demolishes performance so it is usually recommended to disable the emulation to
get "free" performance. This issue with this is that it can crash instability in the most awkward and peculiar of ways. We even found out in this last
month that Unity games with their complex buffer management are highly likely to crash due to old cachelines of data hanging around. The only fix is
to use emulate the memory model using hardware TSO flags or our atomic/LRCPC path. Sadly's ARM Cortex's hardware LRCPC implementation is barely any
faster than atomics.

Steamwebhelper crash fix

New beta versions of Steam has started relying on AT_EXECFN existing. FEX didn't previously emulate this auxv value which was causing it to crash. With this fixed, steamwebhelper is now working again.

More AVX work!

This has been a long time coming and we're almost there finally. After these changes FEX only needs to fix a few implementation bugs in the string
operations, and implement the XSAVE instructions before allowing AVX emulation. In addition to that, FEX is also almost able to supprot AVX2 with the
only instructions that need to be implemented is the gather load instructions!

Implements support for XGETBV

This is a fairly simple instruction as it lets the application query which CPU features are enabled. Necessary for an application to check before
enabling any AVX usage.

Handle PCMPESTRM, PCMPISTRM and AVX variants

These are the remaining string instructions that FEX has implemented. While mostly implemented there are still a couple of edge case behaviours that
aren't quite correct and just need to be fixed.

Implement support for deferring asynchronous signals

This change has been a long time coming to make FEX's JIT faster in the face of handling asynchronous signals. This issue is that FEX needs to enter
code regions that are effectively "uninterruptible" until it is complete. This is basically a reentrancy problem where a piece of code executing could
lock a mutex, or non-atomically updating a container's data, then when a signal occurs it will jump out of the code and potentially come back to this
corrupted state.

As an initial workaround to this problem, FEX would just disable all signals in each of these "signal-deferring regions." This had the overhead that
every region would have a system call going in to it and then another one coming out of it. If how frequently these regions happened was little then
it would be a non-issue; but as is commonly the case, FEX's JIT needs to be wrapped in this signal blocks. If a game is executing a bunch of code,
this means we can be doing thousands of additional system calls per second which adds up as direct overhead.

With this new change, FEX marks that it is in a signal-deferring region with some very cheap memory accesses and if a signal doesn't occur the
overhead is negligible. In the case that a signal does occur, it will get stored to a queue, FEX will finish its signal-deferring region, and then
come back to handle the signal.

This mostly works because asynchronous signals don't have guarantees about the timeliness of the signal being delivered. Sadly this can result in
signal queue depths being subtly incorrect but we are monitoring the situation to know if any game is affected. All in all this finally makes it so
FEX can be straced without being overwhelmed and improves stutter problems!

Grand Theft Auto 5 AVX fix

FEX was accidentally reporting support for BMI1 and BMI2 CPU instructions. These extensions have a requirement that AVX must be implemented for these.
This was causing Grand Theft Auto 5 to crash early trying to use AVX. We will now only report these extensions if emulated AVX is supported, which fixes this game.

Make vfork wait for the process to exit

FEX's previous implementation of vfork actually behaved like fork. The difference between these two syscalls is fairly subtle. In the case of vfork, the parent process will end up sleeping until the child either exits or executes execve.
We were instead treating this like a fork, where the parent continues immediately without waiting. While no known issues were encountered, it is good
to ensure this behaviour is correct for future work.

Getdents optimization

This classic syscall is used for querying directory contents, FEX needs to emulate this syscall since 64-bit applications now use a new syscall called
getdents64. FEX's original implementation was fairly slow due to a misunderstanding as to how this syscall worked. It would create a temporary working
buffer and copy data around a couple of times. With the new implementation it is able to use the buffer provided by the application and doing some
minor fixups to make the overhead fairly light now. This improves performance when an application is doing heavy folder scanning, which mostly means
it improves Proton startup time.

Minor optimizations

There were a handful of minor optimizations that improve performance so minorly that it falls within noise, but is nice to have.

Optimize ARM64 thunk trampolines

This is a very small optimization that changes an indirect load in to a PC relative load, removing a single data dependency.

Minor x87 FCMOV optimization

FEX was duplicating a mask from a GPR in to a vector register using two instruction and now it only uses one instruction.

Optimize ADC/ADD OF flag calculation

This was a small mistake where a bitwise negate was using two instructions instead of one.

Optimize EFLAG unpacking

Each time FEX was unpacking the EFLAG register it was using four instructions per bit of the flag. This has now been improved to only two. Cutting
flag unpacking to 82% of its original size in some edge cases.

Supported emulated Linux kernel version up to 6.2

FEX used to max out the reported kernel version up to 5.18. Now we can report up to 6.2 with this change. 6.3 is going to be harder since it
introduces a new prctl that FEX needs to work around.

Video game showcase

As said previously about Unity engine games having issues without TSO emulation. Here is a clip of Hollow Knight running under FEX full speed on a
Lenovo X13s. Even with the overhead of emulating the x86-TSO memory model, this game runs remarkably well. With x86-TSO emulation disabled this game
would have crashed a few seconds in.

Raw Changes

AOTIR
Stop passing a mutex around. It's already guarded (https://github.com/FEX-Emu/FEX/commit/f47caf48c63e55eff7c7a43d36051effca4d2894)
ARM64
Fixes SRA disabled codepath (https://github.com/FEX-Emu/FEX/commit/dc65a5ef8cbde03352b37cf0527467c83b942cab)
CPUID
Only enable BMI1 and BMI2 if AVX is supported (https://github.com/FEX-Emu/FEX/commit/cc7a56b1a6f7b37f40123066c01f622ce6bdaa5d)
Context
Remove debug namespace (https://github.com/FEX-Emu/FEX/commit/5be798e9e6e1a1db4d578140ecf5ea33baf36026)
ELFCodeLoader
Fixes missing AT_EXECFN (https://github.com/FEX-Emu/FEX/commit/00dc373bb9d9370b82972707416c4c49f7b92f3b)
FEXConfig
Removes Emulated CPU cores option (https://github.com/FEX-Emu/FEX/commit/1a9b6a89f445dd27b9abf623df59386efd78b28d)
FEXCore
Implements support for xgetbv (https://github.com/FEX-Emu/FEX/commit/737f917838384eaedd96c836760d1041c4d0672a)
Support Wine syscalls (https://github.com/FEX-Emu/FEX/commit/77e8be12153ac8bc3074d92e93c40f0375d17d5b)
Convert Core and Telemetry over to fextl::file::File (https://github.com/FEX-Emu/FEX/commit/5674d3a871e42a40768be42bef5f6c45da6a3a5b)
Adds support for hardware x86-TSO prctl (https://github.com/FEX-Emu/FEX/commit/ed69eb9f6f7806cda2be24ba737db2e578a770b7)
FEXLoader
Allow simulated kernel version up to 6.2 (https://github.com/FEX-Emu/FEX/commit/ada226bbb4e9805b67b5d15f771524d670551e70)
FEXRootFSFetcher
Support rolling release distros (https://github.com/FEX-Emu/FEX/commit/9473025b1856398a10bc36c3b468feefc96c96fe)
IRDumper
Fixes ssa number in arguments. (https://github.com/FEX-Emu/FEX/commit/0f4a5edf4fc72c47743ac3ae8c8b96ffa3ca8c54)
InstallFEX
Updates helper install script for Ubuntu 23.04 (https://github.com/FEX-Emu/FEX/commit/02f15f40998f0e10f77e2142744dfdb6aac44ec1)
Linux
Make vfork act more similar to how it should. (https://github.com/FEX-Emu/FEX/commit/95b7592241b3c8c9789b465e037cf0f014a8cbe2)
OpcodeDispatcher
Optimize ADC/ADD OF flag calculation (https://github.com/FEX-Emu/FEX/commit/5b5808218b506ea19a06d6a83c4e47d36b49cefc)
Optimize EFLAG unpacking (https://github.com/FEX-Emu/FEX/commit/69181d438cec324fdb4c866e79f58d3e779dd215)
Handle PCMPESTRM/VPCMPESTRM (https://github.com/FEX-Emu/FEX/commit/182010ca9776b5f8a05711a854485b310c96417a)
Handle PCMPISTRM/VPCMPISTRM (https://github.com/FEX-Emu/FEX/commit/e9244680aa1116441a2d7316053e90daa2db2e19)
Removes a warning that cropped up. (https://github.com/FEX-Emu/FEX/commit/e03b859c200533db9752d5fca439bbd9f4344446)
Syscalls
Optimize getdents{64,} (https://github.com/FEX-Emu/FEX/commit/de0f3984e9262ee222018de1120a1298b27295c6)
Telemetry
Save on signal terminate (https://github.com/FEX-Emu/FEX/commit/c9d1f0d75a2483edb261f1d4edf80a6d3645eb4d)
Thunks
Mostly reverts https://github.com/FEX-Emu/FEX/pull/2672 (https://github.com/FEX-Emu/FEX/commit/6017a9135a7e2b9c54d72815d70411dea1156c7e)
Optimize ARM64 trampoline (https://github.com/FEX-Emu/FEX/commit/ce4e991a6e6a3e9c0e0cdc7d0f85272da76cd137)
Tools
Removes visual debugger (https://github.com/FEX-Emu/FEX/commit/09997cff9c0d78a6c3587dc4d39c72dc8bbe98d0)
X87
Super minor FCMOV optimization (https://github.com/FEX-Emu/FEX/commit/4e01452a6504ce839859043ec677c96b666210b7)
Misc
Implement support for deferred asynchronous signals (https://github.com/FEX-Emu/FEX/commit/8bc33e95c1452a2499c95ba7c1b56c9d6bb4b14e)
Wine TestHarnessRunner support (https://github.com/FEX-Emu/FEX/commit/0ad6f98a8c872cdd127a9896b0dc0939b11da4bd)
github
Updates some actions to v3 (https://github.com/FEX-Emu/FEX/commit/52f64a0c7be32a443b0b24deb0e0e5e9baa05ede)
unittests
Adds step to remove stale SHM regions. (https://github.com/FEX-Emu/FEX/commit/0a4bf10ba52dc843020c0710c662e74eae1baa8f)

FEX - FEX-2305

Published by Sonicadvance1 over 1 year ago

Read the blog post at FEX-Emu's Site!

Welcome back to another release of FEX-Emu! We had cancelled last month's release due to a large amount of code churn happening. In order to ensure
the highest quality of stability we were forced to do so. Now we're back with an even lengthier release this month, so buckle up because there were a
large number of changes that happened.

More AVX Work!

These last two months have been a while ride towards implementing AVX. @Lioncache has been burning down a ton of
instructions to get everything in place for AVX emulation.

New instructions implemented

PCMPISTRI/VPCMPISTRI
VPMASKMOVD/VPMASKMOVQ
VCVTPD2PS/VCVTPS2PD
VCVTSD2SS/VCVTSS2SD
PCMPESTRI/VPCMPESTRI
VMPSADBW
VPSLLVD/VPSLLVQ
VPSRLVD/VPSRLVQ
VCVTSI2SD/VCVTSI2SS
VPINSRB/VPINSRD/VPINSRQ/VPINSRW
VPSADBW
VTESTPD/VTESTPS
VPMADDUBSW
VPMOVMSKB
VMASKMOVPD/VMASKMOVPS

That's a whole bunch of instructions implemented! We have now nearly implemented all the instructions required for AVX.
The two major instructions before AVX can be exposed is the SSE4.2 instructions VPCMPISTRI and VPCMPESTRM. This is because these two
instructions also have AVX versions so it is a required feature in order to support AVX.

We are getting really close and once this feature is done, we can quickly move on to finishing support for AVX2, F16C, and the fused
multiply-accumulate extensions. At that point our CPU emulation will be effectively "feature-complete" for everything that games will care about in
the short-term. Exciting times!

llvm-mingw and WINE support

This is a very big change that has been coming down the pipe for a while now. We have been mostly working behind the scenes to get FEX-Emu wired up so
that it can be compiled as a Windows shared library. This last month is where this work has finally come to a head and most of the work is in place
for this.

How this works is that FEX-Emu has a shared-library and static-library that gets compiled called FEXCore. This is where all the CPU emulation
happens and tries to be mostly OS agnostic, while everything that is Linux specific lives in the frontend called FEXInterpreter. Is is FEXCore now
that can be compiled as a Windows AArch64 PE library. While this isn't currently useful to end users today. This means that WINE can link to this
library for emulating x86/x86-64 on AArch64 platforms. It should be noted that there are still some Linux assumptions strewn about the code, so this
isn't a generic solution for emulation on a true Windows platform. We're writing this support specifically for WINE today.

Converting away from C++ containers that allocate memory

This is the significant change that caused us to cancel last month's release. While @Neobrain was writing code to
support 32-bit library thunking, they had discovered a very big problem. FEX-Emu has long overridden the glibc memory allocation routines in order for
us to ensure that FEX can allocate memory when emulating 32-bit applications. We discovered that this overriding also extends to system libraries that
we load in after the fact. This meant that any time libGL would allocate memory, it would end up being a 64-bit pointer and there was nothing we could
do about it.

The workaround for this problem is to stop overriding the system allocators, which will allow shared libraries to allocate memory that can safely be
used by the 32-bit guest. But this also has the problem that FEX would then run out of memory when executing 32-bit applications. This is due to a
quirk that FEX-Emu needs to allocate all the memory on the system before executing 32-bit applications.

The new workaround is to replace usage of every C++ container that allocates memory with FEX's own container that will use its own allocator. This was
an exceedingly invasive change that touches almost everything in our codebase. With the pain done, FEX now can use its own internal allocators while
system libraries will use the regular glibc allocator as expected. See more about the limitations of this with our
documentation.

Re-enable glibc allocator hooking again

Okay, the previous paragraph was a ruse; FEX-Emu needed to actually override the glibc allocator again. In this case FEX-Emu will actually have three
allocators active at any given moment.

FEX-Emu uses jemalloc for its internal allocator.
The system allocator is overridden with another jemalloc allocator.
The guest application's glibc allocator is untouched.

The problems start occuring when a pointer is shared between thunks and the guest application. If one allocator tries to free a pointer from a
different allocator then fireworks occur. The way around this is to use a jemalloc function to determine if it owns the pointer and choose which
allocator to end up freeing the pointer from. This is particularly painful with X11 thunking because pointers are passed between client and server in
a very laissez faire fashion. This may not stay around in the future but it is a necessary evil for now.

JIT Optimizations and improvements

Reclaim static assigned registers on 32-bit

This allows us to use 8 more general purpose registers and 8 more floating point registers with 32-bit applications. Depending on the game this can
improve performance by a decent margin. We have seen upwards of 20% performance uplift in various games due to it.

Fix Visual C++ redistributable crashing

This was a really annoying bug, where every.single.time. that Proton would run, it would try to install the C++ runtime at least four times. The user
would be required to kill the processes after they were installed. This was fairly egregious because we had thought it was fixed months ago and didn't
realize that it wasn't actually fixed. Depending on the version of the Visual C++ redistributable and Proton it would still occur.

Root causing this issue turns out that the redistributable uses Windows' structured exception handling to catch the case when it passes a null pointer
to strlen which results in a SIGSEGV on the Linux side. FEX was incorrectly saving and restoring state when this occured, which caused it to
infinitely loop and crash. Now that this is fixed, these install correctly and Proton doesn't try doing it on every single run.

Implement REP MOVS as a memcpy

This instruction behaves like a fairly fast memory copy on the CPU. We now convert this over to an internal memory copy operation.
Similar to last month where we converted an instruction to a memset, this instruction being implemented as an IR operation has many times over
performance improvements. In real games this usually translates to a few percentage FPS improvement which is a nice uplift.

Fix restoring of AVX state

While not actually being utilized today (Except due to a bug), @AndreRH found out that we were accidentally failing to
restore AVX register state when a signal handler returned. It's surprising that this wasn't noticed earlier but it could have resulted in some really
bad floating point state.

Remove double syscall overhead on filesystem accesses

When FEX was checking to see if a file exists in the overlayfs style rootfs image we provide, we need to check if the file exists there first. If the
file exists we will redirect the file to be opened from the rootfs instead of the host filesystem. We had an issue that if the file didn't exist, we
would then check for it again on accident before accessing the host file. This would mean that one syscall turned in to three. With this fix in place
we are now only converting it in to two.

If you're running a rootfs image off of a particularly slow drive (or a network share) then this can shave a decent amount of time off of load times.
This was particularly noticeable when running a Proton game under Steam because they will access a ton of files before starting up.

Adds default DRM ioctl interface

This is a fairly basic change. Instead of breaking when hitting an unknown ioctl, pass it to the kernel and hope for the best. This is mostly so Asahi
and other drivers can test things under FEX without pushing patches to us for downstream support.

Add support for thunking Wayland

This doesn't affect most users today but adding support for thunking wayland means in the future applications that use this can sanely use this thunk.
SDL applications today might be able to take advantage of it but it is fairly fresh. We're looking forward to the inevitable Wayland and WINE
utilization to let things move away from X11.

Fixed 32-bit clock_nanosleep

There was a fairly nasty implementation detail where a 32-bit application trying to sleep with this syscall would actually consume a CPU core to 100%.
While fairly uncommon, this allows the game Alwa's Awakening to not burn a CPU core while running.

Add a bunch of functions to FEX's ARMEmitter

Not really a user facing feature but our code emitter has gained a bunch of new instruction support. This will be used in the future for our AVX2
implementation and various things. So it's good to have.

Raw Changes

ARM64Dispatcher
Fix compiling with mingw (https://github.com/FEX-Emu/FEX/commit/a33443db62c0016822a065e5e72ecdba49d74d06)
ARM64Emitter
Ensure platform register is saved on win32 (https://github.com/FEX-Emu/FEX/commit/73ede9d0009228e0c722d373f2858652db572d4d)
ARMEmitter
Handle SVE2 integer add/subtract wide category (https://github.com/FEX-Emu/FEX/commit/c9fb9c4cae5cdda3c56e005d4c54f16ca994e53e)
Handle SVE Integer Multiply-Add - Unpredicated group (https://github.com/FEX-Emu/FEX/commit/7747ac8de8d4b3aa0aef55b8184281988e601127)
Fix treating 32-bit elements as 64-bit with ld1w (https://github.com/FEX-Emu/FEX/commit/100b4d4a5b6b7d6af964f2d348f67b36f8882acb)
Finish off SVE2 Integer - Predicated group (https://github.com/FEX-Emu/FEX/commit/719803bc5a66cd218a905c0f10ad44041b80adb3)
Handle SVE Inc/Dec by Predicate Count category (https://github.com/FEX-Emu/FEX/commit/22936144ceb10c933451467f09caaec78242b651)
Handle SVE Element Count category (https://github.com/FEX-Emu/FEX/commit/7f0cccfbe0a7203dec2010fcb8e3ca17e7d3b21d)
Handle SVE predicate count group (https://github.com/FEX-Emu/FEX/commit/aea90287eb5a07088b022efbfc23681ec41c3a9e)
Handle floating-point multiply add (indexed) groups (https://github.com/FEX-Emu/FEX/commit/73446806724755ff393377be220217124ced4554)
Handle SVE Floating Point Multiply-Add group (https://github.com/FEX-Emu/FEX/commit/3275dabd85ce31848412e8ef4df0b980fc7caff1)
Handle SVE floating-point compare with zero group (https://github.com/FEX-Emu/FEX/commit/11d396ca060d4eef536ef59f35a834aaa9dc0668)
Handle SVE Floating-point Serial Reduction (Predicated) group (https://github.com/FEX-Emu/FEX/commit/cd40a85567eab50623d28e9fe477c9e343fcf249)
Handle SVE Floating Point Unary Operations - Unpredicated group (https://github.com/FEX-Emu/FEX/commit/9b70c1d4acc12e9fb67ad7688162501519ab963b)
Handle LDR (vector) (https://github.com/FEX-Emu/FEX/commit/5e8d9601edd7ff5a804ff4a8951310507ab75d1e)
Move op and assertions into SVEFloatArithmeticPredicated helper (https://github.com/FEX-Emu/FEX/commit/1c3af30e96cbbea889ab6ea4422a0360bcf5fd9e)
Handle SVE Write FFR group (https://github.com/FEX-Emu/FEX/commit/3b74e34b47cb4e008c23bbb193f4143f1571f7d4)
Move SVEBitwiseShiftbyVector into private section (https://github.com/FEX-Emu/FEX/commit/054430f0044306d528193e50ea16b146c909b942)
Remove unused helper functions (https://github.com/FEX-Emu/FEX/commit/94e0591601bfd318c54b3aeb57121e18d9ebf8e6)
Fully handle SVE Integer Misc - Unpredicated (https://github.com/FEX-Emu/FEX/commit/3bd4b23af79dd38f3fadd68ca68124f94908a64f)
Simplify uses of IsXOrWRegister (https://github.com/FEX-Emu/FEX/commit/060745f98b102541c4c2d3fab878fab1bf5e2bd6)
ASIMDOps
Remove unnecessary template constraints (https://github.com/FEX-Emu/FEX/commit/2321d2ad9730428b289c87a4404cc2801379dea2)
ScalarOps
Move base opcodes into helper functions (https://github.com/FEX-Emu/FEX/commit/4b8ac0f8c6237f49a57e643904e74b48b07bd9c2)
Allocator
Adds VirtualAlloc with memory Base hint function (https://github.com/FEX-Emu/FEX/commit/ba45bf4ae72cc89ed2a31a3d262879c07f49c577)
Ensure uses of aligned allocations use aligned_free (https://github.com/FEX-Emu/FEX/commit/c140dd7da82c5707f49a99e68ad3a9b245e58906)
Remove pointer indirection overhead (https://github.com/FEX-Emu/FEX/commit/18154183ad665250cfa3b0312ed4d18150debcd3)
Disable glibc sbrk allocations (https://github.com/FEX-Emu/FEX/commit/e4fadd6992b403c4470ef2fd9a88418b9ac82e93)
AllocatorHooks
Adds some mingw allocator helpers (https://github.com/FEX-Emu/FEX/commit/86e09a00f0e9930967d206d703102d748facebf0)
Arm64
Reclaim SRA registers on 32-bit (https://github.com/FEX-Emu/FEX/commit/3a81efdb28817d9ce75e2c22fe6ac0cbb39ea7ff)
VectorOps
Remove a few unnecessary EORs from comparisons in SVE path (https://github.com/FEX-Emu/FEX/commit/55d65f3aead34647054980200b4d0d78598c058f)
Handle 64-bit elements in VSShr (https://github.com/FEX-Emu/FEX/commit/72df6a26d56b247a4b46f4dce5af836b20c78615)
Remove unnecessary EOR in VAddP/VFAddP SVE path (https://github.com/FEX-Emu/FEX/commit/1f512b098ed123e3aaf015d5c53702a9c761fbae)
Arm64Emitter
Replace x18 usage with x30 (https://github.com/FEX-Emu/FEX/commit/121d9fda2d7a7a08b8973d820ce7e87f5ad9aeb0)
CI
Adds glibc faulting testing (https://github.com/FEX-Emu/FEX/commit/27c03f98d84e9d10ce426e874c89ce128e731dd1)
CMake
Get past configuration when mingw is used (https://github.com/FEX-Emu/FEX/commit/aba42570a19f3bf6edbe251039c1819f95c3352b)
CPUID
Disable RDTSCP under wine (https://github.com/FEX-Emu/FEX/commit/f7d827a26a79ed5649a8007e372367e866b2243f)
Fix std::min type cast (https://github.com/FEX-Emu/FEX/commit/fbc5d583a2b6a926474a8ef90b6cd207bb0cf281)
Enable FAST REP MOVS (https://github.com/FEX-Emu/FEX/commit/6e0a1a5f14abff0b3a782f9b9c5eadc9c525cbf4)
CPUInfo
Add mingw helper for CalculateNumberOfCPUs (https://github.com/FEX-Emu/FEX/commit/1fad26d72f4771c8f14a8c91ae1ecd7221e17e34)
Switch away from using get_nprocs_conf (https://github.com/FEX-Emu/FEX/commit/7f243ee08a44b75373493ebbb1cad2a7523d7c81)
Config
Move path generation to the frontend (https://github.com/FEX-Emu/FEX/commit/da126141d320a656b52de300d42e094863c6e366)
Core
Add a new log message for unsupported instruction (https://github.com/FEX-Emu/FEX/commit/88dba60bee23be1a82267fb87702f01ba62f1c30)
CoreState
Fix SynchronousFaultData padding type (https://github.com/FEX-Emu/FEX/commit/68599bf1247d23d79c326e4b5a8ce03e9810ad9e)
Dispatcher
Disable signal handling under mingw (https://github.com/FEX-Emu/FEX/commit/0fa4390e47a70638b16c08534cc6a2733e0d4999)
Fixes restoring of AVX state (https://github.com/FEX-Emu/FEX/commit/28c168ea0ce5ac40abea2217d248c92eb04e2d8b)
Docs
Adds programming concerns documentation (https://github.com/FEX-Emu/FEX/commit/bfd606ec3dc3d3b9f537ce1d4f34d89a51636590)
FEXConfig
Fixes misalignment when advanced option is changed. (https://github.com/FEX-Emu/FEX/commit/4f20dba5058be450a07fd0f7115cfeb1b588cd05)
FEXCore
Moves SIGBUS handler to FEXCore/Utils (https://github.com/FEX-Emu/FEX/commit/86af1f6a68bf981659965078ad7183bb47118995)
Compile without exceptions (https://github.com/FEX-Emu/FEX/commit/37b5bc49c6fc2ae914f11f0cb7e3169835420c50)
Stop exposing the x86 table data symbols (https://github.com/FEX-Emu/FEX/commit/590422b2952d52dd6cd706ac86780c9f359bf6d4)
Fixup cmake file for mingw (https://github.com/FEX-Emu/FEX/commit/b5420f5db311daee1d29c5fe71bb0b0bd814f400)
Switch to xbyak for CPUID fetch helpers. (https://github.com/FEX-Emu/FEX/commit/7a774a8d801ca2f6e276bed63487eca0ea5e73d0)
Stop leaking AVX configuration state (https://github.com/FEX-Emu/FEX/commit/5c62ea21f460a423947cc860212b69205594e70b)
IR_INC dependency on FEXCore_Base (https://github.com/FEX-Emu/FEX/commit/1771d086ed39533d30c1e1fa9390d34187bbedfb)
Adds fexctl container alias objects (https://github.com/FEX-Emu/FEX/commit/a8ed2af658b519cc4dc6e208b70e5f5b471f70f8)
FEXLoader
Don't build with mingw (https://github.com/FEX-Emu/FEX/commit/892c07a5ed2f6295698e5436eabaaac634014157)
Move to Tools folder (https://github.com/FEX-Emu/FEX/commit/307158d42528d8ebf3eabfe618dfc2c3b5d6c92a)
Move ELFCodeLoader2 to remove 2 (https://github.com/FEX-Emu/FEX/commit/eadce2854fca59627ad777881e39dd7756f27c3b)
FEXServerClient
Insert missing padding in message packet (https://github.com/FEX-Emu/FEX/commit/73caa6725fffa9c49f989d6e3fb6e0061b8b5b4d)
FHU
FS
Create WIN32 helpers for some functions. (https://github.com/FEX-Emu/FEX/commit/af15277fc41b87ce242b43622c5e4d67deb41cd8)
FM
Removes double syscall issue with GetEmulatedFDPath (https://github.com/FEX-Emu/FEX/commit/b2a3c6a04364ad965a75bde6e9ca0347af13d692)
Remove unused FD to Name mapping (https://github.com/FEX-Emu/FEX/commit/1aeab04c9f8c8fc467b003496a51d2ca2deda539)
FileLoading
Add WIN32 specific loading path (https://github.com/FEX-Emu/FEX/commit/c94268789b8af00b74efca577ea4ce947b02e29f)
Frontend
Remove errant header (https://github.com/FEX-Emu/FEX/commit/4bffdc634546edc0f099061fe70d729e7540fb10)
GdbServer
Disable under mingw (https://github.com/FEX-Emu/FEX/commit/f673afc38ffe7902612b60d066d4a23bd4595d4f)
HostFeatures
Mark DCZID related utilities as [[maybe_unused]] (https://github.com/FEX-Emu/FEX/commit/5e105e86af89ae8bc99c1dc2db127381190a00eb)
IR
Passes
Changes vector to fextl (https://github.com/FEX-Emu/FEX/commit/2d05ffe3aa9e220b408641c7f371d54aac690040)
Interpreter
Separate fallback OpHandler from F80 fallbacks (https://github.com/FEX-Emu/FEX/commit/6e52a16ef385df3cd60f4ad09d206f8a6709be40)
IntrusiveIRList
Ensure this is using the FEX allocator (https://github.com/FEX-Emu/FEX/commit/82f7bb282f296934f5fbad236f996e2e5eea1c02)
Ioctl
Add default handler for drm (https://github.com/FEX-Emu/FEX/commit/db706bb28fb15ea9c6867b64c1d68e0e619c5e60)
Ensure DRM name check uses strncmp (https://github.com/FEX-Emu/FEX/commit/c1dab3e6cc9309c30d621bcb0789401e1300fbbd)
LogManager
Remove unused handler (https://github.com/FEX-Emu/FEX/commit/1e4a6d432c8b0a07ef85107b6bd0f37479401074)
LookupCache
Removes unnecessary recursive lock_guard (https://github.com/FEX-Emu/FEX/commit/6dfea8a80fdf7e0e48ef995f8f97c23211cf3dcd)
Netstream
Disable on WIN32 (https://github.com/FEX-Emu/FEX/commit/2ff509610331a9075c0f3baa748e17af530fbb88)
ObjectCache
Ensure correctly packed config option (https://github.com/FEX-Emu/FEX/commit/059472fcef39c4e363f5cf9f475a167b658a4fc3)
OpDispatcher
Implements REP MOVS as Memcpy IR op (https://github.com/FEX-Emu/FEX/commit/11c6f97643232c5a147b48374666f943376d44f1)
OpcodeDispatcher
Simplify PCMPXSTRIOpImpl (https://github.com/FEX-Emu/FEX/commit/30974fb2c95cd31d7f5fa4522a406bab40dd6633)
Handle PCMPISTRI/VPCMPISTRI (https://github.com/FEX-Emu/FEX/commit/88247141d7155d05d7cc1dede6e055f30fb078a4)
Handle VPMASKMOVD/VPMASKMOVQ (https://github.com/FEX-Emu/FEX/commit/238ffdf8938b55211929a47f7fb5427a49344601)
Handle VCVTPD2PS/VCVTPS2PD (https://github.com/FEX-Emu/FEX/commit/35f192b6fd6da78eb3cbfed1b749b84eb0821622)
Handle VCVTSD2SS/VCVTSS2SD (https://github.com/FEX-Emu/FEX/commit/512d6d0069b0b2db8ece9e9b660d51d25020f8ac)
Handle PCMPESTRI/VPCMPESTRI (https://github.com/FEX-Emu/FEX/commit/a12802e74cb19c286dbf7be65a830359ba944554)
Move usages of And(Not( to Andn (https://github.com/FEX-Emu/FEX/commit/f55a653e9937101e054e3165f7b209874fed89d2)
Implement support for 32-bit SALC instruction (https://github.com/FEX-Emu/FEX/commit/df354e37ddaeef2b828bb0491850780cf7225295)
Handle store variants of VMASKMOVPD/VMASKMOVPS (https://github.com/FEX-Emu/FEX/commit/cf66643c60da8b94916fa6a95ed5a3c7f9e751ae)
Handle load variants of VMASKMOVPD/VMASKMOVPS (https://github.com/FEX-Emu/FEX/commit/5cdde0bef16ddbc6858f2f809796d3acb23117ac)
Handle VMPSADBW (https://github.com/FEX-Emu/FEX/commit/2c2abc550b473107692c62060739ea5a4869c73a)
Handle VPSLLVD/VPSLLVQ (https://github.com/FEX-Emu/FEX/commit/54e784742cde7a393f9dd513c4e5e71e9f2caa87)
Handle VPSRLVD/VPSRLVQ (https://github.com/FEX-Emu/FEX/commit/91f056bd2dd0a5fc5f78ae17135b1077fb542559)
Handle VCVTSI2SD/VCVTSI2SS (https://github.com/FEX-Emu/FEX/commit/41477db7aa6d0a4834c94b7e12b11d37917d5589)
Handle VPINSRB/VPINSRD/VPINSRQ/VPINSRW (https://github.com/FEX-Emu/FEX/commit/8633528cee20b4e9d596d3972abb7345e7339fca)
Handle VPSADBW (https://github.com/FEX-Emu/FEX/commit/1087c45b6c824cf2b6ab3262d59b76dc221c6ed7)
Handle VTESTPD/VTESTPS (https://github.com/FEX-Emu/FEX/commit/6517f7eb30d7f2b5500f57ba3305b396d7b98edf)
Handle VPMADDUBSW (https://github.com/FEX-Emu/FEX/commit/c14f4355b71c39175d2fc0c9d0b2843dc351d687)
Handle VPMOVMSKB (https://github.com/FEX-Emu/FEX/commit/6645e68c9554a772d9f8b171d6196168f68774eb)
Pass full register size through VExtractToGPR (https://github.com/FEX-Emu/FEX/commit/e92cfc4d7a77c3962b830ecfe6bd95a2d8fb9d45)
RA
Use FindFirstSetBit helper (https://github.com/FEX-Emu/FEX/commit/cfc1aa593bc847438fa6ff651d52fce68b6c5e60)
SignalDelegator
Make sure to save and restore InSyscallInfo (https://github.com/FEX-Emu/FEX/commit/456e9dbdea9a38db3590e9b10c33e94407f8a927)
Calculate siginfo layout (https://github.com/FEX-Emu/FEX/commit/886c562882d5351a18383114d9dee8c7f8e3255c)
Moves all signal handling to the frontend (https://github.com/FEX-Emu/FEX/commit/9432a84cb4fbe6d9328e1162492ca65bcdb26bc4)
Cleanup unused functions (https://github.com/FEX-Emu/FEX/commit/ad332e3c30736077aeb2a26f5c7eb5add2700b2f)
StringConv
Convert to conversion functions that don't use std::string (https://github.com/FEX-Emu/FEX/commit/c5da0e7ac1f23436e4a054955ad1979f34f9f5b8)
StringUtils
Stop allocating TrimTokens (https://github.com/FEX-Emu/FEX/commit/f45ea1e0f6453f4c885d7122ca69a24476e6651a)
Telemetry
Disable on WIN32 (https://github.com/FEX-Emu/FEX/commit/cbe55b0765da60a52ce00d70a50aa71a57f189e5)
Threads
Moves pthread logic to FEXLoader (https://github.com/FEX-Emu/FEX/commit/874ae5b0fc3733d433a76ecb7e65e1087e01eefe)
Adds SetThreadName helper (https://github.com/FEX-Emu/FEX/commit/797737a84d7e91a3f1cbf5c3a06f0dd6ac2cd896)
Disable glibc allocator fault testing with exit (https://github.com/FEX-Emu/FEX/commit/6d4cef723a1c6ce35523c064093db6bf2463746f)
Thunks
Fixes xcb helper thread creation (https://github.com/FEX-Emu/FEX/commit/542aeed7b9227da6db60f1ccb4d43c011b785458)
Disable under mingw (https://github.com/FEX-Emu/FEX/commit/4a11111abde89419426bcbcb87ed68555e6b6932)
Enable ccache if available (https://github.com/FEX-Emu/FEX/commit/c03ed529e60ef63b6510fa7e65bbdcd6b32ec952)
Make xcb's callback more robust. (https://github.com/FEX-Emu/FEX/commit/e8bf7a1a4650d2812ee76f8a1341943abe8bc327)
VEXTables
Adds a missing class of AVX instructions (https://github.com/FEX-Emu/FEX/commit/a351620c6021826dc7e6758df9e8302002c2eeb6)
Misc
Win32 memory allocation fixes (https://github.com/FEX-Emu/FEX/commit/468f7471e1a47f77e13a9fd5b23780fa7e92a1dc)
Get mingw compiling libFEXCore (https://github.com/FEX-Emu/FEX/commit/7552ad29fa4dc8e8faa54037407467eb635cd4ff)
Disable AOT and object cache under mingw (https://github.com/FEX-Emu/FEX/commit/361e684c6430395fa98b4656a08da4bb536dc7bc)
Disable Break/INT operations on mingw (https://github.com/FEX-Emu/FEX/commit/4c74913edf03e898d0252694bd7f22e613c47c17)
llvm-mingw: Fix SoftFloat compiling (https://github.com/FEX-Emu/FEX/commit/2b5ddb6b939def826b56516542db8024d304d38a)
Add in jemalloc glibc hooking again (https://github.com/FEX-Emu/FEX/commit/1a91d849f03ce9725539e772bb96cc3cefefa879)
Update drm headers to v6.2 (https://github.com/FEX-Emu/FEX/commit/1c2fd72c84eb94ff0b712721e50168a8dc4b03aa)
cpp-optparse: Update to latest optparse (https://github.com/FEX-Emu/FEX/commit/7916281ee7209a9120be9b83aee8a03c2d2b7c68)
More glibc allocation removals. (https://github.com/FEX-Emu/FEX/commit/f806ca688c8004919e965df95bf54eb30a6d7113)
Move FEX away from the remaining glibc allocations that we can (https://github.com/FEX-Emu/FEX/commit/aac4e25ca4a95654a44a3dca2c735193b0173fcb)
Add support for thunking Wayland (https://github.com/FEX-Emu/FEX/commit/21838fe03f09a0e44e2a74b7e7f1064d4becc9f3)
Convert most std::string over to fextl (https://github.com/FEX-Emu/FEX/commit/9183cf144f6b99cc17582aacf565413ac85c6060)
Convert most things to fextl (https://github.com/FEX-Emu/FEX/commit/e3f6ef6b1827bff16f3309f34a86128662f9a1bc)
fextl
:fmt: Remove fwrite usage (https://github.com/FEX-Emu/FEX/commit/d7bc0370ee83b1336dd3e448ad7c38e5cf1c9cff)
Bulk merge (https://github.com/FEX-Emu/FEX/commit/bba8716c7e255105ea48c795d392f9e60cc2eed3)
memory
Don't allow arrays in fextl::make_unique (https://github.com/FEX-Emu/FEX/commit/65fa4958907042a0d22f79bb595fb2c32e18b6c9)
gvisor
Disable timerfd test (https://github.com/FEX-Emu/FEX/commit/780491d61b50915f2d5708082b0b55a41d475131)
mingw
Disable compiling Common/Linux/Tools (https://github.com/FEX-Emu/FEX/commit/d7f9c7ece29e3de6218661d276f6def0ae5ce38b)
unittests
Add missing VPMASKMOVQ store test (https://github.com/FEX-Emu/FEX/commit/e71f3e898f66414718b7732025aa77b26143e56f)
Change alignment directive in 256-bit VPSADBW test to 32 (https://github.com/FEX-Emu/FEX/commit/d1ece88bed621563631eadb0af0a9cdc53846979)
gcc
Disable mcount_pic test (https://github.com/FEX-Emu/FEX/commit/a78860c194ca2bb52f967416116ff43382141600)
x32
Fixes clock_nanosleep syscall (https://github.com/FEX-Emu/FEX/commit/1f44037d9a22723b80abf9bc6409ef8f3bf9716c)

FEX - FEX-2303

Published by Sonicadvance1 over 1 year ago

Read the blog post at FEX-Emu's Site!

Oh jeez, another month already? I guess it's time for another FEX-Emu release. Let's pick a commit, spin the roulette wheel, and hope for the best!
Surely that's how releases work?

Rootfs images are now on a new CDN!

While this is something that doesn't directly impact FEX when running applications, it's a problem that most of our users need to deal with when
installing FEX. Our previous CDN which was hosting our x86 images had a fair number of problems that couldn't be solved. The main issue that affected
users was that it was slow to download the images and depending where you were in the world, it could have an unstable connection. This resulted in
gigabyte sized files taking forever to download or never at all!

This month we have switched our CDN to a service that has worldwide data replication across multiple dataservers. This improves the speed in which
users can download our prebuilt images. Going from an average of 20MB/s to over 300MB/s is a significant boost. In addition to that, the connection is
significantly more stable to the far corners of the world. Also something that doesn't affect users at all is that this new CDN is actually
significantly lower cost than what we are currently using. This was unexpected but it's a nice bonus that this CDN is an improvement is every regard,
including cost.

This month's code changes

With that out of the way, onward to this month's changes.

Optimize REP STOS instruction in to inline memset

This is an instruction that x86 offers that behaves similarly to a memory set operation. It behaves slightly differently since this allows you to set
the memory by element size, and also you can choose to direction in which the memory is set. In particular this instruction tends to get used for
zeroing out memory. Latest x86 CPUs have even optimized this instruction in order to be fast as possible. Previously FEX had decomposed this instruction
in to a complex series of code blocks that was inefficient for our JIT and everything surrounding it. Now we instead convert this to a single IR
operation called MemSet which exposes the semantics of how the instruction works. Allowing our IR to be cleaner and the backend to decompose it in
a more optimal fashion. Currently we emit a a fairly trivial loop that handles this memory set operation. ARM has recently announced that future CPUs
are going to support a memory set instruction that is very similar to the 8-bit REP STOS which will make this implementation even faster!

As seen by this graph, FEX is no where near a native implementation. It's important to note that even without writing "optimal" codegen, this change
has still given FEX up to an 11% performance improvement on its implementation. This was primarily focused around improving the IR, we can now
optimize the code that the JIT emits significantly more easily! Getting closer to native is likely something to come in the
future.

Add config option hide hypervisor CPUID bit

We encountered the first game that has anti-virtual machine code and refuses to run if it thinks it is running in a VM. While FEX isn't a virtual
machine, we expose this CPUID bit so software that cares can use it as hint to query FEX specific CPUID information. Now that this game has stumbled
upon this issue, we added a configuration profile to disable this CPUID bit for the game. If any other games also pick up on this issue then we will
need more profiles.

Proton and pressure-vessel startup optimizations

One of this months efforts have been about improving the time it takes for Proton to startup. pressure-vessel is the project that is used to setup the
Proton execution environment which takes a while overall. One of the hardest things about Proton is that it executes thousands of programs and does an
absolute ton of filesystem accesses. ARM devices typically don't have the highest performance filesystems, which makes one part of this hard, but also
FEX's filesystem overlay adds overhead to this. Additionally one of FEX's shortcomings currently is that every application execution must JIT fresh
code every time it restarts. Since pressure-vessel starts so many programs, a lot of the time is just spent emitting code to memory. There were a few
optimizations that went towards making this faster this month.

With the couple of optimizations in place we managed to shave a second off of the start-up time. Cutting the execution from 9.7 seconds down to 8.7
seconds. Or in the case of running on an Apple M1, execution is now down to 7 seconds. Almost all of this time improvement comes from faster syscall
wrapping and the remaining CPU time is code JIT and execution. It'll only get faster in the future!

Fix a race condition with syscall emulation

While this is a fairly minor change, we fixed a race condition around system calls which would consistently cause crashes when Steam was starting up.
Every piece of work that improves stability just makes the whole emulation experience so much better and needs to be celebrated!

Signal frame improvements!

A significant problem with using FEX is the debugging experience when something breaks. We spent a good amount of time this month improving how FEX
sets up its signal frames when the guest application hits a fault. Since we weren't following traditional signal frame generation, tooling around
backtracing was broken in most cases. We have now reworked this so that libSegFault will now work to give FEX a backtrace of the application's
state when it crashes.

We will be shipping a new rootfs which includes x86 and x86-64 libraries for libSegFault so that if users want to debug a crashing application, they
can try and get a backtrace.

AVX work continues

Another month, another bunch of AVX work that has been implemented.

Instructions implemented

VPHSUBSW
VHSUBPD/VHSUBPS
VPERMILPD/VPERMILPS
VPERMD/VPERMPS
VPHADDSW
VPTEST
VPMOVSD/VPMOVSS
VSHUFPD/VSHUFPS
VPSHUFD/VPSHUFHW/VPSHUFLW
VPSHUFB
VPALIGNR
VEXTRACTF128/VEXTRACTI128
VPBLENDVB/VBLENDVPD/VBLENDVPS
VBLENDPD/VPBLENDW

As you can see a lot of new instructions are now implemented. This now leaves us with about thirty more instructions that need to be implemented
before we can start avertising the features on SVE2-256bit supporting hardware. This is significant as we keep finding more and more games that are
requiring AVX to run

ARM emitter cleanups

Another change that isn't user facing but is always nice to point out some janitorial tasks that have been done. When we switched over to using our
own code emitter there were some design choices and implementations that weren't quite optimal. This usually culminates as developer pain when using
the emitter but was a necessary evil since we wanted to get rid of VIXL's assembler as fast as possible. @Lioncache
spent some time this month cleaning up a lot of the dirty code in the emitter, in some cases making it slightly faster as well. This is always greatly
appreciated as it reduces maintenance burden when working in the JIT.

They also implemented an absolute ton of new instruction emitter functions which previously didn't exist. While we don't use these yet, we will likely
use them at some point which will make our lives easier in the future.

New development machines for our developers

Just recently a new Snapdragon laptop has gotten working OpenGL and Vulkan drivers up and running! We are gifting each of our developers one of these
great machines in order to ensure we have testing platforms for all the OpenGL 4, DXVK, and VKD3D applications we want to be running! Kudos to all the
developers that worked on bringing this hardware up so quickly!

Raw Changes

ARMEmitter
Tidy up some assertion handling (https://github.com/FEX-Emu/FEX/commit/e7069f9f959418360abc3d061f3c60550b0a73ea)
Remove predicate implicit conversion operators (https://github.com/FEX-Emu/FEX/commit/41731e26804fd06379b89d4a7a74c5b3e9bb0fd5)
Make second sxtw parameter a WRegister (https://github.com/FEX-Emu/FEX/commit/e71e3ec9303b8b55ad2a2cc5d749ba5dc8ccd340)
Remove implicit conversions from Register/XRegister/WRegister (https://github.com/FEX-Emu/FEX/commit/378e0692b9cb6eb25b8558b7e98c600e1806e31e)
Remove predicate uint32_t conversion operators (https://github.com/FEX-Emu/FEX/commit/e869b2fe67c8f07a6dfa106a235c672ce1ad37ab)
Remove most implicit conversion operators for vector register types (https://github.com/FEX-Emu/FEX/commit/0f45318040cc69d879625d974cb94dd65e742ccb)
Make VRegister constructor explicit (https://github.com/FEX-Emu/FEX/commit/21fbcef0bde2e093bb68f55dd1d866fdf847f081)
Handle sequential registers in lists nicer (https://github.com/FEX-Emu/FEX/commit/ef020837672c6b75d737f79857d581356ebb642f)
Simplify size handling Advanced SIMD 3 different group (https://github.com/FEX-Emu/FEX/commit/24904f48c414e708bb67ab6de19f214202aac2d4)
Simplify advanced SIMD copy (https://github.com/FEX-Emu/FEX/commit/e65b429c835f2279a041917aee7cb10991a3610b)
Centralize handling for unsigned offset load-stores (https://github.com/FEX-Emu/FEX/commit/1832cc80d62642a465c78f62ca3fda2ef5bf0219)
Handle SVE Integer Compare - Scalars group (https://github.com/FEX-Emu/FEX/commit/fe1faf9ebe208e5e0f86d853237753b254417aa6)
Finish off SVE Predicate Misc group (https://github.com/FEX-Emu/FEX/commit/165db37c8d532a022675dc04873d453d20d22107)
Handle SVE partition break categories (https://github.com/FEX-Emu/FEX/commit/4d655218ab1b343bcf58fb62df753607779ff03f)
Handle SVE integer compare with wide elements category (https://github.com/FEX-Emu/FEX/commit/0a8fc2cbef41e07ac60ffe2529207ace59f40928)
Finish off SVE Permute Vector - Predicated group (https://github.com/FEX-Emu/FEX/commit/a4c694ffc77c58f00fb272ccf94fd2b0b60c9b08)
Handle SVE index generation category (https://github.com/FEX-Emu/FEX/commit/e4488b0cfc057739bc6f9e72876fb04f821d26bb)
Handle a few more vector permutation categories (https://github.com/FEX-Emu/FEX/commit/9c256bfe96e93553b7f352bee0bd260ef213deec)
Handle SVE2 Accumulate category (https://github.com/FEX-Emu/FEX/commit/8c8b680640bd492f34baab401037ee8aef0dbef8)
Finish off SVE Misc category (https://github.com/FEX-Emu/FEX/commit/2bd64ad24e10c4099a6ac91dac0cfe8e8e60d92f)
Handle CPY (scalar) and CPY (SIMD&FP, scalar) (https://github.com/FEX-Emu/FEX/commit/3c1ba846f79ade0ce64f2ec5eab95b59b5e02924)
Handle predicated wide shifts (https://github.com/FEX-Emu/FEX/commit/dd2e70e4aa0ea6e10fc18749a0ad968c6bd60e57)
Handle unpredicated wide shifts and unpredicated shifts by immediates (https://github.com/FEX-Emu/FEX/commit/5fd68b6f07b21a429dc85e25f12baa3363856cab)
Handle SVE2 saturating add/subtract category (https://github.com/FEX-Emu/FEX/commit/cb3cfed9c24016a3cc1e7a66ab2a1e891a8d29fb)
Centralize instruction handling for a few categories (https://github.com/FEX-Emu/FEX/commit/582108a68a9d2ed62721942769d3c1f8ec9e3f98)
Fixes some warnings that cropped up. (https://github.com/FEX-Emu/FEX/commit/5da90aac464be9707c4ddbc58e95f229550085c1)
Handle SVE SQDMULH/SQRDMULH (vector) (https://github.com/FEX-Emu/FEX/commit/1089987a292417ff50c5381356fe28d4f6b61a21)
Handle ADDVL/ADDPL and RDVL (https://github.com/FEX-Emu/FEX/commit/d81097482d93d3d4f0fc9f36bb3ddded2f5ed4fe)
Handle MLA/MLS (vector) and MAD/MSB (https://github.com/FEX-Emu/FEX/commit/347abf09ef802575cbcf3270d233fe04f36123a5)
Handle SVE predicated mul/div and finish off integer reduction category (https://github.com/FEX-Emu/FEX/commit/c0bc5d9748691dcfb6aef4efd6866388449681ea)
ASIMDOps
Amend a few error logs (https://github.com/FEX-Emu/FEX/commit/f7f2dc22105b6dfe657a99e19b57ea898912d89b)
Arm64
Fixes a race condition on syscall spilling SRA (https://github.com/FEX-Emu/FEX/commit/f6e2fe1515f7002cd4335b80a0144ec42e0da27f)
VectorOps
Use SVE only with 256-bit op sizes (https://github.com/FEX-Emu/FEX/commit/2f260ae6ad676935d29c314cd3d1477f5e73a2d7)
Use movprfx with VBSL (https://github.com/FEX-Emu/FEX/commit/e8fd8ef3b71552b944a91350534ea2ae2e1a59ac)
Arm64Emitter
Use bit utils wrapper over __builtin_ffs (https://github.com/FEX-Emu/FEX/commit/9e01730c6c35a1168e81efe05117e11194a26526)
CPUID
Adds an config option to hide hypervisor bit (https://github.com/FEX-Emu/FEX/commit/77fad28b698666a2bc01238c0235adc98771289b)
Core
Support Data in JIT buffer header (https://github.com/FEX-Emu/FEX/commit/c7c47a827ad4a385df6a23e410417ada045fb1bc)
Dispatcher
Fixes crash with misalign stack returning from signal (https://github.com/FEX-Emu/FEX/commit/d688026fe4a6aa888b25bf3534c8d579ff6f68e0)
Support reconstructing RIP from block entry (https://github.com/FEX-Emu/FEX/commit/66d879f38787a544b3cc45503f9a5f3eedb5747d)
Fixes guest stack register usage (https://github.com/FEX-Emu/FEX/commit/81a89ab747248e9b88aa3a0578d4913787af3d26)
Minor flags optimization (https://github.com/FEX-Emu/FEX/commit/6047ca9fe2c21082e567df32ffbe63f57bbdeb1c)
ELFCodeLoader
Adds an option to inject libSegFault (https://github.com/FEX-Emu/FEX/commit/a5762b6faa4093ac42f60b92d9b586d067c7efca)
Emitter
ALUOps
Fix typos in log messages (https://github.com/FEX-Emu/FEX/commit/f2aa0026b58c9ec5ee56ecbf66b4f00fe01e16de)
EmulatedFiles
Optimize openat handler (https://github.com/FEX-Emu/FEX/commit/545a216da61df93f374f04235487aa9675638098)
FEXBash
Move to Tools folder (https://github.com/FEX-Emu/FEX/commit/ef6f5d200362e169b158341c029f13a988762ee9)
FEXCore
Removes C wrapper interface (https://github.com/FEX-Emu/FEX/commit/f71f2445db363a760b6889d50101f090628b2c2f)
FEXRootFSFetcher
Update link to rootfs links file (https://github.com/FEX-Emu/FEX/commit/308fa76aa3c8858eb5eab0e1446c1c6aabded034)
FEXServer
Change systemd service environment variable key (https://github.com/FEX-Emu/FEX/commit/d2e0adf54061cf6419054b8f15e0ec0ba3526c40)
FEXServerClient
Fixes instance where FEXServer can create a zombie (https://github.com/FEX-Emu/FEX/commit/70aefc9db21d62b78526650c159cd7805b12b28d)
FileManagement
Fixes Proton (https://github.com/FEX-Emu/FEX/commit/2fca207e14dc12f4d0f6f508b9c96475d903a315)
Skip opening emulated writable files (https://github.com/FEX-Emu/FEX/commit/e310e29898fefc3ab1491703e318e2652dc3d944)
Optimize GetEmulatedFDPath with an FD! (https://github.com/FEX-Emu/FEX/commit/55d3edb8e62d2a62ebefbebbc25873c387595171)
IR
Add VDupFromGPR (https://github.com/FEX-Emu/FEX/commit/b329442c093640d94bb54310762da96d0c204de9)
Allow specifying register size for AES enc/dec ops and PCLMUL (https://github.com/FEX-Emu/FEX/commit/8689038533e0e41ebd4e133b2fd920e8ea62fdd3)
JIT
Adds a JIT data header and tail. (https://github.com/FEX-Emu/FEX/commit/60b76f53cfc4bc02035b032c5ee3742bb94bac66)
OpcodeDispatcher
Optimize REP STOS to MemSet operation (https://github.com/FEX-Emu/FEX/commit/fc38df2ff0dca65b9cadf1353e9bcad48e4df469)
Restrict partial XMM stores to FPRs in StoreResult_WithOpSize (https://github.com/FEX-Emu/FEX/commit/b39a882a2d4341e68f6dacfb4ecf94917a96f4d7)
Remove now unused _VDupElement path in LoadSource_WithOpSize (https://github.com/FEX-Emu/FEX/commit/4d25de31de6d94e55124aabd7fddc0428496409d)
Handle VPHSUBSW (https://github.com/FEX-Emu/FEX/commit/9b23ae91331ade4ae5bed6fadc65a6484334e22e)
Share MOVHPD implementation with MOVHPS (https://github.com/FEX-Emu/FEX/commit/f951a406e6e35ae2353fd7637918e8f13c9e2465)
Handle alignment for MOVAPS a little better (https://github.com/FEX-Emu/FEX/commit/68b2072eab134e9cda35a954943c63e55c2698c9)
Handle VHSUBPD/VHSUBPS (https://github.com/FEX-Emu/FEX/commit/9b123353b3711259b3317aa91b23e11d171525c1)
Handle register variants of VPERMILPD/VPERMILPS (https://github.com/FEX-Emu/FEX/commit/618f5bb869cd9b28fbe70558599f80a09b573fcf)
Handle VPERMD/VPERMPS (https://github.com/FEX-Emu/FEX/commit/645f40bb967fcb58d0dbdefcf0438badf8711cc9)
Handle VPHADDSW (https://github.com/FEX-Emu/FEX/commit/268deddd094ec6e7fdc9f61f5e7d71aa9553541f)
Optimize ALUOp handler (https://github.com/FEX-Emu/FEX/commit/65b2da20d6919e1a50c380939520bfce53f2feb2)
Handle VPTEST (https://github.com/FEX-Emu/FEX/commit/a90f5363dd265da28343269aea2ea81bac304714)
Use VectorZero over VectorImm in InsertPSOpImpl (https://github.com/FEX-Emu/FEX/commit/ab03e595008d9721be7e28854ca2707375bcc805)
Handle VMOVSD/VMOVSS (https://github.com/FEX-Emu/FEX/commit/25f0a03cebf63a34990c560f238de3c0d6e2ffee)
Handle VPMADDWD (https://github.com/FEX-Emu/FEX/commit/efafe0e6e97122d481f1b3c9b0da044a09e16b60)
Handle VSHUFPD/VSHUFPS (https://github.com/FEX-Emu/FEX/commit/35746c76696207b8e5887529122939760351e74a)
Handle VPSHUFD/VPSHUFHW/VPSHUFLW (https://github.com/FEX-Emu/FEX/commit/3ac7b2cddf790571ce6c07250fd761baac164edd)
Handle VPSHUFB (https://github.com/FEX-Emu/FEX/commit/d40812929f11287c800ed29844026ee144fb5b28)
Handle VPALIGNR (https://github.com/FEX-Emu/FEX/commit/a96ad0fc9d988ee01e0234b1c3d3f9ccf2f6a4d1)
Handle VEXTRACTF128/VEXTRACTI128 (https://github.com/FEX-Emu/FEX/commit/e6fc159d8802a01a95e88aac466e884b485366ec)
Handle VPBLENDVB/VBLENDVPD/VBLENDVPS (https://github.com/FEX-Emu/FEX/commit/4ef3066b69c087f31f79dbddfb17c867e4835f61)
Handle VBLENDPD/VPBLENDW (https://github.com/FEX-Emu/FEX/commit/e255f1cdef51776be3d4f5055ef70354f33b11e8)
Scripts
Update fit_native script for X1C/A78C (https://github.com/FEX-Emu/FEX/commit/11c8db5a14b9bbc305174d4ea0cee1e292eefd8a)
Syscalls
Renamed fstatat64 to fstatat_64 (https://github.com/FEX-Emu/FEX/commit/c4b66b41cd6d8f9f1de63dcdcbc58674c5478910)
VEXTables
Remove VPERMIL2PD and VPERMIL2PS entries (https://github.com/FEX-Emu/FEX/commit/5f574fb93509b7ffad6d6720c63659afdee10929)
VectorOps
Remove unnecessary mov in VUShrNI2/VSQXTN2/VSQXTUN2 (https://github.com/FEX-Emu/FEX/commit/b5bc8cd294b39006fb2b0458440f0f0e5e8f36e7)
Only use VBSL 256-bit path if SVE is present (https://github.com/FEX-Emu/FEX/commit/86a6118b6230b8e0c6bf8254e9899a1b1c06c7a2)
Misc
Support user supplied signal restorer. (https://github.com/FEX-Emu/FEX/commit/143ef57141528dc6322a788b0d011a70899fdfe0)
Fix SDL2 directfb includes under Alpine Linux (https://github.com/FEX-Emu/FEX/commit/3bc722ca696126fed76772780dfdcc552aaa6be9)

FEX - FEX-2302

Published by Sonicadvance1 over 1 year ago

Read the blog post at FEX-Emu's Site!

This month certainly passed in the blink of an eye. A lot of good bug fixes this month as usual! Continue reading to find out more.

Fix incorrect operation for cache line clears

In emulating the CLFLUSH instruction, FEX was incorrectly using the wrong operation for clearing caches. We were accidentally using the CVAU operation instead of CIVAC.
While this is incorrect, it was hard to find anything that was actually affected by the wrong implementation. With Snapdragon's open source Vulkan driver implementing what is required for VKD3D,
it became evident from Vulkan tests that this was incorrectly implemented. Switching the implementation is easy and will let VKD3D run without hacks
when the required feature is finished.

Bug fixes to 64-bit x87 emulation

A big thanks to CallumDev for finding and fixing these latest bugs in FEX's less accurate x87 emulation. As a
reminder, x87 on original hardware operates using 80-bit float values. This is a feature that ARM doesn't natively support, so FEX needs to emulate
this using a software floating point library. We have a hack in our configuration to allow removing this software implementation and instead operate
using 64-bit double operations instead. This can significantly improve performance in some 32-bit games but introduce rendering artifacts.

This month there were many bug fixes:

ALU operations that consume integers converted to floats are fixed
Float comparison that also consumes 16-bit integers fixed
FPREM instruction no longer infinite looping

With these fixes in place, a large number of games now actually render correctly with this hack enabled. It will be interesting to see how well this
improves performance or batterty savings in 32-bit games!

More AVX instructions emulated

With one of FEX's developers taking some away time, this was a little less involved than the last couple of months.
There was still a handful of instructions implementation

VPBLENDD, VBLENDPS, and VPSRAVD

Additionally while these aren't AVX instruction, we also implemented the CLWB and CLFLUSHOPT instructions. These match their ARM equivalents so it was
mostly an easy implementation that applications can use if they want.

Fix copy and paste error in Arm64 JIT

While this is a fairly minor issue, we had a copy and paste error in FEX's register spilling code. This caused Steam to crash in certain situations,
so fixing this since the previous release helps users wanting to run that.

A bunch of minor optimizations

This month had a bunch of small optimizations around the entire project. Alone these are all quite minor but added together should result in a couple
percentage of CPU time removed from FEX's JIT.

Arm64 Dispatcher is slightly faster
CPUID emulation initialization is faster
Optimize File loading, improving config loading time
Frontend instruction decoder optimizations to be faster
Makes IR operations 1 byte smaller, improving memory usage
Inline IR constants optimization to reduce IR memory size

Fixing thunk symbol override fetching

FEX's thunks had an issue where if a library was loaded, we would only ever fetch relevant symbols from that library directly. While this worked for
our use case, it breaks when wanting to use MangoHud in OpenGL applications. Resolving this issue fixes most things that will override symbols with
LD_PRELOAD.

Update JEMalloc from 5.2.1 to 5.3.0

While this is a fairly minor change, this release on JEMalloc fixes some bugs and improves performance. Small but every performance improvement is
welcome.

Support for execveat with AT_EMPTY_PATH

This is an interesting feature where an application can be executed directly through a file descriptor instead of a filepath on disk. This is a fairly
simple idea but has some interesting edge cases that might be interesting to some people. To see the more technical information about implementing
this, check out the pull request.

Raw Changes

ARMEmitter
Handle integer add/subtract vectors (predicated) instruction class (https://github.com/FEX-Emu/FEX/commit/9d33bba1c831becbb60bcb385aa38838d4f60478)
Handle RMIF, SETF8/SETF16 (https://github.com/FEX-Emu/FEX/commit/a899f9f824272265b5525f0c6611041a369042cb)
Handle SVE floating-point recursive reduction (https://github.com/FEX-Emu/FEX/commit/1cda029ed7b8d07958cfbdce4e04d85e13b07f82)
Add a few missing instructions (https://github.com/FEX-Emu/FEX/commit/2c9f99e5d618ff510c9474b7e2b9eb77ccea652c)
Support helper for long address generation (https://github.com/FEX-Emu/FEX/commit/f8d56a81708d301b41bd293f073c12ef3bf9477e)
Removes some warnings that cropped up (https://github.com/FEX-Emu/FEX/commit/5fd8fdbf5c6c9de5630b7d5d1c5acfb4d108863a)
Arm64
Merge two loads in to an LDP (https://github.com/FEX-Emu/FEX/commit/a28039f7cd5552b9e5d8ecebebc89d8dbaa1d6cd)
Fixes incorrect operation for CacheLineClear (https://github.com/FEX-Emu/FEX/commit/f8d92aa1219e383f294b33cc7367c0fa3d5d0f80)
Use switch statement for op handlers instead of jump table (https://github.com/FEX-Emu/FEX/commit/565ed450aaa8ed767814346795ea405f8228a433)
Fix SpillRegister C&P error (https://github.com/FEX-Emu/FEX/commit/9c93c6ffcda6d9888035acffcd5784529b4c7811)
Fixes large offset spill slots (https://github.com/FEX-Emu/FEX/commit/9acb513393837f288636fd5274005f04d69448cb)
VectorOps
Clamp shift amount to esize-1 for VSShr (https://github.com/FEX-Emu/FEX/commit/9a318cad959ad2aee613e79dce8e070f8a5747ae)
ArmEmitter
Adds two more classes of ASIMD instructions (https://github.com/FEX-Emu/FEX/commit/95e544c8407178fe7313e4400c71ebcafc0471e7)
Adds three more classes of ASIMD instructions (https://github.com/FEX-Emu/FEX/commit/81e0ac7e0b6d282cf48efd085dd291c5ebd3f5b2)
CPUID
Optimize initialization (https://github.com/FEX-Emu/FEX/commit/f614fc6fac0b4e8ea83afdd6f56db92213d6f908)
Config
Fix relative execve applications. (https://github.com/FEX-Emu/FEX/commit/65971effc7f2a63297ca31ebfaa804d365f89ffb)
ConstProp
Pool inline constants (https://github.com/FEX-Emu/FEX/commit/1e90ebb4005c1e386d6e9dda3f9440e9ac7feb03)
Core
Adjust virtual memory size for 32-bit (https://github.com/FEX-Emu/FEX/commit/7f6a620c9e78c0b37a38ff6a017e17bfb582bd23)
Dispatcher
Extract 64-bit signal frame save and restore (https://github.com/FEX-Emu/FEX/commit/65b6b6d5dde858c6ec61894749331091533f67ec)
Fixes x86-64 SA_SIGINFO generation (https://github.com/FEX-Emu/FEX/commit/8dae785e9e6f98ac672851621b075e7066265272)
ELFCodeLoader
Don't use std::random_device for RNG (https://github.com/FEX-Emu/FEX/commit/f5e97f3542d0ec001056770ae40c69fd046ad9f7)
Emitter
Remove unused header (https://github.com/FEX-Emu/FEX/commit/90bcb8c70b87a2a689911ef250f0a57b2a9e5bf1)
External
Update JEMalloc to disable 16k pages (https://github.com/FEX-Emu/FEX/commit/bbf9198cba596d52291b0f44014996b36993e7c6)
Externals
Update jemalloc to 5.3.0 (https://github.com/FEX-Emu/FEX/commit/9322e55a3f6ccb7d6189263b9ee552f9f5346574)
F64
Fix integer immediates for add,mul,div,sub (https://github.com/FEX-Emu/FEX/commit/c2325e1772cfe2bf3d2ca23fc34ab0a156972874)
FEXCore
Fixup 32-bit signal handling (https://github.com/FEX-Emu/FEX/commit/fa1193f14c9fa7740b46378d466c32b22c528bba)
FEXLoader
Adds support for execveat with AT_EMPTY_PATH (https://github.com/FEX-Emu/FEX/commit/dcce9add60550b4f6e912ce145b9d25777b9b808)
Build FEXInterpreter and FEXLoader independently (https://github.com/FEX-Emu/FEX/commit/8974509c529e805c3eaf7f4b6b2cab3f043d3f1b)
FEXRootFSFetcher
Support option to auto select first distro (https://github.com/FEX-Emu/FEX/commit/a7aeb4af7f8f66c2999af1966b9cab0938722013)
FEXServer
Remove POLLREMOVE usage (https://github.com/FEX-Emu/FEX/commit/d2d528222c9efc08f2b147830d9ebe664ed264c9)
FileLoading
Optimize FileLoad (https://github.com/FEX-Emu/FEX/commit/28dd94642a95982df52d983f049cae00091ed0dd)
Frontend
Various optimizations (https://github.com/FEX-Emu/FEX/commit/787b6895e88b5dede429a03219df61a9740dd384)
Github
Add ARM emitter tests to CI (https://github.com/FEX-Emu/FEX/commit/da88c68e1275bb6d846c36e2c24f35f712d9034b)
IR
Removes NumArgs member from IR ops (https://github.com/FEX-Emu/FEX/commit/9403c662a3b2cbb302702c4958c1ee9966e09a0e)
Remove HasDest member (https://github.com/FEX-Emu/FEX/commit/f8e762fcfb05d4ee35706d6f2f6157528ae955b0)
JitSymbols
Fixes file opening and writing (https://github.com/FEX-Emu/FEX/commit/a486797e5900bd2451b6af5b6190031a9c373918)
Fixes a crash that can occur (https://github.com/FEX-Emu/FEX/commit/34e1ba612998ff249db593cf7c6bed067c7b44f3)
Linux
Fixes shebang file execution (https://github.com/FEX-Emu/FEX/commit/477d4b6de8cda894aedf17f91497ff070c93c730)
MContext
Insert a stack cookie with assertions enabled (https://github.com/FEX-Emu/FEX/commit/766435941031a88d796afe43cd359773800579ba)
OpDispatcher
Adds support for CLWB and CLFLUSHOPT (https://github.com/FEX-Emu/FEX/commit/7be2e1ad34eb4624fd849a4746b0300f767dfd8a)
Fixes a few missing GPR/XMM helper usages (https://github.com/FEX-Emu/FEX/commit/4aa984aed953568df4b5163396e724c6910975df)
OpcodeDispatcher
Handle VPBLENDD/VBLENDPS (https://github.com/FEX-Emu/FEX/commit/62e6ada1123c011ac78c9dd1b7d2867141cb939d)
Handle VPSRAVD (https://github.com/FEX-Emu/FEX/commit/fe79f61fc350f68dc3436a36f17b8fa825ffe0dd)
Scripts
Update InstallFEX.py rootfs links (https://github.com/FEX-Emu/FEX/commit/df8704215b0468113b609b056e9a3122ec416249)
Syscalls
Fix out-of-bounds read when handling single-line shebang files (https://github.com/FEX-Emu/FEX/commit/3d29dac1b1163719f5ddea509c7c36a76bf37bf4)
Thunks
Fixes host symbol overrides (https://github.com/FEX-Emu/FEX/commit/9d35bc01c79eb1f7e89a9f5cd2bc151415b12936)
X86Tables
Optimize struct layouts (https://github.com/FEX-Emu/FEX/commit/dfc3297192a31c6bf12aac30000b9bc4cf34f77a)
Misc
X87_F64: Fixes FICOM (https://github.com/FEX-Emu/FEX/commit/afaff9293b4e03fe2bcdfad7d759d2a7b7eaae86)
fix ifdef to use HAS_SYSCALL_TGKILL for tgkill as it was intented (https://github.com/FEX-Emu/FEX/commit/8d0329ddaf0a47586092b062dbb5d101bcbd4038)
fix tgkill (https://github.com/FEX-Emu/FEX/commit/1521e0a248ab559687dd11cdbbba993583fc4989)
Fix FPREM flags calculation in F64 (https://github.com/FEX-Emu/FEX/commit/632add660c7992dd4138f2b88ca8b6e9d6281bf0)
unittests
Adds negative integer x87 tests (https://github.com/FEX-Emu/FEX/commit/ee58c5de1d4a21656012c0a11d4192849b18136a)

FEX - FEX-2301

Published by Sonicadvance1 almost 2 years ago

Read the blog post at FEX-Emu's Site!

Happy new year! A new month brings a new release of FEX-Emu, bringing in the new year.

A large amount of work in this last month, showing that FEX-Emu isn't slowing down even through the holiday season.

AVX emulation work continues

An absolute ton of work landed this last month towards bringing up AVX emulation in this last month. In total there were around 185 new
AVX instructions implemented in FEX-Emu's backend this month. At this point it starts becoming easier to talk about the number of missing instructions
rather than what is implemented.

According to FEX-Emu's instruction decoder tables, we have around 60 more instructions to implement before we can start advertising the feature. Of course
with anything programming related, the last 10% is going to take the longest to implement.

A huge shoutout to @lioncash for smashing out these implementations so quickly. The amount of work going in to this is
extensive.

As a side-note for users looking forward to this feature. The implementation requires hardware that supports both SVE and SVE2 with a 256-bit register
width now. Which means that Fujitsu A64FX, Neoverse-V1, and all current consumer class Cortex chips are incapable of taking advantage of AVX once
complete. This is a future proofing implementation for when future hardware becomes available that supports what FEX-Emu needs.

Implement a new AArch64 code emitter

One thing that has been a stand out performance bottleneck has been how quickly FEX-Emu can emit AArch64 binary code to memory. The project that
FEX-Emu used for this is ARM/Linaro's project called vixl. This project is a suite of tools including assemblers,
simulators, and disassemblers and many open source projects do use this. This is a very nice project that eases the developer's burden when writing a
JIT that targets ARM devices. Sadly when profiling our code, it turns out that FEX-Emu spensd a decent amount of time inside of vixl code due to how
obtusely large it is. Even with Link-Time-Optimization enabled in our code, we can't reduce the overhead incurred from vixl sadly.

With this in mind, FEX-Emu decided to create its own AArch64 code emitter tailored to what the project needs, which is high performance and low
overhead.

As seen in the chart above, the percentage of time between how long it takes to emit code between Vixl and our new emitter is significant. With the
Cortex-X1 only taking 68.7% of the time, and a smaller Cortex-A55 only taking 60.2% of the time. The Cortex-A55 having more of a win is showcasing
that due to how much code vixl takes to emit code, it is effectively saturating the icache and
BTB of the poor little CPU core.

Only code emission performance isn't the only story that matters here though. We need to showcase how much of an improvement this has including the
rest of the translation from x86 code.

Although code emission is only a percentage of our total time spent when translating x86 code, this new emitter is having a fairly massive ~8%
reduction in time spent JITing. This will manifest as reduced stutters when users are running games and generally faster application execution for
short-lived applications.

We're not stopping there of course, look forward to the coming months as we spend more time optimizing our JIT so it runs even faster!

Initial 32-bit thunk support

A tricky feature that FEX-Emu does with its emulation is that it is translating 32-bit x86 applications to run inside of a 64-bit process space. This
is a hard problem to resolve which is why we don't currently support thunking of libraries when running 32-bit applications. This is the initial work
required to start supporting this use case.

While not wired up to any library currently, we are quickly working towards getting Vulkan and OpenGL wired up to this interface so we can accelerate
older 32-bit games.

Various JIT optimizations

There have been various JIT optimizations this month which will improve performance a small amount. These aren't benchmarked since the percentage
improvements are so small that it is likely to fall in to single digit noise.

Optimize inline syscall spilling

When FEX handles a syscall inline with our JIT, we were spilling all of our registers to memory. Now with this optimization correctly working we only
spill exactly what is required, making inline syscalls faster.

Optimize generic spilling and filling

When jumping out of the JIT to C code, we need to spill both general purpose registers and vector registers to the stack. With this optimization in place we now
generate roughly half the instructions necessary when doing so.

Optimize SVE register spilling and filling

While currently not utilized today, this cuts the number of instructions required for spilling SVE registers to a quarter. Should be quite nice for
future hardware.

Zip elements for PHSUB instructions

These horizontal vector instructions behave a little weirdly and our original JIT implementation wasn't quite optimal. Previously we were doing
explicit element inserts to combine the final result. Now we are using the AArch64 Zip instructions which are significantly more optimal.

Fix global application configurations

This was a bug where we accidentally broke applications configurations shipped with the fex-emu package. In particular this caused the steamwebhelper
to break. With this resolved, steam will work correctly again.

Fix misspelled library names in Thunks Database

While a fairly minor fix, this can have a profound impact on users that are using our thunking infrastructure. Our XCB thunks were incorrectly named,
which meant that if users were enabling XCB thunks independentally of Vulkan/GL, then they wouldn't have actually been enabled.
With this typo fixed then this won't be a concern.

Note that if Vulkan or GL thunks were enabled, then this wouldn't likely have been an issue since X11 would have loaded xcb independentally anyway.

Misc

There was a bunch more this month that was smaller and spread out. We don't want to take up too much of your time so if you want to see more, make
sure to check out the detailed change log!

Raw Changes

ARM64
Moves RA functions to header (https://github.com/FEX-Emu/FEX/commit/048daa45793d772223ad1905848081d6a469b687)
Arm64
Rename GetSrcPair, GetDst, and GetSrc (https://github.com/FEX-Emu/FEX/commit/bf7d0f7ed9217d1ba3ba627df85fe66948e624db)
Enables debug option for disassembling the JIT code (https://github.com/FEX-Emu/FEX/commit/03a061339a10fe8bb56e4110946f576b382270a9)
Inline Syscall spill optimization (https://github.com/FEX-Emu/FEX/commit/0ebb15c732ec7f49466248cdf94506ba41344c00)
Optimize SVE register spilling and filling (https://github.com/FEX-Emu/FEX/commit/1ab4471ef9aeb81c83d70f985e7a735a384c0447)
Optimizing spilling and filling (https://github.com/FEX-Emu/FEX/commit/9a8852f9b64fb1fb714807afa9120bb059718057)
Reduce dispatcher to 1 page (https://github.com/FEX-Emu/FEX/commit/65e8bf9d72017d3672aaa3a452009278b723fd3b)
VectorOps
Simplify FADDP result merging (https://github.com/FEX-Emu/FEX/commit/344ec33ba55af4d76da5d4d24678d5f4a9fadbea)
Config
Fixes global application configs (https://github.com/FEX-Emu/FEX/commit/dc9737a3948be8a5c5605aa0938d87fe70bceb93)
Crypto
Explicitly clear upper lane with VPCLMULQDQ (https://github.com/FEX-Emu/FEX/commit/4c013c867fb0c95f5badf35be8f88edae2429660)
Dispatcher
Calculate REG_ERR correctly using ARM ESR_EL1 (https://github.com/FEX-Emu/FEX/commit/4f313f5d4098fd088e1c9fb08bcb0ccb985b3b08)
Frontend
Handle 256-bit destination sizes directly (https://github.com/FEX-Emu/FEX/commit/e8aa79bea93a8e800232c074ad8fb44f7a1fd6f5)
IR
Handle 128-bit VInsElement with SVE (https://github.com/FEX-Emu/FEX/commit/94ae2e3a9ce1963bad1021ec6294bfe1c7d045c0)
LookupCache
Use a PMR map for our Blocklinks with monotonic allocator (https://github.com/FEX-Emu/FEX/commit/b7358b492617b604ddd5155bbf1891f3ed749904)
Optimize cache clearing and allocation (https://github.com/FEX-Emu/FEX/commit/2b6a020c4c80d3d8163b7b5fe17f82f90d726387)
OpCodeDispatcher
Optimize a case of GOT calculation (https://github.com/FEX-Emu/FEX/commit/b42b4e03a4e07b71034eab29a37ab55bc768aff3)
OpcodeDispatcher
Handle immediate variants of VPERMILPD/VPERMILPS (https://github.com/FEX-Emu/FEX/commit/3904a5264fb5a9e347459b5731cac9fff1d1a5ec)
Handle VMASKMOVDQU (https://github.com/FEX-Emu/FEX/commit/c6297edac0afa33fba9ee29017e23060e165b8f4)
Handle VPHSUBD/VPHSUBW (https://github.com/FEX-Emu/FEX/commit/4786ddc44ce338d0727a8bf87938d515651a53dc)
Zip elements instead of for loop insertion in PHSUB (https://github.com/FEX-Emu/FEX/commit/58ec2b2d7f00e637b564fa81349783854a3f014c)
Handle VDPPD/VDPPS (https://github.com/FEX-Emu/FEX/commit/9b8c92e275b85bdb418dcb208e5c87bd0df8eb74)
Handle VINSERTPS (https://github.com/FEX-Emu/FEX/commit/6caf764b7c81fa76465df905ae5cc1c101c4cfe9)
Handle VMOVMSKPD/VMOVMSKPS (https://github.com/FEX-Emu/FEX/commit/faa81f241b6b908d3cf3dd86866c19c7df4fc15e)
Handle VPUNPCKHBW/VPUNPCKHWD/VPUNPCKHDQ/VPUNPCKHQDQ (https://github.com/FEX-Emu/FEX/commit/64cd377e37da3712d08fc6f535f6b46d9c8327c8)
Handle VUNPCKHPD/VUNPCKHPS (https://github.com/FEX-Emu/FEX/commit/138f1fc844b95deb1334a017aad7efcfe12374c6)
Handle VPUNPCKLBW/VPUNPCKLWD/VPUNPCKLDQ/VPUNPCKLQDQ (https://github.com/FEX-Emu/FEX/commit/6bc1c3fc307d77f01ff2eba0ed3210f7643e1ea6)
Handle VUNPCKLPD/VUNPCKLPS (https://github.com/FEX-Emu/FEX/commit/4560c5b73c8cace01367baac7e0add44f402ba4d)
Handle VCVTSS2SI/VCVTTSS2SI/VCVTSD2SI/VCVTTSD2SI (https://github.com/FEX-Emu/FEX/commit/4a884802f8659858ca0d1bf6af66a24d36b8439e)
Handle VCVTPD2DQ/VCVTTPD2DQ/VCVTPS2DQ/VCVTTPS2DQ (https://github.com/FEX-Emu/FEX/commit/f37938576d28a5c9cd408fad556a611a13a0f6ca)
Handle VPMULHRSW (https://github.com/FEX-Emu/FEX/commit/82adc2f93134fb5720691e7f298239249ce9641d)
Handle VPMULHW/VPMULHUW (https://github.com/FEX-Emu/FEX/commit/4a3af8d7f9aacb2a88ca2292c8fc2b36f35a9a5c)
Handle VPHMINPOSUW (https://github.com/FEX-Emu/FEX/commit/9d58514f570097843cf5402bd69e7c3c004844cd)
Handle VPMULDQ/VPMULUDQ (https://github.com/FEX-Emu/FEX/commit/33e8f21ac7b979e93f52a89764dd224fcd645b8b)
Handle VCMPSD/VCMPSS (https://github.com/FEX-Emu/FEX/commit/cecda7bbb6afa7960b5ead34be20aca729338df4)
Remove lingering debug log from VPFCMPOp (https://github.com/FEX-Emu/FEX/commit/ce351282f23d611922ae8799e9800976cb55a749)
Convert runtime assert to static_assert in SHUFOps (https://github.com/FEX-Emu/FEX/commit/345e9b97bfc67ba772831bf6bf87b5d377cf04c9)
Handle VCMPPD/VCMPPS (https://github.com/FEX-Emu/FEX/commit/0c651dd5f831e039695bda5e5616ad95261b9675)
Handle VPSRLDQ (https://github.com/FEX-Emu/FEX/commit/1668db046d3073a39f832930a974ac0a9d3a848e)
Remove unnecessary usage of VMov in VPSLLDQOp (https://github.com/FEX-Emu/FEX/commit/515b3e485b1f2594e1901b5525a3f0a8c9d5be02)
Handle VCVTDQ2PD/VCVTDQ2PS (https://github.com/FEX-Emu/FEX/commit/4aed60ee3d50234111cd362833b6499dadbd669a)
Handle VPSLLDQ (https://github.com/FEX-Emu/FEX/commit/60a2fb163cf2349e54be6a28eb986eaf31f9abd4)
Simplify SHA1MSG1 implementation (https://github.com/FEX-Emu/FEX/commit/d0cb329608b93a66ac373426cab31fcc84d9ba95)
Handle immediate variants of VPSRLD/VPSRLQ/VPSRLW (https://github.com/FEX-Emu/FEX/commit/72a3b182795682193e8b3af4cfc9fa5be6577548)
Handle 128-bit AVX AES instructions (https://github.com/FEX-Emu/FEX/commit/18004512518cc8cea9dcf2e4aad163e15d9b4581)
Handle immediate variants of VPSRAD/VPSRAW (https://github.com/FEX-Emu/FEX/commit/1d9218224f9f8d017e035582dc6324009408d6d1)
Handle VPACKUSDW/VPACKUSWB (https://github.com/FEX-Emu/FEX/commit/6e733bfc22f6cf632a17a7befe067f99fd824683)
Handle VPACKSSDW/VPACKSSWB (https://github.com/FEX-Emu/FEX/commit/01d22849cf86f46bd2b4824058f20a918c29ff04)
Handle vector versions of VPSRA{D, W} (https://github.com/FEX-Emu/FEX/commit/78b53bfa499a790b79f97d72552bb3fa051c1876)
Handle remaining PEXTRW opcode (https://github.com/FEX-Emu/FEX/commit/fabf4530460e48e9f901b57f2736507f85bb9b3b)
Handle VPMULL{D, B} (https://github.com/FEX-Emu/FEX/commit/b26e4109fa9f1601c11e6ffb547b00a641b1e512)
Handle vector variants of VPSRL{D, Q, W} (https://github.com/FEX-Emu/FEX/commit/ad3bf189c018855efc8678329c3251667727a5cf)
Handle VPEXTR{B, D, Q, W}/VEXTRACTPS (https://github.com/FEX-Emu/FEX/commit/c86ba7646c0f6a471d09cfd9ce4f40cb5c443336)
Handle immediate variants of VPSLL{D, Q, W} (https://github.com/FEX-Emu/FEX/commit/c1e301a5edab937b0a8e8044de11bdd901972160)
Handle vector variants of VPSLL{D, Q, W} (https://github.com/FEX-Emu/FEX/commit/58fab721b30acabfa7c27e0a388f456f9f0ddcf4)
Handle VPHADDW/VPHADDD (https://github.com/FEX-Emu/FEX/commit/8ce6c08152dfa1c01476791ddc527b671f76a912)
Handle VPMOVSXB{D, W, Q}/VPMOVSXW{D, Q}/VPMOVSXDQ/VPMOVZXB{D, W, Q}/VPMOVZXW{D, Q}/VPMOVZXDQ (https://github.com/FEX-Emu/FEX/commit/dc2eaf6511c8d3e473fa253f762b61de8630b482)
Narrow memory access with scalar rounding operations (https://github.com/FEX-Emu/FEX/commit/0e233a96f0fe49bcf712a77bd98f836bcc78bf67)
Move template impl to regular function where applicable (https://github.com/FEX-Emu/FEX/commit/4b891d614719167c20f23f1891d0c454860796d2)
Handle VROUNDS{D, S}/VROUNDP{D, S} (https://github.com/FEX-Emu/FEX/commit/1ca356371d43cc82c6fa75f024a7165ffe32c64c)
Handle VINSERTF128/VINSERTI128 (https://github.com/FEX-Emu/FEX/commit/4b2164768f08d716652fb99833b973272b950c69)
Handle VPERM2F128/VPERM2I128 (https://github.com/FEX-Emu/FEX/commit/f3d0fa6f60924da88bb126953475da2a4a1836e3)
Handle VPERMQ/VPERMPD (https://github.com/FEX-Emu/FEX/commit/60a45615df18a42daf58c0f23f04803176c31c7f)
Handle VHADDP{D, S} (https://github.com/FEX-Emu/FEX/commit/ded257c92f01807a3ac35a0fd635b8751378d340)
Handle VPMAXS{B, D, W}/VPMAXU{B, D, W} (https://github.com/FEX-Emu/FEX/commit/9de5840f7abd282d0e3529881597fadc3a8ba1c3)
Handle VPMINS{B, D, W}/VPMINU{B, D, W} (https://github.com/FEX-Emu/FEX/commit/40bab6b58e614132835c6bbbcb4f06f5efffbb04)
Handle VPADDS{B, W}/VPSUBS{B, W} (https://github.com/FEX-Emu/FEX/commit/98a454169da83b92dfcb2bea3ade1db3df48c0d7)
Handle VPADDUS{B, W}/VPSUBUS{B, W} (https://github.com/FEX-Emu/FEX/commit/757602bb1e318955f84017bbcd500d4afb9443aa)
Handle VPSUB{B, D, Q, W} (https://github.com/FEX-Emu/FEX/commit/a90067fb1e330a6e3b0bef966ca6f09cfe88fac3)
Handle VPSIGN{B, D, W} (https://github.com/FEX-Emu/FEX/commit/1bc013d5f0713b15cd329d8ad35c824847760e2c)
Handle VDIVP{D, S}/VDIVS{D, S} (https://github.com/FEX-Emu/FEX/commit/a07a53364074714ecb7e4e7d8685b0bb9ea68612)
Handle VMULP{D, S}/VMULS{D, S} (https://github.com/FEX-Emu/FEX/commit/eefcea49f4eded1e9dd987ca9b9f080d5d13bb46)
Handle VMAXP{D, S}/VMAXS{D, S}/VMINP{D, S}/VMINS{D, S} (https://github.com/FEX-Emu/FEX/commit/d6b137e6b77c9ab547131e06582fe964d2e3bff9)
Handle VSUBP{D, S}/ VSUBS{D, S} (https://github.com/FEX-Emu/FEX/commit/db9039017983d4bf948b6092446d40cd41819a0e)
Handle VRCPPS/VRCPSS (https://github.com/FEX-Emu/FEX/commit/293734408c01538cce8ea47e166e93aeb951708c)
Handle VLDDQU (https://github.com/FEX-Emu/FEX/commit/03fbb923b301750e9bfc8614f17c44d9262a4e80)
Handle VPABS{B, D, W} (https://github.com/FEX-Emu/FEX/commit/a57f3a62641cef42675afec5030daa36386fd978)
Handle VPCMPEQ{B, D, Q, W}/VPCMPGT{B, D, Q, W} (https://github.com/FEX-Emu/FEX/commit/573896d0b719f91dcac37d8bebc02c27f818cdc9)
Handle VRSQRTSS/VRSQRTPS (https://github.com/FEX-Emu/FEX/commit/ab14375a036cb2e8302f038cd54251d6a4a97ae5)
Handle VPBROADCAST{B, D, Q, W}/VBROADCASTI128 (https://github.com/FEX-Emu/FEX/commit/ace90aac953292162e8a6e7a40de3e7f45d5133d)
Handle VBROADCASTSD/VBROADCASTSD/VBROADCASTF128 (https://github.com/FEX-Emu/FEX/commit/2123868a421636f692376406c767cd3afea947f1)
Handle VLDMXCSR/VSTMXCSR (https://github.com/FEX-Emu/FEX/commit/b73aeb8902741f1a022fededacb5db4d70c3fb5a)
Handle VSQRTPD/VSQRTPS/VSQRTSD/VSQRTSS (https://github.com/FEX-Emu/FEX/commit/d965ae03c462472f6d678df0bde5e901678f34b4)
Handle VCOMISD/VCOMISS/VUCOMISD/VUCOMISS (https://github.com/FEX-Emu/FEX/commit/7ac21e794dcddaa9b63ee9d4ffbef32331fbb189)
Handle VPAVGB/VPAVGW (https://github.com/FEX-Emu/FEX/commit/a98920d4e4bba5d31b5f4cfc9881715fdc965722)
Explicitly zero upper lanes (https://github.com/FEX-Emu/FEX/commit/e9aa368a62296a12e1e0e44483bea8fbfd2f39db)
Handle VADDSD/VADDSS (https://github.com/FEX-Emu/FEX/commit/5ac44baa2c2154f1513dd5e51c1c0d14e716e436)
Merge HADDP/PHADD into VectorALUOp (https://github.com/FEX-Emu/FEX/commit/bf86df7a667805f1b54f4c6fb7484e0d950258ce)
Merge PAVGOp with VectorALUOp (https://github.com/FEX-Emu/FEX/commit/3322f8b8901b476bc02a7c636d3939f53854e182)
Merge PADDQOp, PSUBQOp, PADDSOp, PSUBSOp with VectorALUOp (https://github.com/FEX-Emu/FEX/commit/9eaa45f92223ad4eecbe07df0529d8e4d15a7215)
Merge ANDNOp with VectorALUROp (https://github.com/FEX-Emu/FEX/commit/4b1671860296bf68850c340fb4a04d291a6c33d5)
Simplify VANDN (https://github.com/FEX-Emu/FEX/commit/2bf7e09862d4c1e2820fc3fbededdde1e7edf578)
OpcodeHandler
Handle VADDSUBP{D, S} (https://github.com/FEX-Emu/FEX/commit/905eb015c086d6ed651d506b36aeb791df52df99)
ThunkDB
Clean up database loading (https://github.com/FEX-Emu/FEX/commit/12b866c276766be41b254bf685f52520d7c71824)
Thunks
Fix IDE integration (https://github.com/FEX-Emu/FEX/commit/16969fcdad1a0c7a7aa806d10b99ca47677785ff)
ThunksDB
Fix misspelt guest library names (https://github.com/FEX-Emu/FEX/commit/e486833d75af2ec28d76b3c931d23c746e158427)
X86Tables
Restrict CVTDQ2PD and CVTTSD2SI to 64-bit memory accesses (https://github.com/FEX-Emu/FEX/commit/91c00d2cb61f6eeed53db9bdf0636897cb6d169d)
Misc
Create a new ARM64 Emitter and move JIT over to it. (https://github.com/FEX-Emu/FEX/commit/ec55ecdb31a04e16c0290bc1b381cd186ea1b8ad)
Initial 32-bit host thunk feature support (https://github.com/FEX-Emu/FEX/commit/d5f3a091d08f03b27b95544e34306b8ad85b429f)

FEX - FEX-2212

Published by Sonicadvance1 almost 2 years ago

Read the blog post at FEX-Emu's Site!

A lot of good work this month with the highlight being that we have started working on our AVX implementation and started optimizing our IR to be more efficient.

Disable PCLMUL if not supported on host

This carry-less multiplication instruction is only implemented on ARM SoCs that ship the cryptographic extension.
This extension is unsupported on the Raspberry pi which was causing applications that use openssl to crash.
Specifically this fixes Steam running on the Raspberry Pi again.

Adds 256-bit support to the remaining IR vector ops

A lot of work this month for implementing support for 256-bit operations.
With this work in place our JITs now support 256-bit for all of the IR operations.

Work started on AVX emulation

With the previous work completed for having our JITs support 256-bit operations, work could now be started on implementing AVX.
This AVX work is implemented as native SVE 256-bit operations, so the only hardware that can currently execute this partial implementation is Neoverse-V1 CPUs.
The expectation that as ARM CPUs become more powerful, they will eventually support SVE with 256-bit sized registers.
It may take a few generations to get hardware that supports this, if ARM CPUs want to run AVX games then they will need to support the equivalent hardware feature-set.

Current instructions implemented:

VZEROUPPER, VZEROALL
VMOVAPS, VMOVQ
VMOVNTDQ, VMOVNTDQA, VMOVNTPD, VMOVNTPS
VMOVDQA, VMOVDQU
VMOVAPD, VMOVUPD, VMOVUPS
VMOVLPD, VMOVLPS
VMOVSHDUP, VMOVSLDUP
VMOVHPD, VMOVHPS
VMOVDDUP
VORPD, VORPS, VPOR
VPXOR, VXORPD, VXORPS
VANDPD, VANDPS, VPAND, VANDNPD, VANDNPS, VPANDN
VADDPD, VADDPS, VPADDB, VPADDW, VPADDD, VPADDQ

This is just the beginning of us implementing support for this, stay tuned as we implement the remaining operations over the next few months.

Generate register access IR operations directly

As an original implementation design detail, FEX implemented GPR and XMM register accesses as a generic emulated CPU state access. Once we added
static register allocation we also added an optimization pass to convert these generic accesses in to register accesses which directly map to our
static register allocator.

This is a redundant pass since we know upfront which registers were being accessed. With this change we are generating register access IR operations
directly and removed the optimization pass. This removes around 12% JIT compilation time, which improves responsiveness and lets FEX spend less time
compiling code.

Systemd fixes

While this is a niche supported operation, some people may be interested in running FEXServer as a systemd client.
A FEXServer is meant to be a user-wide server that the FEX clients talk to for rootfs and eventually other management.
Using a systemd user service, a FEXServer can be started early, letting it mount the rootfs image, and run in the background.
This can be fairly useful as FEX error logs can then be printed to journalctl for inspection as for why a process has crashed.

Add support for steamid based configuration files

As an ongoing effort of documenting which applications can run with FEX's OpenGL and Vulkan thunk libraries, it was determined that some applications
use generic executable names. This means that a configuration file that uses the application name would have erroneously enabled thunks for other
untested applications.

In order to work around this issue, our configuration system now supports an optional steamid based naming convention for games that are launched from
Steam. With this in place, we now have a repository that contains application configurations that users can install at their leisure. This repository
can be found on Github

As part of the documentation process, all of these configurations must be documented on our Wiki with
testing results to ensure it works.

Implement SGDT

This is a quirky instruction that is emulated on a native x86 system these days. This instruction is a system instruction that is used by the OS for
getting the configuration of the global descriptor table. Linux captures this instruction and returns a configuration that says the table is living in
kernel memory space. While this is already true, an application usually doesn't need to care about this data.

Curiously enough Denuvo uses this instruction in some of their implementations for some reason. With us implementing this instruction, Denuvo games
now get slightly further before they horribly crash.

auxv fixes

When FEX executes an application, it needs to setup an emulated auxv state since this isn't a cross-architecture state.

AT_RANDOM
- This now correctly passes through the host's AT_RANDOM value rather than fixed values
AT_PLATFORM
- Some tooling uses this to determine if it is running as i686 or x86-64
AT_HWCAP/HWCAP2
- This just returns some CPUID values, most applications use CPUID directly instead of this
AT_MINSIGSTKSZ
- The minimum signal stack size is no longer being a hardcoded constant size
- Applications are supposed to use this to calculate a signal stack size

Support radeon drm driver in ioctl emulation

Most Radeon GPUs these days use the amdgpu kernel driver, but a user found a hole in our ioctl emulation by using an old Radeon GPU on a Phytium ARM
board.

With this in-place, older Radeon cards that use the radeon kernel driver can now have accelerated OpenGL.

Misc optimizations

This month we have had a random smattering of optimizations that improve startup, shutdown, and execve performance. While not individually providing a
lot of benefit; small optimizations like these add up to make FEX better over time

Defer cpuinfo file initialization until first access
- Improves startup time
Use tsl::robin_map for some internal maps
- Improves JIT time, and some minor shutdown performance improvements
Disable multiblock by default
- This causes excessive JIT overhead which makes the experience worse for the user
- Significantly reduces stutters
Improve hot path of file existance checking in syscall wrapping
- During our overlayfs handling, this can be hit quite hard during file accesses
- Improves file IO in applications

Raw Changes

Arm64
Const on unmodified argument (https://github.com/FEX-Emu/FEX/commit/9ca34ca30668259eceef99bfc102796e9d67e26e)
Minor optimization in AESKEYGENASSIST (https://github.com/FEX-Emu/FEX/commit/c1d118c1d45da2a57ea8fa1fc94a1ab16e433408)
Optimize Break IR op codegen (https://github.com/FEX-Emu/FEX/commit/c7dd6ff28af1c0ff645769120bf4ff3d480ebdd9)
VectorOps
Simplify VMov IR op on SVE (https://github.com/FEX-Emu/FEX/commit/70e6ab5782203dbf72ea6f8c6fdde14ded1043a9)
CMake
Fix typo in clang thunks option. (https://github.com/FEX-Emu/FEX/commit/0030971f6f4f7772e296212a3b3df67a606750ce)
Config
Disable multiblock by default (https://github.com/FEX-Emu/FEX/commit/df25d4e03eb76fec50ba4d6ff41dd3b9b19bce33)
Add support for steamid based configurations. (https://github.com/FEX-Emu/FEX/commit/02ca94e6e685f88c976c3ec200f109fe9eb74beb)
Core
Replace a couple maps with tsl robin_map (https://github.com/FEX-Emu/FEX/commit/57c5761920ff57ce6b0688906cba488ca491e384)
Removes log about migrating to shared memory mode (https://github.com/FEX-Emu/FEX/commit/8b6e9e08c0d012f3868665fa082f3e306d1d42ca)
ELFCodeLoader
Calculate AT_MINSIGSTKSZ (https://github.com/FEX-Emu/FEX/commit/e0fe9167ea1f7321db8c659cac4a825da44c0b89)
Fixes AT_PLATFORM null terminator (https://github.com/FEX-Emu/FEX/commit/d7b0e8469ea1cbb498d70656e8dfdded7993fe2a)
Pass through AT_SECURE (https://github.com/FEX-Emu/FEX/commit/8afc3b8e23338d74d0998ce14c365443710d6e72)
Ensure we set AT_SYSINFO for 32-bit (https://github.com/FEX-Emu/FEX/commit/1d32df91cee1535c1c2f8a33f68ab02859fbc323)
EmulatedFiles
Defer cpuinfo file initialization to first access (https://github.com/FEX-Emu/FEX/commit/8e2b0d10e5589791da9984db66f1be132b957af1)
Externals
Update vixl submodule (https://github.com/FEX-Emu/FEX/commit/f066abc151147cff5218ce78f698cc0c3ef40105)
FEXConfig
Sort named rootfs vector (https://github.com/FEX-Emu/FEX/commit/71f658b07d0acaa603e292a66c9f4b34e2eead91)
FEXLoader
Make IsInterpreterInstalled check less horrible. (https://github.com/FEX-Emu/FEX/commit/1dd5642e468f6f1e0347bf2e8c4883576216b37e)
Disables some AOT shutdown overhead when not enabled (https://github.com/FEX-Emu/FEX/commit/f8b2a0b4d806f66e073c74d8b8b4662044e67b70)
FEXServer
More Systemd fixes (https://github.com/FEX-Emu/FEX/commit/5e5e5a35d92048dd0568e80249bff3c09f295cf9)
FEXServerClient
Disable confusing connection log (https://github.com/FEX-Emu/FEX/commit/cc6306aa32c938d91cd2e595da211450a868d708)
Add some debug logs for when FEX can't connect to se… (https://github.com/FEX-Emu/FEX/commit/3c8da3e3b4e963215a17c5be9215a8ff25c8bb7c)
IR
Handle 256-bit VExtr (https://github.com/FEX-Emu/FEX/commit/5a403b77654fbf8216d91ac5133ca50ca5ad40a3)
Removes the only uses of VSLI and VSRI (https://github.com/FEX-Emu/FEX/commit/7d9ed4e1bf03667ebbc754a3bc444e28052a3026)
Remove VLoadMemElement and VStoreMemElement (https://github.com/FEX-Emu/FEX/commit/9cee0126d7e152c12f755815f28f5bc9a826f568)
Handle 256-bit LoadRegister/StoreRegister (https://github.com/FEX-Emu/FEX/commit/a9c51388cf5ba2288d062a9ebab7270b6ef5296f)
Handle 256-bit VAddV (https://github.com/FEX-Emu/FEX/commit/04d4c5e017a37d06144d88e2f7064249a87358e2)
IntrusiveIRList
Add a utility helper for getting an OrderedNodeWrapper (https://github.com/FEX-Emu/FEX/commit/3c881809f428f5153122a111b5a2a1b0b8d1ed77)
IoctlEmu
Support radeon (https://github.com/FEX-Emu/FEX/commit/fc28062052ece5fd8ec3e5ba323d5b34cffe60fc)
Linux
Improve performance of hot paths in path searching (https://github.com/FEX-Emu/FEX/commit/66e0d46d89cfb0c5f3defd3ad14075cb27a6392d)
OpcodeDispatcher
Handle VADDPD/VADDPS/VPADDB/VPADDW/VPADDD/VPADDQ (https://github.com/FEX-Emu/FEX/commit/8f157e45bb04a7335275d1c9ae30cfbaee01d4b5)
Handle VANDPD/VANDPS/VPAND/VANDNPD/VANDNPS/VPANDN (https://github.com/FEX-Emu/FEX/commit/c37fcf136ababf13b491ae8f1a878cc70f8d1c33)
Handle VORPD/VORPS/VPOR (https://github.com/FEX-Emu/FEX/commit/34e39c996ea0b3c202b98724c5858f6549100974)
Handle VPXOR/VXORPD/VXORPS (https://github.com/FEX-Emu/FEX/commit/4de69029e590153cb32b2c4273dd0dcbc27d8b11)
Handle VZEROUPPER/VZEROALL (https://github.com/FEX-Emu/FEX/commit/a374a9af35ba826c3a63748e317f02029081410d)
Handle VMOVQ (https://github.com/FEX-Emu/FEX/commit/b35c6c6d22934a6625e6e043fafeadc2cf592994)
Handle VMOVNTDQ/VMOVNTDQA/VMOVNTPD/VMOVNTPS (https://github.com/FEX-Emu/FEX/commit/0841ff5feb1b34b8fc93d60ce504b4c5110772ec)
Implement SGDT (https://github.com/FEX-Emu/FEX/commit/9912d41714c974a845425b4ad0cc6d2381c7417a)
Moves all GPR and XMM accesses to direct register accesses (https://github.com/FEX-Emu/FEX/commit/a4556e90cd2b6d33d8a47f26d61e90c44b826b93)
Handle VMOVDQA/VMOVDQU (https://github.com/FEX-Emu/FEX/commit/df761a99cec48430b6a2cf96235be46155010b18)
Handle VMOVDDUP (https://github.com/FEX-Emu/FEX/commit/f8a199af494ff69ff5e8e49eed97ad48a9ef4365)
Handle VMOVSHDUP/VMOVSLDUP (https://github.com/FEX-Emu/FEX/commit/58f35ba4138e22375f4981c966721e8aa6e08351)
Handle VMOVHPD/VMOVHPS (https://github.com/FEX-Emu/FEX/commit/bc20f1e684575238c0da1bb88ff78f45b5a50b12)
Handle VMOVLPD/VMOVLPS (https://github.com/FEX-Emu/FEX/commit/69045db3a9ab8aa12c7fc4aa6edc41a0bb5f7ebd)
Handle VMOVAPD/VMOVUPD/VMOVUPS (https://github.com/FEX-Emu/FEX/commit/2271a90adbad84131351f004492a20346dda4dfd)
Handle VMOVAPS (https://github.com/FEX-Emu/FEX/commit/56ff09f3acff2d33942c55130d9dcbf266e7f820)
Disable PCLMUL if not supported on host (https://github.com/FEX-Emu/FEX/commit/c8293cbbda66b0f1585cddf1cc45b0ca05bd1ffe)
Syscalls
Minor optimization with initialization of syscall definition vector (https://github.com/FEX-Emu/FEX/commit/4d2c4b44231931be1aa63cd0eb345ae35c1b718a)
Thunks
Fix guest targets not being detected by IDEs (https://github.com/FEX-Emu/FEX/commit/91588775690c8063a0f802002923ceb89bb8f56c)
Misc
x86_64/VectorOps: Separate 128-bit/256-bit paths (https://github.com/FEX-Emu/FEX/commit/181d315d2ca0f01b8c0dec62e435754f2a675500)
Systemd fixes (https://github.com/FEX-Emu/FEX/commit/d5f7e616ebfd6559395f90968f2f5ae0ae9b96ff)
InstallFEX.py: Adds support for Kinetic (https://github.com/FEX-Emu/FEX/commit/aa837eddd5650e0cb6f677f4a36d39bc35289371)
Update release process to include AUR (https://github.com/FEX-Emu/FEX/commit/5336f017256af1f938e13f5873ab84b329029fc2)
unittests
Expand VPCLMULQDQ unit test (https://github.com/FEX-Emu/FEX/commit/9fea774a93b3fd2a5c8921c3f250f8c9c4350e2e)

FEX - FEX-2211

Published by Sonicadvance1 almost 2 years ago

Read the blog post at FEX-Emu's Site!

A lot of good changes this month for our users. Both performance and compatibility improvements to be had!

Segment register index optimization

This optimization has been a long time coming. Sitting in pull-request limbo since back in April. This is an optimization to cache segment register
addresses so the JIT can more optimally generate memory accesses. While segment registers are mostly gone with x86-64, 32-bit segment registers are
used fairly commonly with some instructions completely implicitly. This just adds overhead to fetch the LDT and GDT entries for something that
typically doesn't change very quickly.

With this optimization in place, we get an average of 4.3% uplift in 32-bit Bytemark. This performance improvement will be directly felt when
running 32-bit applications.

48-bit Proton Experimental fixes

For a while now FEX has worked with Proton 7.0 and older, but we have had issues running Proton Experimental in some cases.
This was a tricky problem to nail down but we had some good leads. If your ARM device was running its kernel with 48-bit Virtual address space (VA) enabled then Proton
Experimental wouldn't work. On the other-hand if your kernel is compiled using a 36-bit VA then it would run fine. After a few days of debugging, it
turns out that Proton/Wine allocates the lowest 32MB of its stack space, and the kernel by default allocates a 128MB space for the application.

When an application is ran natively the stack is allocated at the fixed location in memory. FEX was failing to allocate the stack at the correct
location. When Wine's preloader eventually ran; FEX will have allocated JIT code at that fixed location, which Wine would then map over, zeroing the
memory and breaking the FEX JIT. The preloader has done this for a long time and it was by pure chance that we weren't breaking older versions of Wine
and Proton.

With this problem fixed in FEX, we are now able to run triple-A games on AArch64. Just like the following images of God of War running on Snapdragon
888.

Even more IR changes preparing for AVX emulation

Once again this month we have a absolute ton of commits from Lioncash working on making our JIT be ready for AVX emulation. Around 25 commits working
towards this, with only about four more IR vector operations to support AVX with.

Once the JITs support 256-bit operations, we can start working towards emulating the instructions themselves.

Fix thunk crashing due to insufficient stack space

When FEX starts we potentially need to allocate all memory inside of the 48-bit VA space to match how x86-64 only has 47-bits.
This intersects with our stack space allocation which is supposed to autogrow, but we allocated it instead. Now we give the full 128MB stack space to
FEX so it won't crash anymore.

Implements support for remaining BCD instructions

Thanks to @wannacu for implementing the remaining handful of 32-bit BCD instructions. DAA, DAS, AAA, AAS, AAM,
AAD were all missing in FEX's implementation. While BCD is fairly uncommonly used these days, they still managed to find an application that uses
these instructions. With these implemented, FEX should have all of the BCD instructions finally implemented.

Implement gpuvis timeline profiler support

While not majorly important for users, this is a very good interface for developers wanting to watch why a game has stuttered and for how long code
took to compile. This lets us take advantage of the same interface that GPU profiling events are using to see why a game missed a vsync.

This isn't enabled by default out of concern for taking too much CPU time, so it needs to be enabled with the ENABLE_FEXCORE_PROFILER cmake
option.

Fix ROR OF flag calculation

This is a fairly minor bug since not many things rely on the OF flag specifically. But in our testing of new Proton games, we found out that Denuvo
Anti-Tamper is relying on this edge case behaviour and we messed it up. While this gets Denuvo running slightly farther, it still doesn't quite work
under FEX.

Fixes FPREM1 C2 flag calculation

FPREM1 will return a flag if the number was too large to calculate in one step. Which is usually not the case. Since we are calculating the full
remainder we will never set say we return a partial remainder. This solves an infinite loop in Mono applications that are using SIN/COS math
operations.

Claim X87 transcenental ops are in range

X87 will set a flag if a program tries to operate on a value that is out of range for trancendental SIN/COS/TAN operations.
FEX-Emu doesn't actually detect these for performance reasons, so instead claim these are always in range. While not always true, if they are out of
range then we weren't detecting them anyway. Fixes an issue where glibc would do some fixups to try and bring the value in range, resulting in invalid
results.

Add missing thunk library versions

This fixes an issue where FEX thunks would try to dlopen development libraries, which are missing on most user's devices.

Fixes indirect thunks with 8+ arguments

This fixes a quite bad crash with OpenGL and Vulkan thunking where every function with 8 or more arguments would be likely to break.
Fixes thunks for a bunch of games.

Add support for disabling thunks in application configurations

This is useful for narrowing down thunk compatibility issues in certain applications. While it is still not recommended to enable thunks globally,
this allows more flexibility with tinkering with it

Implements four more auxv values

FEX implements most of these values for applications to pull but in some cases we didn't have these setup. Specifically AT_PLATFORM is required
so ldconfig can work correctly. AT_HWCAP/AT_HWCAP2 is used for an application to check for CPU features, and AT_RANDOM is a 128-bit
random number that the kernel provides.

Misc

Quite a few more things that were changed this month, but this report has been going on long enough.

Raw Changes

32bit
Fixes Debug build of VDSO (https://github.com/FEX-Emu/FEX/commit/cf91ab9d5f9a6e559d3d4ffa65972fc2874b39cf)
Allocator
Expand stack space when stealing virtual address space (https://github.com/FEX-Emu/FEX/commit/000677abb67f1a42d185ec3014d6d84d11acd78d)
Arm64
BranchOps
Remove unused std::vector (https://github.com/FEX-Emu/FEX/commit/74e18f43173f83ba514ae97e36c3558510c7abc8)
ConversionOps
Eliminate use of temporary in Vector_FToF (https://github.com/FEX-Emu/FEX/commit/b1d98f4e584e68dc0dfb1b66b2c077bd8cc963cf)
MemoryOps
Merge if statement into switch in ParanoidLoadMemTSO (https://github.com/FEX-Emu/FEX/commit/199649b30fdd0cf944cf0751835efdd26fbe26e6)
Remove lingering unnecessary ptrue instances (https://github.com/FEX-Emu/FEX/commit/40d820fd05ff96222483eff7d0e1db307ad6fed8)
VectorOps
Make use of MOVPRFX where applicable (https://github.com/FEX-Emu/FEX/commit/639d6e60712b94298605e768a1b27cc12fedd9ee)
Simplify SVE VSXTL/VSXTL2/VUXTL/VUXTL2 implementations (https://github.com/FEX-Emu/FEX/commit/136f1e2fc730d43ef59b9de812955befa02e48d1)
ELFCodeLoader
Fixes Proton Experimental on 48-bit VA systems (https://github.com/FEX-Emu/FEX/commit/2fa1a649996044ff4e455dd0ba2cc4c5b2ee14dc)
Implement four more auxv values (https://github.com/FEX-Emu/FEX/commit/fa5322d3f91921b6ed25b76f591a58450577cede)
External
Update vixl submodule (https://github.com/FEX-Emu/FEX/commit/fc6de5f3c0efc3e53451321025a8b13c9df7cb94)
FEXCore
Adds support for a timeline profiler interface (https://github.com/FEX-Emu/FEX/commit/62a24bd38f97b1c0d166240af03532920e165b77)
FEXServer
Be robust against invalid packets. (https://github.com/FEX-Emu/FEX/commit/64eb87e9b5d1f8fae60f312ff51871e23aace646)
Flags
Refine _Bfe's shift (https://github.com/FEX-Emu/FEX/commit/abb44d3327dac790a1d70a6df260b56edf3c3e5d)
IR
Handle 256-bit VInsElement (https://github.com/FEX-Emu/FEX/commit/9e7daf61d052425572218ed59d68bd27c2ded3a1)
Handle 256-bit LoadContextIndexed/StoreContextIndexed (https://github.com/FEX-Emu/FEX/commit/03f0edc5b56af971c17b7456c1c289e6da4c3994)
Handle 256-bit StoreMem/StoreMemTSO/ParanoidStoreMemTSO (https://github.com/FEX-Emu/FEX/commit/70a91ee6ce3b3f1686e1a2d3d594b88df4a629e2)
Handle 256-bit LoadMem/LoadMemTSO/ParanoidLoadMemTSO (https://github.com/FEX-Emu/FEX/commit/8a14f87a4444386618de36d9d8fe976a7313f0e8)
Handle 256-bit LoadContext/StoreContext (https://github.com/FEX-Emu/FEX/commit/d475b0ba9e3c44bd2e1f8ff72b9d72176d4f5699)
Handle 256-bit VTBL1 (https://github.com/FEX-Emu/FEX/commit/2332c4151005fba6065ca9f5cc96a5c259d05740)
Handle 256-bit VInsGPR (https://github.com/FEX-Emu/FEX/commit/b7d9c00dff6dd4bc4db8019096590e85b6a901bb)
Handle 256-bit VExtractToGPR (https://github.com/FEX-Emu/FEX/commit/b3ee5dba0f170a1b6b897214ccf517df9273ad6e)
Handle 256-bit Vector_FToI (https://github.com/FEX-Emu/FEX/commit/7e810233d9b540690476e235ac8fc0a430905558)
Check for invalid conversion masks in Float_FromGPR_S (https://github.com/FEX-Emu/FEX/commit/7291b1072705ac7acb2e0db1c868e21ad3a27ce7)
Handle 256-bit Vector_FtoF (https://github.com/FEX-Emu/FEX/commit/b8f7e4c8ec4be715d485158aa501c54d8a0a110d)
Handle 256-bit Vector_FToZS/Vector_FToS (https://github.com/FEX-Emu/FEX/commit/13003da28988f74cd89014b59c59cba85ee697a7)
Handle 256-bit Vector_SToF (https://github.com/FEX-Emu/FEX/commit/cb17ee98711165e3fc86f80e3fcfdc306fddb62e)
Handle 256-bit VDupElement (https://github.com/FEX-Emu/FEX/commit/27b022d4d923814d1d04f2511a655ce1df5d4b11)
Handle 256-bit VUnZip/VUnZip2 (https://github.com/FEX-Emu/FEX/commit/780e3c7fb7fda77625fe5f6ffa1ba865c0425692)
Handle 256-bit VZip/VZip2 (https://github.com/FEX-Emu/FEX/commit/ab45db166511059fb4dc652197f7fdecd64556d0)
Handle 256-bit VUShrNI/VUShrNI2 (https://github.com/FEX-Emu/FEX/commit/bb38bcb67deb320aeb8edb2ccae484d0f6edfaf7)
Handle 256-bit VSQXTUN/VSQXTUN2 (https://github.com/FEX-Emu/FEX/commit/6cc29125423b307618598ed24d375b6f645ec85b)
Handle 256-bit VSQXTN/VSQXTN2 (https://github.com/FEX-Emu/FEX/commit/1c7d4165ab578105d9d37c42a142b05fa1b0d973)
Handle 256-bit VSMull/VSMull2 (https://github.com/FEX-Emu/FEX/commit/3ac5e0423a42c8658f18c3f9a8d4cffad3f75c37)
Handle 256-bit VUMull/VUMull2 (https://github.com/FEX-Emu/FEX/commit/78a077397e9744be5d3d63abf9402f0fa61b2ba1)
Handle 256-bit VUXTL/VUXTL2 (https://github.com/FEX-Emu/FEX/commit/e9f3a5b3e4423bf693599b8fcc9709849f0bb664)
Handle 256-bit VUABDL (https://github.com/FEX-Emu/FEX/commit/ebc45dff456aeaf3b78ad4d47abb57da5de132d5)
Handle 256-bit VSXTL/VSXTL2 (https://github.com/FEX-Emu/FEX/commit/f14a5ffbbfc4a7fd6a99f8409ce596a66bb49d00)
Interpreter
MiscOps
Remove unused StopThread() function (https://github.com/FEX-Emu/FEX/commit/d2e0dc99de794a2bfc7300dbf15f368c63113284)
OpcodeDispatcher
Fixes ROR imm OF calculation (https://github.com/FEX-Emu/FEX/commit/eca9353b2888aa06899c9115109bc550f03116cc)
Fixes flag calculation on ROR and ROL by immediate (https://github.com/FEX-Emu/FEX/commit/b1e475d81de13ebf5700a30873d0367dca274923)
Fixes FPREM1 C2 flag calculation (https://github.com/FEX-Emu/FEX/commit/4a09a4324f7b22474897f6dd47f6044c7394b128)
Syscalls
Fixes 64-bit mmap and munmap (https://github.com/FEX-Emu/FEX/commit/d1b235dd838256b9215a9c0e5b7124cd04d023ac)
Thunks
Add support for disabling thunks in config (https://github.com/FEX-Emu/FEX/commit/004c3230a48983faa2f6e448f06a5032ee6a9951)
Fixes missing thunk librarie so versions (https://github.com/FEX-Emu/FEX/commit/2272b30a912691eb20f69a81f73a384dd20001f0)
Fixes indirect thunks with 8+ arguments (https://github.com/FEX-Emu/FEX/commit/6b3d8886e5550be196c34754374993ab6d2ffa2d)
Update Vulkan thunk to v1.3.231 (https://github.com/FEX-Emu/FEX/commit/ddc10272a09e7648c31731e05fb8f8324cc36596)
Guest
Enable SSE2 on thunks and set fpmath to sse (https://github.com/FEX-Emu/FEX/commit/671f3e74a43dbd99912ad7a31632e7df493f4e70)
X11
Reorder and sort X11 interface by headers included. (https://github.com/FEX-Emu/FEX/commit/8d373c15b85c8605c5ee04554348f4dfe43e9a88)
libX11
Fix recursive initialize (https://github.com/FEX-Emu/FEX/commit/d8386121a8427d6a4c3696d8744907981fdbf914)
ThunksDB
Fixes Thunks loaded boolean pointer check (https://github.com/FEX-Emu/FEX/commit/b5fb1cb07cda5a6aca35ca20d4f4156018f45a00)
Utils
64BitAllocator
Minor cleanups and optimization for munmap (https://github.com/FEX-Emu/FEX/commit/70a3ceb64ee62bf6bf5d356f25839ae28d1d030a)
X87
Claim incoming float was in the range for trancendental ops (https://github.com/FEX-Emu/FEX/commit/aa5e92bee2f72023b188baabde709d38961170d5)
Misc
Segment register index optimization (https://github.com/FEX-Emu/FEX/commit/ecf489108751ecf86c7934741fe748951d798856)
Implements DAA, DAS, AAA, AAS, AAM and AAD instruction (https://github.com/FEX-Emu/FEX/commit/f26eccd00f0e8ebffda8708ea273aad11f23247c)
Ensure Arm64Emitter uses FEX allocator (https://github.com/FEX-Emu/FEX/commit/ffb4de9fd9320f428e3ff5283f5fa830ee23ffb5)
unittests
Amend mm register usage in H0F38/66_04.asm test (https://github.com/FEX-Emu/FEX/commit/cada0d593c73d38d0dfab807ff37a9954d86e43a)
asm
Adds more extensive FPREM/FPREM1 tests (https://github.com/FEX-Emu/FEX/commit/b726f60afd894d814f2c86694d3164dddd18b4c8)
gvisor
Adds a bunch of tests to flakes (https://github.com/FEX-Emu/FEX/commit/5bef13df946ed993302398512f4e640c709b5988)

FEX - FEX-2210

Published by Sonicadvance1 about 2 years ago

Read the blog post at FEX-Emu's Site!

This month's release was a bit delayed due to the fact that most of FEX-Emu's developers were meeting up physically at the X.Org Developer's
Conference this year! Before we talk about this months changes we need to spend a bit of time talking about some cool things.

FEX-Emu XDC talk

This year FEX-Emu had a talk to discuss some of the weird interactions with Mesa in an emulated environment. You can see the full talk in the embedded
video.
XDC Talk

At the end of the video we showed a quick demo of (mostly!) Proton games running under FEX-Emu on a Snapdragon 888 device. You can see this demo
directly embedded below.
XDC Sizzle Reel

Ubuntu 22.04 Rootfs Mesa update

We have had to update the Ubuntu 22.04 rootfs image with a newer version of Mesa today. Unfortunately our last update with Mesa 22.2 had a bug in the
Raspberry Pi Vulkan driver which completely broke Vulkan on ALL devices, not just raspberry pi. We have updated the rootfs today with a mesa git
version of the library to work around this issue. As a benefit, this version of the FEX rootfs includes the new Venus Vulkan 1.3 driver which can be
useful for testing.

Pick up the latest rootfs with the FEXRootFSFetcher tool.

New Lenovo ThinkPad X13s Gen 1 laptops

Last month Lenovo launched a new Snapdragon laptop that is one of the best development platforms that FEX-Emu devs could ask for. This platform is
shipping the Snapdragon 8cx Gen 3 SoC which is one of Qualcomm's most powerful chips. The only downside with this platform currently is that the GPU
doesn't yet work under Linux. There is an ongoing community effort to get the GPU up and running but these Snapdragon chips typically take a while
before support is fully in-place.

Once the GPU works then this will be a perfect platform for testing Adreno with the Turnip Vulkan driver and Freedreno. At that point we will be
shipping out these laptops to all of our devs so we have a good Vulkan development platform.

Tweet from @FEX_Emu

FEX-2210

Although most of our developers were at XDC, there is no shortage of code that was merged this last month.

IR changes preparing for AVX emulation

This last month had at least 32 commits preparing our JITs for emulating AVX. While AVX isn't yet wired up, this is still a required step before it is supported. We are still requiring ARM SVE hardware that is shipping with 256-bit wide registers. This means the current consumer CPUs and just announced Neoverse-V2 won't work for our emulation here! This is future-proofing work since more games are requiring AVX to run but we'll just need to live with the problem that we will need new CPUs for the latest AAA games to run under FEX.

Support clang for thunks

We added support for building our thunks with clang this release. In particular the Ubuntu PPA is shipping this already. This might give a very minor perf increase but the main thing is removing a hard dependency on GCC.

Add uninstall cmake target

While it is generally advised to not install directly from source building, user tend to still do this.
It was asked multiple times to have an uninstall target so we finally added this convenience feature.

32-bit VDSO thunking support

This is FEX-Emu's first 32-bit thunk library! This exercises most of the thunking framework to bring this feature to 32-bit, without some of the harder parts that require data repacking. Now that this is proving that our 32-bit thunking is working, it is likely that we will start working towards getting the rest of the thunks supporting 32-bit as well!

IR cleanups

While this isn't directly user facing, this makes the JIT IR a bit easier to handle. Making the devs lives easier. We've removed redundant operations that aren't necessary.

Add support for vixl simulator in CI

While we are waiting for SVE-256bit hardware to get on the market, we need CI to prove that our implementation is correct. We have once again added the vixl simulator to our source tree.
The vixl simulator supports emulating the SVE instructions at whichever register width you want. While stacking emulators isn't good for performance, it is good for ensuring correct behaviour.
Sadly ARM's simulator doesn't emulate 100% of the operations correctly, we have had to disable a few of our unit tests in this case; but, it works well enough that it can pick up major mistakes.

CI functional testing

We have added functional testing of some of our thunks in our CI system. Specifically we are testing our OpenGL and Vulkan thunks to ensure they don't break. Since this is the beginning of functional testing, we currently only run vulkaninfo and glxinfo.
Soon we will be expanding this functional testing to encompass more features which will likely capture even more problems if they come up.

Map ELF files more like the kernel

The kernel has an interesting behaviour around how it maps ELF files in memory. It will always load the dynamic linker at around the highest address
it can. The primary ELF file will be loaded roughly in the middle of the address space with a bit of ASLR bias. We now emulate the same behaviour in
FEX to help with problems when running WINE. While not all the issues are sorted out, this is a good step towards making it more stable.

Fix LLVM ASAN

We had an issue with our ELF loading where LLVM ASAN was breaking due to mixing multiple mmaps in the same space. Simple bug with a simple fix. ASAN
all the things!

SMC deadlock fix

There was a fix to prevent a potential deadlock in our Self-Modifying-Code detection routines. Thanks to the developer that found this!

Lots of misc fixes this month

It would be hard to list all of the misc other fixes that happened this month. Find out more in our raw release notes!

Raw Changes

Arm64
Fixes SVE VectorImm (ad8526852)
Centralize location for register defines (169cfbbee)
VectorOps
Make use of static predicate registers (0fee355ff)
CI
Fixes struct verifier on Ubuntu 20.04 (f97a4afd8)
Adds support for flakes (96fecfd7c)
CMake
Add toolchain file for 32-bit cross-compiler (1ed3ecb40)
Extend AArch64 check to include arm64 (a583ebe59)
Docs
Update Release docs (c987e1ef4)
ELFCodeLoader
Map primary ELF more like the kernel (b44b3401b)
Fixes dynamic non-interpreter ELFs (edca52860)
Map interpreter first (71f7ff510)
ELFCodeloader
Map once and then use MAP_FIXED to overwrite (d68b84bc2)
FEXConfig
Ensure APP_CONFIG_NAME isn't stored in json (8d69f539a)
FEXLinuxTests
Adds missing pthread_cancel flake status (c262362a0)
Migrate to Catch2 (8f70137b1)
Build 32-bit and 64-bit test variants separately (3448c8343)
Use the build system instead of setting up compile flags via source-code annotations (d2138694b)
FEXServer
Fix waiting on kernel version older than 5.3 (8f9d79934)
FHU
Convert to a interface target (dee85f14f)
IR
Handle 256-bit VSMul/VUMul (23dd056b6)
Handle 256-bit VRev64 (c412d073b)
Handle 256-bit VShlI/VUShlI/VUShrI (c2b6aef6f)
Handle 256-bit VSShrS/VUShlS/VUShrS (51214d1be)
Handle 256-bit VFCMPORD/VFCMPUNO (7b4b9a80f)
Handle 256-bit VFCMPLT/VFCMPGT/VFCMPLE (25a8a0077)
Handle 256-bit VFCMPEQ/VFCMPNEQ (a67f7422b)
Handle 256-bit VCMPGT/VCMPGTZ/VCMPLTZ (ed8150cfb)
Handle 256-bit VCMPEQ/VCMPEQZ (462a163ba)
Handle 256-bit VBSL (6374175a6)
Handle 256-bit VSMax/VUMax (8d8b02928)
Handle 256-bit VSMin/VUMin (aa6a49932)
Handle 256-bit VNot (64c4fdccf)
Handle 256-bit VFNeg (d715ffbc8)
Handle 256-bit VNeg (1799d4c67)
Handle 256-bit VFRSqrt (dacd96cab)
Handle 256-bit VFSqrt (ca4d3bf64)
Handle 256-bit VFRecp (ea38b043c)
Handle 256-bit VFMax (a39746df2)
Handle 256-bit VFMin (2367a8e50)
Handle 256-bit VAddP (cb121d7f1)
Handle 256-bit VFDiv (50eba4066)
Handle 256-bit VFMul (447226576)
Handle 256-bit VFSub (3f8b872f1)
Handle 256-bit VFAddP (e573ddc2d)
Handle 256-bit VFAdd (eedbde6f1)
Handle 256-bit VPopcount (4e441e5a0)
Handle 256-bit VAbs (3e287a36c)
Removes Mov IR op (46bde401b)
Removes VExtractElement (01beac495)
Removes unnecessary VBitcast IR op (fcd981e6b)
Removes SplatVector{2,4} (2b9cc9666)
Removes VInsScalarElement (82eba2229)
Interpreter
Handle 256-bit VSShr/VUShl/VUShr (4d6e15d7a)
Use constant for AVX register size where applicable (808e1c033)
Handle 256-bit VMov (412793c21)
Handle 256-bit VAnd/VBic/VOr/VXor (b5cb42924)
JITs
Handle spilling/filling 256-bit vectors (6742e0c37)
Expand max spill slot size to 32 bytes (0d0d116bd)
SMC
Fix possible deadlock (8da9ebc2e)
Scripts
Updates DefinitionExtract (3977e1f29)
StructVerifier
Fixes CI failure (d4b5bf0f7)
ThunkLibs
X11/Xext: Removes two functions that don't exist on 32-bit (cc4c705fc)
Thunks
Add support for building with clang (2b1ef9735)
Adds dependency on linker script (eaddf7f1a)
Implement the Thunk IR op for 32-bit mode (1ea00f68a)
Adds functional thunk testing to CI (a59097763)
Host
Adds bool operator to fex_guest_function_ptr (3237de308)
gen
Use fmt for writing formatted output (704afed97)
libvulkan
Fixes print for 32-bit (d8c2a8271)
VDSO
Fix vsyscall (5cf59408a)
VectorOps
Handle 256-bit VURAvg (977d6dd24)
Handle 256-bit VUMinV (0261ed353)
Extend VSQAdd/VSQSub/VUQAdd/VUQSub (f34f1309a)
Extend VAdd/VSub (0ad52b7d1)
Misc
Add opencl thunk db (b693112c8)
32-bit VDSO support (6f6f3c9dc)
Update vixl external (832a320e2)
Move thunk generator logic from ASTVisitor to ASTFrontendAction (54915f87c)
Add support for the vixl simulator (b36ec152d)
cmake
Adds uninstall target (6adf22761)
unittests
Disable gvisor pselect test (acddc0323)

FEX - FEX-2209

Published by Sonicadvance1 about 2 years ago

Read the blog post at FEX-Emu's Site!

A lot of miscellaneous work this month that isn't directly user facing. We do still have some interesting topics this month that some people will be
interested in.

Simplify StealMemory functions

A fairly significant change this month is reducing the time it takes FEX to set up its memory upon load. FEX needs to do an initial setup of the memory when an application loads
because between x86-64, x86, and AArch64 the memory layouts are significantly different.

Depending on the architecture of the application, FEX needs to allocate a large amount of memory to emulate the x86/x86-64 memory behaviour.

On 32-bit x86

We need to allocate all memory above 32-bit memory space
- This is because we emulate 32-bit applications as a 64-bit AArch64 application

On 64-bit x86-64

We need to allocate all memory in the 48-bit virtual address space
- This is because AArch64 supports the full 48-bit space for the user
- x86-64 userspace only receives 47-bit
- Application's rely on not receiving 48-bit pointers!

From this graph showing the amount of CPU time spent in each routine, we can see a significant reduction in time to execute.
For 32-bit and 64-bit specific operations this results in a ~70x and ~181x reduction in in execution time!

How well does this improve execution time in practice though?

This graph is showing the total time it takes to run applications fully through. The smallest test applications have shaved off around 75% - 85% their execution time. The biggest improvement
comes from Proton setting up its execution environment. Proton's underlying execution environment is called pressure-vessel which executes hundreds
of background applications while setting up. This is one of the worst cases for FEX since each independent application execution needs to JIT new code
and handle all of its state setup. This case reduces the execution time from around 21 seconds down to around 17 seconds! This can really
be felt when execution back to back Proton instances when testing games!

While this is a significant step in the right direction, FEX still has a ways to go to hit the native execution time of pressure-vessel which can take
as little as one second.

More AVX work

A bunch more work has gone in to supporting AVX emulation. This is still preliminary backend work for now.

HostRunner
- Handle upper YMM lanes in sigsegv handler
InterpreterOps
- Extend SSAData size to accomodate 256-bit operations
VectorOps
- Extend VAnd/VBic/VOr/VXor
- Extend VMov
- Extend VectorImm
- Extend VectorZero

Thunks

X11

Some fairly minor changes here that improve usability of thunks with Proton. We added more Xlibint functions to the thunks which fixes X11 thunking
with DXVK. X11 is required for both Vulkan and OpenGL thunking so having this working is necessary when running those games.

Another necessary change for supporting thunks with Wine/Proton is more aggressively supporting X11 functions which require variadic arguments. There
are quite a few of these functions sprinkled around that require this. While we supported these functions with open-coded support up to 7 arguments,
we need to support at least up to 14 arbitrary arguments in some instances. We now have some assembly code in place which can support an arbitrary
number of arguments by packing these in memory the expected way. While this only works for 64-bit integers, it's all that we need for X11.

With both of these features implemented both OpenGL and Vulkan thunking works with Proton.

VDSO

While this is implemented as a thunk on the FEX side, it behaves slightly differently that normal thunks. This will always be enabled as long as FEX
can load the VDSO-host.so library installed on the system. Due to the nature of VDSO, all applications always have a VDSO region provided by the
kernel at all times. FEX wants to provide fast emulation of this "library" since applications abuse it heavily for performance. This was noticed when
running Proton games, they abuse the clock_gettime very heavily which was causing significant CPU overhead. Applications were calling this VDSO
syscall hundreds or thousands of times a second. This now significantly lowers the amount of time spent in the kernel for timing functions.

getdents syscall emulation

AArch64 doesn't support this syscall but in most cases applications don't use it. This is because there is a much more modern syscall called
getdents64 that everything uses now. When running older compiled applications they are likely to use the classic syscall. Since AArch64 doesn't have
the classic version, we now emulate it entirely using getdents64, which fixes running applications from centos 7.

Misc

Fix compiling without jemalloc
- Thunks are unsupported without jemalloc but we need to keep it compiling
Consolidate generated files to one file per platform
- Nice code cleanup for developers
Minor cleanups for signature-based function pointer thunking
Support direct thunk config in configuration files
- This improves the user experience with enabling thunks for application configurations
- No need for two files to describe one thing now

Raw Changes

64BitAllocator
Fixes a significant state tracking perf problem (https://github.com/FEX-Emu/FEX/commit/123b6728e4a9030d16c58330ee344a0a34659b3b)
Allocator
Simplify StealMemory, make it less chatty with kernel space (https://github.com/FEX-Emu/FEX/commit/04678f84042e22eb57c2ff5a61b188cddaedb83d)
Arm64
JIT
Rename CanUseSVE to HostSupportsSVE (https://github.com/FEX-Emu/FEX/commit/7d8950de40a9759a442aef84cb58983b18daaf71)
CI
Build Thunks (https://github.com/FEX-Emu/FEX/commit/e544591c9c4824a1fdc50fb89373058bcf502571)
FDUtils
Don't make unknown get_fdpath fatal (https://github.com/FEX-Emu/FEX/commit/336dedbed5266c287538d0d33da01537a29b86f1)
FEXRootFSFetcher
Fix crash if curl fails to download rootfs definition file (https://github.com/FEX-Emu/FEX/commit/31fefaae0d4e09954f20605561e37e212dbbd90e)
FEXServer
Support socket path override (https://github.com/FEX-Emu/FEX/commit/a2f4f494a9cdfcfffdba52b35da7a1f9378e635c)
Github
Fix fresh runner rootfs checkout (https://github.com/FEX-Emu/FEX/commit/d6199687f44fb10322ff9876028f88cf2edabd7b)
HostRunner
Handle upper YMM lanes in sigsegv handler (https://github.com/FEX-Emu/FEX/commit/d5c83a2e45a91aaeeecd026ce051b41d62b819e8)
IRLoader
TestHarnessLoader
Don't build if not building tests (https://github.com/FEX-Emu/FEX/commit/097184c3e07eff3af49dce3d9664f77223bb8bce)
InterpreterOps
Extend SSAData size to accomodate 256-bit operations (https://github.com/FEX-Emu/FEX/commit/98dbfbe6548eef70f45425caca45464860ec8be4)
Linux
Emulate classic getdents syscall for x64 and x32 (https://github.com/FEX-Emu/FEX/commit/9de25c200c888aa56fc5e389b65004265613340e)
Syscalls
Use underscored shm syscall names (https://github.com/FEX-Emu/FEX/commit/bbcca80b60e1443cf5604779356fff431057a798)
Termux
Add android-shmem library (https://github.com/FEX-Emu/FEX/commit/0adbe311128f4af35ab514af6bf3e7cb70bf2825)
Thunks
Consolidate all generated code to one file per library per platform (https://github.com/FEX-Emu/FEX/commit/c17da256176a92501305cf672771044e33f6308c)
Adds VDSO thunk library (https://github.com/FEX-Emu/FEX/commit/53623ffa7254e664a2607dc633a1016da0025777)
Minor cleanups for signature-based function pointer thunking (https://github.com/FEX-Emu/FEX/commit/e6acdcc5835ab23f528453bcd237b3259b611b0f)
Support direct thunk config in configuration files (https://github.com/FEX-Emu/FEX/commit/84a95adae13456cd820c8541a09338d98e6153f5)
Fix compile without jemalloc (https://github.com/FEX-Emu/FEX/commit/d5138f509c29869a4b817d0a4ebe0f8c0cda5c09)
X11
Support Variadic stack packing (https://github.com/FEX-Emu/FEX/commit/fbb008e51055ed73e07f4e97415a518da35c38a5)
Adds missing XLibint functions (https://github.com/FEX-Emu/FEX/commit/998a3d83539ee755eaa9ff4742f2c5bf8ffce968)
VectorOps
Extend VAnd/VBic/VOr/VXor (https://github.com/FEX-Emu/FEX/commit/e776f4cd4eee4408bb8c7e02b3088014df29de30)
Extend VMov (https://github.com/FEX-Emu/FEX/commit/e7d7dd13d7edd4fabd9b90eba162d9487ad5b02d)
Extend VectorImm (https://github.com/FEX-Emu/FEX/commit/8439cf410fef9b16efadedcc0a0451858fa58bab)
Extend VectorZero (https://github.com/FEX-Emu/FEX/commit/d03b6a938234a193d655c9f5f04550d29b3618fd)
Misc
New domain. (https://github.com/FEX-Emu/FEX/commit/7f9edbf39e2a88870aebfe4c40bee7a1404f11a4)
x86_64/JIT: Resolve lingering fmt deprecation warning (https://github.com/FEX-Emu/FEX/commit/37ccb1391751a4966ea38772f36d78f0d33a2e63)
cmake
fix incorrect assumption about the value of git's core.abbrev (https://github.com/FEX-Emu/FEX/commit/c03a7fd482b6a21ff5ac87105cd63aac2a12e3a7)
unittests
Support skipping unit tests based on host feature support (https://github.com/FEX-Emu/FEX/commit/1fe6fc3feb63cf33f3db66cd462b287672eaf880)
ThunkLibs
Fix warning about "dangerous" use of tmpnam (https://github.com/FEX-Emu/FEX/commit/12fee91ebbb6f2bb82d805679b985ca91158176c)

Related Projects

u-root

A fully Go userland with Linux bootloaders! u-root can create a one-binary root file system (init...

19 Nov 2015 2,430

snacklinux

Linux distribution for lazy people

10 Mar 2013 16

quickemu

Quickly create and run optimised Windows, macOS and Linux virtual machines

15 Mar 2020 10,152

heads

A minimal Linux that runs as a coreboot or LinuxBoot ROM payload to provide a secure, flexible bo...

03 Aug 2016 1,402

linux-kernel-module-cheat

The perfect emulation setup to study and develop the Linux kernel v5.4.3, kernel modules, QEMU, g...

30 Jul 2016 3,947

linux-lab

Docker/Qemu Based Linux Kernel Learning, Development and Testing Environment; New Linux ELF Video...

11 Jul 2016 1,150

box86

Box86 - Linux Userspace x86 Emulator with a twist, targeted at ARM Linux devices

09 Jan 2019 3,123

linuxboot

The LinuxBoot project is working to enable Linux to replace your firmware on all platforms.

21 Dec 2017 857

linux-installer

Universal GNU+Linux installer script

08 Nov 2020 20